Check me sum – What They Are and Why They Matter

The concept of a checksum is an important concept within cyber security. Grasping the idea behind it, I felt, would be a solid step towards establishing a good basic understanding of today’s IT landscape.

So what is a checksum and how does it work?

What is a Checksum?

A checksum is a small block of digital data used to detect errors or altering that may have been introduced during its storage or transmission.

Definition by Wikipedia

How is a Checksum Generated?

The process is rather simple, the underlying algorithms are not. For the start, a rough understanding will, however, suffice.

To start, the algorithms will take data, which often comes in the form of files, as input.

------------        ---------------       --------------
|          |        |    Hash     |       |            |
|   File   | ---->  |  algorithm  | ----> |  Checksum  |
|          |        |             |       |            |
------------        ---------------       --------------

The data is processed by the algorithm and in the end turned into a checksum. Depending on which algorithm is used, completely different checksums for the same file will be generated.

# Hash algorithms: MD2, MD4, MD5, SHA, SHA224, SHA256, SHA512, RIPEMD160...

# The same file used as input for different hashing algorithms
# The results are completely different hashes with varying lengths 

MD5       = 9f53caffee2e9bf83778f9674c37282e 
RIPEMD160 = dea18ea779e2bc35ada84acfdde109ab266c1315 
SHA      = c3170b1fe2045687641ca30b9de43c0f27af8bb3 
SHA224    = 791e4d9a1024b454e5152eaed594fdd72f03374eb6464267424b708c 
SHA256    = fee8f65099e5c392e1019e64894a138cfccfcd5a8b966f0aaf93eb2433ff8119

2 Useful Checksum Characteristics

Checksums have two very useful characteristics, which is why they are used everywhere in today’s computer world.

  1. They do not change if the underlying data (or file) is not altered
  2. They change dramatically if the underlying data (or file) is altered

Because of these two features, they are particularly useful when it comes to establishing whether files have been altered. A common use is when it comes to downloading installer files. As the installation of a file poses an imminent threat on computer security, the creators of the program, will then verify that the file is the one one they say it is.
Generating a Checksum for a file

Here, we will quickly calculate a checksum for a file under Linux/Mac in order to demonstrate the process. We will use a file myfile.txt and run the MD5 hash algorithm over it.

Input                                           Output
------------------        ---------------       --------------------------------------
|                |        |    MD5      |       |                                    |
|   myfile.txt   | ---->  |  algorithm  | ----> |  9f53caffee2e9bf83778f9674c37282e  |
|                |        |             |       |                                    |
------------------        ---------------       --------------------------------------

To start, we will first have to create the file.

$ echo "hello" > myfile.txt
$ cat myfile.txt

We are creating the file myfile.txt and are writing ‘hello’ to it. We quickly print the contents of the file, in order to verify that everything worked as planned. Next, we are using openssl and invoke the MD5 algorithm on the file.

$ openssl md5 myfile.txt
MD5 (myfile.txt) = b1946ac92492d2347c6235b4d2611184

The algorithm then returns us a hash, which we can use as a reference point to see whether the file has been altered. Just to show you that the algorithm doesn’t calculate a new checksum if the file hasn’t been changed, once again call the MD5 function, providing us with exactly the same hash.

$ openssl md5 myfile.txt
MD5 (myfile.txt) = b1946ac92492d2347c6235b4d2611184

Then we are altering the file by changing one character from ‘hello’ to ‘hallo’ and re-run the algorithm.

$ echo "hallo" > myfile.txt
$ openssl md5 myfile.txt
MD5 (myfile.txt) = 9f53caffee2e9bf83778f9674c37282e

The output is completely different from the previous one. If we compare the two, we can directly see, without even looking at the file, that the file has been tampered with.

MD5 (original myfile.txt) = b1946ac92492d2347c6235b4d2611184
MD5 (tampered myfile.txt) = 9f53caffee2e9bf83778f9674c37282e

This is the power of checksums.

In upcoming posts, we will see how checksums are used to provide authenticity in combination with digital signatures.