Cryptographic functions
Cryptographic functions serve a variety of important purposes,
including:
● Ensuring the integrity of messages or files
● Verifying passwords without storing them in clear
text
● Creating “proof of work” for use in cryptocurrencies like
Bitcoin and Ethereum
● Generating and verifying digital signatures
A typical cryptographic function takes a message of arbitrary
size for input and produces a fixed-length hash. For example,
MD5 produces 128-bit hashes, while SHA-256 produces 256-bit
hashes. When represented in hexadecimal, MD5 hashes are 16
characters long, and SHA-256 hashes are 32 characters long (since
two letters represent one byte, and one byte equals 8 bits).
md5 hash translation
Cryptographic hash functions have several fundamental properties:
● Determinism: The same input always produces the same
output.
● Speed: The process should be able to compute hashes for
any input size quickly.
● One-wayness: It should be infeasible to determine the
input given the hash.
● Collision resistance: A cryptographic hash function is
considered collision resistant if it is computationally
impossible to find two distinct inputs that produce the
same hash output.
● Avalanche effect: Even a tiny change in the input should
result in a significantly different hash.
These properties ensure that hash functions can be used effectively
for cryptographic purposes.
md5
While the MD5 algorithm has historical interest, it is no longer
considered secure for cryptographic purposes due to known
vulnerabilities that allow multiple inputs to produce the same hash
output. However, it can still be used for non-cryptographic
purposes, such as checking the integrity of a downloaded file
against unintentional corruption. It is important to note that
MD5 should not be used for security-critical applications.
sha256
The SHA-2 family of cryptographic hash functions, which includes
SHA-256, was developed as an improvement over the SHA-1 and
SHA-0 functions. Both of which have roots in the MD5
algorithm. As such, there are many similarities between the
various SHA functions.
SHA-256 is considered a secure algorithm because it is
computationally infeasible to determine the input given the hash
output. However, it is not recommended for storing passwords in
databases due to its relatively low cost of computation, which makes
it vulnerable to brute-force and rainbow table attacks.
The algorithms
Step 1 — Formatting the input
The shuffling functions used in cryptographic hash algorithms have
specific requirements for the input message. In particular, the
message must be divided into chunks of 512 bits, and the
resulting formatted message must meet particular criteria.
The formatted message size must be a multiple of 512 bits
To ensure that the formatted message has the correct size, the
following steps are taken:
● A “1” bit is appended to the end of the message, and
enough “0” bits are added to make the length of the
formatted message a multiple of (512–64) bits.
● The size of the original message is stored in a 64-bit format
in the remaining space at the end of the formatted
message.
Then, the formatted message length can be calculated using the
following formula: aligned = (nb + (X-1)) & ~(X-1) This formula
aligns a number ‘nb’ to a multiple of ‘X’ bytes.
The build_msg function may also have an ‘is_little_endian’
parameter, which indicates the endianness of the input message.
Endianness refers to the order in which bits are stored in memory
and can be either little-endian or big-endian. You can learn
more about endianness in this article:
Little & Big Endian
Little and big endian are two ways of storing multibyte data-types ( int, float, etc). In
little endian machines, last…
medium.com
To convert a 64-bit number from one endianness to the
other, you can use the following function:
The shuffling algorithm works by iteratively inverting small groups
of two-byte neighbors, starting with the smallest group and
progressively expanding to larger groups.
bswap(uint64_t)
Step 2 — Shuffling the data
The shuffling steps for cryptographic hash algorithms differ
slightly depending on whether you use MD5 or SHA-256.
However, in both cases, the message is divided into 512-bit chunks.
The algorithm uses 32-bit buffers to hold data (4 for MD5 and 8
for SHA-256). For each chunk, the algorithm performs
mathematical operations on these buffers, which modify their state.
While the details of these operations can be complex, you can learn
more about them in the following articles or by exploring the
implementation in my project: SHA-256 and MD5.