How Hash Functions Work — MD5, SHA-256, CRC32 Algorithm Explained

The Basics

What is a hash function?

A hash function is a deterministic procedure that turns an input of any size into an output of fixed size. Feed in 3 bytes or 3 gigabytes — out comes the same number of bits every time. Hash the same input twice and you get the same result. Change a single bit in the input and the output looks completely different.

🎯

Deterministic

Same input → same output, always. Two people on opposite sides of the planet running SHA-256 on the same file will compute the same 64-character digest.

📏

Fixed-length output

MD5 always outputs 128 bits. SHA-256 always 256 bits. The input can be anything from an empty string to a multi-GB file — the digest is the same size.

💥

Avalanche effect

Flip one bit of input and roughly half the output bits flip too. "hello" and "Hello" produce SHA-256 digests with nothing in common.

🔒

One-way (cryptographic only)

Cryptographic hashes are designed so that you cannot recover the input from the output — even with massive computing power. This isn't true of non-crypto hashes.

Two big families

Cryptographic vs non-cryptographic hashes

Not every hash function is built for security. The split matters because the design goals — and therefore the speed, output size and resistance to attack — are completely different.

🛡️

Cryptographic hashes

MD5, SHA-1, SHA-2 family. Designed to resist three attacks: pre-image (find an input that hashes to a given output), second pre-image (find another input that collides with a given input), and collision (find any two inputs that produce the same hash). Used in TLS, digital signatures, blockchain.

🚀

Non-cryptographic hashes

CRC-32, Adler-32, FNV-1a, DJB2. Designed for speed and good statistical distribution, not security. Easy to reverse, easy to forge collisions. Used inside hash tables, error-detection checksums, bloom filters and load balancers.

Shared blueprint

The Merkle-Damgård construction

MD5, SHA-1 and the SHA-2 family all share the same overall blueprint, called Merkle-Damgård construction. Understanding it once is enough to understand all four algorithms — they only differ in the details inside the box.

Pad the message so its length is a multiple of the block size (512 bits for MD5/SHA-1/SHA-256, 1024 bits for SHA-512). Padding always ends with the message length encoded as a number.
Split the padded message into fixed-size blocks.
Initialise a small state (5 to 8 words, depending on the algorithm) with constant magic numbers.
For each block, run a compression function that mixes the block into the state. This is where the algorithm-specific math happens.
After the last block, the final state is the digest.

The padding step is what makes hash functions safe against length-extension on protocols — but only if the protocol uses the hash correctly. (MD5 and SHA-1/2 are themselves vulnerable to length-extension; HMAC and SHA-3 fix this in different ways.)

Algorithm 1

How MD5 works

MD5 (Message Digest 5) was published by Ron Rivest in 1991. It compresses any input into a 128-bit (16-byte, 32 hex character) digest. The algorithm is still in widespread use for non-security purposes despite being cryptographically broken.

Step by step

Padding. Append a single 1-bit, then enough 0-bits to make the length 64 bits short of a multiple of 512. Append the 64-bit original message length at the end.
Initialise state. Four 32-bit words: A=0x67452301, B=0xEFCDAB89, C=0x98BADCFE, D=0x10325476.
For each 512-bit block, perform 64 operations divided into 4 rounds of 16 operations each. Each round uses a different non-linear function (F, G, H, I) built from AND/OR/XOR/NOT, plus a per-step rotation amount and a sine-derived constant.
Add the post-round state into the running A/B/C/D values.
Output A‖B‖C‖D after all blocks are processed (little-endian). That's your 128-bit MD5 digest.

Why it's broken

In 2004, Wang and Yu published a practical method for finding MD5 collisions — two different messages that hash to the same value — in seconds on commodity hardware. In 2008, researchers used MD5 collisions to forge a rogue Certificate Authority. Today, MD5 collisions are trivial; use it only when collisions don't matter (e.g. cache keys, file deduplication, checksum-only integrity).

→ Try the MD5 generator

Algorithm 2

How SHA-1 works

SHA-1 (Secure Hash Algorithm 1) was designed by the NSA and published by NIST in 1995. It outputs a 160-bit (20-byte, 40 hex character) digest. Famously used by Git for commit IDs.

SHA-1 uses the same Merkle-Damgård outer structure as MD5 but with a wider state (5 words instead of 4), an 80-step compression function (4 rounds of 20 steps each), and a different mixing pattern that expands each 16-word block into 80 words. The 5-word state means a longer digest — 160 bits — and a different per-step rotation/constant scheme.

Why it's deprecated

In 2017, Google's SHAttered attack produced two distinct PDF files with the same SHA-1 hash. The attack required about 9 quintillion (9×10¹⁸) hash computations — out of reach for most attackers, but well within reach of a determined adversary. NIST deprecated SHA-1 for digital signatures in 2011. Browsers stopped trusting SHA-1 TLS certificates in 2017. Use SHA-256 or higher for anything security-relevant.

→ Try the SHA-1 generator

Algorithm 3

How the SHA-2 family works

SHA-2 is a family of six functions published by NIST in 2001: SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224 and SHA-512/256. They all share the same general design but use one of two underlying engines:

🟢

SHA-256 engine

32-bit words, 512-bit blocks, 64 rounds, 8-word state. Used by SHA-256 directly, and by SHA-224 (which truncates the output to 224 bits).

🟢

SHA-512 engine

64-bit words, 1024-bit blocks, 80 rounds, 8-word state. Used by SHA-512 directly, and by SHA-384 (which truncates the output to 384 bits using different initial constants).

Inside the compression function

Each block is split into 16 words, then expanded to 64 (SHA-256) or 80 (SHA-512) words using a non-linear schedule of rotations, XORs and shifts. The compression loop maintains 8 state words (a–h). At each round:

T1 = h + Σ₁(e) + Ch(e,f,g) + K[t] + W[t]
T2 = Σ₀(a) + Maj(a,b,c)
h ← g; g ← f; f ← e; e ← d + T1
d ← c; c ← b; b ← a; a ← T1 + T2

Σ₀, Σ₁, Ch (choose) and Maj (majority) are simple bit-level mixing functions. K[t] is a round constant derived from the cube roots of prime numbers. W[t] is the expanded message word. The whole thing is fast on modern CPUs — and on hardware with SHA extensions (Intel SHA-NI, ARM Crypto), it's extremely fast.

Security status

SHA-256 and SHA-512 have no known practical attacks in 2026, more than two decades after publication. The best-known collision attack on SHA-256 still requires roughly 2¹²⁸ operations — entirely infeasible. These are the algorithms you should pick by default.

→ SHA-224 · SHA-256 · SHA-384 · SHA-512

Algorithm 4

How CRC-32 works

CRC-32 (Cyclic Redundancy Check, 32-bit) is not a cryptographic hash at all — it's an error-detection code. It's designed to catch accidental bit-flips during transmission or storage, not to resist deliberate tampering.

The math: polynomial division

CRC-32 treats the input as a long binary number and divides it by a fixed 33-bit "generator polynomial" — for the most common variant (used in ZIP, PNG and Ethernet), that's 0xEDB88320 reversed. The remainder of that division is the CRC.

In practice, every implementation uses a precomputed 256-entry lookup table to process one byte at a time:

Initialise CRC to 0xFFFFFFFF.
For each input byte b: CRC = TABLE[(CRC ^ b) & 0xFF] ^ (CRC >>> 8)
XOR the final CRC with 0xFFFFFFFF (final inversion).

CRC-32 detects all single-bit errors, all double-bit errors, all odd numbers of errors, and any burst error shorter than 32 bits. It is trivial to forge a collision — you can construct any message with any target CRC in O(n) time — so never use it for security purposes.

→ Try the CRC-32 generator

Algorithm 5

How Adler-32 works

Adler-32 was designed by Mark Adler (of zlib fame) as a faster, simpler alternative to CRC-32 for error detection in the Deflate compression algorithm.

It maintains two running 16-bit sums, A and B, both modulo 65521 (the largest prime less than 2¹⁶):

Initial A = 1, B = 0.
For each byte b: A = (A + b) mod 65521; B = (B + A) mod 65521
Output: (B << 16) | A — 32 bits total.

Adler-32 is faster than CRC-32 in software (just two additions and two modular reductions per byte) but has weaker error-detection properties: it fails to catch certain swap-style corruptions and is sensitive to leading zero bytes. It's still good enough for zlib's purpose, where the underlying stream is already compressed and therefore "random-looking".

→ Try the Adler-32 generator

Algorithm 6

How FNV-1a works

FNV-1a (Fowler-Noll-Vo, variant 1a) is a tiny, fast, non-cryptographic hash designed for use inside hash tables and bloom filters. The 32-bit variant fits in three CPU instructions per byte.

Initial hash = 0x811C9DC5 (FNV-32 offset basis).
For each byte b: hash = (hash XOR b) × 0x01000193 (mod 2³²).
The final 32-bit value is the hash.

The "1a" variant XORs before multiplying (FNV-1 does it the other way around); 1a has slightly better distribution. FNV-1a has excellent statistical properties for general-purpose hashing — its avalanche behaviour is surprisingly good for such a simple function — and it's been a workhorse in databases, compilers (LLVM uses it), and bloom filters for two decades.

→ Try the FNV-1a generator

Algorithm 7

How DJB2 works

DJB2 is even simpler than FNV-1a. Posted by Daniel J. Bernstein to comp.lang.c in the early 1990s as a "fast hash function for strings", it's been quietly used inside thousands of C and C++ programs ever since.

Initial hash = 5381.
For each byte b: hash = (hash << 5) + hash + b — equivalent to hash × 33 + b.

Why 33? Why 5381? Bernstein never published a formal analysis — the constants were chosen empirically based on how well they distributed common English text. The function is fast, has excellent locality (one byte at a time, no table lookups), and remains a great choice for tiny in-memory hash maps even today. Like FNV-1a, it has zero collision resistance against an attacker who can choose inputs.

→ Try the DJB2 generator

Side by side

All algorithms compared

Algorithm	Output	Family	Year	Status	Common use
MD5	128-bit	MD	1991	Broken	Checksums, dedup
SHA-1	160-bit	SHA-1	1995	Deprecated	Git, legacy TLS
SHA-224	224-bit	SHA-2	2001	Secure	Compact digests
SHA-256	256-bit	SHA-2	2001	Recommended	TLS, Bitcoin, signing
SHA-384	384-bit	SHA-2	2001	Secure	Suite B, TLS
SHA-512	512-bit	SHA-2	2001	Secure	Archival, high-security
CRC-32	32-bit	CRC	1975	Non-crypto	ZIP, PNG, Ethernet
Adler-32	32-bit	Checksum	1995	Non-crypto	zlib, Deflate
FNV-1a	32-bit	FNV	1991	Non-crypto	Hash tables, bloom filters
DJB2	32-bit	DJB	~1991	Non-crypto	C hash tables

FAQ

Common questions about how hashes work

What is the difference between hashing and encryption? ▾

Encryption is reversible: you take plaintext + a key, you get ciphertext; later you take ciphertext + the same key, you get the plaintext back. Hashing is one-way: you take input, you get a fixed-size digest, and there is no key and no way to recover the input from the digest. Hashing is for fingerprinting and integrity; encryption is for confidentiality.

Why are MD5 and SHA-1 still useful if they're broken? ▾

"Broken" in cryptography means collisions are findable — an attacker can construct two different inputs with the same hash. That matters for signatures and certificates, but not for non-adversarial use cases: cache keys, file deduplication, casual integrity checks where no attacker is involved. MD5 and SHA-1 are still very fast and still hash uniformly random inputs to uniformly random outputs.

Why doesn't SHA-256 also become weaker over time? ▾

It might — eventually. Cryptanalysis progresses slowly, and the security margins of SHA-256 and SHA-512 are very large. After 25 years, the best public attack on SHA-256 only marginally improves over brute force. NIST and other bodies will pre-emptively migrate to SHA-3 or longer-output SHA-2 well before any SHA-256 attack becomes practical.

How do hash tables use these hash functions? ▾

A hash table maps keys to "bucket" indices. You hash the key (FNV-1a, DJB2, or a SipHash variant in modern languages), take the hash modulo the bucket count, and look in that bucket. Hash functions used here don't need to be cryptographic — they just need to be fast, deterministic, and distribute inputs evenly across buckets. Speed matters far more than collision resistance.

What is the "avalanche effect" in hash functions? ▾

The avalanche effect is the property that flipping a single input bit causes roughly half of the output bits to flip. Good cryptographic hashes have a very strong avalanche; weak hashes don't. You can see this for yourself: hash "hello" and "Hello" with SHA-256 and compare the outputs character by character — they share essentially no common bits.

What is a Merkle-Damgård construction? ▾

It's the iterative compression scheme used by MD5, SHA-1 and SHA-2: pad the message, split it into fixed blocks, then for each block run a compression function that mixes the block into a running state. After the last block, the state is the hash. Most pre-SHA-3 hash functions follow this pattern with different compression functions inside.

Why is SHA-3 different from SHA-2? ▾

SHA-3 (Keccak) was selected by NIST in 2012 as a structural alternative to SHA-2 — not a replacement for security reasons. It uses a "sponge" construction instead of Merkle-Damgård, which gives it different properties (immunity to length-extension, simpler arbitrary-length output). HashGenerator.tools doesn't currently include SHA-3; it's on the roadmap.

How fast are these hash functions? ▾

Rough order on a modern CPU, software-only, in MB/sec: DJB2 ≈ 1500, FNV-1a ≈ 1200, Adler-32 ≈ 1000, CRC-32 ≈ 800 (with table), MD5 ≈ 600, SHA-1 ≈ 500, SHA-256 ≈ 300, SHA-512 ≈ 200 (faster than SHA-256 on 64-bit CPUs because it operates on 64-bit words). With hardware acceleration (Intel SHA-NI, ARMv8 Crypto), SHA-256 jumps to over 2000 MB/sec.

Why do hash functions use weird-looking initial constants? ▾

The initial state and round constants in MD5, SHA-1 and SHA-2 are derived from the fractional parts of mathematical constants (square roots and cube roots of small primes). Using publicly verifiable "nothing up my sleeve" numbers gives confidence that the designers didn't choose constants to enable a hidden backdoor.

Is double hashing more secure than single hashing? ▾

Slightly, for some narrow attack scenarios (it doubles the work for length-extension), but it doesn't meaningfully improve collision resistance. For password storage, what you actually want is a slow purpose-built function (bcrypt, Argon2id, scrypt) — not raw SHA-256 applied 1,000 times. Don't roll your own KDF.

How hash functions actually work