Fundamentals

What Is a Hash Collision? Real Examples (Including SHAttered)

📅 2026-05-12 ⏱ 7 min read ← Back to Blog

A hash collision is when two different inputs produce the same hash output. For a good cryptographic hash function this should be effectively impossible to engineer — finding one would require trying 2^(half the bit size) random inputs by the birthday paradox. For SHA-256 that's 2^128 tries. For MD5 it's supposed to be 2^64. In 2004, researchers found an MD5 collision in a few hours. By 2017, SHA-1 had been broken too.

What actually happened in those attacks, and why SHA-256 is still safe, is one of the more interesting stories in modern cryptography.

Quick definitions

Collision: two different inputs that produce the same hash. hash(A) = hash(B) with A ≠ B.
Birthday paradox: collisions appear after roughly √(2^bits) random tries, not 2^bits.
Theoretical collision: a math paper showing how to find one. Doesn't necessarily mean anyone has.
Practical collision: an actual demonstration with real bytes. MD5 (2004), SHA-1 (2017).
Chosen-prefix collision: the dangerous kind — the attacker picks meaningful starting bytes, and the algorithm finds suffixes that match.

The birthday paradox, applied to hashes

If you hash random inputs into a 32-bit output space, you'd naively expect collisions after about 4 billion (2^32) tries. The birthday paradox says you'd actually start seeing collisions after only ~65,000 tries — the square root.

The intuition is the same as the classic version: in a room of 23 people, there's a 50% chance two share a birthday. The naive calculation says you'd need 365 people for a near-certain match. But you're not asking "does anyone share my birthday?" — you're asking "do any two people share?" The number of pairs grows quadratically with room size.

For hash functions: 2^n possible outputs means a 50% chance of collision after roughly √(2^n) = 2^(n/2) tries. That's why a "256-bit hash" only provides "128 bits of collision resistance."

Hash	Output bits	Brute-force collision attempts
CRC-32	32	2¹⁶ = 65,000 (trivial)
MD5	128	2⁶⁴ = 18 quintillion (theoretical, ~$1B with current GPUs)
SHA-1	160	2⁸⁰ = 1.2 septillion (theoretical, but cryptanalysis cuts it to ~2⁶⁰)
SHA-256	256	2¹²⁸ (impossible with current/foreseeable technology)

The numbers in the third column are upper bounds. They assume the hash function behaves like a perfect random oracle. When mathematical weaknesses are found, the actual attack cost can be much lower than the brute-force bound — that's the story of MD5 and SHA-1.

The MD5 break (2004)

In August 2004, Xiaoyun Wang and Hongbo Yu published a paper showing they could find MD5 collisions in about an hour of computation. The brute-force expectation was 2^64 tries; Wang's attack was 2^39 — about 33 million times faster.

The attack worked by exploiting subtle structure in MD5's compression function. The collisions were "identical-prefix": both colliding messages had to share the same starting bytes, with the attack-generated bytes appended at the end. That made them theoretically possible but not immediately useful — the colliding files looked obviously synthetic.

By 2007, Marc Stevens and Arjen Lenstra extended the attack to chosen-prefix collisions. Now an attacker could pick two completely different prefixes (say, two TLS certificates with different subject names) and the algorithm would find suffix bytes that made both hash to the same MD5.

In December 2008, a group used this to forge a signed Certificate Authority certificate. They got a CA to sign a benign-looking website certificate, then used the chosen-prefix collision to "promote" that certificate into appearing to be an intermediate CA — capable of signing any other TLS certificate. With this, they could impersonate any website. The PoC was demonstrated at the 25th Chaos Communication Congress. Within weeks, browser vendors began phasing out MD5 for certificate signing.

SHAttered (2017): the first practical SHA-1 collision

The brute-force expectation for SHA-1 collisions is 2^80 attempts. Theoretical attacks from 2005 cut that to roughly 2^63. By 2015, Marc Stevens (the MD5 attacker, still at it) and others estimated a SHA-1 collision could be found for around $173,000 in cloud computing.

In February 2017, Google and CWI Amsterdam actually did it. SHAttered produced two PDFs — visibly different documents — with identical SHA-1 hashes. The attack used:

6,500 CPU-years (rented from Google Cloud)
110 GPU-years
Total cost: ~$110,000 in 2017 cloud pricing

This was an identical-prefix collision — the attackers controlled both PDFs. Three years later, in January 2020, Gaëtan Leurent and Thomas Peyrin demonstrated a chosen-prefix SHA-1 collision for about $45,000 in cloud time. That's the dangerous kind — the attacker can collide their malicious content with someone else's legitimate content.

By 2026, the chosen-prefix SHA-1 attack is estimated at a few hundred to a few thousand dollars depending on GPU availability. SHA-1 should be considered fully broken for collision resistance.

Why SHA-256 is still safe

The SHA-2 family (which includes SHA-256) was designed in 2001 with the lessons of MD5 and SHA-1 cryptanalysis in mind. It uses:

A larger output (256 bits vs. SHA-1's 160)
More rounds of internal processing (64 vs. 80)
A different mathematical structure

Twenty-five years of public cryptanalysis on SHA-256 has produced essentially no progress on collisions. The best known attack is still effectively brute force at 2^128. Even with the entire global computing capacity directed at finding a single SHA-256 collision, the universe would end first.

This could change. New mathematical insights occasionally break hash functions that seemed strong. But SHA-256 is the most-attacked hash function in history (it's also Bitcoin's proof-of-work), and nothing has been found in a quarter century. If you're betting on a hash function in 2026, it's a very safe bet.

The "weak hash function" warning signs

How do you know a hash function might be heading for trouble?

Output size under 160 bits. Birthday bound is too small. MD5 and SHA-1 are below this threshold.
Theoretical attacks published that beat brute force. If researchers can find a collision in 2^60 instead of 2^80, that's a 10,000× weakness — and attacks usually only get better.
Algorithmic structure shared with broken hashes. SHA-1 was a tweaked version of SHA-0, which was broken in 1998. That's a hint, not proof — SHA-2 also shares some lineage and is still strong — but it's a signal.
Vendor deprecation. When NIST, Microsoft, Google, and Mozilla all announce phaseout timelines, take the hint.

Birthdays, weddings, and SHA-256

An intuitive way to feel the size of 2^128 collision resistance:

The current world population is ~8 billion (≈ 2^33)
The number of seconds since the Big Bang is ~4×10^17 (≈ 2^58)
The number of atoms in the observable universe is ~10^80 (≈ 2^266)
2^128 is greater than the number of grains of sand on Earth squared

That's the "find a SHA-256 collision by brute force" workload. It is not happening with any imaginable engineering. The only way SHA-256 breaks is through mathematical insight we don't yet have. Maybe we'll have it eventually — but probably not soon.

Want to see a hash collision in action? Hash any text with MD5 and watch how a tiny change ripples through the entire output. Our MD5 generator and SHA-256 generator show this live — try "abc" vs "abd".