(Start at the beginning of the series – and all the source can be found in my github repo)

So FNV1 was easy, and Murmur3 wasn’t too much harder; for a challenge and to see how far I could go, I decided to try to compute an MD5 string hash using C++11 `constexpr`

.

This was significantly harder. I broke out my copy of Applied Cryptography 2e, found a reference implementation of MD5 in C, read through RFC 1321 and the pseudocode on the Wikipedia page.

**Few, few the bird make her nest**

I built up MD5 piece by piece, pulling out parts of the reference implementation to check that I’d got each building block right before moving on. The actual round function primitives were the easy part. As usual for a hash function, they are a mixture of bitwise functions, shifts, rotates, adds. These types of things make for trivial `constexpr`

functions, for example:

constexpr uint32_t F(uint32_t X, uint32_t Y, uint32_t Z)
{
return (X & Y) | (~X & Z);
}
constexpr uint32_t rotateL(uint32_t x, int n)
{
return (x << n) | (x >> (32-n));
}
constexpr uint32_t FF(uint32_t a, uint32_t b,
uint32_t c, uint32_t d,
uint32_t x, int s, uint32_t ac)
{
return rotateL(a + F(b,c,d) + x + ac, s) + b;
} |

constexpr uint32_t F(uint32_t X, uint32_t Y, uint32_t Z)
{
return (X & Y) | (~X & Z);
}
constexpr uint32_t rotateL(uint32_t x, int n)
{
return (x << n) | (x >> (32-n));
}
constexpr uint32_t FF(uint32_t a, uint32_t b,
uint32_t c, uint32_t d,
uint32_t x, int s, uint32_t ac)
{
return rotateL(a + F(b,c,d) + x + ac, s) + b;
}

There are similar functions for the other low-level primitives of the MD5 round functions, conventionally called F, G, H and I.

Now, MD5 works on buffer chunks of 512 bits, or 16 32-bit words. So assuming we have a string long enough, it’s easy, if a bit long-winded, to convert a block of string into a schedule that MD5 can work on:

struct schedule
{
uint32_t w[16];
};
constexpr schedule init(const char* buf)
{
return { { word32le(buf), word32le(buf+4), word32le(buf+8), word32le(buf+12),
word32le(buf+16), word32le(buf+20), word32le(buf+24), word32le(buf+28),
word32le(buf+32), word32le(buf+36), word32le(buf+40), word32le(buf+44),
word32le(buf+48), word32le(buf+52), word32le(buf+56), word32le(buf+60) } };
} |

struct schedule
{
uint32_t w[16];
};
constexpr schedule init(const char* buf)
{
return { { word32le(buf), word32le(buf+4), word32le(buf+8), word32le(buf+12),
word32le(buf+16), word32le(buf+20), word32le(buf+24), word32le(buf+28),
word32le(buf+32), word32le(buf+36), word32le(buf+40), word32le(buf+44),
word32le(buf+48), word32le(buf+52), word32le(buf+56), word32le(buf+60) } };
}

**Seconds out, round one**

It’s not pretty, but such are the constructs that may arise when you only have C++11 `constexpr`

to work with. The output of MD5 is going to be 4 32-bit words of hash (denoted A, B, C and D in the literature), and in the main loop, which happens for each 512 bits of the message, there are four rounds, each round having 16 steps. After each step, the 4 words are rotated so that A becomes the new B, B becomes the new C, etc. So a round step is fairly easy – here’s the round 1 step which uses the F primitive:

struct md5sum
{
uint32_t h[4];
};
constexpr md5sum round1step(const md5sum& sum,
const uint32_t* block,
int step)
{
return { {
FF(sum.h[0], sum.h[1], sum.h[2], sum.h[3],
block[step], r1shift[step&3], r1const[step]),
sum.h[1], sum.h[2], sum.h[3]
} };
} |

struct md5sum
{
uint32_t h[4];
};
constexpr md5sum round1step(const md5sum& sum,
const uint32_t* block,
int step)
{
return { {
FF(sum.h[0], sum.h[1], sum.h[2], sum.h[3],
block[step], r1shift[step&3], r1const[step]),
sum.h[1], sum.h[2], sum.h[3]
} };
}

As you can see, there are some constants (`r1shift`

and `r1const`

) for each stage of round 1. Rotating the words after each round step is also easy:

constexpr md5sum rotateCR(const md5sum& sum)
{
return { { sum.h[3], sum.h[0], sum.h[1], sum.h[2] } };
}
constexpr md5sum rotateCL(const md5sum& sum)
{
return { { sum.h[1], sum.h[2], sum.h[3], sum.h[0] } };
} |

constexpr md5sum rotateCR(const md5sum& sum)
{
return { { sum.h[3], sum.h[0], sum.h[1], sum.h[2] } };
}
constexpr md5sum rotateCL(const md5sum& sum)
{
return { { sum.h[1], sum.h[2], sum.h[3], sum.h[0] } };
}

So now we are able to put together a complete round, which recurses, calling the round step function and rotating the output until we’re done after 16 steps.

constexpr md5sum round1(const md5sum& sum,
const uint32_t* msg, int n)
{
return n == 16 ? sum :
rotateCL(round1(rotateCR(round1step(sum, msg, n)), msg, n+1));
} |

constexpr md5sum round1(const md5sum& sum,
const uint32_t* msg, int n)
{
return n == 16 ? sum :
rotateCL(round1(rotateCR(round1step(sum, msg, n)), msg, n+1));
}

Rounds 2 through 4 are very similar, but instead of using the F primitive, they use G, H and I respectively. A complete MD5 transform for one 512-bit block looks like this (with a helper function that sums the MD5 result parts):

constexpr md5sum sumadd(const md5sum& s1, const md5sum& s2)
{
return { { s1.h[0] + s2.h[0], s1.h[1] + s2.h[1],
s1.h[2] + s2.h[2], s1.h[3] + s2.h[3] } };
}
constexpr md5sum md5transform(const md5sum& sum,
const schedule& s)
{
return sumadd(sum,
round4(
round3(
round2(
round1(sum, s.w, 0),
s.w, 0),
s.w, 0),
s.w, 0));
} |

constexpr md5sum sumadd(const md5sum& s1, const md5sum& s2)
{
return { { s1.h[0] + s2.h[0], s1.h[1] + s2.h[1],
s1.h[2] + s2.h[2], s1.h[3] + s2.h[3] } };
}
constexpr md5sum md5transform(const md5sum& sum,
const schedule& s)
{
return sumadd(sum,
round4(
round3(
round2(
round1(sum, s.w, 0),
s.w, 0),
s.w, 0),
s.w, 0));
}

**The ugly business of padding**

So far so good. This works, as long as we’re processing the complete 512-bit blocks contained in the message. Now to consider how to finish off. The padding scheme for MD5 is as follows:

- Append a 1-bit (this is always done, even if the message is a multiple of 512 bits)
- Add as many 0-bits as you need to, to make up 448 bits (56 bytes)
- Append a 64-bit value of the original length
*in bits* to the 448 bits to make a final 512-bit block

This gets messy in C++11 `constexpr`

-land, but suffice to say that I wrote a `leftover`

function analogous to `init`

that could deal with padding. Now, finally, the complete MD5 calculation, which has three conditions:

- As long as there is a 64-byte (512-bit) block to work on, recurse on that.
- If the leftover is 56 bytes or more, pad it
*without* the length in there and recurse on an “empty block”.
- Otherwise, pad with padding and length, transform and we’re done!

constexpr md5sum md5update(const md5sum& sum, const char* msg,
int len, int origlen)
{
return
len >= 64 ?
md5update(md5transform(sum, init(msg)), msg+64, len-64, origlen) :
len >= 56 ?
md5update(md5transform(sum,
leftover(msg, len, origlen, 64)),
msg+len, -1, origlen) :
md5transform(sum, leftover(msg, len, origlen, 56));
} |

constexpr md5sum md5update(const md5sum& sum, const char* msg,
int len, int origlen)
{
return
len >= 64 ?
md5update(md5transform(sum, init(msg)), msg+64, len-64, origlen) :
len >= 56 ?
md5update(md5transform(sum,
leftover(msg, len, origlen, 64)),
msg+len, -1, origlen) :
md5transform(sum, leftover(msg, len, origlen, 56));
}

**Woot!**

I don’t mind admitting that when I finally got this working, I did a happy dance around my apartment. Even though the code has ugly parts. But of course the thrill of achievement soon gives way to the thirst for more, and MD5 isn’t exactly today’s choice for hash algorithms, even if it is still often used where cryptographic strength isn’t paramount. So I started thinking about SHA256.

Of course, cryptography proceeds largely by incrementally twiddling algorithms when they are found lacking: adding rounds, beefing up functions, etc. And so the NSA’s SHA series proceeded from Ron Rivest’s MD series. In particular, the block size and padding schemes are identical. That was a huge leg-up on SHA256, since I’d already solved the hardest part and I could reuse it.

SHA256 produces a larger digest, and obviously has different magic numbers that go into the rounds, but structurally it’s very similar to MD5. Again I used Wikipedia’s pseudocode as a reference. The only real thing that I needed to do over MD5, besides change up some trivial maths, was the schedule extend step. SHA256 copies the 16 words of the message to a 64-word block to work on, and extends the 16 words into the remaining 48 words. Computing a single value in the schedule, `w[i]`

, is easily written in a recursive style, where `n`

represents the “real” values we already have, initially 16:

constexpr uint32_t extendvalue(const uint32_t* w, int i, int n)
{
return i < n ? w[i] :
extendvalue(w, i-16, n) + extendvalue(w, i-7, n)
+ s0(extendvalue(w, i-15, n)) + s1(extendvalue(w, i-2, n));
} |

constexpr uint32_t extendvalue(const uint32_t* w, int i, int n)
{
return i < n ? w[i] :
extendvalue(w, i-16, n) + extendvalue(w, i-7, n)
+ s0(extendvalue(w, i-15, n)) + s1(extendvalue(w, i-2, n));
}

And to extend all the values, we can then do:

constexpr schedule sha256extend(const schedule& s)
{
return { { s.w[0], s.w[1], s.w[2], s.w[3],
s.w[4], s.w[5], s.w[6], s.w[7],
s.w[8], s.w[9], s.w[10], s.w[11],
s.w[12], s.w[13], s.w[14], s.w[15],
extendvalue(s.w, 16, 16), extendvalue(s.w, 17, 16),
//
// and so on...
//
extendvalue(s.w, 62, 16), extendvalue(s.w, 63, 16) } };
} |

constexpr schedule sha256extend(const schedule& s)
{
return { { s.w[0], s.w[1], s.w[2], s.w[3],
s.w[4], s.w[5], s.w[6], s.w[7],
s.w[8], s.w[9], s.w[10], s.w[11],
s.w[12], s.w[13], s.w[14], s.w[15],
extendvalue(s.w, 16, 16), extendvalue(s.w, 17, 16),
//
// and so on...
//
extendvalue(s.w, 62, 16), extendvalue(s.w, 63, 16) } };
}

When I did this initially, I ran into the compiler’s max step limit for a `constexpr`

computation. Different from the max recursion limit, the max step limit is a guideline for how many expressions should be evaluated within a single constant expression. Appendix B suggests 2^{20}. If a compiler did memoization on `extendvalue`

I suspect this would be fine, but evidently clang doesn’t, so to compromise, I split the function into 3: `sha256extend16`

, `sha256extend32`

and `sha256extend48`

, each of which extends by 16 words at a time. And that worked.

After extending the schedule and changing some of the maths functions, the rest was easy – practically the same as for MD5. Now I was done with string hashing.

For the next experiments with compile-time computation, I wanted to understand a strange thing I’d seen online…