{"id":1278,"date":"2015-10-14T22:06:47","date_gmt":"2015-10-15T05:06:47","guid":{"rendered":"http:\/\/www.elbeno.com\/blog\/?p=1278"},"modified":"2015-10-15T08:51:54","modified_gmt":"2015-10-15T15:51:54","slug":"more-string-hashing-with-c11-constexpr","status":"publish","type":"post","link":"https:\/\/www.elbeno.com\/blog\/?p=1278","title":{"rendered":"More string hashing with C++11 constexpr"},"content":{"rendered":"<p>(Start at the <a href=\"https:\/\/www.elbeno.com\/blog\/?p=1206\">beginning of the series<\/a> &#8211; and all the source can be found in <a href=\"https:\/\/github.com\/elbeno\/constexpr\">my github repo<\/a>)<\/p>\n<p>So FNV1 was easy, and Murmur3 wasn&#8217;t too much harder; for a challenge and to see how far I could go, I decided to try to compute an MD5 string hash using C++11 <code>constexpr<\/code>.<\/p>\n<p>This was significantly harder. I broke out my copy of <a href=\"http:\/\/www.amazon.com\/Applied-Cryptography-Protocols-Algorithms-Source\/dp\/0471117099\">Applied Cryptography 2e<\/a>, found a reference implementation of MD5 in C, read through <a href=\"https:\/\/www.ietf.org\/rfc\/rfc1321.txt\">RFC 1321<\/a> and <a href=\"https:\/\/en.wikipedia.org\/wiki\/MD5#Pseudocode\">the pseudocode<\/a> on the Wikipedia page.<\/p>\n<p><strong>Few, few the bird make her nest<\/strong><\/p>\n<p>I built up MD5 piece by piece, pulling out parts of the reference implementation to check that I&#8217;d got each building block right before moving on. The actual round function primitives were the easy part. As usual for a hash function, they are a mixture of bitwise functions, shifts, rotates, adds. These types of things make for trivial <code>constexpr<\/code> functions, for example:<\/p>\n<pre lang=\"cpp\">\r\nconstexpr uint32_t F(uint32_t X, uint32_t Y, uint32_t Z)\r\n{\r\n  return (X & Y) | (~X & Z);\r\n}\r\n\r\nconstexpr uint32_t rotateL(uint32_t x, int n)\r\n{\r\n  return (x << n) | (x >> (32-n));\r\n}\r\n\r\nconstexpr uint32_t FF(uint32_t a, uint32_t b,\r\n                      uint32_t c, uint32_t d,\r\n                      uint32_t x, int s, uint32_t ac)\r\n{\r\n  return rotateL(a + F(b,c,d) + x + ac, s) + b;\r\n}\r\n<\/pre>\n<p>There are similar functions for the other low-level primitives of the MD5 round functions, conventionally called F, G, H and I.<\/p>\n<p>Now, MD5 works on buffer chunks of 512 bits, or 16 32-bit words. So assuming we have a string long enough, it&#8217;s easy, if a bit long-winded, to convert a block of string into a schedule that MD5 can work on:<\/p>\n<pre lang=\"cpp\">\r\nstruct schedule\r\n{\r\n  uint32_t w[16];\r\n};\r\n\r\nconstexpr schedule init(const char* buf)\r\n{\r\n  return { { word32le(buf), word32le(buf+4), word32le(buf+8), word32le(buf+12),\r\n        word32le(buf+16), word32le(buf+20), word32le(buf+24), word32le(buf+28),\r\n        word32le(buf+32), word32le(buf+36), word32le(buf+40), word32le(buf+44),\r\n        word32le(buf+48), word32le(buf+52), word32le(buf+56), word32le(buf+60) } };\r\n}\r\n<\/pre>\n<p><strong>Seconds out, round one<\/strong><\/p>\n<p>It&#8217;s not pretty, but such are the constructs that may arise when you only have C++11 <code>constexpr<\/code> to work with. The output of MD5 is going to be 4 32-bit words of hash (denoted A, B, C and D in the literature), and in the main loop, which happens for each 512 bits of the message, there are four rounds, each round having 16 steps. After each step, the 4 words are rotated so that A becomes the new B, B becomes the new C, etc. So a round step is fairly easy &#8211; here&#8217;s the round 1 step which uses the F primitive:<\/p>\n<pre lang=\"cpp\">\r\nstruct md5sum\r\n{\r\n  uint32_t h[4];\r\n};\r\n\r\nconstexpr md5sum round1step(const md5sum& sum,\r\n                            const uint32_t* block,\r\n                            int step)\r\n{\r\n  return { {\r\n      FF(sum.h[0], sum.h[1], sum.h[2], sum.h[3],\r\n         block[step], r1shift[step&3], r1const[step]),\r\n        sum.h[1], sum.h[2], sum.h[3]\r\n        } };\r\n}\r\n<\/pre>\n<p>As you can see, there are some constants (<code>r1shift<\/code> and <code>r1const<\/code>) for each stage of round 1. Rotating the words after each round step is also easy:<\/p>\n<pre lang=\"cpp\">\r\nconstexpr md5sum rotateCR(const md5sum& sum)\r\n{\r\n  return { { sum.h[3], sum.h[0], sum.h[1], sum.h[2] } };\r\n}\r\n\r\nconstexpr md5sum rotateCL(const md5sum& sum)\r\n{\r\n  return { { sum.h[1], sum.h[2], sum.h[3], sum.h[0] } };\r\n}\r\n<\/pre>\n<p>So now we are able to put together a complete round, which recurses, calling the round step function and rotating the output until we&#8217;re done after 16 steps.<\/p>\n<pre lang=\"cpp\">\r\nconstexpr md5sum round1(const md5sum& sum,\r\n                        const uint32_t* msg, int n)\r\n{\r\n  return n == 16 ? sum :\r\n    rotateCL(round1(rotateCR(round1step(sum, msg, n)), msg, n+1));\r\n}\r\n<\/pre>\n<p>Rounds 2 through 4 are very similar, but instead of using the F primitive, they use G, H and I respectively. A complete MD5 transform for one 512-bit block looks like this (with a helper function that sums the MD5 result parts):<\/p>\n<pre lang=\"cpp\">\r\nconstexpr md5sum sumadd(const md5sum& s1, const md5sum& s2)\r\n{\r\n  return { { s1.h[0] + s2.h[0], s1.h[1] + s2.h[1],\r\n        s1.h[2] + s2.h[2], s1.h[3] + s2.h[3] } };\r\n}\r\n\r\nconstexpr md5sum md5transform(const md5sum& sum,\r\n                              const schedule& s)\r\n{\r\n  return sumadd(sum,\r\n                round4(\r\n                    round3(\r\n                        round2(\r\n                            round1(sum, s.w, 0),\r\n                            s.w, 0),\r\n                        s.w, 0),\r\n                    s.w, 0));\r\n}\r\n<\/pre>\n<p><strong>The ugly business of padding<\/strong><\/p>\n<p>So far so good. This works, as long as we&#8217;re processing the complete 512-bit blocks contained in the message. Now to consider how to finish off. The padding scheme for MD5 is as follows:<\/p>\n<ul>\n<li>Append a 1-bit (this is always done, even if the message is a multiple of 512 bits)<\/li>\n<li>Add as many 0-bits as you need to, to make up 448 bits (56 bytes)<\/li>\n<li>Append a 64-bit value of the original length <em>in bits<\/em> to the 448 bits to make a final 512-bit block<\/li>\n<\/ul>\n<p>This gets messy in C++11 <code>constexpr<\/code>-land, but suffice to say that I wrote a <code>leftover<\/code> function analogous to <code>init<\/code> that could deal with padding. Now, finally, the complete MD5 calculation, which has three conditions:<\/p>\n<ol>\n<li>As long as there is a 64-byte (512-bit) block to work on, recurse on that.<\/li>\n<li>If the leftover is 56 bytes or more, pad it <em>without<\/em> the length in there and recurse on an &#8220;empty block&#8221;.<\/li>\n<li>Otherwise, pad with padding and length, transform and we&#8217;re done!<\/li>\n<\/ol>\n<pre lang=\"cpp\">\r\nconstexpr md5sum md5update(const md5sum& sum, const char* msg,\r\n                            int len, int origlen)\r\n{\r\n  return\r\n    len >= 64 ?\r\n    md5update(md5transform(sum, init(msg)), msg+64, len-64, origlen) :\r\n    len >= 56 ?\r\n    md5update(md5transform(sum,\r\n                           leftover(msg, len, origlen, 64)),\r\n              msg+len, -1, origlen) :\r\n    md5transform(sum, leftover(msg, len, origlen, 56));\r\n}\r\n<\/pre>\n<p><strong>Woot!<\/strong><\/p>\n<p>I don&#8217;t mind admitting that when I finally got this working, I did a happy dance around my apartment. Even though the code has ugly parts. But of course the thrill of achievement soon gives way to the thirst for more, and MD5 isn&#8217;t exactly today&#8217;s choice for hash algorithms, even if it is still often used where cryptographic strength isn&#8217;t paramount. So I started thinking about SHA256.<\/p>\n<p>Of course, cryptography proceeds largely by incrementally twiddling algorithms when they are found lacking: adding rounds, beefing up functions, etc. And so the NSA&#8217;s SHA series proceeded from Ron Rivest&#8217;s MD series. In particular, the block size and padding schemes are identical. That was a huge leg-up on SHA256, since I&#8217;d already solved the hardest part and I could reuse it.<\/p>\n<p>SHA256 produces a larger digest, and obviously has different magic numbers that go into the rounds, but structurally it&#8217;s very similar to MD5. Again I used <a href=\"https:\/\/en.wikipedia.org\/wiki\/SHA-2#Pseudocode\">Wikipedia&#8217;s pseudocode<\/a> as a reference. The only real thing that I needed to do over MD5, besides change up some trivial maths, was the schedule extend step. SHA256 copies the 16 words of the message to a 64-word block to work on, and extends the 16 words into the remaining 48 words. Computing a single value in the schedule, <code>w[i]<\/code>, is easily written in a recursive style, where <code>n<\/code> represents the &#8220;real&#8221; values we already have, initially 16:<\/p>\n<pre lang=\"cpp\">\r\nconstexpr uint32_t extendvalue(const uint32_t* w, int i, int n)\r\n{\r\n  return i < n ? w[i] :\r\n    extendvalue(w, i-16, n) + extendvalue(w, i-7, n)\r\n    + s0(extendvalue(w, i-15, n)) + s1(extendvalue(w, i-2, n));\r\n}\r\n<\/pre>\n<p>And to extend all the values, we can then do:<\/p>\n<pre lang=\"cpp\">\r\nconstexpr schedule sha256extend(const schedule& s)\r\n{\r\n  return { { s.w[0], s.w[1], s.w[2], s.w[3],\r\n        s.w[4], s.w[5], s.w[6], s.w[7],\r\n        s.w[8], s.w[9], s.w[10], s.w[11],\r\n        s.w[12], s.w[13], s.w[14], s.w[15],\r\n        extendvalue(s.w, 16, 16), extendvalue(s.w, 17, 16),\r\n        \/\/\r\n        \/\/ and so on...\r\n        \/\/\r\n        extendvalue(s.w, 62, 16), extendvalue(s.w, 63, 16) } };\r\n}\r\n<\/pre>\n<p>When I did this initially, I ran into the compiler's max step limit for a <code>constexpr<\/code> computation. Different from the max recursion limit, the max step limit is a guideline for how many expressions should be evaluated within a single constant expression. Appendix B suggests 2<sup>20<\/sup>. If a compiler did memoization on <code>extendvalue<\/code> I suspect this would be fine, but evidently clang doesn't, so to compromise, I split the function into 3: <code>sha256extend16<\/code>, <code>sha256extend32<\/code> and <code>sha256extend48<\/code>, each of which extends by 16 words at a time. And that worked.<\/p>\n<p>After extending the schedule and changing some of the maths functions, the rest was easy - practically the same as for MD5. Now I was done with string hashing.<\/p>\n<p>For the next experiments with compile-time computation, I wanted to <a href=\"https:\/\/www.elbeno.com\/blog\/?p=1284\">understand a strange thing<\/a> I'd seen online...<\/p>\n","protected":false},"excerpt":{"rendered":"<p>(Start at the beginning of the series &#8211; and all the source can be found in my github repo) So FNV1 was easy, and Murmur3 wasn&#8217;t too much harder; for a challenge and to see how far I could go, I decided to try to compute an MD5 string hash&#8230;<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[22],"tags":[],"class_list":["post-1278","post","type-post","status-publish","format-standard","hentry","category-cpp"],"_links":{"self":[{"href":"https:\/\/www.elbeno.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/1278","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.elbeno.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.elbeno.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.elbeno.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.elbeno.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1278"}],"version-history":[{"count":7,"href":"https:\/\/www.elbeno.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/1278\/revisions"}],"predecessor-version":[{"id":1300,"href":"https:\/\/www.elbeno.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/1278\/revisions\/1300"}],"wp:attachment":[{"href":"https:\/\/www.elbeno.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1278"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.elbeno.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1278"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.elbeno.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1278"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}