Lameness Explained

OK, more than one person wanted explanations of The C++ <random> Lame List, so here are some of my thoughts, if only to save people searching elsewhere.

  1. Calling rand() is lame because it’s an LCG with horrible randomness properties, and we can do better. And if you’re not calling rand(), there’s no reason to call srand().
  2. Using time(NULL) to seed your RNG is lame because it doesn’t have enough entropy. It’s only at a second resolution, so in particular, starting multiple processes (e.g. a bringing up bunch of servers) at the same time is likely to seed them all the same.
  3. No, rand() isn’t good enough even for simple uses, and it’s easy to do the right thing these days. The lower order bits of rand()‘s output are particularly non-random, and odds are that if you’re using rand() you’re also using % to get a number in the right range. See item 6.
  4. In C++14 random_shuffle() is deprecated, and it’s removed in C++17, which ought to be reason enough. If you need more reason, one version of it is inconvenient to use properly (it uses a Fisher-Yates/Knuth shuffle so takes an RNG that has to return random results in a shifting range) and the other version of it can use rand() under the hood. See item 1.
  5. default_random_engine is implementation-defined, but in practice it’s going to be one of the standard generators, so why not just be explicit and cross-platform-safe (hint: item 10)?. Microsoft’s default is good, but libc++ and libstdc++ both use LCGs as their default at the moment. So not much better than rand().
  6. It is overwhelmingly likely that whatever RNG you use, it will output something in a power-of-two range. Using % to get this into the right range probably introduces bias. Re item 3, consider a canonical simple use: rolling a d6. No power of two is divisible by 6, so inevitably, % will bias the result. Use a distribution instead. STL (and others) have poured a lot of time into making sure they aren’t biased.
  7. random_device is standard, easy to use, and should be high quality randomness. It may not be very well-performing, which is why you probably want to use it for seeding only. But you do want to use it (mod item 8).
  8. Just know your platform. It might be fine in desktop-land, but random_device isn’t always great. It’s supposed to be nondeterministic and hardware based if that’s available… trust but verify, as they say.
  9. Not handling exceptions is lame. And will bite you. I know this from experience with random_device specifically.
  10. The Mersenne twisters are simply the best randomness currently available in the standard.
  11. Putting mt19937 on the stack: a) it’s large (~2.5k) and b) you’re going to be initializing it each time through. So probably not the best. See item 17 for an alternative.
  12. You’re just throwing away entropy if you don’t seed the generator’s entire state. (This is very common, though.)
  13. Simply, uniform_int_distribution works on a closed interval (as it must – otherwise it couldn’t produce the maximum representable value for the given integral type). If you forget this, it’s a bug in your code – and maybe one that takes a while to get noticed. Not good.
  14. Forgetting ref() around your generator means you’re copying the state, which means you’re probably not advancing the generator like you thought you were.
  15. seed_seq is designed to seed RNGs, it’s that simple. It tries to protect against poor-quality data from random_device or whatever variable-quality source of entropy you have.
  16. Not considering thread safety is always lame. Threads have been a thing for quite a while now.
  17. thread_local is an easy way to get “free” thread safety for your generators.
  18. You should be using a Mersenne twister (item 10) so just use the right thing for max(). Job done.

If you want more, see rand() Considered Harmful (a talk by Stephan T Lavavej), or The bell has tolled for rand() (from the Explicit C++ blog), or see Melissa O’Neill’s Reddit thread, her talk on PCG, and the associated website.

And of course,

Leave a Reply