Have you ever needed to generate a random number in code? whether it's for rolling a dice, or shuffling a set, this tweet thread is here for you! There's no reason that it should be easy or obvious, very experienced programmers repeat common mistakes. I did, before I learned ...
-
-
And when you choose multiples, you usually don't want duplicates, so what you often need is a shuffle. How do you shuffle? Well you can call sort() with some kind of customer random as-me comparator. DON'T DO THIS.
Show this thread -
Also, DON'T LET YOUR FRIENDS DO THIS. Instead do a Fisher-Yates shuffle. They are super easy ...
Show this thread -
jumble = [ A , B , C, D, E, F ] for (i = 0; i < jumble.length; i++) { int r = rand(i) save = jumble[i] jumble[i] = jumble[r] jumble[r] = save It's O(N) and it works every time. Super easy to remember if you code it a few times.
Show this thread -
O.k. so that's shuffling, like for your playlist or whatever. But what about weighted sets and selection? What if you want to choose elements but also put a thumb on the scale?
Show this thread -
Well you could build a set or an array that just repeats elements in them w times, where w is the weight. Eats a lot of space though! and gets super inefficient if you don't want to have duplicates. The COOLEST answer here is to use Vose's Alias.
Show this thread -
Sadly, this one is too long for a tweet until next year when twitter decides that we need more space to offend one another, so for now, bookmark this page http://www.keithschwarz.com/darts-dice-coins/ … , or print it out and frame it in case the Internet ever dies. It's one of the best bits.
Show this thread -
You can read it for yourself, but it uses a 2d approach to do weighted selection in O(1) time with O(n) space. MAD THAT THIS WORKS. And it reminds me of something else, the last and BONUS randomness item I'll get into ...
Show this thread -
How do we generate numbers that honor a normal distribution?
Show this thread -
A normal distribution is a super common statistical distribution, it describes the distribution of lots of phenomena and the central limit theorem says that basically any distribution is secretly just one step removed from the normal distribution.
Show this thread -
That probably didn't make any sense. Here's a Wikipedia page that also won't make much sense: https://en.wikipedia.org/wiki/Normal_distribution … . It especially doesn't make sense that those totally differently looking lines are supposed to be "the same". That's ok though.
Show this thread -
It's ok because STATISTICS DOESN'T MAKE SENSE. They work really well, but if you think about them for too long and too deeply, you fall into a transcendent state. That's also how we know that statistics are pure science. Anyway, back to the topic ...
Show this thread -
A cool, though not common any more, way to generate normally distributed numbers is called the Box-Muller transform, https://en.wikipedia.org/wiki/Box–Muller_transform …, and it combines the "Just throw the bad crap away" (aka rejection sampling) and two-dimensional approaches we've seen already.
Show this thread -
With Box-Muller, we choose two random values, between -2^^31 and 2^^31 say, we plot them as x and y on a two dimensional plane. If (x,y) lies within the circle of radius size 2^^31, we keep the point, otherwise we go again. r is the distance from the origin to (x, y) squared.
Show this thread -
Here's a picture from Wikipedia, but basically we're throwing darts at a square and if they land in a circle we're good. It's amazing how much low-level stuff is dumb-as-rocks.pic.twitter.com/pQ97dvdD8x
Show this thread -
O.k. some parting thoughts before ending this thread. First, if you need to generate random numbers in constant time, or exotic distributions, get super deep into this stuff. There are seriously rough weeds to tackle.
Show this thread -
Second: if you find yourself building a whole RNG, it's really very hard, again, get deep in the weeds and learn about DRBGs, fork-safety, thread-safety, /dev/urandom, getrandom() and so on. Avoid if you can!
Show this thread -
Third: always use a secure RNG, your language or programming environment should have one. *Don't* ever seed an RNG yourself. One exception: for fuzz inputs and other tests, where you may want repeat deterministically for debugging. But DON'T LET IT LEAK INTO PRODUCTION.
Show this thread -
Another exception is games, where you may want to generate content and play based on a small seed value, BUT UNDERSTAND THAT THIS IS NOT SECURE.
Show this thread -
Last tip: always measure your little random functions with a histogram or whatever. I still code these wrong and have to check. Thanks for reading!
Show this thread
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.