Humble Pi

Page 22

by Matt Parker

Scott runs a website where people can play games over email, which means he requires about twenty thousand dice rolls per day. People who play board games take their dice rolls seriously, so he went to all the effort in 2009 to build a machine capable of physically rolling enough dice. He was sure to engineer the Dice-O-Matic so it was future-proofed with plenty of spare capacity, hence the maximum output of 1.3 million rolls per day. Scott currently has about a million unused dice rolls saved on his server and the Dice-O-Matic fires up for an hour or two each day to top up the randomness reservoir, filling his house in Texas with the thundering sound of hundreds of dice rolling at once.

While it has the authentic charm of rolling actual dice, the Dice-O-Matic is clearly not the most efficient computer peripheral ever built. When the UK government announced its issue of premium bonds in 1956 it suddenly had the need to produce random numbers on an industrial scale. Unlike normal government bonds, which pay out a fixed amount of interest, the ‘interest’ on premium bonds is grouped into prizes and handed out to bond holders randomly.

And so ERNIE (Electronic Random Number Indicator Equipment) was built and powered up in 1957. It was designed by Tommy Flowers and Harry Fensom, who we now know had been involved in building the first computers to break Nazi codes during the Second World War (this was still classified information at the time). I have visited ERNIE, long decommissioned, at the Science Museum in London.fn1 Taller than me, and wider than me (by several metres, in fact), it looks exactly like you would expect a beige series of 1950s computer cabinets to look like. But I knew that in there somewhere was ERNIE’s heart of randomness: a series of neon tubes.

ERNIE pictured here with a puny human.

Neon tubes are normally used for illumination, but ERNIE used them to generate random numbers. The path of any given electron through the neon gas it is lighting up is chaotic, so the resulting current is largely random. Turning on a neon tube is like rolling quadrillions of nanodice at once. Which means that, even if the electrons are fed into a neon lamp at a very steady rate, they will all bounce around differently and come out at slightly different times. ERNIE took the current coming out of neon lamps and extracted the random noise to use as the basis for random numbers.

Over half a century later, premium bonds are still sold in the UK, with the prizes drawn once a month. The Electronic Random Number Indicator Equipment is now in its fourth iteration and ERNIE 4 uses thermal noise from transistors to generate random numbers: electrons are forced through resistors and the change in voltage and heat produced are used as the random noise.

If the Press Your Luck designers had really wanted an unbreakable system, they would have needed some kind of physical random system to connect to their Big Board. The board was already illuminated with an ostentatious number of lights; if a few of them had been neon lamps, they would have been good to go. A conveyor belt of dice in the next room would have also worked, if it could move fast enough. And for the ultimate in unpredictability, there are random quantum systems available off the shelf.

It does feel like overkill, but you can currently buy your own quantum random system for about €1,000. This contains an LED to emit photons into a beam splitter where quantum interactions determine which way that photon should go. Where the photon comes out determines the next bit of the random number. Plug it into your computer via USB and the base model will immediately start serving up 4 million random 1s and 0s every second. (Greater rates of randomness are available at higher price points.)

If you’re being random on a budget, the Australian National University have you covered. They have set up their own quantum random-number generator by listening to the sound of nothing. Even in a vacuum of nothingness there is something going on. Thanks to the quirks of quantum mechanics, it is possible for a particle and its antiparticle pair to spontaneously appear from literally nowhere and then annihilate each other faster than the universe can notice they shouldn’t be there. This means empty space is actually a writhing foam of particles popping in and out of reality.

In the Department of Quantum Sciences at ANU they have a detector listening to a vacuum and it converts the quantum foam into random numbers and then streams them live at https://qrng.anu.edu.au around the clock. For tech people, they have a great range of secure delivery systems (never use the built-in random.random() function in Python again!). And if you like your background noise binary, they have an audio version so you can listen in to the sounds of random.

Random by rote

Let’s say you’re really trying to cut back on your number budget. The bargain basement of randomness is a ‘pseudo-random’ number. Like a kind of off-brand version of the real thing, pseudorandom numbers look and taste a lot like the original but are made to much lower standards.

Pseudorandom numbers are what your computer, phone or anything without its own random number drive serves up when you ask for a random number. Most phones will have a built-in calculator and, if you tip it sideways, you should get the full range of scientific calculator options. I’ve just hit the ‘Rand’ random button on mine and 0.576450227330181 has popped up on the screen.fn2 After a second mash it now reads 0.063316529365828. Each time I get a new random number between zero and one, ready to be scaled up to whatever my random needs may be.

Having a pocket random-number generator is incredibly handy if you want to randomize all sorts of decisions in your life. When I go out for a drink with my brother we use a random number to decide who pays the bill (even lead digit: I pay; odd, he does). If you want to add some digits to the end of a password, now you can be less predictable. Need to give someone a believable fake phone number? The Rand button is your new friend.

But sadly, those numbers are not truly random. Like the Big Board, they are following a predetermined sequence of values. Except, instead of memorizing a list in advance, they generate it on the fly. Pseudorandom number generators use mathematical equations to generate numbers which have all the hallmarks of being random but are just pretending to be.

To make your own pseudorandom numbers, start with an arbitrary number with four digits. I’m going to use the year I was born: 1980. We now need a way to turn this into a new, seemingly unrelated four-digit number. If we cube it, we end up with 7,762,392,000, and I’m going to ignore the first digit and take the second through fifth positions: 7623. Repeating the process of cubing and removing digits gives us 4297, 9340, 1478, and so on.

This is a sequence of pseudorandom numbers. They are being procedurally generated with no uncertainty. The digits 9340 will always follow 4297, but not in any obvious way. My sequence is not great because there are only so many four-digit numbers and, eventually, we’ll hit the same one twice and the numbers will start to repeat in a loop. In this case, the 150th term is the same as the third term: 4297 again. That is then followed by 9340, and the same run of 147 numbers will repeat for ever. Real pseudorandom sequences use far more complicated calculations so the numbers don’t loop as quickly and to help to obfuscate the process.

I used 1980 as the first number to ‘seed’ my sequence, but I could have picked a different seed and got a different sequence. Industrial-grade pseudorandom algorithms will spit out completely different numbers for slight changes in their seeds. Even if you’re using a known pseudorandom generator, if you choose a ‘random’ seed, the numbers it spits out will be unpredictable. But the best pseudorandom number generator is of no use if you get lazy when seeding it. Ever since the early internet, web traffic has been kept safe by encrypting it with random numbers. But when one browser picked random numbers for use in Secure Sockets Layer (SSL) encryption, the seed could be easily guessed by other people who might want to listen in.

The World Wide Web burst into public consciousness around 1995 and, for me, nothing is more nineties than the Netscape Navigator web browser. Forget being saved by bells or having sex in the cities: for me, the nineties was a comet swirling around a capital N while I waited for a website to load. That was b
ack when everything was ‘cyber’ and people could use the phrase ‘information superhighway’ with a straight face.

When searching around for a seed to generate random numbers, Netscape would use a combination of the current time and its process identifiers. On most operating systems, whenever a program is running it is given a process ID number so your computer can keep track of it. Netscape would use the process ID of the current session as well as the process ID of the parent program which opened Netscape, combined with the current time (seconds and microseconds) to seed its pseudorandom-number generator.

But those numbers are not hard to guess. I’m currently using Chrome as my web browser and the window I last looked at has a process identifier of 4122. It was opened by a different Chrome window when I hit ‘new window’ and that parent window has a process identifier of 298. As you can see, these are not big numbers! If a malicious agent knew the rough time I opened that window before doing something in need of encryption (like logging in to my online bank), they could work out a list of all the possible combinations of times and process identifiers. It would be a long list to look at for a human, but not much for a computer to crunch through and check all the options.

In 1995 Ian Goldberg and David Wagner (then computer science PhD students at the University of California, Berkeley) showed that a clever malicious agent could produce a list of possible random seeds small enough that a computer could check them all in a matter of minutes, rendering the encryption useless.fn3 Netscape had previously turned down offers of help from the security community but, after the work of Goldberg and Wagner, they patched the problem and released their solution to be independently scrutinized by anyone who wanted to go through it with a fine code comb.

Modern browsers get their random seeds from the computer they are running on by mashing together over a hundred different numbers: as well as the time and process identifiers, they also use things like the current state of free space on the hard drive and the time between when the user hits keys or moves the mouse. Because using an amazing pseudorandom-sequence generator with an easy-to-guess seed is like buying an expensive lock and then using it as a doorstop. Or indeed buying an expensive lock and leaving the screws visible and unscrewable.

Random numbers fall mainly in the planes

Algorithms to generate pseudorandom numbers are constantly evolving and adapting. They need to balance their apparent randomness with being efficient, easy to use and secure. Because random numbers are vital to digital security, some of these algorithms are kept under lock and key. Microsoft has never released how Excel generates its pseudorandom numbers (nor are users allowed to choose their own seeds). Thankfully, enough of these algorithms are in the public domain that we can take a critical look at them.

One of the first standard methods for generating pseudorandom numbers was to multiply each number in your sequence by a large multiplier K then divide the answer by a different number M and keep the remainder as your next pseudorandom term. This was used by almost all early computers, until George Marsaglia, a mathematician at Boeing Scientific Research Laboratories, spotted a fatal flaw in 1968. If you took the sequence of random numbers coming out and plotted them as coordinates on a graph, they would line up. Admittedly, this could require complicated graphs with upwards of ten dimensions.

Marsaglia’s research was looking at these multiply-then-divide generators in general but, for a sloppy choice of K and M values, the situation could get much worse. And IBM nailed it when it came to a bad choice of K and M. The RANDU function used by IBM machines got each new pseudorandom number by multiplying by K = 65,539 and then dividing by M = 2,147,483,648, which are almost impressively bad. The K value is only three more than a power of two (specifically, 65,539 = 216 + 3) and, combined with a modulus which was also a power of two (2,147,483,648 = 231), all the supposedly random data ended up being disturbingly well organized.

While Marsaglia’s work had to use alignments in abstract mathematical spaces, the IBM random number could be plotted as 3D points which fall into just fifteen neat planes. That’s about as random as a spork.

Getting quality pseudorandom numbers continues to be a problem. In 2016 the Chrome browser had to fix its pseudorandom-number generator. Modern browsers are now pretty good at producing seeds for their pseudorandom numbers but, unbelievably, the generators themselves can still have problems. Chrome was using an algorithm called MWC1616, which was based on a combination of multiplication with carry (the MWC from the name) and concatenation to generate pseudorandom numbers. But it accidentally repeated itself, over and over. What a bore.

That’s not what you want random data to look like.

Some programmers had released a Chrome extension people could download and use. To anonymously keep track of everyone who had installed it, upon installation it would generate a random number as an arbitrary user ID and send that back to the company’s database. They had a graph in the office showing a nice increase in installations of their extension until, one day, the number of new installs dropped to zero. Had the whole world suddenly decided to stop using their extension? Or was there some fatal flaw in their code which had caused it to stop working?

No. Their extension was working fine and people were still installing it. But it used the JavaScript programming language and called the built-in function Math.random() to get a new user ID number for each new install. This worked fine for the first few million cases but, from then on, it was only returning numbers which had already been used. This meant that all new users looked like duplicates of those already in the database.

These user ID numbers were 256-bit values with possibilities measured in the quattuorvigintillions (around 1077). There is no way they should repeat so quickly. Something had gone wrong, and the MWC1616 algorithm was to blame. The pseudorandom numbers had looped. This was not the only case of Chrome users having random problems and, thankfully, the developers behind the browser’s JavaScript engine set about fixing the problem. As of 2016 Chrome has switched to an algorithm called xorshift128+ which provides pseudorandom numbers via a crazy amount of raising numbers to the power of other numbers.

So, for now, the world of pseudorandom numbers is calm and browsers are serving them up with no problems. But that does not mean it is the end of the story. One day xorshift128+ will be usurped. Anything involving computing power is a constant arms race, as more powerful computers can disentangle ever bigger numbers. It is only a matter of time before our current pseudorandom-number-generator algorithms are no longer fit for purpose. Hopefully, by then, a new generation of computer scientists will have given us something even better. We need more randomness in our lives.

Randomly wrong

When I was a high-school maths teacher one of my favourite pieces of homework to set was to ask students to spend their evening flipping a coin one hundred times and recording the results. They would each come back to class with a long list of heads and tails. I could then take those lists and, by the end of the lesson, I had split them into two piles: those who actually did the homework as requested and flipped a physical coin, and those who could not be bothered and just wrote out a long list of heads and tails off the top of their head.

Most cheating students would remember to have roughly as many heads as tails, as would be expected from a real, random coin, but they forgot about longer runs. Heads and tails on a fair coin are both equally likely and subsequent flips are independent, so this random data should be nice and uniform. This means it does not just have every possible event happening equally as often, but any combinations of events. Eight coin flips in a row will produce HTHHTHHH about as often as HTHTHTHH.

In the case of my students, they forgot that HHHHHH is as likely as any other run of six flips. And in a hundred random coin results you expect at least a run of six the same, if not more than ten in a row. Writing something like TTTTTTTTTT when you are faking random data feels wrong, but it’s what we should expect. Just like teenagers trying to cheat on their bori
ng homework is what we should expect.

Not that adults are very different. As the saying goes, there are only three things certain in life: death, taxes and people trying to cheat on their taxes. Fudging a tax return can require making up some random numbers to look like real financial transactions. And instead of a teacher checking homework, there are ‘forensic accountants’ going through tax returns to look for the tell-tale signs of fake data.

If financial fraud is not done randomly enough, it is easy to spot. There is a standard financial data check which involves looking at the first few digits of all available transactions and seeing if any are more or less frequent than expected. Deviations from the expected frequency do not necessarily mean something nefarious is going on, but where there are too many transactions to all be checked manually, the unusual ones are a good starting point. Investigators at a bank in the US analysed the first two digits of all credit-card balances where the bank had written off the debt as unrecoverable and there was a huge spike at 49. This was traced to a single employee, who was giving credit cards to friends and family who would then run them up to between $4,900 and $4,999 owed. The max the employee could write off without authorization was $5,000.

Even auditors themselves are not immune. A large auditing company ran a check of the first two digits on all expense claims put in by their employees. This time there were way more claims starting with 48 than there should have been. Again, a single employee was responsible. The auditors who worked for the firm could claim expenses when they were working offsite but one person was consistently claiming their breakfast on the way to the office as well. And they always bought the same coffee and muffin, which cost $4.82.

In these cases, if the people had been more random and less greedy, they could have camouflaged themselves in the rest of the data and not drawn attention to their transactions. But they would need to be the right type of random. Not all random data matches a uniform distribution like you would expect from flipping a fair coin. A dice with five numbers all the same and only one face different is still random, but the results will not be uniform. If you pick days at random you will not get equal numbers of weekdays and weekends. And if you shout, ‘Hey, Tom, it’s been ages, how are you?’ at strangers in the street, you will not get uniform responses (but when you accidentally get it right, it’s worth it).

‹ Prev Next ›