Shannon summarized his war work on secret communications in a 114-page opus, “A Mathematical Theory of Cryptography,” which he finished in 1945. The paper was immediately deemed classified and too sensitive for publication, but those who read it found a long treatise exploring the histories and methodologies of various secrecy systems. Moreover, he had offered a persuasive analysis of which methods might be unbreakable (what he called “ideal”) and which cryptographic systems might be most practical if an unbreakable system were deemed too complex or unwieldy. His mathematical proofs presented the few people cleared to read it with a number of useful insights and an essential observation that language, especially the English language, was filled with redundancy and predictability. Indeed, he later calculated that English was about 75 to 80 percent redundant. This had ramifications for cryptography: The less redundancy you have in a message, the harder it is to crack its code. And this also, by extension, had implications for how you might send a message more efficiently. Shannon would often demonstrate that the sender of a message could strafe its vowels and still make it intelligible. To illustrate Shannon’s point, David Kahn, a historian of cryptography who wrote extensively on Shannon, used the following example:
F C T S S T R N G R T H N F C T N
To transmit the message fact is stranger than fiction one could send fewer letters. You could, in other words, compress it without subtracting any of its content. Shannon suggested, moreover, that it wasn’t only individual letters or symbols that were sometimes redundant. Sometimes you could take entire words out of a sentence without altering its meaning.
The completion of the cryptography paper coincided with the end of the war. But Shannon’s personal project—the one he had been laboring on at home in the evenings—was largely worked out a year or two before that. Its subject was the general nature of communications. “There is this close connection,” Shannon later said of the link between sending an encoded message and an uncoded one. “They are very similar things, in one case trying to conceal information, and in the other case trying to transmit it.” In the secrecy paper, he referred briefly to something he called “information theory.” This was a bit of a coded message in itself, for he offered no indication of what this theory might say.
ALL WRITTEN AND SPOKEN EXCHANGES, to some degree, depend on code—the symbolic letters on the page, or the sounds of consonants and vowels that are transmitted (encoded) by our voices and received (decoded) by our ears and minds. With each passing decade, modern technology has tended to push everyday written and spoken exchanges ever deeper into the realm of ciphers, symbols, and electronically enhanced puzzles of representation. Spoken language has yielded to written language, printed on a press; written language, in time, has yielded to transmitted language, sent over the air by radio waves or through a metal cable strung on poles. First came telegraph messages—which contained dots and dashes (or what might have just as well been the 1s and 0s of Boolean algebra) that were translated back into English upon reception. Then came phone calls, which were transformed during transmission—changing voices into electrical waves that represented sound pressure and then interleaving those waves in a cable or microwave transmission. At the receiving end, the interleaved messages were pulled apart—decoded, in a sense—by quartz filters and then relayed to the proper recipients.
In the mid-1940s Bell Labs began thinking about how to implement a new and more efficient method for carrying phone calls. PCM—short for pulse code modulation—was a theory that was not invented at the Labs but was perfected there, in part with Shannon’s help and that of his good friend Barney Oliver, an extraordinarily able Bell Labs engineer who would later go on to run the research labs at Hewlett-Packard. Oliver would eventually become one of the driving forces behind the invention of the personal calculator. Shannon and Oliver had become familiar with PCM during World War II, when Labs engineers helped create secret communication channels between the United States and Britain by using the technology. Phone signals moved via electrical waves. But PCM took these waves (or “waveforms,” as Bell engineers called them) and “sampled” them at various points as they moved up and down. The samples—8,000 per second—could then be translated into on/off pulses, or the equivalent of 1s and 0s. With PCM, instead of sending waves along phone channels, one could send information that described the numerical coordinates of the waves. In effect what was being sent was a code. Sophisticated machines at a receiving station could then translate these pulses describing the numerical coordinates back into electrical waves, which would in turn (at a telephone) become voices again without any significant loss of fidelity. The reasons for PCM, if not its methods, were straightforward. It was believed that transmission quality could be better preserved, especially over long distances that required sending signals through many repeater stations, by using a digital code rather than an analog wave. Indeed, PCM suggested that telephone engineers could create a potentially indestructible format that could be periodically (and perfectly) regenerated as it moved over vast distances.
Shannon wasn’t interested in helping with the complex implementation of PCM—that was a job for the development engineers at Bell Labs, and would end up taking them more than a decade. “I am very seldom interested in applications,” he later said. “I am more interested in the elegance of a problem. Is it a good problem, an interesting problem?” For him, PCM was a catalyst for a more general theory about how messages move—or in the future could move—from one place to another. What he’d been working on at home during the early 1940s had become a long, elegant manuscript by 1947, and one day soon after the press conference in lower Manhattan unveiling the invention of the transistor, in July 1948, the first part of Shannon’s manuscript was published as a paper in the Bell System Technical Journal; a second installment appeared in the Journal that October. “A Mathematical Theory of Communication”— “the magna carta of the information age,” as Scientific American later called it—wasn’t about one particular thing, but rather about general rules and unifying ideas. “He was always searching for deep and fundamental relations,” Shannon’s colleague Brock McMillan explains. And here he had found them. One of his paper’s underlying tenets, Shannon would later say, “is that information can be treated very much like a physical quantity, such as mass or energy.” To consider it on a more practical level, however, one might say that Shannon had laid out the essential answers to a question that had bedeviled Bell engineers from the beginning: How rapidly, and how accurately, can you send messages from one place to another?
“The fundamental problem of communication,” Shannon’s paper explained, “is that of reproducing at one point either exactly or approximately a message selected at another point.” Perhaps that seemed obvious, but Shannon went on to show why it was profound. If “universal connectivity” remained the goal at Bell Labs—if indeed the telecommunications systems of the future, as Kelly saw it, would be “more like the biological systems of man’s brain and nervous system”—then the realization of those dreams didn’t only depend on the hardware of new technologies, such as the transistor. A mathematical guide for the system’s engineers, a blueprint for how to move data around with optimal efficiency, which was what Shannon offered, would be crucial, too. Shannon maintained that all communications systems could be thought of in the same way, regardless of whether they involved a lunchroom conversation, a postmarked letter, a phone call, or a radio or telephone transmission. Messages all followed the same fairly simple pattern:
All messages, as they traveled from the information source to the destination, faced the problem of noise. This could be the background clatter of a cafeteria, or it could be static (on the radio) or snow (on television). Noise interfered with the accurate delivery of the message. And every channel that carried a message was, to some extent, a noisy channel.
To a non-engineer, Shannon’s drawing seemed sensible but didn’t necessarily explain anything. His larger point, however, as he proved in his mathematical proofs, was that there were ways to make sure messages got where they were supposed to, clearly and reliably so. The first place to start, Shannon suggested, was to think about the information within a message. The semantic aspects of communication were irrelevant to the engineering problem, he wrote. Or to say it another way: One shouldn’t necessarily think of information in terms of meaning. Rather, one might think of it in terms of its ability to resolve uncertainty. Information provided a recipient with something that was not previously known, was not predictable, was not redundant. “We take the essence of information as the irreducible, fundamental underlying uncertainty that is removed by its receipt,” a Bell Labs executive named Bob Lucky explained some years later. If you send a message, you are merely choosing from a range of possible messages. The less the recipient knows about what part of the message comes next, the more information you are sending. Some language choices, Shannon’s research suggested, occur with certain probabilities, and some messages are more frequent than others; this fact could therefore lead to precise calculations of how much information these words or messages contained. (Shannon’s favorite example was to explain that one might need to know that the word “quality” begins with q, for instance, but not that a u follows after. The u gives a recipient no information if they already have the q, since u always follows q; it can be filled in by the recipient.)
Shannon suggested it was most useful to calculate a message’s information content and rate in a term that he suggested engineers call “bits”—a word that had never before appeared in print with this meaning. Shannon had borrowed it from his Bell Labs math colleague John Tukey as an abbreviation of “binary digits.” The bit, Shannon explained, “corresponds to the information produced when a choice is made from two equally likely possibilities. If I toss a coin and tell you that it came down heads, I am giving you one bit of information about this event.” All of this could be summed up in a few points that might seem unsurprising to those living in the twenty-first century but were in fact startling—“a bolt from the blue,” as one of Shannon’s colleagues put it—to those just getting over the Second World War: (1) All communications could be thought of in terms of information; (2) all information could be measured in bits; (3) all the measurable bits of information could be thought of, and indeed should be thought of, digitally. This could mean dots or dashes, heads or tails, or the on/off pulses that comprised PCM. Or it could simply be a string of, say, five or six 1s and 0s, each grouping of numerical bits representing a letter or punctuation mark in the alphabet. For instance, in the American Standard Code for Information Interchange (ASCII), which was worked out several years after Shannon’s theory, the binary representation for FACT IS STRANGER THAN FICTION would be as follows:
010001100100000101000011010101000010000001001001010 1001100100000010100110101010001010010010000010100111 001000111010001010101001000100000010101000100100001 000001010011100010000001000110010010010100001101010 100010010010100111101001110
Thus Shannon was suggesting that all information, at least from the view of someone trying to move it from place to place, was the same, whether it was a phone call or a microwave transmission or a television signal.
This was a philosophical argument, in many respects, and one that would only infiltrate the thinking of the country’s technologists slowly over the next few decades. To the engineers at the Labs, the practical mathematical arguments Shannon was also laying out made a more immediate impression. His calculations showed that the information content of a message could not exceed the capacity of the channel through which you were sending it. Much in the same way a pipe could only carry so many gallons of water per second and no more, a transmission channel could only carry so many bits of information at a certain rate and no more. Anything beyond that would reduce the quality of your transmission. The upshot was that by measuring the information capacity of your channel and by measuring the information content of your message you could know how fast, and how well, you could send your message. Engineers could now try to align the two—capacity and information content. For anyone who actually designed communications systems with wires or cables or microwave transmitters, Shannon had handed them not only an idea, but a new kind of yardstick.
Excerpted from ‘The Idea Factory’ – by Jon Gertner, pages 124-130