Though the subtitle pinpoints the topic a bit more precisely than just “information”, I began reading this not completely sure of where it would take me, although I hoped it would provide a discussion of the following three points:
1. information theory as a scientific field pioneered by people like Alan Turing and Claude Shannon in the mid-20th century
2. the idea that the universe itself is built on information, known popularly as “it from bit”
3. the exponential increase in our rate of information exchange in the last- take your pick – 20, 50, 150 years
TI does eventually get to all of these, though after a laborious first 200 pages which expend a lot of ink on early English dictionaries, early telegraph systems, and Charles Babbage. What I induced from this, although the themes aren’t stated explicitly, is the murky nature of technological progress which retrospective histories tend to minimize, instead neatly claiming that invention X was made by person P in year Y to solve Z. Through a long discussion of long-forgotten long distance communication attempts and various telegraph encoding schemes, one sort of gets this picture. And we also come to realize how the absence of technology we take for granted must have affected our ancestors’ mental paradigms. For example, when asked to name some possible advantages of the newly invented telegraph, Babbage hesitated, then speculated that it could provide an early warning system for storms. Before the telegraph, there was no long distance communication that could travel faster than a horse, so people’s understanding of weather must have been much different from our own. Any theories that weather could be forecast based on events elsewhere, if spawned at all, would have been practically useless. Even Newton struggled to find the right words for physical concepts he was trying to describe, such as force. The right term simply wasn’t in the language yet, and thus not in society’s consciousness.
But I feel the first half of the book has more to do with the nature of technological change, itself a complex topic, than with real information theory. The first chapter, on the African drum language, was to me the most interesting. It turns out that African villages had a system of long-distance communication long before the telegraph, though it took Europeans a long time to figure it out, since the drums only carried two tones combined with pauses. Nevertheless, they relayed complex messages. The key is that African languages are tonal, so words with the same letters mean very different things depending on the pitch spoken. The drums would simply bang out the tones of a phrase. However, each combination of tones is associated with dozens of words, so the drum messages would be laden with redundant words and phrases, leading the listener to eliminate combinations that don’t make sense. This obviously takes a lot of practice, so it helps immensely if one is born into that language. But come to think of it, we probably have a nontrivial amount of redundancy in our language too; by adding a few words or saying the same thing in multiple ways, we are insuring against having our core message misunderstood.
The best material in TI is found somewhere between pages 200 and 350. Here we delve into Claude Shannon’s scientific definitions of information theory and follow with application to genetics, highlighted by the always fascinating perspectives of Richard Dawkins. Shannon defined information as the minimum number of bits required to transmit a message. Thus, if many messages are possible, the chosen message will contain more information. If there are only two possible weather forecasts (hot or cold), one bit will suffice. If instead there are 2^10 possible forecasts (including variables such as precipitation, temperature, wind, etc.), selecting one will require 10 bits – much more information. Switching to another example, if the set of possible messages included anything that could be written in English, encoding one letter would seemingly require 5 bits (2^4 = 16, 2^5 = 32, 26 possible letters). However, some letters occur more frequently than others, and combinations of letters and words occur more frequently than others, so Shannon estimated that the average information content of each letter was actually closer to one bit. To illustrate this, he selected a random passage from a book and asked his wife to predict each letter as he uncovered them with his thumb – this becomes easier as larger portions of words and phrases are revealed, and the more predictable the next letter is, the less information it contains.
Shannon also equated his concept of information to the thermodynamic concept of entropy (i.e. a measure of the number of states a system can have, also known as the degree of disorder). Since high entropy systems have more possible states, it takes more information to describe them. However, this link between entropy and information is mathematical; it can get confusing if we try to attach semantic meaning to it. For example, suppose we have a pool of water divided into hot and cold partitions. If we remove the divider, there are suddenly many more possible arrangements of water molecules, thus higher entropy, and it takes more information to describe the particular state the system is in at a molecular level. At a macro level, the molecules will probabilistically mix and converge to an equilibrium temperature. To many non-physicists, this would imply a loss of useful information, and thus some of Shannon’s contemporaries negated his formulas and managed to define information as negative entropy. On top of that, it can also be confusing to describe high entropy as “disorder”; since entropy predicts a relatively homogeneous equilibrium, some would call such a state more “orderly” than before. Like Newton’s time, we struggle with these concepts in part because they have not yet been reconciled with common language.
Maybe we could try to think about information in a more abstract sense instead. The universe, in a very general sense, can be thought of as a giant medium of energy exchange, with matter being a special form of energy. We describe the sun, stars, other physical objects, radiation, etc. by what they do with energy. Likewise, all life forms on earth ultimately can be described by how they convert energy. But it might make more sense to think about information exchange instead of energy exchange. That light beam from a star 10,000 light years away is encoding information about a particular area of spacetime. A photograph can be described in a physical sense by the atoms and energy it contains, but it also is information, preserving a particular pattern of information from another point in spacetime. A person, by his very existence, is encoding information in many forms, not the least of which is his genetic code. We tend to think of information as being consciously communicated among people, but a much more general type of information exists in ourselves long before we try to attach meaning to it.
That physical processes can be described in terms of information exchange is I don’t think too much of a philosophical stretch. Some have, however, argued one step further by suggesting that the universe is actually built on information (mathematics or bits) in a physical sense. There’s an old snippet of advice for college freshmen that what you used to know as psychology is really biology, but biology is really chemistry, and chemistry is really physics, and physics is really math. The fundamental laws of the universe govern the behavior of matter and energy, which we call physics. However, because of these laws, matter exhibits very regular forms and properties, and bonds together repeatedly. We find it more useful to describe these phenomena not by the fundamental laws themselves, but by shortcut formulas and descriptives we call chemistry. Likewise, we are familiar with certain stable and self-replicating sets of molecules whose chemical processes we describe as biology.
Going the other way, one could postulate that the fundamental laws are all that really “exist”, and are really a specific mathematical pattern that we experience as matter, energy, and spacetime. Everything else – physics, chemistry, biology – is a substructure of the pattern, which we find useful to label as ‘atoms’, ‘electromagnetic waves’, and ‘chemical bonds’ because they occur regularly within the pattern. As a much simplified analogy, consider how the digit 0 appears regularly in the sequence of integers (0, 10, 20,…100, 101, 102,…). If it seems farfetched that matter itself could somehow be a (rather complicated) pattern of bits, consider that human minds have great difficulty comprehending large numbers and patterns, and the complexity that can arise from them. Before modern times, I don’t think anyone could have fathomed that the program that describes himself was comprised of a four-letter genetic alphabet. A pattern of a few billion of those letters arranged into 30,000 words called genes is what builds us. The evidence is now plainly in front of us, but I don’t think we really understand what happens when 30,000 genes are all interacting among themselves plus an external environment influenced by still more genes. A few years ago, the mapping of our genome came with great hope for genetic-based therapies in medicine. It still might happen, but we’ve only made progress on a few very specific cases and our most successful model (for intervening in our own genetics or any such complicated system) is still trial and error – lots of trial, lots of error. We find shortcuts to describe the system here and there, we see its ultimate progress, but we’re not close to understanding how all the little parts work together. We cannot reliably predict what changing a bit of the genome will do, not without lots of trial and error. So I don’t dismiss the idea that something as simple as information could turn out to drive systems we observe in the universe at many levels, including those of fundamental physical units.
The biggest mystery of the universe to me is not its enormity, but why, inside an observable universe 10^26 times larger than us, and comprised of atoms on the order of 10^11 times smaller than us, we experience a singular consciousness. How does this consciousness come about? Why do I wake up every day feeling that I am the same entity, instead of something else? It cannot be something in the atoms themselves; the atoms within a human body are recycled about once every seven years. It has to be the pattern those atoms make. Once again, the pattern which defines us is ironically more complex than we can comprehend – billions of neurons connected in trillions of possible ways. The best I can speculate is that consciousness is closely related to memory. What makes a singular individual is that he can recall many past experiences all linked by his own participation in them, thus forming a sense of being. These memories are stored as neural patterns, which encode information. The brain must develop this ability and build an identity by having many memories, which explains why I have no recollection of being conscious as an infant. Thus there must be degrees of consciousness; full consciousness does not come about at once. One of the features of a conscious being, far away on a spectrum from say, a collection of organic molecules, is the ability to alter behavior based on past experience, in other words, to behave based on large amounts of stored information in addition to what can be immediately sensed. Even if I cannot see ‘Dad’ now, I have the concept of ‘Dad’ stored as a particular neural pattern, and through memory, can learn from repeated past occurrences that ‘Dad’ will be present late in the day after ‘Dad’ finishes ‘Work’, and can make decisions accordingly now.
And why should I expect ‘Dad’ to give me attention later? Well, because Dad and I share a lot of genetic code, and successful genes have evolved that prompt an organism which they control to also promote the survival and well-being of other organisms that have copies of the same genes. Dawkins famously pointed out that the evolutionary idea that genes are the means by which organisms replicate is backwards – we are the means by which genes replicate. All the machinery that comprises us – from a nuclear membrane to a cell membrane to organs and brains – is scaffolding that genes built around themselves to increase their own chances of survival. It’s the genes – the genetic code, the information – that’s evolving and trying to spread. We are hosts for propagating the information that is our genetic code.
And that’s not all we’re propagating. Dawkins also came up with the idea of a ‘meme’ – an idea, catchphrase, tune, image, cliche – which survives and spreads by leaping from brain to brain. (Recursively, ‘meme’ as a concept is itself a meme.) Memes don’t have to be logically true to survive; they must be memorable and make us want to pass them on. As Nicholas Humphrey wrote, “When you plant a fertile meme in my mind you literally parasitize my brain, turning it into a vehicle for the meme’s propagation in just the way that a virus may parasitize the genetic mechanism of a host cell. And this isn’t just a way of talking – the meme for, say, ‘belief in life after death’ is actually realized physically, millions of times over, as a structure in the nervous systems of individual men the world over.” If you find any of the ideas in this article worth remembering and passing on, our brains will have served as replicators for the ideas themselves.
Here’s an idea I find worth remembering (and I’m paraphrasing Dawkins): that the process of evolution in a biological sense is really a special instance of a more general type of evolution – the evolution of information. For information to be passed on it needs a replicator, like a gene or meme. For information to evolve, the replication process must introduce mutations that differentially affect the survival of the replicators. The surviving memes are those which are successful at implanting themselves in our brains and getting us to pass them on.