Information Theory without the Theory: I lied.

"The fundamental problem of communication is that of reproducing at one point, either exactly or approximately, a message selected at another point..." Claude Shannon

Back in 1948, a mathematician named Claude Shannon released a paper that would quantify information as a bit.

It is the idea that since we are able to quantify information, we can then perform transformations on data. Shannon started his research to find the fundamental limits on signal processing operations, such as compression, and the ability to reliably store and communicate data, however, as time has gone on, this Information Theory can be applied in a number of other disciplines. Computer Science, Electrical Engineering and other Applied Mathematics disciplines have adapted Shannon's research and applied their use in a number of different applications, many of which are standard - take .zip files for example.

In this series, I hope to show you the cool things that Information Theory has allowed us to do over the last half century, mainly in the area of Computer Science. While I want to keep away from just boring words and stupid theories and instead show code, some theory is necessary today.

Light On, Light Off.

I realise signal processing feels like a foriegn term, but going off on a tangent, here's a quick way to help visualise what is going on:
Think of Morse Code. You know, flashing lights and what not, well, if we take a quick flash for 100 , that is, 1 the light is on, and 0 the light is off, and the long flash as 110 we can assign that signal, a code. SOS in Morse code is ... --- ..., and in our binary code, 100 100 100 110 110 110 100 100 100, and we have encoded the same information, same signal, into our own language.

Funnily enough that manages to illustrate a problem with signal processing: sometimes our code is going to be pretty long. I mean ... --- ... actually looks smaller than 100 100 100 110 110 110 100 100 100, and surely there is a way to remove have less zeros when we send out our code...

Well there is, it's Compression.

Compression (feat. Maths).

Take an image, one you made in paint. Have you ever made a huge image in MS Paint, and saved it as a .bmp? Was it huge? It probably was. But that image is saved in bitmap. Even in a simple 200 x 400 image, all the colour red, we store the information for each of the 80,000 (200*400) pixels in that image to paint red! Since each Colour takes up 3 bytes in your computer, you're storing 240kb of data! In reality it would be easier just storing the size of the image and the colour together.

And that is redundancy. Redundancy is parts of data that can be compressed or referred to at another time to avoid telling you twice what you already know. This determining of what information is, and the quantification of that information defines what Information Theory is all about.

So, how could we remove redundant pieces of information? Well, like everything before, there is a little bit more information I need to give you.

Take P.

  1. P = 0 means you don't believe it will happen.
  2. P = 1 means you believe that it will.
  3. Therefore, 0 < P < 1 means you're not sure of the outcome, but the event could occur

So throwing a coin, you know that getting heads should be a 0.5.

Cool, so now, two events are classed as independent if they don't affect each other. Tossing a coin twice consists of two independent events, as the following result has nothing to do with the previous toss. This can be defined as P = 0.5 * 0.5 meaning the probability of getting either two heads or two tails in a row is 0.25

We're getting there, I promise.

Lets go Shopping.

Take two stores. One is a Chemist, one is a Menswear store. Their respective head offices are doing market research, and are wondering what the gender balance is of their patrons. To differentiate between them, they assign 0 to Males, and 1 to females.

So the Chemist gets the data streamed to the head office:

1100110011001100110011001100110011001100 ...

However, it turns out the customers are 50% Male, 50% Female. Which is a problem in the world of Information Theory - there is no way to predict what the next customers gender may be, so unfortunately each bit of data (the 1 and 0) is a bit of information.

What about the Menswear store? They know from previous research that 80% of their patrons are male. Thats pretty cool - you can now safely guess that the next person into the store is indeed a Male.

So, with the magic of Compression, there must be a way to harness that probability, right?

What if we grouped the people entering the store into twos, and assigned them a series of bits to determine what they are (I'll explain how to get similar values in a later post, but for now, don't worry):

  • M + M = 0
  • M + F = 10
  • F + M = 110
  • F + F = 111

Wait a minute. We just threw on two extra bits to the last two!
Indeed we did, but look at the probabilities:

  • M + M = 0.64
  • M + F = 0.16
  • F + M = 0.16
  • F + F = 0.04

So you can see that we are less likely to send the larger bit codes, than the less ones. If you don't believe me, here is some maths. (Probability * bits)

( 0.64 * 1 ) + ( 0.16 * 2 ) + ( 0.16 * 3 ) + ( 0.04 * 3 ) = 1.56

Boom. So for every two patrons, we end up sending 1.56 bits. You have successfully turned one person into 0.78 of a person, achieving 22% compression! Woohoo!

And you can reliably work back from that data - I can recreate the exact sequence of patrons again (but more on how that works on another day).

This is exactly what Claude Shannon proved, that with a 80/20 spread, you can compress the data down to 0.71 bits of information for each bit of data, enabling you to send less information, but keep all the data you intended to send.

Phew. What a mission. I think that's enough to take in for one post.

Over the next few weeks, I want to keep looking into Information Theory, and its subparts. I would also like to look into various algorithms and how, by using them on your data, you can compress, store and send more reliable information.

I don't know if it's just me or what, but this is the cool stuff of Computer Science, the stuff that makes things... work...

If you like me, and don't think I spurt utter poop, follow me on twitter @rack_jobinson. I promise I tweet, I really do!

Note: I want to thank Victoria University of Wellington's ENGR 101 class, where the Chemist/Menswear example came from. That course had so much about a lot, but Information Theory has been the one lecture that has stuck with me ever since! Would throughly recommend looking into Information Theory on Google, and reading through the Wikipedia page here. It's a lot more theory than what I went into, but the main ideas are there. Especially Entropy.