Claude Shannon

Claude Shannon: The Most Underappreciated Mathematical Genius


Computer Science


Claude Shannon is one of the most underappreciated mathematicians/scientists of all time. People compare his contributions to that of Einstien, Turing, Newton made in their respective fields. He is known as the <strong>"father of information theory"</strong> after he published his paper <i>"A Mathematical Theory of Communication"</i>. This paper found a new field of <b>Information Theory<b> and is the basis of all the work done in the telecommunication industry. It was also the first time the term "bit" was used for representing information. Apart from that, he was also one of the first mathematicians ever to create probabilistic models using Markov chains and using them for automation. He was the first person ever to create an <b>AI chess player</b> and is also credited for creating the first probabilistic language model for predicting unknown (or next) letters/words in a sentence in the English language. Here we will discuss some of his work in AI. <h2><b><center>Use of Information Theory in Telecommunication</h2></b></center> The paper <i>"The Mathematical Theory of Communication"<i> was vastly based on probability theory but it talked about communication. Shannon's article laid out the basic elements of communication: <ul><li>An information source that produces a message</li> <li>A transmitter that operates on the message to create a signal which can be sent through a channel.</li> <li>A channel, is the medium over which the signal, carrying the information that composes the message, is sent.</li> <li>A receiver, which transforms the signal back into the message intended for delivery.</li> <li>A destination, which can be a person or a machine, for whom or which the message is intended.</li></ul> It also developed the concepts of information entropy and redundancy and introduced the term bit as a unit of information. He found that a channel had a certain maximum transmission rate that could not be exceeded. Today we call that the bandwidth of the channel. Shannon demonstrated mathematically that <strong>even in a noisy channel with low bandwidth, essentially perfect, error-free communication could be achieved by keeping the transmission rate within the channel's bandwidth</strong> and by using error-correcting schemes: the transmission of additional bits that would enable the data to be extracted from the noise-ridden signal. <h2><b><center>Word Generation using Probabilistic Models</h2></b></center> Shannon's paper on Mathematical Theory of Communication wasn't just a big breakthrough in the field of telecommunication. It also laid foundations for the field of Information Theory. In exploring the application of his newly founded theory of information to human language, thought of purely as a statistical source, Shannon measured how well simple <strong>n-gram models did at predicting, or compressing, natural text.</strong> An n-gram model means predicting the next element of the sequence by looking at the last 'n' known elements of the sequence. He used <i>Markov chain</i> to create a statistical model of the sequences of letters in a piece of English text. Markov chains are now widely used in speech recognition, handwriting recognition, information retrieval, data compression, and spam filtering. Markov model work by training the model of some already existing data and calculating probabilities of a sequence with 'n' elements <strong><i>given that</i></strong> a sequence of 'n-1' elements. For example: if the text has 100 occurrences of "th", with 60 occurrences of "the", 25 occurrences of "thi", 10 occurrences of "tha", and 5 occurrences of "tho", the Markov model of order 2 predicts that the next letter following the 2-gram "th" is 'e' with probability 3/5, 'i' with probability 1/4, 'a' with probability 1/10, and 'o' with probability 1/20. <h2><b><center>Computer Chess Program</h2></b></center> In 1949, Shannon presented a paper called <i>"Programming a Computer for Playing Chess"</i>. The paper presented a method to automate chess players and was the first attempt to do so. There were attempts to automate games before this. Computer Scientists (called Mathematicians at that time) used the Minimax algorithm to automate the Tic-tac-toe game. The <strong>Minimax algorithm</strong> tries to minimize the chances of losing the game by going through all the possible moves a player and the other player can play and give the best possible result. This was possible for tic-tac-toe as it's a small game with a small board (3 X 3) but replicating this on a game like chess where there are </i>way too many possible moves</i> was quite difficult (even today the best supercomputers will take minutes to make one move using minimax algorithm). Claude's process for having the computer decide on which move to make was a minimax procedure, based on an <strong>evaluation function</strong> of a given chess position. Shannon gave a rough example of an evaluation function in which the value of the black position was subtracted from that of the white position while each piece had a different rating, eg 1 point for a pawn, 3 points for a knight or bishop, etc. His algorithm would calculate the value to be maximized (to minimize the chances of loss) after "thinking" let's say the next 3 moves. In the 1990s, <i>IBM used a similar algorithm</i> to defeat the highest-rated Chess played at that time and became the first AI to do so. <h2><b><center>Shannon's mouse</h2></b></center> Shannon also used ML techniques to create an artificial mouse called <b>Theseus</b>. If you put the mouse in the maze it would first go through all possible paths it can go to and will reach the target position it was supposed to reach. After that, if you put the mouse in any place that it has already been to, it would be able to find its way to the target position without clashing with any walls. Basically <b>it would learn the map of the places it has been to.</b> Claude Shannon not only played a big role in the field of communication and information theory, but he also made major contributions to the field of Artificial Intelligence. The development of different AI-related techniques can be divided into two phases, one from the 1940s to 1966 (followed by an AI winter with no real progress in AI due to lack of funding) and one from the late 1980s to now. Claude Shannon played a big role in the first phase and his techniques are still quite useful though nowadays neural networks are found to do better work in most fields. Shannon's discoveries were way ahead of his time.

- Ojas Srivastava, 12:24 AM, 18 Jul, 2022

Information Theory