6.004 Basic of information
6.004 Basic of information
Q: what is good representation of information?
What is information? Information is data that resolves uncertainty about a particular fact.
$X$ is discrete random variable, which has $N$ possible values ($x_i$) and associated probability $p_i$, Information received when learning that choice was $x_i$:
$$I(x_i)=log_2(\frac{1}{p_i})$$
the unit is bits, the number of bits required to encode the choice.
The entropy, $H(X)$, of a discrete random variable $X$ is average amount of information received when learning the value of X:
$$H(X)=E(I(X))=\sum_ilog_2(\frac{1}{p_i})$$
encoding is a unambiguous mapping between bits string to set of possible data.
Encoding as binary tree is a good way to think.
Huffman’s algorithm, given a set of symbols and their probability, constructs an optima variable-length encoding 1 symbol per time.
Build encoding tree bottom up with lower probability.
If we do same algorithm but using multiple symbols, we can do better.
Hamming distance: same length bits, how many different bits in the same position.
single-bit error detection and correction by add hamming distance of valid word encoding.