6.004 Basic of information

6.004 Basic of information

Q: what is good representation of information?

What is information? Information is data that resolves uncertainty about a particular fact.

$X$ is discrete random variable, which has $N$ possible values ($x_i$) and associated probability $p_i$, Information received when learning that choice was $x_i$:

$$I(x_i)=log_2(\frac{1}{p_i})$$

the unit is bits, the number of bits required to encode the choice.

The entropy, $H(X)$, of a discrete random variable $X$ is average amount of information received when learning the value of X:

$$H(X)=E(I(X))=\sum_ilog_2(\frac{1}{p_i})$$

encoding is a unambiguous mapping between bits string to set of possible data.

Encoding as binary tree is a good way to think.

Huffman’s algorithm, given a set of symbols and their probability, constructs an optima variable-length encoding 1 symbol per time.
Build encoding tree bottom up with lower probability.

If we do same algorithm but using multiple symbols, we can do better.

Hamming distance: same length bits, how many different bits in the same position.

single-bit error detection and correction by add hamming distance of valid word encoding.

Ref

6.004

tags: 6.004, Information