Информационная энтропия

В 1948 году, исследуя проблему рациональной передачи информации через зашумленный коммуникационный канал, Клод Шаннон выдвинул новый революционный вероятностный подход размышлений о коммуникациях и одновременно создал первую, истинно математическую, теорию энтропии. Его сенсационные идеи быстро послужили разработке двух основных направлений: теории информации, которая использует понятие вероятности и эргодическую теорию для изучения статистических характеристик данных и коммуникационных систем, и теории кодирования, в которой используются главным образом алгебраические и геометрические инструменты для придумывания эффективных шифров.

Энтропия для обособленных случайных событий x, с возможными состояниями 1..n это:

H(x)=-\sum _{i=1}^{n}p(i)\log _{2}p(i).

That is the entropy of the event x is the sum over all possible outcomes i of the product of the probability of outcome i times the log of the probability of i. We can equally apply this to a probability distribution rather than a discrete-valued event.

Shannon shows that any definition of entropy satisfying his assumptions will be of the form:

-K\sum _{i=1}^{n}p(i)\log _{2}p(i).

where K is a constant (and is really just a choice of measurement units).

Shannon defined a measure of entropy (H = − p₁ log₂ p₁ − ... − p_n log₂ p_n) that, when applied to an information source, could determine the minimum channel capacity required to reliably transmit the source as encoded binary digits. Shannon's formula can be derived by calculating the mathematical expectation of the amount of information contained in a digit from the information source. Shannon's entropy measure came to be taken as a measure of the uncertainty about the realization of a random variable. It thus served as a proxy capturing the concept of information contained in a message as opposed to the portion of the message that is strictly determined (hence predictable) by inherent structures. For example, redundancy in language structure or statistical properties relating to the occurrence frequencies of letter or word pairs, triplets etc. See Markov chain.

Shannon's definition of entropy is closely related to thermodynamic entropy as defined by physicists and many chemists. Boltzmann and Gibbs did considerable work on statistical thermodynamics, which became the inspiration for adopting the word entropy in information theory. There are relationships between thermodynamic and informational entropy. For example, демон Максвелла reverses thermodynamic entropy with information but getting that information exactly balances out the thermodynamic gain the demon would otherwise achieve.

It is important to remember that entropy is a quantity defined in the context of a probabilistic model for a data source. Independent fair coin flips have an entropy of 1 bit per flip. A source that always generates a long string of A's has an entropy of 0, since the next character will always be an 'A'. Empirically, it seems that entropy of English text is about 1.5 bits per character (try compressing it with the PPM compression algorithm!), though clearly that will vary from text source to text source. The entropy rate of a data source means the average number of bits per symbol needed to encode it.

Many data bits may not convey information. For example, data structures often store information redundantly, or have identical sections regardless of the information in the data structure.
The amount of entropy is not always an integer number of bits.

Entropy effectively bounds the performance of the strongest non-lossy (or nearly non-lossy) compression possible, which can be realized in theory by using the typical set or in practice using Huffman, Lempel-Ziv or arithmetic coding.

A common way to define entropy for text is based on the Markov model of text. For an order-0 source (each character is selected independent of the last characters), the binary entropy is:

H({\mathcal {S}})=-\sum p_{i}\log _{2}p_{i}

Where p_i is the probability of i. For a first-order Markov source (one in which probabilities are dependent on the immediately preceding character but not on older history except through the immediately preceding character), the entropy rate is:

H({\mathcal {S}})=-\sum _{i}p_{i}\sum _{j}\ p_{i}(j)\log _{2}p_{i}(j)

Where i is a state (certain preceding characters) and $p_{i}(j)$ is the probability of $j$ given $i$ as the previous character (s).

In general the b-ary entropy of a source ${\mathcal {S}}$ = (S,P) with source alphabet S = {a₁, ..., a_n} and discrete probability distribution P = {p₁, ..., p_n} where p_i is the probability of a_i (say p_i = p(a_i)) is defined by:

H_{b}({\mathcal {S}})=-\sum _{i=1}^{n}p_{i}\log _{b}p_{i}

Another way to define the entropy function H (not using the Markov model) is by proving that H is uniquely defined (as earlier mentioned) iff H satisfies 1) - 3):

1) H(p₁, ..., p_n) is defined and continuous for all p₁, ..., p_n where p_i $\in$ [0,1] for all i = 1, ..., n and p₁ + ... + p_n = 1. (Remark that the function solely depends on the probability distribution, not the alphabet.)

2) For all positive integers n, H satisfies

H\underbrace {\left({\frac {1}{n}},\ldots ,{\frac {1}{n}}\right)} _{n\ \mathrm {arguments} }<H\underbrace {\left({\frac {1}{n+1}},\ldots ,{\frac {1}{n+1}}\right).} _{n+1\ \mathrm {arguments} }

3) For positive integers b_i where b₁ + ... + b_n = n, H satisfies

H\left({\frac {1}{n}},\ldots ,{\frac {1}{n}}\right)=H\left({\frac {b_{1}}{n}},\ldots ,{\frac {b_{k}}{n}}\right)+\sum _{i=1}^{k}{\frac {b_{i}}{n}}H\left({\frac {1}{b_{i}}},\ldots ,{\frac {1}{b_{i}}}\right).

Информационная энтропия

Навигация

Поиск