American engineer R. Hartley in 1928 considered the process of obtaining information as selecting one message from a finite predetermined set of N equally probable messages, and the amount of information I, contained in the selected message, was defined as the binary logarithm N .
Hartley's formula:
I = log2 N.
Let's say you need to guess one number from a set of numbers from one to one hundred. Using Hartley's formula, you can calculate how much information is required for this: I = log2100 > 6.644. Thus, a message about a correctly guessed number contains an amount of information approximately equal to 6.644 units of information.
Let's give others examples of equally probable messages:
1. when tossing a coin: "it came up heads", "Heads fell";
2. on the book page: "number of letters is even", "the number of letters is odd".
Let us now determine are the messages equally probable? “a woman will be the first to leave the door of the building” And “The man will be the first to leave the door of the building”. It is impossible to answer this question unambiguously. It all depends on what kind of building we are talking about. If this is, for example, a metro station, then the probability of leaving the door first is the same for a man and a woman, and if this is a military barracks, then for a man this probability is much higher than for a woman.
For problems of this kind, the American scientist Claude Shannon proposed in 1948 another formula for determining the amount of information, taking into account the possible unequal probability of messages in the set.
Shannon's formula:
I = - ( p 1log2 p 1 + p 2 log2 p 2 +... + p Nlog2 pN),
Where pi- the probability that exactly i The th message is highlighted in the set of N messages.
It is easy to see that if the probabilities p 1, ...,pN are equal, then each of them is equal to 1 /N, and Shannon's formula turns into Hartley's formula.
Claude Shannon determined information , How removed uncertainty . More precisely, obtaining information is a necessary condition for removing uncertainty. Uncertainty arises in a situation of choice. The task that is solved in the course of removing uncertainty is reducing the number of options under consideration (reducing diversity), and ultimately choosing one option appropriate to the situation from among the possible ones. Removing uncertainty makes it possible to make informed decisions and take action. This is the controlling role of information.
Imagine that you went into a store and asked them to sell you chewing gum. A saleswoman who has, say, 16 varieties of chewing gum is in a state of uncertainty. She cannot comply with your request without obtaining additional information. If you specified, say, “Orbit”, and out of 16 initial options the saleswoman now considers only 8, you have reduced her uncertainty by half (looking ahead, let’s say that halving the uncertainty corresponds to receiving 1 bit of information ). If you, without further ado, simply pointed your finger at the display window - “this one!”, then the uncertainty was completely removed. Again, looking ahead, let's say that with this gesture in this example you gave the saleswoman 4 bits of information.
Situation maximum uncertainty assumes the presence of several equally probable alternatives (options), i.e. neither option is preferable. Moreover, the more equally probable options there are observed, the greater the uncertainty, the more difficult it is to make an unambiguous choice and the more information required get it for this. For N options, this situation is described by the following probability distribution: (1 /N,1/N, …,1/N} .
Minimum uncertainty is 0, i.e. this situation complete certainty , meaning that the choice has been made and all the necessary information has been received. The probability distribution for a situation of complete certainty looks like this: (1, 0, …0).
The quantity characterizing the amount of uncertainty in information theory is denoted by the symbol H and has a name entropy , more precisely information entropy .
Entropy ( H) – measure of uncertainty , expressed in bits. Entropy can also be considered as measure of distribution uniformity random variable.
Rice. 3.4 Behavior of entropy for the case of two alternatives
In Fig. 3.4 shows the behavior of entropy for the case of two alternatives, when the ratio of their probabilities changes ( P, (1-P)).
The entropy reaches its maximum value in this case when both probabilities are equal to each other and equal to 1/2; the zero entropy value corresponds to the cases ( P 0=0, P 1=1) and ( P 0=1, P 1=0).
Amount of information I And entropy H characterize the same situation, but from qualitatively opposite sides. I is the amount of information required to remove uncertainty H. According to Leon Brillouin's definition information is negative entropy(negentropy) .
When uncertainty is completely removed, the amount of information received I equal to the original uncertainty H.
When uncertainty is partially removed, the amount of information received and the remaining uncertainty that remains unresolved add up to the original uncertainty. Ht + It = H(Fig. 3.5).
Rice. 3.5 Relationship between entropy and amount of information
For this reason, the formulas that will be presented below for calculating entropy H are also formulas for calculating the amount of information I, i.e. when it comes to complete removal of uncertainty, H they can be replaced by I.
In general, entropy H and the amount of information obtained as a result of removing uncertainty I depend on the initial number of options considered N and a priori probabilities of implementation of each of them P:{p 0,p 1, …,pN- 1), i.e. H=F(N,P). The entropy calculation in this case is carried out according to Shannon's formula , proposed by him in 1948 in the article “Mathematical Theory of Communications”.
In a special case when all options equally probable, there remains a dependence only on the number of options considered, i.e. H=F(N). In this case, Shannon’s formula is significantly simplified and coincides with Hartley's formula , which was first proposed by the American engineer Ralph Hartley in 1928, i.e. 20 years earlier.
Shannon's formula is as follows:
The minus sign in formula (2.1) does not mean that entropy is a negative value. This is explained by the fact that pi£ 1 by definition, and the logarithm of a number less than one is a negative value. By the property of a logarithm, therefore this formula can be written in the second version, without the minus before the sum sign.
The expression is interpreted as a private amount of information It, obtained in case of implementation i-th option. Entropy in Shannon’s formula is the average characteristic – the mathematical expectation of the distribution of a random variable ( I 0,I 1, …,I N- 1} .
Let's give an example of calculating entropy using Shannon's formula. Let the composition of workers in some institution be distributed as follows: 3/4 are women, 1/4 are men. Then the uncertainty, for example, about who you will meet first when entering an institution, will be calculated by a series of actions shown in Table. 3.1.
Table 3.1
pi | 1/pi | Ii= log2(1/ pi),bit | pi* log2(1/ pi),bit | |
AND | 3/4 | 4/3 | log2(4/3)=0.42 | 3/4 * 0,42=0,31 |
M | 1/4 | 4/1 | log2(4)=2 | 1/4 * 2=0,5 |
å | H= 0,81bit |
We have already mentioned that Hartley's formula is a special case of Shannon's formula for equally probable alternatives.
Substituting into formula (2.1) instead pi it (in the equally probable case, independent of i)value, we get:
Thus, Hartley's formula looks very simple:
It clearly follows from this that the greater the number of alternatives ( N), the greater the uncertainty ( H). Logarithm to base 2 reduces the number of options to units of information - bits. Figure 3.6 shows the dependence of entropy on the number of equally probable choices.
Rice. 3.6 Dependence of entropy on the number of equally probable choices (equivalent alternatives)
To solve inverse problems when the uncertainty ( H) or the amount of information obtained as a result of its removal ( I) and you need to determine how many equally probable alternatives correspond to the occurrence of this uncertainty, use the inverse Hartley formula, which looks even simpler:
For example, if it is known that as a result of determining that Kolya Ivanov of interest to us lives on the second floor, 3 bits of information were obtained, then the number of floors in the house can be determined using formula (2.3), as N= 23= 8 floors.
If the question is: “The house has 8 floors, how much information did we receive when we learned that Kolya Ivanov of interest to us lives on the second floor?”, we need to use formula (2.2): I = log2(8) = 3 bits.
Until now, we have given formulas for calculating entropy (uncertainty) H, indicating that H they can be replaced with I, because the amount of information received with complete removal of uncertainty of a certain situation is quantitatively equal to the initial entropy of this situation.
But uncertainty can only be partially removed, so the amount of information I, received from some message, is calculated as the decrease in entropy resulting from obtaining given messages.
For the equally probable case, using Hartley’s formula to calculate entropy, we obtain:
The second equality is derived based on the properties of the logarithm. Thus, in the equally probable case I depends on how many times the number of choices considered (diversity considered) has changed.
Based on (3.5), the following can be deduced:
If, then - complete removal of uncertainty, the amount of information received in the message is equal to the uncertainty that existed before receiving the message.
If, then - the uncertainty has not changed, therefore, no information was received.
If, then => ,
if, then => .
Those. the amount of information received will be a positive value if, as a result of receiving the message, the number of alternatives under consideration has decreased, and negative if it has increased.
If the number of alternatives considered as a result of receiving the message is halved, i.e., then I=log2(2)=1 bit. In other words, receiving 1 bit of information eliminates half of the equivalent options from consideration.
Let us consider, as an example, an experiment with a deck of 36 cards (Fig. 3.7).
Rice. 3.7 Illustration for an experiment with a deck of 36 cards
Have someone take one card from the deck. We are interested in which of the 36 cards he took out. The initial uncertainty, calculated using formula (3.2), is H= log2(36)@5.17 bit. The one who draws the card tells us some information. Using formula (3.5), we determine how much information we receive from these messages:
Option A: “This is a red card.”
I=log2(36/18)=log2(2)=1bit (there are half red cards in the deck, uncertainty has decreased by 2 times).
Option B: “This is a card of the spades suit.”
I=log2(36/9)=log2(4)=2 bits (cards of spades make up a quarter of the deck, uncertainty has decreased by 4 times).
Option C. “This is one of the high cards: jack, queen, king or ace.”
I=log2(36)–log2(16)=5.17-4=1.17 bits (the uncertainty has decreased by more than half, so the amount of information received is more than one bit).
Option D. “This is one card from the deck.”
I=log2(36/36)=log2(1)=0 bits (uncertainty has not decreased - the message is not informative).
Option E. “This is the queen of spades.”
I=log2(36/1)=log2(36)=5.17 bits (uncertainty completely removed).
Task 1. How much information will be contained in the visual message about the color of the removed ball if there are 50 white, 25 red, 25 blue balls in an opaque bag?
Solution.
1) total balls 50+25+25=100
2) ball probabilities 50/100=1/2, 25/100=1/4, 25/100=1/4
3)I= -(1/2 log21/2 + 1/4 log21/4 + 1/4 log21/4) = -(1/2(0-1) +1/4(0-2) +1/4(0 -2)) = =1.5 bits
Task 2. There are 16 balls of different colors in a basket. How much information does the message that you have taken out a white ball convey?
Solution. Because N = 16 balls, then I = log2 N = log2 16 = 4 bits.
Task 3. There are black and white balls in the basket. Among them are 18 black balls. The message that a white ball has been drawn carries 2 bits of information. How many balls are there in the basket?
1) 18 2) 24 3) 36 4)48
Solution. Using Shannon's formula, we find the probability of getting a white ball: log2N=2, N=4, therefore, the probability of getting a white ball is 1/4 (25%), and the probability of getting a black ball is 3/4 (75%). If 75% of all balls are black, their number is 18, then 25% of all balls are white, their number is (18*25)/75=6.
It remains to find the number of all balls in the basket 18+6=24.
Answer: 24 balls.
Task 4. In some countries, a 5-character license plate is made up of capital letters (30 letters in total) and decimal digits in any order. Each character is encoded with the same and minimum possible number of bits, and each number is encoded with the same and minimum possible number of bytes. Determine the amount of memory required to store 50 license plates.
1) 100 bytes 2) 150 bytes 3) 200 bytes 4) 250 bytes
Solution. The number of characters used to encode the number is: 30 letters + 10 numbers = 40 characters. The amount of information carried by one symbol is 6 bits (2I=40, but the amount of information cannot be a fractional number, so we take the nearest power of two greater than the number of characters 26=64).
We found the amount of information contained in each character, the number of characters in the number is 5, therefore, 5*6=30 bits. Each number is equal to 30 bits of information, but according to the problem, each number is encoded with the same and minimum possible number of bytes, therefore, we need to find out how many bytes are in 30 bits. If you divide 30 by 8 you get a fractional number, and we need to find the integer number of bytes for each number, so we find the nearest factor of 8 that exceeds the number of bits, this is 4 (8*4=32). Each number is encoded in 4 bytes.
To store 50 license plates you will need: 4*50=200 bytes.
Choosing the optimal strategy in the game “Guess the Number”. The choice of the optimal strategy in the game “Guess the Number” is based on obtaining the maximum amount of information, in which the first participant guesses an integer (for example, 3) from a given interval (for example, from 1 to 16), and the second must “guess” the intended number. If we consider this game from an information point of view, then the initial uncertainty of knowledge for the second participant is 16 possible events (options of the hidden numbers).
With an optimal strategy, the interval of numbers should always be divided in half, then the number of possible events (numbers) in each of the resulting intervals will be the same and guessing the intervals is equally probable. In this case, at each step the answer of the first player (“Yes” or “No”) will carry the maximum amount of information (1 bit).
As can be seen from table. 1.1, guessing the number 3 occurred in four steps, at each of which the uncertainty of the second participant’s knowledge was halved by receiving a message from the first participant containing 1 bit of information. Thus, the amount of information required to guess one of the 16 numbers was 4 bits.
Test questions and assignments
1. It is known a priori that the ball is in one of three urns: A, B or C. Determine how many bits of information the message that it is in urn B contains.
Options: 1bit, 1,58bat, 2bat, 2,25bat.
2. The probability of the first event is 0.5, and the second and third are 0.25. What is the information entropy for such a distribution? Options: 0,5bat, 1 bit, 1,5bat, 2bat, 2,5bat, 3bat.
3. Here is a list of employees of a certain organization:
Determine the amount of information missing to fulfill the following requests:
Please call Ivanova to the phone.
I am interested in one of your employees, she was born in 1970.
4. Which message carries more information:
· As a result of tossing a coin (heads, tails), the result is tails.
· The traffic lights (red, yellow, green) are now green.
· As a result of tossing the dice (1, 2, 3, 4, 5, 6), 3 points were obtained.
The American engineer R. Hartley in 1928 considered the process of obtaining information as selecting one message from a finite predetermined set of N equally probable messages, and the amount of information I contained in the selected message was defined as the binary logarithm of N.
Hartley's formula: I = log 2 N or N = 2 i
Let's say you need to guess one number from a set of numbers from one to one hundred. Using Hartley's formula, you can calculate how much information is required for this: I = log 2 100 > 6.644. Thus, a message about a correctly guessed number contains an amount of information approximately equal to 6.644 units of information.
Let's give other examples equally probable messages :
1. when tossing a coin: “it came up heads”, “it came up heads”;
2. on the book page: “the number of letters is even,” “the number of letters is odd.”
Let us now determine whether equally probable messages « The woman will be the first to leave the door of the building.” And “the first man to leave the door of the building" It is impossible to answer this question unequivocally. It all depends on what kind of building we are talking about. If this is, for example, a metro station, then the probability of leaving the door first is the same for a man and a woman, and if this is a military barracks, then for a man this probability is much higher than for a woman.
For problems of this kind, the American scientist Claude Shannon proposed another formula in 1948 determining the amount of information, taking into account the possible unequal probability of messages in the set .
Shannon's formula: I = - (p 1 log 2 p 1 + p 2 log 2 p 2 + . . . + p N log 2 p N),
where p i is the probability that exactly i-th message highlighted in a set of N messages.
It is easy to see that if the probabilities p 1, ..., p N are equal, then each of them is equal to 1 / N, and Shannon’s formula turns into Hartley’s formula.
In addition to the two considered approaches to determining the amount of information, there are others. It is important to remember that any theoretical results are applicable only to a certain range of cases, outlined by the initial assumptions.
As units of information Claude Shannon offered to accept one bit(English bit - binary digit - binary digit).
Bit in information theory - the amount of information necessary to distinguish between two equally probable messages (such as “heads” - “tails”, “even” - “odd”, etc.).
In computing, a bit is the smallest “portion” of computer memory required to store one of the two characters “0” and “1” used to represent data and commands internally.
A bit is too small a unit of measurement. In practice, a larger unit is more often used - byte, equal to eight bits. It is precisely eight bits that are required to encode any of the 256 characters of the computer keyboard alphabet (256 = 2 8).
Even larger derived units of information are also widely used:
1 Kilobyte (KB) = 1024 bytes = 210 bytes,
1 Megabyte (MB) = 1024 KB = 220 bytes,
1 Gigabyte (GB) = 1024 MB = 230 bytes.
Recently, due to the increase in the volume of processed information, such derived units as:
1 Terabyte (TB) = 1024 GB = 240 bytes,
1 Petabyte (PB) = 1024 TB = 250 bytes.
Per unit of information, one could choose the amount of information needed to distinguish between, for example, ten equally probable messages. It will not be binary (bit), but decimal ( dit) unit of information.
The amount of information contained in a message is determined by the amount of knowledge that this message carries to the person receiving it. A message contains information for a person if the information contained in it is new and understandable for this person, and, therefore, expands his knowledge.
The information that a person receives can be considered a measure of reducing the uncertainty of knowledge. If some message leads to a decrease in the uncertainty of our knowledge, then we can say that such a message contains information.
A unit of information is taken to be the amount of information that we obtain when uncertainty is reduced by 2 times. This unit is called bit.
In a computer, information is presented in binary code or machine language, the alphabet of which consists of two numbers (0 and 1). These numbers can be considered as two equally probable states. When writing one binary digit, the choice of one of two possible states (one of two digits) is implemented and, therefore, one binary digit carries the amount of information of 1 bit. Two binary bits carry 2 bits of information, three bits carry 3 bits, etc.
Let us now pose the inverse problem and determine: “How many different binary numbers N can be written using I binary digits?” Using one binary digit you can write 2 different numbers (N=2=2 1), using two binary digits you can write four binary numbers (N=4=2 2), using three binary digits you can write eight binary numbers (N =8=2 3), etc.
In general, the number of different binary numbers can be determined by the formula
N – number of possible events (equally probable)!!!;
In mathematics, there is a function that is used to solve an exponential equation, this function is called the logarithm. The solution to this equation is:
If events equally probable , then the amount of information is determined by this formula.
Amount of information for events with different probabilities determined by Shannon's formula :
,
where I is the amount of information;
N – number of possible events;
P i – probability of individual events.
Example 3.4
There are 32 balls in the lottery drum. How much information does the message about the first number drawn contain (for example, number 15 was drawn)?
Solution:
Since drawing any of the 32 balls is equally probable, the amount of information about one drawn number is found from the equation: 2 I =32.
But 32=2 5. Therefore, I=5 bits. Obviously, the answer does not depend on which number is drawn.
Example 3.5
How many questions do you need to ask your interlocutor to determine for sure the month in which he was born?
Solution:
Let's consider 12 months as 12 possible events. If you ask about a specific month of birth, you may have to ask 11 questions (if a negative answer was received to the first 11 questions, then it is not necessary to ask the 12th, since it will be correct).
It is more correct to ask “binary” questions, that is, questions that can only be answered with “yes” or “no.” For example, “Were you born in the second half of the year?” Each such question splits the set of options into two subsets: one corresponding to the answer “yes”, and the other to the answer “no”.
The correct strategy is to ask questions in such a way that the number of possible options is halved each time. Then the number of possible events in each of the resulting subsets will be the same and their guessing is equally probable. In this case, at each step the answer (“yes” or “no”) will carry the maximum amount of information (1 bit).
Using formula 2 and using a calculator we get:
bat.
The number of bits of information received corresponds to the number of questions asked, but the number of questions cannot be a non-integer. We round up to a larger integer and get the answer: with the right strategy, you need to set no more than 4 questions.
Example 3.6
After the computer science exam that your friends took, the grades (“2”, “3”, “4” or “5”) are announced. How much information will be conveyed by the message about the grade of student A, who has learned only half of the tickets, and the message about the grade of student B, who has learned all the tickets.
Solution:
Experience shows that for student A all four assessments (events) are equally probable and then the amount of information carried by the assessment message can be calculated using formula (1):
Based on experience, we can also assume that for student B the most likely grade is “5” (p 1 =1/2), the probability of a grade “4” is two times less (p 2 =1/4), and the probability of grades “2” " and "3" are still two times smaller (p 3 =p 4 =1/8). Since events are not equally probable, we will use formula 2 to calculate the amount of information in a message:
Calculations have shown that with equally probable events we receive more information than with unequally probable events.
Example 3.7
An opaque bag contains 10 white, 20 red, 30 blue and 40 green balls. How much information will be contained in the visual message about the color of the removed ball?
Solution:
Since the number of balls of different colors is not the same, the probabilities of visual messages about the color of a ball taken out of the bag also differ and are equal to the number of balls of a given color divided by the total number of balls:
P b =0.1; P k =0.2; P s =0.3; P z =0.4.
Events are not equally probable, therefore, to determine the amount of information contained in the message about the color of the ball, we use formula 2:
To calculate this expression containing logarithms, you can use a calculator. I"1.85 bits.
Example 3.8
Using Shannon's formula, it is easy to determine how many bits of information, or binary digits, are needed to encode 256 different characters. 256 different symbols can be considered as 256 different equally probable states (events). According to the probabilistic approach to measuring the amount of information, the required amount of information to binary encode 256 symbols is:
I=log 2 256=8 bits=1 byte
Therefore, to binary encode 1 character, 1 byte of information or 8 binary bits is required.
How much information is contained, for example, in the text of the novel “War and Peace,” in the frescoes of Raphael, or in the human genetic code? Science does not provide answers to these questions and, in all likelihood, will not provide answers soon. Is it possible to objectively measure the amount of information? The most important result of information theory is the following conclusion: “In certain, very broad conditions, one can neglect qualitative features information, express its quantity in numbers, and also compare the amount of information contained in different groups of data.”
Currently, approaches to defining the concept of “amount of information” have become widespread, based on that the information contained in the message can be loosely interpreted in the sense of its novelty or, in other words, reducing the uncertainty of our knowledge about the object. These approaches use the mathematical concepts of probability and logarithm.
Approaches to determining the amount of information.
American engineer R. Hartley in 1928, the process of obtaining information was considered as the selection of one message from a finite predetermined set of N equally probable messages, and the amount of information I contained in the selected message was defined as the binary logarithm of N.
Hartley's formula: I = log 2 N
Let's say you need to guess one number from a set of numbers from one to one hundred. Using Hartley's formula, you can calculate how much information is required for this: I = log 2 100 = 6.644. Thus, a message about a correctly guessed number contains an amount of information approximately equal to 6.644 units of information.
Let's give others examples of equally probable messages:
1. when tossing a coin: "it came up heads", "heads fell";
2. on the book page: "the number of letters is even", "the number of letters is odd".
Let us now determine whether the messages are equally probable "The first woman to leave the building's doors" And "The man will be the first to leave the door of the building". It is impossible to answer this question unequivocally. It all depends on what kind of building we are talking about. If this is, for example, a cinema, then the probability of leaving the door first is the same for a man and a woman, and if this is a military barracks, then for a man this probability is much higher than for a woman.
For problems of this kind, the American scientist Claude Shannon proposed in 1948 another formula for determining the amount of information, taking into account the possible unequal probability of messages in the set.
Shannon's formula: I = - (p 1 log 2 p 1 + p 2 log 2 p 2 + . . . + p N log 2 p N),
where p i- the probability that exactly i The th message is selected in a set of N messages.
It is easy to see that if the probabilities p 1 , ..., p N are equal, then each of them is equal 1/N, and Shannon's formula turns into Hartley's formula.
In addition to the two considered approaches to determining the amount of information, there are others. It is important to remember that any theoretical results are applicable only to a certain range of cases, outlined by the initial assumptions.
As a unit of information, Claude Shannon proposed to take one bit (English. bit - bi nary digi t - binary digit).
Bitin information theory- the amount of information necessary to distinguish between two equally probable messages (such as “heads” - “tails”, “even” - “odd”, etc.).
In computing A bit is the smallest “portion” of computer memory required to store one of the two characters “0” and “1” used for internal machine representation of data and instructions.
A bit is too small a unit of measurement. In practice, a larger unit is more often used - byte, equal to eight bits. It is precisely eight bits that are required to encode any of the 256 characters of the computer keyboard alphabet (256 = 2 8).
Even larger derived units of information are also widely used:
- 1 Kilobyte (KB) = 1024 bytes = 2 10 bytes,
- 1 Megabyte (MB) = 1024 KB = 2 20 bytes,
- 1 Gigabyte (GB) = 1024 MB = 2 30 bytes.
Recently, due to the increase in the volume of processed information, such derived units as:
- 1 Terabyte (TB) = 1024 GB = 2 40 bytes,
- 1 Petabyte (PB) = 1024 TB = 2 50 bytes.
Per unit of information, one could choose the amount of information needed to distinguish between, for example, ten equally probable messages. This will not be a binary (bit), but a decimal (dit) unit of information.
1.6. What can you do with the information?
Information can be:
All these processes associated with certain operations on information are called information processes.
1.7. What properties does information have?
Information properties:
Information reliable, if it reflects the true state of affairs. Inaccurate information can lead to misunderstandings or poor decisions.
Reliable information may become unreliable over time, since it tends to become outdated, that is, it ceases to reflect the true state of affairs.
Information full, if it is enough for understanding and making decisions. Both incomplete and redundant information hinder decision making or may lead to errors.
Accuracy of information is determined by the degree of its proximity to the real state of the object, process, phenomenon, etc.
Value information depends on how important it is for solving the problem, as well as on how much further it will be used in any type of human activity.
Only in a timely manner the information obtained can bring the expected benefits. Both premature presentation of information (when it cannot yet be assimilated) and its delay are equally undesirable.
If valuable and timely information is expressed in an unclear way, it can become useless.
Information becomes understandable if it is expressed in the language spoken by those for whom this information is intended.
Information must be presented in an accessible (according to the level of perception) form. Therefore, the same questions are presented differently in school textbooks and scientific publications.
Information on the same issue can be presented briefly (concisely, without unimportant details) or extensively (detailed, verbose). Conciseness of information is necessary in reference books, encyclopedias, textbooks, and all kinds of instructions.
Control questions:
1. What does the term “computer science” mean and what is its origin?
2. What areas of knowledge have been officially assigned to the concept of “computer science” since 1978?
3. What areas of human activity and to what extent does computer science affect them?
4. Name the main components of computer science and the main directions of its application.
5. What is meant by the concept of “information” in everyday, scientific and technical senses?
6. From whom (or what) does the person receive information? Who does the information pass on to?
7. What can you do with the information?
8. Give examples of human information processing. What are the results of this processing?
9. Give examples of technical devices and systems designed to collect and process information.
10. What determines the information content of a message received by a person?
11. Why is it more convenient to evaluate the amount of information in a message not by the degree of increase in knowledge about the object, but by the degree of decrease in the uncertainty of our knowledge about it?
12. How is the unit of measurement for the amount of information determined?
13. In what cases and by what formula can you calculate the amount of information contained in a message?
14. Why is the number 2 taken as the base of the logarithm in Hartley’s formula?
15. Under what condition does Shannon’s formula transform into Hartley’s formula?
16. What defines the term “bit” in information theory and computer science?
17. Give examples of messages, the information content of which can be determined unambiguously.
When studying various phenomena and objects of the surrounding world, people sought to associate numbers with these objects and introduce their quantitative measure. People learned to measure distances, weigh various objects, calculate the areas of figures and volumes of bodies. Having learned to measure time and its duration, we are still trying to understand its nature. The thermometer was invented many years before scientists understood what it measured: approximately three centuries passed from the invention of the first thermometer to the development of thermodynamics. The quantitative study of a certain phenomenon or object may be ahead of its qualitative study, and the process of forming the corresponding concept may follow the quantitative study.
A similar situation has developed with regard to information. R. Hartley in 1928, and then K. Shannon in 1948, proposed formulas for calculating the amount of information, but they never answered the question of what information is. In communication theory, information appears in the form of various messages: for example, letters or numbers, as in telegraphy, or as a continuous function of time, as in telephony or radio broadcasting. In any of these examples, ultimately, the task is to convey the semantic content of human speech. In turn, human speech can be represented in sound vibrations or in written form.
This is another property of this type of information: the ability to represent the same semantic content in different physical forms. W. Ashby was the first to draw special attention to this. Representing information in different physical forms is called encoding. In order to communicate with other people, a person has to constantly engage in encoding, recoding and decoding. It is obvious that information can be transmitted through communication channels in a variety of coding systems.
R. Hartley was the first to introduce the methodology of “measuring the amount of information” into the theory of information transfer. At the same time, R. Hartley believed that the information that he was going to measure was “... a group of physical symbols - words, dots, dashes, etc., which by general agreement have a known meaning for the corresponding parties.” Thus, Hartley set himself the task of introducing some kind of measure to measure encoded information.
Let a sequence of n characters a 1 a 2 a 3 a n be transmitted, each of which belongs to the alphabet A m containing m characters. What is the number K of different variants of such sequences? If n = 1 (one character is transmitted), then K = m; if n=2 (a sequence of 2 characters is transmitted), then K = m*m = m 2 ; in the general case, for a sequence of n characters we get
Hartley proposed calculating the amount of information contained in such a sequence as the logarithm of the number K to base 2:
I = Log 2 K, (2.1)
where K = mn.
That is, the amount of information contained in a sequence of n characters from the alphabet A m , in accordance with Hartley’s formula, is equal to
I = Log 2 (m n) = n Log 2 m. (2.2)
Remark 1. Hartley assumed that all symbols of the alphabet A m can occur with equal probability (frequency) anywhere in the message. This condition is violated for natural language alphabets: for example, not all letters of the Russian alphabet occur in the text with the same frequency.
Remark 2. Any message of length n in the alphabet A m will contain the same amount of information. For example, in the alphabet (0; 1), messages 00111, 11001 and 10101 contain the same amount of information. This means that when calculating the amount of information contained in a message, we are distracted from its semantic content. A “meaningful” message and a message derived from it by an arbitrary permutation of symbols will contain the same amount of information.
Example. A telegraph message uses two symbols - a period (.) and a dash (-), i.e. the alphabet consists of m = 2 characters. Then, when transmitting one character (n = 1), the amount of information I = Log 2 2 = 1. This amount was taken as a unit of measurement of the amount of information and is called 1 bit (from English binary unit = bit). If a telegraph message in the alphabet (. ; -) contains n characters, then the amount of information I = n Log 2 2 = n (bits).
Using the symbols 0 and 1, information is encoded in a computer and when transmitted over computer networks, i.e. the alphabet consists of two characters (0; 1); one symbol in this case also contains I = Log 2 2 = 1 bits of information, therefore a message of length n characters in the alphabet (0; 1) in accordance with Hartley’s formula (2.2) will contain n bits of information.
If we consider the transmission of messages in the Russian alphabet, consisting of 33 letters, then the amount of information contained in a message of n characters, calculated using Hartley’s formula, is equal to I = n*Log 2 33 » n* 5.0444 bits. The English alphabet contains 26 letters, one character contains Log 2 26 » 4.7 bits, so a message of n characters, calculated using Hartley's formula, contains n* Log 2 26 » 4.7 *n bits of information. However, this result is not correct, since not all letters appear in the text with the same frequency. In addition, separating characters must be added to the letters of the alphabet: space, period, comma, etc.
Formula (2.1) superficially resembles the Boltzmann formula for calculating the entropy of a system with N equally probable microstates:
S= - k*Ln(W), (2.3)
where k is Boltzmann’s constant = 1.38*10 -23, and W is the probability of spontaneous adoption of one of the microstates of the system per unit time t = 10 -13 seconds, W = 1/N, i.e.
S= -k*Ln(1/N) = k*Ln(N), (2.4)
which is completely consistent with formula (2.1) with the exception of the factor k and the base of the logarithm. Because of this external similarity, the value of Log 2 K in information theory is also called entropy and is denoted by the symbol H. Information entropy is a measure of the uncertainty of the state of some random variable (physical system) with a finite or countable number of states. Random value(s.v.) is a quantity that, as a result of an experiment or observation, takes on a numerical value, which is not known in advance.
So, let X be a random variable that can take N different values x 1, x 2, ... x N; if all values of r.v. X are equally probable, then the entropy (measure of uncertainty) of the quantity X is equal to:
H(X) = Log 2 N. (2.5)
Comment. If a random variable (system) can only be in one state (N=1), then its entropy is equal to 0. In fact, it is no longer a random variable. The greater the number of possible equally probable states, the higher the uncertainty of a system.
Entropy and the amount of information are measured in the same units - bits.
Definition. 1 bit is the entropy of a system with two equally probable states.
Let system X be in two states x1 and x2 with equal probability, i.e. N = 2; then its entropy H(X) = Log 2 2 = 1 bit. An example of such a system is given by a coin, when tossed, either heads (x1) or tails (x2) appear. If the coin is “correct”, then the probability of getting heads or tails is the same and equal to 1/2.
Let's give another definition of the unit of measurement of information.
Definition. The answer to a question of any nature (any character) contains 1 bit of information if it can be “yes” or “no” with equal probability.
Example. Game of "empty-thick". You hide a small object in one hand and ask your partner to guess in which hand you hid it. He asks you "in your left hand?" (or simply chooses a hand: left or right). You answer “yes” if he guessed right, or “no” otherwise. For any answer, the partner receives 1 bit of information, and the uncertainty of the situation is completely removed.
Hartley's formula can be used when solving problems of determining the selected element of a given set. This result can be formulated as the following rule.
If in a given set M, consisting of N elements, some element x is selected, about which nothing else is known, then to determine this element it is necessary to obtain Log 2 N bits of information.
Let's consider several problems using Hartley's formula.
Problem 1. Someone has thought of a natural number in the range from 1 to 32. What is the minimum number of questions that must be asked in order to guaranteed guess the intended (highlighted) number. The answers can only be “yes” or “no”.
A comment. You can try to guess the intended number by simple search. If you're lucky, you'll only have to ask one question, but in the worst case scenario, you'll have to ask 31 questions. In the proposed task, you need to determine the minimum number of questions with which you are guaranteed to determine the intended number.
Solution. Using Hartley's formula, you can calculate the amount of information that needs to be obtained to determine the selected element x from the set of integers (1,2,3 32). To do this, you need to get H = Log 2 32 = 5 bits of information. Questions must be asked in such a way that the answers to them are equally probable. Then the answer to each such question will bring 1 bit of information. For example, you can divide the numbers into two equal groups from 1 to 16 and from 17 to 32 and ask which group the intended number is in. Next, you should do the same with the selected group, which already contains only 16 numbers, etc. Let, for example, think of the number 7.
Question No. 1: Does the intended number belong to the set (17; 32)? The answer "no" gets you 1 bit of information. We now know that number belongs to the set (1; 16).
Question No. 2: Does the conceived number belong to the set (1; 8)? Answering “yes” brings you 1 more bit of information. We now know that number belongs to the set (1; 8).
Question No. 3: Does the conceived number belong to the set (1; 4)? The answer “no” brings you 1 more bit of information. We now know that number belongs to the set (5; 8).
Question No. 4: Does the conceived number belong to the set (7; 8)? Answering “yes” brings you 1 more bit of information. We now know that number belongs to the set (7; 8).
Question No. 5: Is the intended number equal to 8? The answer “no” brings you 1 more bit of information. We now know that the intended number is 7. The problem is solved. Five questions were asked, 5 bits of information were received in response, and the intended number was determined. ‚
Problem 2. (Problem about a counterfeit coin). There are 27 coins, of which 26 are real and one is fake. What is the minimum number of weighings on a lever scale for which one counterfeit coin out of 27 can be reliably identified, using the fact that the counterfeit coin is lighter than the real one?
Lever scales have two cups and with their help you can only determine whether the contents of the cups are the same in weight, and if not, then the contents of which cup is heavier.
Solution. This is a task to identify one selected element out of 27. Using Hartley’s formula, we can immediately determine the amount of information that needs to be obtained to identify a counterfeit coin: it is equal to I = Log 2 27 = Log 2 (3 3) = 3 Log 2 3 bits. Note that without yet knowing the weighing strategy, we can say how much information we need to obtain to solve the problem.
If you put an equal number of coins on the scales, then three equally probable outcomes are possible:
1. The left cup is heavier than the right (L > R);
2. The left cup is lighter than the right (L< П);
3. The left cup is in balance with the right (L = R);
The “lever scale” system can be in three equally probable states, so one weighing gives Log 2 3 bits of information. In total, to solve the problem you need to get I = 3 Log 2 3 bits of information, which means you need to do three weighings to determine a counterfeit coin. We already know the minimum number of weighings, but we do not yet know how they should be carried out. The strategy should be such that each weighing provides the maximum amount of information. Let's divide all the coins into three equal piles A, B and C, 9 pieces each. A counterfeit coin, denoted by the letter f, can be found in any of the three piles with equal probability. Let's choose any two of them, for example A and B, and weigh them.
There are three possible outcomes:
1) A is heavier than B (A > B); means f Î B;
2) A is lighter than B (A< B); значит f Î A;
3) A is in equilibrium with B (A = B); means f Î C.
For any outcome, we will determine in which pile the counterfeit coin f is located, but in this pile there will be only 9 coins. Divide it into three equal piles A1, B1, C1, 3 coins in each. Let's choose any two and weigh them. As in the previous step, we will determine the pile of coins in which the fake coin is located, but now the pile consists of only three coins. Let's choose any two coins and weigh them. This will be the last, third weighing, after which we will find the counterfeit coin.
Problem 3. Without using a calculator, estimate to one bit the entropy of a system that could be in 50 states with equal probability.
Solution. Using Hartley's formula, H = Log 2 50. Let's evaluate this expression.
Obviously 32< 50 < 64; логарифмируем это неравенство à Log 2 32 < Log 2 50 < Log 2 64 à 5 < Log 2 50 < 6. Энтропия системы с точностью до 1 бита 5 < H < 6 . ‚
Task 4. It is known that the entropy of the system is 7 bits. Determine the number of states of this system if it is known that they are all equally probable.
Solution. Let us denote by N the number of states of the system. Since all states are equally probable, then H = Log 2 N à N = 2 H, i.e. N = 2 7 = 128.
We will define information through its basic properties (since, along with matter and energy, it is the primary concept of our world and therefore cannot be defined in a strict sense):
- information brings information about the surrounding world that was not available at the point in question before it was received;
- information is not material and cannot exist in isolation from the form of information presentation (sequences of signals or signs - messages);
- messages contain information only for those who are able to recognize it.
Messages contain information not because they copy objects of reality, but according to a social agreement about the connection between the carriers and the objects designated by this carrier (for example, a word denotes some object of objective reality). In addition, carriers can be formed by naturally occurring physical processes.
In order for a message to be transmitted to the recipient, it is necessary to use some physical process that can propagate at one speed or another from the source to the recipient of the message. A time-varying physical process that reflects a transmitted message is called a signal.
To use mathematical tools to study information, you need to abstract from the meaning and content of the information. This approach was common to the researchers we mentioned, since pure mathematics operates with quantitative relationships without going into the physical nature of the objects behind which the relationships stand. Therefore, if the meaning is emasculated from messages, then the starting point for an informational assessment of an event remains only a set of events that are different from each other and, accordingly, messages about them.
Let us be interested in the following information about the state of some objects: in which of the four possible states (solid, liquid, gaseous, plasma) is some substance located? Which of the four technical school courses is the student studying in? In all these cases, there is uncertainty of the event of interest to us, characterized by the presence of a choice of one of four possibilities. If in the answers to the above questions we abstract from their meaning, then both answers will carry the same amount of information, since each of them identifies one of the four possible states of the object and, therefore, removes the same uncertainty of the message.
Uncertainty is inherent in the concept of probability. Reducing uncertainty is always associated with the choice (selection) of one or more elements (alternatives) from a certain set of them. This mutual reversibility of the concepts of probability and uncertainty served as the basis for the use of the concept of probability when measuring the degree of uncertainty in information theory. If we assume that any of the four answers to questions is equally probable, then its probability in all questions is equal 1/4 .
The equal probability of answers in this example also determines the equal uncertainty removed by the answer in each of the two questions, which means that each answer carries the same information.
Now let’s try to compare the following two questions: which of the four courses at the technical school is the student studying? How will a coin fall when tossed: upside down or upside down? In the first case, four equally probable answers are possible, in the second - two. Therefore, the probability of some answer in the second case is greater than in the first ( 1/2 > 1/4 ), while the uncertainty removed by the answers is greater in the first case. Any possible answer to the first question removes more uncertainty than any answer to the second question. Therefore, the answer to the first question contains more information! Consequently, the lower the probability of an event, the greater the uncertainty the message about its occurrence removes and, consequently, the more information it carries.
Suppose that some event has m equally probable outcomes. Such an event could be, for example, the appearance of any character from the alphabet containing m such characters. How to measure the amount of information that can be transmitted using such an alphabet? This can be done by defining the number N possible messages that can be transmitted using this alphabet. If the message is formed from one character, then N = m, if of two, then N = m m = m 2. If the message contains n characters ( n– message length), then N = mn. It would seem that the desired measure of the amount of information has been found. It can be understood as a measure of the uncertainty of the outcome of an experiment, if by experience we mean the random selection of a message from a certain number of possible ones. However, this measure is not entirely convenient.
If there is an alphabet consisting of one character, i.e. When m = 1, only this symbol may appear. Therefore, there is no uncertainty in this case, and the appearance of this symbol does not convey any information. Meanwhile, the meaning N at m = 1 does not go to zero. For two independent message sources (or alphabet) with N 1 And N 2 number of possible messages total number of possible messages N = N 1 N 2, while it would be more logical to consider that the amount of information received from two independent sources should not be a product, but a sum of component quantities.
A way out was found R. Hartley who offered the information I, per one message, is determined by the logarithm of the total number of possible messages N:
I (N) = log N
If the entire set of possible messages consists of one ( N=m=1), That
I(N) = log 1 = 0,
which corresponds to the lack of information in this case. If there are independent sources of information from N 1 And N 2 number of possible messages
I (N) = log N = log N 1 N 2 = log N 1 + log N 2
those. the amount of information per message is equal to the sum of the amounts of information that would be received from two independent sources, taken separately.
Formula proposed Hartley, satisfies the requirements. Therefore, it can be used to measure the amount of information. If the possibility of the appearance of any symbol of the alphabet is equally probable (and we have so far assumed that this is exactly the case), then this probability р= 1/m. Believing that N = m, we get
I = log N = log m = log (1/p) = – log p,
The resulting formula allows us to determine the amount of information in some cases. However, for practical purposes it is necessary to specify a unit of measurement. To do this, assume that information is uncertainty eliminated. Then, in the simplest case of uncertainty, the choice will be made between two mutually exclusive equally probable messages, for example, between two qualitative features: positive and negative impulses, impulse and pause, etc.
The amount of information transmitted in this simplest case is most conveniently taken as a unit of information amount. The resulting unit of information, which is a choice from two equally probable events, is called a binary unit, or bit. (Name bit formed from the two initial and last letters of an English expression binary unit, which means binary unit.)
A bit is not only a unit of information, but also a unit of uncertainty. This refers to the uncertainty contained in one experiment that has two equally probable outcomes. The amount of information received from a message is influenced by its surprise factor for the recipient, which depends on the likelihood of receiving a particular message. The lower this probability, the more unexpected the message and, therefore, the more informative. Message, probability
which has a high and, accordingly, low degree of surprise, carries little information.
R. Hartley understood that messages have different probabilities and, therefore, the surprise of their appearance for the recipient is not the same. But in determining the amount of information, he tried to completely eliminate the factor of “surprise.” Therefore the formula Hartley allows you to determine the amount of information in a message only for the case when the appearance of symbols is equally probable and they are statistically independent. In practice these conditions
are rarely performed. When determining the amount of information, it is necessary to take into account not only the number of different messages that can be received from a source, but also the likelihood of receiving them.
The most widely used approach for determining the average amount of information contained in messages from sources of a wide variety of natures is the following. TO Shannon.
Consider the following situation. The source transmits elementary signals k various types. Let's follow a fairly long segment of the message. Let it have N 1 signals of the first type, N 2 signals of the second type, ..., Nk signals k-th type, and N 1 + N 2 + ... + N k = N– total number of signals in the observed segment, f 1, f 2, ..., f k– frequencies of the corresponding signals. As the length of the message segment increases, each frequency tends to a fixed limit, i.e.
lim f i = p i , (i = 1, 2, ..., k),
Where p i can be considered the probability of the signal. Suppose a signal is received i-th type with probability p i, containing – logpi units of information. In the considered segment i The th signal will be encountered approximately Np i times (we will assume that N is large enough), and the total information delivered by signals of this type will be equal to the product Np i log р i. The same applies to signals of any other type, so the total amount of information delivered by a segment of N signals will be approximately equal. To determine the average amount of information per signal, i.e. specific information content of the source, you need to divide this number by N. With unlimited growth, approximate equality will turn into exact equality.
As a result, an asymptotic relation will be obtained - the formula Shannon. It turned out that the formula proposed Hartley, is a special case of the more general formula Shannon.
In addition to this formula, Shannon proposed an abstract communication scheme consisting of five elements (information source, transmitter, communication line, receiver and recipient), and formulated theorems on throughput, noise immunity, coding, etc.