Some comments on information theory

@Cleopatre VII Nice thread. I'm very interested.
I have some questions, which seems to be easy to answer, but in my opinion are the hardest.
1. What it means that something is truly random, true randomity definition.
2. What is the opposite of truly random process? (in information theory or mathematics)
3. (helpful question) If I decide to grab a cup of tea on my table and drink a bit, was randomness involved in that? Which polarization of 2nd question was it?
 
@Cleopatre VII Nice thread. I'm very interested.
I have some questions, which seems to be easy to answer, but in my opinion are the hardest.
1. What it means that something is truly random, true randomity definition.
2. What is the opposite of truly random process? (in information theory or mathematics)
3. (helpful question) If I decide to grab a cup of tea on my table and drink a bit, was randomness involved in that? Which polarization of 2nd question was it?
In general, randomness is said to be the apparent or actual lack of pattern or predictability in events. According to Ramsey theory, pure randomness is impossible, especially for large structures. However, I remain at the same time that even for microscopic structures randomness is not something obvious. Mathematician Theodore Motzkin suggested that "while disorder is more probable in general, complete disorder is impossible".

Does total randomness really exist? This seems to me a question not only objective, but dependent on human judgment. Randomness and entropy are closely related. I will write about entropy soon. I will also write about your tea! I will also write about paint.

What is the opposite of randomness? Any purposefulness, but with definitions is by no means so easy.

And there are even more problems here. To write about entropy, there are some initial assumptions that are not necessarily true. Hence, it's not an easy topic, and that's probably why Ark writes that he doesn't understand entropy.
 
I am a theoretical physicist, but I must admit: I never understood the concept of entropy (or negentropy aka information). Therefore I am looking forward to learning something new from the discussions in this series....
And there are even more problems here. To write about entropy, there are some initial assumptions that are not necessarily true. Hence, it's not an easy topic, and that's probably why Ark writes that he doesn't understand entropy.

This is an interesting topic to be sure. I'm not an expert and lack the mathematical training (or brain?) to understand the technical details if they are not very basic, but I have been fascinated by information theory ever since I first read Shannon's famous paper back at university.

Part of the difficulty here seems to be that there is massive confusion around the relationship (or non-relationship) between entropy in thermodynamics and "entropy" in information theory. Lore has it that John von Neumann suggested the word entropy to Shannon as a political maneuver, kind of a marketing gag. And there are those who strongly oppose any mixing up of the two, even if just by analogy. An excellent paper about that (almost a book really) is this (no math needed to understand it):

Thermodynamics ≠ Information Theory: Science’s Greatest Sokal Affair
Libb Thims Institute of Human Thermodynamics, Chicago, IL

Even if one does not agree with the author in everything he says, he gives a great overview of the controversy and the fascinating history behind it. It is also useful IMO to prevent one from jumping to conclusions too soon, and he provides tons of useful references.

Then there is the other side, those who try to bring physics/statistical mechanics and information theory together. One must be careful though, because a lot of it seems to be somewhat woo-woo, unclear, and based on playing with words, often resulting in circular reasoning. But it seems to me there IS something going on there, so in that respect, the die-hard critics of applying information theory to other things might be too black and white in their thinking.

At the root of this conflict, it seems to me, once again lies materialism. Ironically, the materialist assumptions seem (in part) to fuel both sides of the argument.

There are those who desperately want to get any talk about information out of physics (or thermodynamics), because they (perhaps rightly) fear that this opens the pandorra's box of going beyond materialism and dealing with "pure matter".

But then there are also those who seek to turn information into "stuff", i.e. something material. They actually WANT to sneak information into physics or other sciences, while treating it as some kind of material commodity. I think they are driven by the hope that by that maneuver, they can explain away some of the contradictions with materialism (but IMO they are just sneaking in language related to consciousness and are under the illusion that this preserves their materialist outlook...)

From the above cited paper:

American physical economics historian Philip Mirowski comments on this blurred confusion in his usual straight from the shoulder manner:52

“Shannon’s theory of information has set in motion one of the most farcical trains of misconceptions and misunderstandings in the modern history of the sciences, namely, that ‘information’ is a palpable thing with sufficient integrity to be measured and parceled out like so many dollops of butterscotch.”

Related to that, it is obvious that Shannon didn't really talk about information, but about the transmission of information. It all doesn't make any sense without the meaning of information, in Shannon's case the agreed-upon code and interpretation on both ends of the transmission. He even initially used the word "intelligence" and not information, following Boole.

This inevitably involves consciousness. It's easy to see this: for example, for written communication, you need the alphabet. Then you can calculate the probability of a letter occuring, and develop compression algorithms based on that. (For example, you use a shorter code for a letter that is often used, or you use a shorthand for the English word "the" instead of the three letters and so on.) But for all that you need the alphabet, and indeed, language. And both receiver and sender need to know what to do with that. And even if both sides are machines, someone still needs to construct and program them, they must be designed for a specific purpose etc. And there is also the question - when you have maximally compressed a signal, what do you get? That's what Shannon originally called "intelligence"... And "intelligble" again implies that there is some intelligence/mind at the end of the transmission chain...

Well, I have no idea either what exactly is going on. In philosophy there is the idea that a computer (and maybe even the brain) is some kind of physical instatiation of higher information processes. So a computer with its Boolean logic is sort of a physical replica of processes at a higher level, "bringing it down to earth" so to say, a bit like in object-oriented programming a certain object is "instantiated". But that might again be just cheap talk and doesn't help...

I recently stumbled upon this paper by Ben-Naim, fwiw:


I'm out of my depth here because you need a math/physics background to really understand it, except for his witty criticism of some of the "crimes" various thinkers/scientists have committed in the name of Shannon. But, this is exactly what Libb Thims accuses him of (!):

Ben-Naim, however, is of a peculiar variety of rare physical chemists, who thinks that Shannon entropy
should be renamed (see: rename table below), and is under the pied piper impression that the entire field
of physical chemistry, in fact the entire SI unit system, the bedrock of science itself, can be reformulated
into units of bits, e.g. by redefining the absolute temperature scale, throwing out Planck’s constant, and
making thermodynamic entropy unitless (Brillouin’s idea), among other absurdities—which of course is a
delusional view. Beyond this, Ben-Naim is so brainwashed—or rather does not see the light, as Planck
put it— by the idea that thermodynamics needs to be reformulated in terms of information theory, that in
regards to in giving his reason for why he agrees with Denbigh that Neumann did science a disservice, he
states ‘I would simply say that I shall go back to Clausius’ choice of the term, and suggest that he should
have not used the term entropy in the first place.’ This is an example someone afflicted by Shannon
syndrome.

A horrible mess all of that, and I can't help but think that at some level, this may have been by design. The understanding of information (whatever it is) seems to be crucial in some way for science to advance, but it seems there are more dead-ends, illusions and conceptual errors than real progress...?

Thanks for this thread btw, these are just some thoughts going on in my mind at the moment.
 
This is an interesting topic to be sure. I'm not an expert and lack the mathematical training (or brain?) to understand the technical details if they are not very basic, but I have been fascinated by information theory ever since I first read Shannon's famous paper back at university.

Part of the difficulty here seems to be that there is massive confusion around the relationship (or non-relationship) between entropy in thermodynamics and "entropy" in information theory. Lore has it that John von Neumann suggested the word entropy to Shannon as a political maneuver, kind of a marketing gag. And there are those who strongly oppose any mixing up of the two, even if just by analogy. An excellent paper about that (almost a book really) is this (no math needed to understand it):

Thermodynamics ≠ Information Theory: Science’s Greatest Sokal Affair
Libb Thims Institute of Human Thermodynamics, Chicago, IL

Even if one does not agree with the author in everything he says, he gives a great overview of the controversy and the fascinating history behind it. It is also useful IMO to prevent one from jumping to conclusions too soon, and he provides tons of useful references.

Then there is the other side, those who try to bring physics/statistical mechanics and information theory together. One must be careful though, because a lot of it seems to be somewhat woo-woo, unclear, and based on playing with words, often resulting in circular reasoning. But it seems to me there IS something going on there, so in that respect, the die-hard critics of applying information theory to other things might be too black and white in their thinking.

At the root of this conflict, it seems to me, once again lies materialism. Ironically, the materialist assumptions seem (in part) to fuel both sides of the argument.

There are those who desperately want to get any talk about information out of physics (or thermodynamics), because they (perhaps rightly) fear that this opens the pandorra's box of going beyond materialism and dealing with "pure matter".

But then there are also those who seek to turn information into "stuff", i.e. something material. They actually WANT to sneak information into physics or other sciences, while treating it as some kind of material commodity. I think they are driven by the hope that by that maneuver, they can explain away some of the contradictions with materialism (but IMO they are just sneaking in language related to consciousness and are under the illusion that this preserves their materialist outlook...)

From the above cited paper:



Related to that, it is obvious that Shannon didn't really talk about information, but about the transmission of information. It all doesn't make any sense without the meaning of information, in Shannon's case the agreed-upon code and interpretation on both ends of the transmission. He even initially used the word "intelligence" and not information, following Boole.

This inevitably involves consciousness. It's easy to see this: for example, for written communication, you need the alphabet. Then you can calculate the probability of a letter occuring, and develop compression algorithms based on that. (For example, you use a shorter code for a letter that is often used, or you use a shorthand for the English word "the" instead of the three letters and so on.) But for all that you need the alphabet, and indeed, language. And both receiver and sender need to know what to do with that. And even if both sides are machines, someone still needs to construct and program them, they must be designed for a specific purpose etc. And there is also the question - when you have maximally compressed a signal, what do you get? That's what Shannon originally called "intelligence"... And "intelligble" again implies that there is some intelligence/mind at the end of the transmission chain...

Well, I have no idea either what exactly is going on. In philosophy there is the idea that a computer (and maybe even the brain) is some kind of physical instatiation of higher information processes. So a computer with its Boolean logic is sort of a physical replica of processes at a higher level, "bringing it down to earth" so to say, a bit like in object-oriented programming a certain object is "instantiated". But that might again be just cheap talk and doesn't help...

I recently stumbled upon this paper by Ben-Naim, fwiw:


I'm out of my depth here because you need a math/physics background to really understand it, except for his witty criticism of some of the "crimes" various thinkers/scientists have committed in the name of Shannon. But, this is exactly what Libb Thims accuses him of (!):



A horrible mess all of that, and I can't help but think that at some level, this may have been by design. The understanding of information (whatever it is) seems to be crucial in some way for science to advance, but it seems there are more dead-ends, illusions and conceptual errors than real progress...?

Thanks for this thread btw, these are just some thoughts going on in my mind at the moment.
Thank you very much for your interesting comment! I think the technical details will not be a problem. We will discuss them very slowly. I will talk a little more about entropy soon. Its relationship with information theory, however, will be a little later.
 
In general, randomness is said to be the apparent or actual lack of pattern or predictability in events.
Predictability (as alone word) still doesn't implicate that process is not random, it may have distribution that values goes very low from 0 to 0.99, and huge spike up from 0.99 to 1 (0-1 chances of occur), which still is kind of random process.
Also we can not forgot about incapableness of measuring all needed and avaliables informations - eg. our hand rolling the dice (we as humans cannot measure all needed variables that are involved with rolling the dice by hand because our senses are limited).

According to Ramsey theory, pure randomness is impossible, especially for large structures.
Let me disagree and I will explain why.

Let's say our smallest possible 'atom of randomness' is 6 sided dice. It has 6 equally possible states 1 to 6 with probability of occuring (when rolled) 1/6. Like in OUR world, bigger object are created from smaller objects, atoms/chemical elements and so on.

So lets build bigger object from our single random objects (this implies additional forces exist in our new simple world made by dices, like internal attraction force of the single dice, but only for very small distances).
So we are building huge wall-like looking set of dices placing for example six dice on the bottom, and 5 another rows of 6 dices on top of it.

D - single six sided dice

DDDDDD
DDDDDD
DDDDDD
DDDDDD
DDDDDD
DDDDDD - wall of 6 sided dices - new object in our experimental world

This new object (the wall) also has 6 states because has 6 sides. But know the probability of being faced up with certain side up, is different than it was for one dice only.
So those six sides look for example like that:
First side (our front) - side nr 1
111111
111111
111111
111111
111111
111111
Second side (our back of the wall) - side nr 2
222222
222222
222222
222222
222222
222222
Side side (eg, left one) - side nr 3
3
3
3
3
3
3
Right side - side nr 4
4
4
4
4
4
4
Top side - side nr 5
555555
Bottom side - side nr 6
666666

Now when you roll such builded object (built by dices with even randomity) we have object with uneven distribution because sides nr 1 and nr 2 have much higher probability of occuring after rolling than remaning 4 sides.
We could build much higher wall-dice, that has let's say 100k rows. Then from our point of view only sides nr 1 and nr 2 will always happen. We could even builda dice that will almost everytime reveal only 1 of all 6 sides, but according to law of big numbers, remaining sides still may happen sometimes...

So our exmerimental world is still random, but its randomness is rulled by an order of objects that are created within. Yet its still random.

We may of course ask, what was behind the process of 'rolling the dice', but I will leave this topic for the next time ;)
 
Last edited:
Also we can not forgot about incapableness of measuring all needed and avaliables informations - eg. our hand rolling the dice (we as humans cannot measure all needed variables that are involved with rolling the dice by hand because our senses are limited).
Thank you very much for your interesting comments. Regarding the above, I will shortly be writing about the Maxwell's demon and entropy. Now, however, I will post one more post below. I write it on purpose just before introducing the topic of entropy.
 
Returning to the probability distribution in the roll of the dice, let's assume a hypothetical situation in which we roll the dice 10 times. Which of the following results do we find more likely?

{1,4,3,6,2,5,6,4,5,2}

or

{6,6,6,6,6,6,6,6,6,6}

and why?

The first set of results seems to be completely accidental, and if it had been obtained during a dice roll, it would not have surprised anyone. The second one seems to be very interesting and puzzling! But is it more likely to get the first set of results? Let us analyze this problem from the mathematical point of view.

As we remember, the probability p_i of obtaining each value of a random variable in the dice roll is equal to 1/6. Therefore, the probability of obtaining the first set of results will be expressed by the formula

1/6∙1/6∙1/6∙1/6∙1/6∙1/6∙1/6∙1/6∙1/6∙1/6=(1/6)10=110/610 =1/610 =1/60466176.

In the second case, however, the probability of obtaining a set of results will be expressed by exactly the same formula. Both sets will therefore be elementary events in the set Ω that consists of 60466176 elements, i.e.

Ω10={(1,1,1,1,1,1,1,1,1,1),(1,1,1,1,1,1,1,1,1,2)…,(6,6,6,6,6,6,6,6,6,6) }.

This result may seem surprising at first, but from a mathematical point of view there is nothing surprising in it, because any sequence of ten dots from 1 to 6 is just as likely to be obtained as any other.

This fact will be of great importance during the discussion on entropy, which will be the topic of the next post.

Earlier, the issue of the empty set was also discussed, and I referred to the probability of hitting a point. How is it actually with this point? Hegel wrote a lot about points. He even argued that they were in fact a contradiction of the space which, after all, is actually made of them! Through such considerations, he finally came to the conclusion that time is the truth of space. Heidegger also referred to Hegel's conclusions.

Returning to mathematics, however, it should be said that In classical Euclidean geometry, a point is a primitive notion (the same as the set and its element) that models an exact location in the space, and has no length, width, or thickness. In modern mathematics, a point refers more generally to an element of some set called a space.

We know, however, that the point is infinitesimally small. Thus, if the area of a point is zero, then its ratio to any chosen, non-zero area is expressed as 0/r, where r is any real number greater than zero. So what is the probability of hitting the point?

Measure of the point hitting event m(A) = 0. Measure of the whole set Ω, m(Ω)> 0. Hence,

P(A)=0/(m(Ω))=0.

Nevertheless, hitting the chosen point is by no means impossible. Why is it like that? As mentioned earlier, in the case of infinite sets, the probability of a given event is not an inherent feature of the set and the event, but depends on how the event is selected. By choosing fancifully, the probability can be manipulated. In particular, it makes no sense to talk about probability if we choose from an infinite set and do not specify how we choose an element.

In the example above, I also used the concept of the measure. It is simply a function that determines the "sizes" of measurable subsets of a fixed set by assigning them non-negative numbers or infinity, assuming that the empty set measure is zero and the measure of the sum of disjoint sets is the sum of their measures. For now, however, we will not need a formal definition of the measure, so I will not introduce it.

I am including some simple exercises as usual.

Exercise 1.
We roll the dice 5 times. Calculate the probability of obtaining the following results:
a) {1,1,1,1,1},
b) {2,6,3,1,4}.

Exercise 2.
Prove the following formula:

P(A∪B) = P(A) + P(B) - P(A∩B).
Hint: Use what I wrote about the sum and intersection of sets.

Please do not hesitate to ask questions if you would like me to elaborate on any given point in this post.
 
@Cleopatre VII

P(A) = |A|/|Ω|
P(B) = |B|/|Ω|
P(A∩B) = |A|/|Ω| + |B|/|Ω| - |A∪B|/|Ω|
Because for example for two sets A = 1, 2, 3, 4 and B = 3, 4, 5, 6 |A| = 4 |B| = 4 , so |A∪B| = 6 because we have six the same elements then 4+4-6 = 2 (common elemens 3 and 4)

P(A∪B) = |A|/|Ω| + |B|/|Ω| - |A|/|Ω| - |B|/|Ω| + |A∪B|/|Ω|
P(A∪B) = |A∪B|/|Ω|

Good thinking or not?
 
This is an interesting topic to be sure. I'm not an expert and lack the mathematical training (or brain?) to understand the technical details if they are not very basic, but I have been fascinated by information theory ever since I first read Shannon's famous paper back at university.

Part of the difficulty here seems to be that there is massive confusion around the relationship (or non-relationship) between entropy in thermodynamics and "entropy" in information theory. Lore has it that John von Neumann suggested the word entropy to Shannon as a political maneuver, kind of a marketing gag. And there are those who strongly oppose any mixing up of the two, even if just by analogy.

I don't know information theory enough (or at all) to make a 'relevant/educated' statement about bolded part above, but reading in the Wave how von Neumann 'handled' Einstein and his research about gravity/relativity (and also potential introduction of 5th dimension as a physical one, i.e. A. Einstein and P. Bergmann; On a Generalization of Kaluza's Theory of Electricity, see attachment), it wouldn't surprise me at all if he also had some 'handling' done in this case.

Regarding the statistical interpretation of entropy (thermodynamics; probabilistic interpretation), I find Boltzmann's interpretation, and later Gibbs' one, since Boltzmann considered "all the component particles of a thermodynamic system as statistically independent", easy to understand (from wiki: Boltzmann's entropy formula - Wikipedia):

Boltzmann used a ρ ln ρ formula as early as 1866. He interpreted ρ as a density in phase space - without mentioning probability - but since this satisfies the axiomatic definition of a probability measure we can retrospectively interpret it as a probability anyway. Gibbs gave an explicitly probabilistic interpretation in 1878.

How is this connected, or/if can be used in information theory, I have no idea.

Thank you for starting this thread @Cleopatre VII, will be following with interest.
 

Attachments

  • Einstein_Bergmann-Generalization_Kaluza_Theory_Electricity-1968642.pdf
    1.2 MB · Views: 7
@Cleopatre VII

P(A) = |A|/|Ω|
P(B) = |B|/|Ω|
P(A∩B) = |A|/|Ω| + |B|/|Ω| - |A∪B|/|Ω|
Because for example for two sets A = 1, 2, 3, 4 and B = 3, 4, 5, 6 |A| = 4 |B| = 4 , so |A∪B| = 6 because we have six the same elements then 4+4-6 = 2 (common elemens 3 and 4)

P(A∪B) = |A|/|Ω| + |B|/|Ω| - |A|/|Ω| - |B|/|Ω| + |A∪B|/|Ω|
P(A∪B) = |A∪B|/|Ω|

Good thinking or not?
Good thinking in general, but I would write the exact proof as follows:

A=(A\B)∪(A∩B),
B=(B\A)∪(A∩B).

Therefore, by the definition of probability

P(A)=P(A\B)+P(A∩B),
P(B)=P(B\A)+P(A∩B).

For any sets the following relation is satisfied:

(A\B)∩(B\A)=∅.

Hence,

P(A)+P(B)=P(A\B)+2P(A∩B)+P(B\A)=P((A\B)∪(A∩B)∪(B\A))+P(A∩B)=P(A∪B)+P(A∩B).

And therefore,

P(A∪B)=P(A)+P(B)-P(A∩B).

Q.E.D.

Do you understand the specified stages of the proof?
 
I don't know information theory enough (or at all) to make a 'relevant/educated' statement about bolded part above, but reading in the Wave how von Neumann 'handled' Einstein and his research about gravity/relativity (and also potential introduction of 5th dimension as a physical one, i.e. A. Einstein and P. Bergmann; On a Generalization of Kaluza's Theory of Electricity, see attachment), it wouldn't surprise me at all if he also had some 'handling' done in this case.

Regarding the statistical interpretation of entropy (thermodynamics; probabilistic interpretation), I find Boltzmann's interpretation, and later Gibbs' one, since Boltzmann considered "all the component particles of a thermodynamic system as statistically independent", easy to understand (from wiki: Boltzmann's entropy formula - Wikipedia):



How is this connected, or/if can be used in information theory, I have no idea.

Thank you for starting this thread @Cleopatre VII, will be following with interest.
Thank you very much for the book and, above all, for your presence. In time we will come to the relationship between entropy and information theory, but before that happens, I will introduce the concept of entropy and the assumptions that must be made for this.

We will talk about it for now in the context of mathematical and physical models, but I intend to draw attention to the mysterious and unknown in the future. That's what I'm actually getting at, but I start with mathematics.

Mathematics can sometimes be associated with a dry view of the material world, but in my opinion it is not. Yes - mathematicians construct models that do not embrace our reality. Few people know, however, that what is most abstract in mathematics often began in theological and mystical considerations. Mathematics, physics, theology, and mysticism have many points in common. Seemingly difficult to find them, but currently I am dealing with them during my research work.

Apart from that, I also work in mathematical physics and neurobiology. It all comes together, and the mention of consciousness in one of earlier comments did not escape me either. We will also talk about consciousness.

Consciousness and time are my two greatest fascinations.
 
One more note. I don't know if everyone is familiar with the concept of set difference (I used it in my proof) and the complement of sets. I will explain these concepts briefly.

We write the difference of sets A and B as A\B.
From definition:
x∈(A\B)⇔(x∈A)∧(x∉B).

Thus, the difference of sets A and B includes all those elements that belong to A and at the same time do not belong to B.

Example 1.
A = {1, 2, 3},
B = {2, 3, 6},
A\B = {1}.

The concept of the difference of sets is related to the concept of the complement of sets, i.e. A\B=A∪Bc,
where Bc is the complement of the set B, i.e. all those elements that do not belong to B.

Example 2.
Ω = {1,2,3,4,5,6,7,8,9},
B = {1,2,3},
Bc = {4,5,6,7,8,9}.

Above we assume that this set Ω is our whole world. There is only the set Ω and its individual subsets.

Additionally, it can be presented graphically (in attachment).

If you still have any questions or doubts or anything is not clear enough for you - please ask. As already mentioned, I love answering questions.

I also have a request that people who are more mathematically advanced forgive me. I would like everyone to be able to take part in this discussion to an equal extent, not only people who have a more advanced mathematical apparatus. That is why I care about your understanding, dear readers. I care about every single person.
 

Attachments

  • Complement.png
    Complement.png
    7 KB · Views: 5
Last edited by a moderator:
Back
Top Bottom