Continued Halting
Sunday, November 12th, 2006The last post A Halting Followup was meant to be the end of the Halting. But I’ve written before about my difficulty with writing. I lean forward to pick a berry from the thicket only to peer into the thicket, see not a linear trail of berries, but berries in every direction. I’m immersed in berries. The spectacle of glistening berries distracts me not because “they’re good too”. But they it’s the same thing in many directions. And as I start to peer around, if I look hard enough, almost everything becomes a berry. So what do you pick?
Hunh?
Writing about the Halting Problem (HP) and Logical Positivism (LP) and Turing Machines (TMs) brought back to me my discussion with Lumpy, (L??), that sage grad student who commented on my blog at one time. Having leaned into the bush, I can see there are a lot more berries to pick.
The reason that Lumpy comes to mind was that he is fond of saying that all things not provable in a set of axioms A, may still be provable in A + B. B is a new set of additional axioms.
Aside from Lumpy being right–which I don’t contest– I see this as a central weakness in his argument against my central argument: that we need to have more than Turing Machine minds to comprehend the concept of a Turing Machine.
I’ll get to that eventually (or maybe I won’t). What I want to discuss in this chapter is Encoding.
Cracking into Code
I feel that I have failed to explain encoding before. So here is another stab at it. It contains much that might be already known to you, and perhaps some stuff that somebody is going to wonder why I waste bytes on it. Nonetheless, I think it provides a brief overview of what I want to specify about encodings.
Chances are you are viewing this page in a windowed environment. The browser is just another window in a stack on your desktop. A “window” is a defined rectangle which causes mouse events to go to certain programs. All of this behavior needs to be encoded in an understandable structure, before the computer can do any of it.
While we’ve mentioned a group of abstractions there, I’m going to focus on the rectangle. We have a number of choices that a windowed system could have made to represent (encode) a rectangle.
We could have: [xmin, xmax, ymin,ymax].
We could also have: [(xmin, ymin), (xmax,ymax)] (The parenthesis are mainly visual cues.)
Or we could have: [(x, y), length, width].
Mainly, you need 4 integers. The choice of representation will result in a system that knows what each slot means. On displays that exist, this pretty much encodes a rectangle lossless-ly. There are other encodings that are lossy.
For example, if you’re “John Smith”, you might have an encoding of yourself in a program that knows you as “John Smith”. It almost might know your age and your shoe size. If you’re dying to know how to encode this, it would likely be
P= [N,age, shoe-size].
Then if you have a packet of instructions that understands that the first slot is N, the second is the age and the third a shoe size, you can stack these people to the skies. But what is N?
Well, N = [’J', ‘o’, …, ‘t’, ‘h’, 0] in most modern programs. What we put in P is the location of memory where N resides. When your computer reads N it enters into a passel of logic which knows that there is an arbitrarily long list of integers (encoding letters). But as no letter is encoded as 0, it knows when it hits 0 that it’s done. But there is another popular style of string which would have it encoded as N = [10, ‘J’…]. That is first, “Here comes 10 characters,” followed by ten characters.
So now that we have Pm as [ Nm, 25, 11 ], and N is either of those two ways of storing names. It doesn’t matter which, but we have to know what way we have chosen.
Thus we can speak about an “encoding” as a way to scan a row of numbers to get back from them what we stored in them. I have described to you pretty standard ways of storing information. But they all contain conscious effort to handle information being placed in that order and the recovery of that information. All of this is contained in an algorithm, captured in a row of numeric instructions.
Of course, it’s interesting to note that the atom of calculation is “the integer”. But it’s not Cantor’s integer. Cantor proved that the set of integers has no greatest member. But any good programmer should know a concept of “Max-Integer” on his or her chosen environment. So integercomputer is not the same thing as integerMathematics.
Oddly enough, this integer is an encoding. And it is a lossy encoding for any integer over roughly 4 billion. On most systems “Max Integer” (I see Gregor Cantor scratching his head.) is a touch over 4 billion–for the most common way of encoding it.
So now I’ve shown you lossless and lossy encodings. The joke in Monty Python’s sketch about the artist Johann Gambolputty de von Ausfern … Von Hauptkopf of Ulm relies on the idea that Johann doesn’t need all those names. Some subset would give enough of an indication of his name. Unless there were also a Johann Gambolputty de von Ausfern schpledensclittcrasscrenbonfrieddigger dingledangledonglebursteinvonknackerthrasherapplebangerhorowitzticolensicgranderknottyspelltinkle grandlichgrumbelmeyerspelterwasserkurstlichhimbleeisenbahnwagengutenabendbitteeinnürnburger bratwurstlegerspurtenmitzweimacheluberhundsfutgumberaberschonendankerkalbsfleischmittleraucher Von Hauptkopf of Munich as well, he might well be understood as Johan Gambolputty (…de von Ausfern….). There is no mainframe format that has ever existed that would store Johann’s name, and no COBOL copybook that would take it back out for you.
We could, however, even in COBOL world, store a unique-enough string. But storing a name that is “good enough” to identify Johann never helps us get Johann’s real name back. Just the same way that storing your name, age, and shoe size does not help us find the average number of times you laugh during an episode of Yes, Dear (which throws a NullPointerException for me (Geek-talk for a value for which there is no value)). Were we able to stick you in the computer, anything that is “you” could tell us, or at least in the encoding system for “you” we might be able to parse your mind and find out.
Person-encoding–among many other types of encoding–have few if any lossless encodings. So databases store what concerns them. Rest assured that none of it is you.
