Wednesday, December 30, 2009

A Very Large Structure

You'd think I'd learn. You really would.

I've posted a number of entries about issues with ZFC set theory, all of which, through the patient help of a number of readers, have proven to not live up to their expectations. As it stands, nothing I've said so far has shown any real problems within set theory. ZFC set theory has proven itself durable, and although I find some non-intuitive attributes to the formal system, it has proven itself consistent. It is exactly as it was designed to be.

I am going to present another possible argument (I've learned to not say proof :-) that most likely will crash itself at the rocks of set theory, like all of the others. Still, I am too much of a stubborn fool to just turn and run ...

This argument will come in three parts:

1. I will define a data-structure that exists OUTSIDE of set theory.
2. I will use the data-structure to define a set.
3. I will apply Cantor's diagonal construction to the set, and examine the results.


THE DATA-STRUCTURE

This section will define a data-structure that will be useful later in defining a set. The data-structure itself is not intended to be within set theory, and is only really useful as a convenient method of enumerating the number values to be placed in the set.

Lets start by specifying three irrational values:

sqrt(2)/10 = 0.141421356...
e/10 = 0.271828183...
Pi/10 = 0.314159265...

Each of these values is an irrational number, that is they contain an infinitely long non-repeating sequence of digits. The division by 10 just shifts the digits, it doesn't alter the non-repeating representation.

Lets construct a subtree P0 such that:

- it is an N-ary tree, where each and every node has exactly 10 children.
- the value placed at the root is 0.
- the value of each child node from left to right is 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, this extends to all of the children, all of the way down to infinity.

We can define a path through P0, as being the sequence of digits starting at the root and continuing all of the way to infinity. Thus:

- All paths are infinitely long.
- There are an infinite number of these paths in the structure.

Each path, expresses either a finite rational, a infinitely repeating rational, or an infinitely long non-repeating irrational. A few examples of each are:

a) 0.000000000... is a finite rational. (commonly written as 0)
b) 0.000110100... is a finite rational. (commonly written as 0.0001101)
c) 0.333333333... is an repeating rational (commonly written as 1/3)
d) 0.151515151... is an repeating irrational (commonly written as 1/6.6)
e) 0.999999999... is an repeating rational
f) 0.141421356...  is an irrational (commonly written as sqrt(2)/10)
g) 0.271828183... is an irrational (commonly written as e/10)
h) 0.314159265... is an irrational (commonly written as Pi/10)


We can see that a) is in the structure, starting at root and continuing down each and every far left hand child (0) until infinity. b) goes from the root to thru a 3 zeros, then to a 1, another 1, and then continues down the tree for all of the zeros. c) follows the 3 children (4th branch) and d) alternates between 1 and 5. e) follows the far right child all of the way down to infinity.

f), g) and h) follow a much more complex path from the root, twisting and turning at each level in a way to guarantee that they aren't going to repeat in any way.

By construction, this tree contains paths that specify all of the permutations of all possible real values in the range [0,1). Although each node has exactly 10 children, there are an infinite number of paths, and each path is infinitely long.

Because it is fully specified, we can say that 0.0000000..., 0.333333333... and 0.999999999... are paths in P0. We can also say that sqrt(2)/10 is a path in P0. And we can say that both e/10 and Pi/10 are paths in P0. There are an infinite number of these defined paths in P0.

We can now construct a tree called T that has an infinite number of children, stretching out to infinity on the right. For convenience we can say that the value contained at the root of T is 0.

To construct the children we can start by adding in P0, at the left-hand node. To the right, increasing by 1 each time, we can add in an infinite number of children:

P0, P1, P2, P3, P4, ...

P1 is similar to P0, in that it permutes all possible combination of paths, but it's root value is 1 instead of 0. P2 is 2, P3 is 3, etc. Each Pn subtree covers the range [n,n+1) of real numbers. Each Pn subtree permutes all possible combinations of the infinite paths. Each Pn subtree fits snugly against its siblings, so that together the children cover the range [0, infinity]

Now we need to show that T encodes all possible representations for all possible real values. We can start by noting that any infinitessimally small values are contained in P0, located somewhere on the left-hand side of the branches of P0, such as example b) from above.

Any large irrational value, such as Pi^e can be found thus:

Pi^e = 22.4591577...  which starts at the child P22, the 23rd node of T, and works its way down to infinity. 

Because there are an infinite number of Pn subtrees, all extremely huge reals are contained in a subtree somewhere. Pi starts under P3, e in P2, etc.

So the structure contains all of the large numbers, the small ones, and every possible infinite variation of the digits on both sides of the decimal place. Thus the tree T contains an infinite representation for every possible value in R.


THE SETS

By definition, any irrational or real value can be used in a set. That is we can define a finite set

A = { sqrt(2), e, Pi }

and although the actual values of each of these elements are infinitely long non-repeating values, they are completely included in the set, and the set itself is countable.

So, we can define a finite set from our 8 example numbers in P0:

B = { 0, 0.0001101, 1/3, 1/6.6, 0.999999..., sqrt(2)/10, e/10, Pi/10 }

This set is countable.

We can extend this method of defining a set to handle an infinite number of elements. Thus we can define a set P0 from the above subtree P0 as follows:

The set P0 contains exactly one element from each and every infinitely path in the subtree P0. Since order doesn't matter in a set, we can express this set as follows:

P0 = { 0, 1/3, Pi/10, sqrt(2)/10, e/10, r1, r2, r3, ... }

Where r1, r2, r3, ... are all of the other reals matching the infinite number of different paths in the subtree P0. Based on the elementary property of sets we can say that:

- The set P0 contains irrationals such as Pi/10.
- The set P0 contains rationals such as 0 and 1/3.
- The set is infinite.

We can show that:

- The set P0 is countable.
- The set P0 is countably infinite.

because we can trivially construct a function f: P0 -> N that is an injection. If we have an injection f on a infinite set, by the basis of set theory we can always find a bijection.

From the construction and because the set is countably infinite we can say that:

- The set P0 contains all of the infinite paths that exist in the subtree P0.

As well as P0, we can define the sets P1, P2, P3, ... to have the exact same properties as the set P0. That is they have unordered elements such as:

P1 = { 1, 4/3, r1_1, r1_2, r1_3, ... }
P2 = { 2, 7/3, r2_1, r2_2, r2_3, ... }

We can now define a set T, such that it is the union of all of the elements of the sets P0, P1, P3, ... Because the order of sets is unimportant, we can list the elements in the set T by the following method:

The first element in the set T is the 1st element of P0.
The second element in the set T is the 2nd element of P0.
The third element in the set T is the 1st element of P1.
The fourth element in the set T is 3rd element of P0.
The fifth element in the set T is the 2nd element of P1.
The sixth element in the set T is the 1st element of P2.
....

This enumeration shows that there is an injective function between the elements of T and N.

The enumeration gives us the set:

T = { 0, 1/3, 1, Pi/10, 4/3, 2, ... }

Because this set is constructed from the union of countably infinite sets, and because we can trivially construct an f: T -> N,  this set too is countably infinite.

Please note that although we defined the sets from the corresponding data structures, other than as a syntactic way to enumerate out all of the elements for the definition of the set, the data structures themselves are not used in the basis of this argument.

I say this because many counter-arguments will attempt to show that the subtree P0 is uncountable. Whether it is or is not, doesn't matter because within the rules of set theory we can define the sets Pn to have an infinite number of elements,  and we can only use the tree as a method of enumerating these elements in a descriptive way for the definition.

At very least, we could use an intervening seqeunce constructed from descending the tree, one infinite path at a time in some fixed order. As an infinite sequence, it could contain all of the infinitely long paths. We can always construct such a sequence from the infinite number of infinite paths contained within the tree because otherwise Cantor's diagonal argument would never be able to test such an arrangement of numbers, and this would be an inconsistency in set theory. It has been shown multiple times that Cantor's can test any possible arrangement of real numbers. Because of that constructing a sequence must be possible. 


THE TEST

So, now we have a set T, which we believe contains all of the possible real values. Of course we must test this with Cantor's diagonal argument. We do this by constructing a sequence from our set T, and apply the diagonal argument to that sequence.

Now, there are four possible outcomes from Cantor's argument:

1. It constructs an element E, found in T.
2. It constructs an element E, not found in T.
3. It never halts.
4. It constructs an element E, but not from all of the elements in T.

#3 and #4 can be shown to be trivially not possible. That is, they rely on there being structures in set theory that are larger than just sequences (aka magic sequences), but over the last few weeks, there has been more than ample proof that such structures do not, and cannot exist in set theory, and that the first principles of set theory itself keep them from existing. Thus Cantor's diagonal argument by definition can always construct an element E from all of the elements in the sequence T.

#2 is the normal expected behavior for Cantor's. That is, we expect to go through the construction and be able to create an element E that is a real number, but that is not contained in the set T. Since T is countably infinite, there is definitely some enumeration to which we can apply the construction. Now, since T contains all of the elements from the sets P0, P1, P2, ... and these in turn contain all of the possible elements from their data-structure counter-parts, the only way for any elements to get missed, is for them to have dropped out when we went from the data structures to the sets and then to the sequence. That is, the data-structure self-evidently contains all possible permutations of all real numbers, so either Cantor's is wrong, set theory is inconsistent or something got lost along the way (ie. that R itself is not able to fit into set theory, so it's not even uncountable, it is just outside looking in).

#1 would confirm that T, which is R, is countable. Which given it's underlying status, I think this may actually show a contradiction as well because T is infinitely countable and R is uncountable, yet they are isomorphic to each other.

My guess would be that #1 is the likely result, in that we could take the element E that has been constructed, and look it up in the tree T, in some subtree Pn. We will be able to find it there, and thus show that there are no missing elements.

Given everyone's faith in set theory, there is likely some problem with the construction of this argument.

Thursday, December 24, 2009

Cantor's Lost Surjection (Revised)

Revision 2.0: Dec 28, 2009 -- I've made the changes suggested by Chad Groft, and boris in their excellent comments.

Before you go any further, PLEASE read this introductory paragraph fully and carefully. Over the last couple of weeks, I've been writing several posts about a very subtle problem in ZFC set theory, whose results wind through a great deal of the existing work. I may be right, I may be wrong, but I have been very careful to try to stay within the first-principles of set theory, in order to shed some illumination onto the issue. The issue is not as simple as "I am a crank", or "Cantor's proves..." or any other quick knee-jerk reaction to what I am saying. You are welcome to comment, to express your opinions, BUT only if you're read what I've written carefully, and with an open and objective mind. Only if you're not succumbing to your emotions. Mathematics is about careful, objective thought. We must never lose sight of that.

This post is NOT about:

- Reals being (un)countable
- missing irrationals in a bi-infinite spreadsheet
- missing irrational seeds in an infinite number of bi-infinite spreadsheets
- magic sequences


First, I want to talk very explicitly about a wikipedia page:

http://en.wikipedia.org/wiki/Countable_set

In particular I want to closely examine the "Definition" section. This part of the document contains exactly 13 lines (although if your window isn't wide enough some of the lines wrap). In the first four lines, we have the following two definitions:

D1: Line 1 -- a set is "countable" if an injective function exists to N
D2: Line 4 -- a set is "countably infinite" if a bijective function exists to N

Before we go any further: A "formal system", as defined in GEB by Douglas Hofstadter, contains a finite number of axioms, an infinite number of theorems, and a finite number of rules of production. A number of people have also included "definitions" as being separate and distinct pieces from the basic axioms in the system, in particular ZFC set theory differentiates between its base definitions (such as countable) and it's axioms (such as the axiom of choice).

All theorems are built on-top of the many self-evident pieces below, and other theorems.  Everything that isn't self-evident or independent is a theorem. Everything that stands on its own is an axiom or a definition.

In this way, definitions and axioms act as the basic building blocks on which all of the theorems are based. The different pieces may be interspersed with each other, but some are primitive building blocks (axioms and definitions), while the others are composite building blocks (theorems).

Rigor means the entire construction is based on a finite number of solid, self-evident pieces. Neither the axioms, the definitions, nor the theorems should contradiction each other, or be inconsistent. If that happens, there is a problem with the formal system.

Lines 1, and 4 basically specify two definitions. Together we can make the following statements about the way terminology is defined for set theory:

For all sets within set theory:
a1: "countable" is not always equal to "countably infinite"

For all functions within set theory:
a2: bijective is both injective and surjective

Getting back to the wikipedia page, in the next part:

T1: line 7 -- starts a "theorem"

This is then followed by three statements, each of which are all deemed to be equal to each other. They are:

T1-1: S is countable (injective) (f: S -> N)

T1-2: Either S is empty OR there exists a surjection (g: N -> S)

T1-3: Either S is finite OR there exists a bijection (h: N -> S)

The lines T1-1 and T1-3 basically say that for any infinite set, if it is countable, then there exists a bijection between the naturals numbers and it. So that for any infinite set:

S = { a, b, c, ... }

there is a bijection between N and itself.

PLEASE keep in mind that according to the page itself, T1 isn't an axiom or a definition, it is a theorem. The proof for the theorem isn't given in the page, it says it comes from a textbook.

By T1, the following set:

A = { a1, a2, a3, ..., b1, b2, b3, .... }

is claimed to have a bijection between it and N. The idea is that the above can be rewritten as:

B = { a1, b1, a2, b2, a3, b3, ... }

And because this is countable, there is a bijection between this set and N. If we map these two onto each other we get assignments as follows:

1 <- a1, 2 <- b1, 3 <- a2, 4 <- b2, 5 <- a3, 6 <- b3,  ...

That is, it takes the first 6 elements in N to map to the first three elements for each of the ak and bk subsets of elements. This mapping is considered to be a bijective function.

However, if we have the finite set:

D = { a1, a2, a3, b1, b2, b3 }

and another

E = { 1, 2, 3 }

Since E has less elements than D, we can only create an injective function from E to D. The size difference means that no bijective function can exist.  We only get something like:

1 <- a1, 2 <- a2, 3 <- a3

In this sense, it is accepted that in set theory, infinite sets that are countable can have a bijective function between them, even if one set appears to be somewhat "larger", with respect to infinity, then the other one. However, this does not apply to finite sets.

As a comment on my last post, delosgatos said:

"One of the cool and weird things you first encounter with infinite sets is that infinite sets are bijective with their own proper subsets."

Actually, it is far more than that. There is basically a bijective function with "any" other infinite set if that set is considered countable. If you can find an injective function, then you can claim a bijective one based on T1.

In this way set theory essentially splits all (non-empty) sets into three groups:

a) finite
b) countably infinite
c) uncountable

Finite sets differ based on size, and thus there are an infinitely large number of different possiblies. By definition there are also an infinite number of uncountable sets.

But for countably infinite sets, since there exists a bijective function between any pair of sets, they are all essentially isomorphic to the same underlying set N. A bijection can always be found.

This behavoir of the system is very different for all sets that are countably infinite then it is for finite sets.

As a consequence, the "definitions" of injective and surjective become watered down for countably infinite sets. They can still exist as function types, in that you can create an injective function, but by definition, you can also always create a bijective one between the same two sets.

Rephrased as I did earlier, we get the following relationships in the terminology:

For all countably infinite sets within set theory:
b1: "countable" is always equal to "countably infinite"

For all functions on countably infinite sets within set theory:
b2: a bijective function exists whenever there exists a function that is injective and surjective OR just injective

Under the theorem T1, as an example, we can find an f: N -> NxN  that is considered to be a bijection, for exactly this reason. Where the set NxN is:

{ {0,0}, {0,1}, {1,0}, {0,2}, {1,2}, {2,2}, {2,1}, ... }

Which is interesting, given that if NxN were finite, it would contain n^2 more elements than a finite N, if both were limited to the base elements (like 1, 2 and 3).  If f was between finite sets, then there could be no bijection, but because these are infinite sets, that is ingored.

Thus f is widely believed to be bijective, and it's underlying sets are considered countable, and countably infinite.

BUT, the above statement is only true because of the "theorem" T1, not because of the two basic definitions D1 and D2. By themselves the definitions are not strong enough to say that.

Just to be sure I figured I'd check a source other than wikipedia. The following:

http://plato.stanford.edu/entries/set-theory/primer.html

is a well-written introduction to set theory. It uses the term "one-to-one" often, but it is careful to define this in section 3 as being "injective". It avoids the word bijection.

Read through section 7, very carefully. It starts with defining countable as having the same cardinality as N, but then it says in the comments that the definition of "countable" is actually injective. Also it defines "most countable" as being finite or countable (wikipedia warned of this). 

Throughout the various theorems, "countable" keeps falling back on being one-to-one (injective). The authors carefully avoid using "bijective" through-out the entire document. 

Later in section 9, they do something really weird, they say:

"Naturally, a question arises whether perhaps all infinite sets are countable."

But then they use Cantor's diagonal argument to show that this can't be true.

Still, note that throughout this entire reference, the authors never once go back to the idea that "countable" is "bijective" for infinite sets. Countable stays as a definition which is injective.

So the wikipedia page probably isn't wrong. It accurately describes how the definition starts with countable, and because of a theorem gets to bijection for all infinite sets that are deemed to not be "uncountable".


"That's just the way ZFC set theory is ..."

Well, maybe thus is the way it is taught, but the "definitions" initially state that it is not the case. The wikipedia page clearly casts all of the blame on the unnamed theorem T1. A theorem whose proof is causing inconsistencies within set theory.

For finite and uncountable sets, there can be diffents sets within those groups where the the relationship between the sets in terms of functions is "only-injective", or "only-surjective". That is, no bijection exists or can exist. They are clearly and consistently defined.

For all of the infinite sets in the middle (between finite and uncountable), there is always a bijectiion between any two sets, no matter what. That is a fundumental inconsistency in the way the formal system treats its different mathematical objects.

It starts with pretty clear and simple definitions about how there is a difference between injective and bijective functions, but for one huge segment of the sets, those differences become negligable.Meaningless.

Just for countably infinite sets, the theorem T1 essentially "hides" any differences in the size of the sets. It is just one big giant special case, that does not match with the behavoir of the rest of set theory. With the behavoir set forth by the definitions and the axioms.

And what's going on with counting? The wikipedia page starts with a nice intuitive definition. "We can count things, one at a time, ...". But because of the T1 theorem the terminology and usage become extremely complex. Counting the elements of finite sets, is radically different than counting the elements of infinite sets.

Why?

According to the wikipedia page, the theorem T1 is proven in Lang's text. I don't have access to that proof, but from the other text I included, we can guess that the problem is probably Cantor's diagonal argument. Everything is fine in ZFC set theory when the text is talking about "countable" as just being "injective". It's a pretty simple definition.

The existance of theorem T1, however causes a strange ripple throught-out the system that effectivily treats all countably infinite sets in a special and strange way, inconsistent with the basic foundations of set theory.

That's one potential problem, but I've already shown that there are more problems lurking about. The magic sequences, for example. I provided an example of one, but because of T1, everybody believes that an infinite set that is countable automatically has a bijection to N. This means that the example can just be "bijected" away. The set:

C = { a1, a2, ..., b1, b2, ..., ... }

where the third "..." means creating elements such as c1, c2, ..., has two very different uses of "...", each of which head off to a different type of infinity.

With T1, numerous commenters have shown that this is now countably infinite, and has a bijection to N.

Thus we can pound this thing down into a sequence (even though it doesn't actually fit). This probably shows up another inconsistency because at very least I would expect C to be classed with R as being "uncountable", but it seems to fall the other way, as countably infinite.

So we have three problems now, all acting as a perfect storm with each other:

- Cantor's diagonal argument relies solely on regular sequences.
- magic sequences are believed to not exist.
- T1 allows claims a bijection between 'all' countably infinite sets.

The net result is that T1 declares that where there is any mapping that is injective there is automatically a bijective one too, whether or not it really is. This basically undoes the meaning of injective and surjective for all functions on infinite sets (except those that are proven to be uncountable).

It is this T1 theorem which starts a chain of problems within set theory (although I suspect that the diagonal argument is still the originator of all of this).

Ultimately, what happens is that given any possible structure for reals that could fit into set theory, forcing it to be bijective with T1 incorrectly stuffs it into a sequence. No sequence can get past Cantor's argument. And T1 reduces anything larger than a sequence, to just being a sequence. Thus they all work together to insure that set theory can't be applied to complex structures. 

Thus any potential structure to hold R, no matter if it is correct or not, cannot be examined objectively. They all fail regardless of what they hold, or how they hold it.

To some, this is the essence of the proof that reals are uncountable. That there is no way to get around these behaviors. To me this appears to be circular logic, with each flaw protecting each other from detection.

But if we remove T1, and use only D1 and D2 instead, then countable and countably infinite don't blur (and we can ignore countably infinite). Then the rationals are only countable, and any structure with multiple infinities isn't incorrectly flattened into a single infinity. Then magic sequences really do exist and they get past Cantor's and then there is actually a chance (although not definite) that R is in fact countable (injective).

Removing T1 starts a landslide in set theory. And given that parts of it are inconsistent with it's own initial definitions, it is a good thing to question this.

PLEASE, before you rush off and comment, accusing me of missing something elementary, please reread this post very carefully. Over the last few weeks, I have had an excellent stream of comments from which to learn. In writing this, I took many of them into account, and tried to make sure that I didn't pass over anything obvious. I reread my references, and tried to account for all of the details. I do really appreciate comments, but I hope that people can stay constructive, and really dig into the substance of the problem. The problem is subtle, deep and has wedged its way past a great number of people.

Saturday, December 19, 2009

Crank it Up

What I am about to say, if true will really surprise a lot of people. We all have a tendency to believe that once something is locked in stone, then by virtue of that, it is unshakable. Well, that's mostly true, but mankind has a vast amount of learning left to do...

Picture a spreadsheet, but it's no ordinary spreadsheet. To the left and to the right it extends out infinity. Up and down it also extends out infinitely.

In each cell we place a real number. We start with one in the center and generate the rest of the spreadsheet.

Now, we start in the middle, and we make some program to add the following digits in each cell to the right by:

.1,  .111...,  .10101...,  .1001001...,  .100010001...,  ...

and to the left by

...,  10001,  1001,  101,  11,  1

and we leave the center at 0.

Going upwards we shift the number to the left by 1 decimal place (divide by 10), and going downwards we shift it right by one digit (multiple by 10).

Now, we place an irrational in the center, lets say Pi, and we do the generation:

...
... 141.4159... 41.4159... 31.4159... 32.4159... 32.5270... ...
...  14.1415...  4.1415...  3.1415...  3.2415...  3.2527... ...
...   1.4141...  0.4141...  0.3141...  0.3241...  0.3252... ...
...

What exactly are we are looking at?

This is a spreadsheet, that when stretched out to infinity in the left, right, up and down directions will generate irrationals. Lots of them. Possibly all of them.

Since we can circle around the spreadsheet starting it the middle and making gradually larger and larger loops, we can map each and every cell onto N. This means that our spreadsheet is countable, and thus if this is all of the irrationals, then the irrationals themselves are countable.This would also mean that R itself, which is made up of the countable rationals and the now countable irrationals is countable.

How do we know this works. One interesting way is to start at Pi and generate a table large enough to get to sqrt(2). If this works, the sqrt(2) will be in the table, and we can navigate to it.

In fact, all other know irrationals are also in the table.

Why?

There are lots of reasons, which I have been circling around for a couple of weeks now. Lets start with some basic premises:

- All irrationals are a non-repeating path that is infinitely long.
- If you add a repeating (or finite) path to a non-repeating one, the result is a non-repeating one.
- If there is a path between any two irrationals, and it is countable, then the path is finite.
- Irrationals in base 10 are just a representation of the underlying number.
- the +1 or +.1 pattern permutes the number, but not the distance between the permutation, so we need to distance the changes from each other, thus +11, +.111..., +101, +.10101... etc.
- the infinitely long permutations permute the tail.
- we have an infinitely long number, so we need infinitely long permutations (this was the most painful part to figure out).
- the shift operators, left and right accomplish the goal of moving the permutations into infinity.
- the table, itself generates all of the permutations and combinations, stretching out to infinity in four different directions.

We can start with any irrational, and get to all of the others.

But there is more:

The bi-infinite spreadsheet (because normally a spreadsheet only goes to infinity in two directions, rows and columns), is actually a binary infinite node tree, as I discussed in an earlier post.

Iterating the tree or the spreadsheet produces a magic sequence, as my other post also showed.

We can see that the spreadsheet is trivially countable by just starting in the center and circling around,

Still, the fact that it is shows that Cantor's diagonalization doesn't apply to the enumeration precisely because there are two different sets of infinities make the result a magic sequence.

And just a little bit more:

In one of my comments in Good Math, Bad Math I talked about how I saw infinities. Pretty much, that perspective and what I wrote this morning in my own comments on the Magic Sequence post, confirm that my understanding is pretty solid.

So another set of results, that is related:

All binary trees are countable, whether or not they contain irrationals.

A "possible" result is that there is no such this as uncounability. Everything in a formal system is countable.

There are at least two categories of infinity: calculus infinities and set theory ones. They are different, but relatable.

There are two different types of calculus infinities, the type that is "going" to infinity, and the type that is already "there". The first type is "dynamic", while the second type is "static".

As set theory infinities, this is static:

3.1415...

And this is dynamic:

{ 1, 2, ... }

Infinities are pretty much interchangeable, but they do come in axises. Thus we can have 1D, 2D and 3D infinities, and probably more beyond that.

Axis are static or dynamic.

A bi-infinite spreadsheet by itself is 2D, filled with irrationals is 3D. Cantor's diagonal proof only works on 1D or 2D. sequence are 1D or 2D, magic sequences are 3D.

It may be true that two dynamic axises don't map to a static/dynamic pair, I'm not sure, I think about it.

You can map from a dynamic infinity to a static one, as I showed in my comment about counting binary trees, but you have to be careful.

To go from a dynamic to static, the concept of actually going "through" infinity with a path, and onto another infinity is crucial. This whole piece is crucial to make set theory consistent, or the fact that "uncountable binary trees" exist causes it be be ill-defined.

We can frame any set theory infinity, either static or dynamic, in terms of calculus infinities. We can step through calculus infinities, one step at a time, to view the behavior.

The set:

{ 3, 3.1, 3.14, 3.1415, ..., Pi }

In calculus infinities, this starts as a dynamic infinity, that "teleports" itself to the static infinity containing Pi at the end.  In set theory infinities, this just is.

Another ambiguity that should be noted:

{ 3, 3.1, 3.14, 3.1415, ..., }

This "syntax" should only be used to specify all of the number in the set "approaching Pi", but not Pi itself. Doing this resolves another problem.

I think that's all of it for now.

A special thanks for all of the people who patiently put up with my questions and provided answers over the last two weeks, without your help I could never have debugged the program.

Wednesday, December 16, 2009

The Magic Sequence

This post is a continuation of both my discussions at:

http://scienceblogs.com/goodmath/2009/12/another_cantor_crank_represent.php

and at:

http://theprogrammersparadox.blogspot.com/2009/12/infinite-node-tree.html

To understand what I am saying you have to get through all of my comments (and other responses) at Mark's site, my post and all of the comments for my post. I know. It's a lot of stuff, and currently it is badly organized.

This post will clear up the discussion at Good Math, Bad Math about Cantor's Diagonalization Argument, and later today I'll add another one which will (hopefully) show that R -> N.


Consider the following sequence of real numbers:

0.100000000...
0.110000000...
0.111000000...
...
0.010000000...
0.011000000...
0.011100000...
...
...

Please note the second terminating ellipsis (...), it is not a typo. The first two ellipsis' are symbolic notation to explain how to extend the two existing sub-sequences of numbers. The third one in this case means that we extend the infinite sequences themselves.

Now, this structure and this notation will provoke two questions:

Can't you enumerate this sequence and re-write is as such?
Is this syntax even legal?

I'd like to answer both of these questions together. Starting with another example:

1  | [2]. 1 0 0 0 0 0 0 0 0 ...
2  |  0 . 1 1 0 0 0 0 0 0 0 ...
4  |  0 . 1 1 1 0 0 0 0 0 0 ...
...
3  |  0 .[2]1 0 0 0 0 0 0 0 ...
5  |  0 . 0 1 1 0 0 0 0 0 0 ...
8  |  0 . 0 1 1 1 0 0 0 0 0 ...
...
6  |  0 . 0[2]1 0 0 0 0 0 0 ...
9  |  0 . 0 0 1 1 0 0 0 0 0 ...
13 |  0 . 0 0 1 1 1 0 0 0 0 ...
...
...

In this example, on the left I have added a unique number from N to each row, using a method of counting where I gradually add in each new infinite sub-sequence, one-by-one and extend the count to include them. This structure is countable.

On the right I have attempted to apply Cantor's Diagonalization, but because there are an infinite number of sub-sequences, I can go forward to each, but I can never get back. The structure can NOT be diagonalized.

So we can clear up the first problem I keep hearing. The above example shows explicitly that:

countable != diagonalizable.

Now the really interesting bit, the coolest part of all of this, is the enumeration on the left-hand side. It is clearly enumerable, so we can write it out as a list of elements, but I won't dare! Why?

Because once I start writing out the elements I can never quit! That is, if at any point in the writing I stop and finish with ..., then that list (the one that I've written) is no longer the same as the one above.

Yes, it's an enumerable list, but No it cannot be written out as a finite list of elements. The ellipsis is not strong enough to describe it. We need a symbol more like [...|...] which would clearly show that the list was going off to "two" infinities at exactly the same time, not just one. (although it won't explain how to construct the missing elements).

Grasping for a name, I decided that "magic sequence" adequacy reflects the fact that it is a valid countable sequence that we can write, but if and only if we never stop writing it!

Is this really part of set theory? Yes, absolutely. The magic sequence is a simple sequence that can be counted. The alternative double ellipsis syntax may or may not be acceptable, but even if it isn't it is just a convenience for being able to write down (in a finite state) a magic sequence.

Again, it is a simple countable sequence that is unwritable as such. The multiple ellipsis is the only syntactic way we have of describing this mathematical object. How can this happen?

If we have:

- an infinite number of infinite elements.

everybody can agree that this is both countable and diagonlizable. But if we have:

- an infinite number of an infinite number of infinite elements

then it can't be diagonalized, the second infinity clearly gums up the works. But if we group it as:

- [ an infinite number of an infinite number ] of infinite elements

The first part of this is countable, and the entire thing is a magic sequence of infinite elements. So we can count it, and use it as a sequence.

Cantor's diagonalization argument starts with a sequence. If I supply it with a magic sequence, it becomes invalid. It cannot say one way or the other that there are missing elements.

This is a very deep and wide hole.

What's really interesting though, is that if R can be counted, then the only way it can be counted is with a magic sequence. A regular infinite sequence will ALWAYS fail the diagonalization argument. There is no question about this. The diagonalization argument was originally constructed to ensure this.


So, if you've not been following my last post (and all of the comments at both sites), I will summarize the current state of my argument.

I have:

a) a mathematical object (binary infinite node tree), that I think is expressible enough to contain the reals (R). This object fits in set theory.

b) a mapping of R onto this structure (for which there are known problems, this mapping is invalid!).

c) another mathematical object (magic sequence).

d) a simple example that shows that a magic sequence is countable, but not diagonalizable, thus Cantor can't be applied to a magic sequence.

e) a way of traversing the structure in a, to generate a magic sequence.


At this moment, the only real hole I have in my argument comes from the irrationals. In the instance of the tree I created in the comments (the 22nd one I think, but there are no numbers), I put ALL of them in the 2nd level, cut into two sets. That clearly doesn't work.

If we start enumerating the tree, starting at root, the first item is Pi, and the second item is the first element of the set of irrationals over the range [0,Pi).

Unfortunately, there is no first element until we get to infinity, but we're not allowed to get there.

Still although that particular mapping doesn't work, after a bit of thinking, I believe I have one that does. But, that will have to wait for another post this evening, since I have to get back to work now ...

Sorry to leave you all hanging :-)

Saturday, December 12, 2009

The Infinite Node Tree

I have to stop have dreaming about infinities, they keep waking me up at night ...

We can define an 'infinite node tree' as a tree similar to an N-ary tree, except that at each and every node, there are an infinite number of children.

Can this tree be traversed?

Yes, if we start at the first level, and create a set of all of the children at each level we get something like:

level1 = { c1, c2, .... }
level2 = { {c11, c12, ... }, {c21, c22, ...}, ... }
level3 = { {c111, c112, ...}, {c211, c211, ...}, ...}
...

In my previous post, I suggested that if we have several infinite sets we can combined them as follows:

If we have the set:

S = { { a1, b1, c1, ...}, {a2, b2, c2, ...}, ....}

Is there a injective and surjective mapping onto

{ a, b, c, d, ....}

Hypothesis: Yes

If we define a 'pass' to be taking the a finite number of elements from a finite set, then adding in a new set, we get an algorithm like:

N=1
sets = S{N}

forever
    for each set in sets
        get next element in current set
    sets += S{N}

So we traverse the above as follows:

a1,   b1, a2,    c1, b2, a3,   d1, c2, b3, a4,   ...

We can see that eventually we will get through all elements of S.

Then:

a = a1, b = b1, c = a2, d = c1, e = b2, f = a3, ....

Thus there appears to be a simple mapping between a single infinite set and an infinite set of infinite sets.

Now we can do that for our infinite node tree, by adding in each new set, one at a time, for each new level. (two iterators, instead of one).

So for each iteration, we add in the next set of each new level (one at a time) and then add in the next elements of each new set in the list of sets. Something like:

N=1
M=1
sets = S{N}
levels = L{M}

forever
    for each level in levels
        for each set in sets
            get next element in current set
        sets += S{N++}
    levels += L{M++}

(or something similar, it's late :-)

So eventually, slowly, we will eventually traverse all of the elements in the tree exactly once (as we approach infinity).

So what can we do with this infinite node tree?

To make it simpler, we can remove all of the finite elements out of R, and replace them with infinite representation, so for instance:

1 = 1.0000000...
4 = 4.0000000...

NOTE: I'm not sure if this is really necessary, but it does mean that I am left with a consistent set of  infinite representations for all of the possible numbers in R. I'd rather only deal with only one type of number, like Pi or sqrt(2).

Now if we consider that between any two numbers in R, there lies an infinite number of other numbers such that as a set they look like:

{ ..., A, ..., B, ... }

Then starting somewhere arbitrarily like 0, we can build an infinite node tree containing all of the numbers in R.

UPDATE2: To clarify, each real is contained in its own node in the tree. That is, there is a node for Pi and a node for sqrt(2), as well as nodes for all of the other numbers (rational and irrational).

Another, perhaps easier way to look at it is to add in all of the known numbers in R to a level in the tree (like in Cantor's first set of the diagonalization argument). Between each of these number is an infinite number of children, and beneath all of them is still more infinite numbers of child, etc. As we find more, we can add them to the lower levels (so we can use Cantor's construction as the way to build the tree itself).

Either way, I think the entire structure of R fits into the structure of the infinite node tree.

UPDATE: I just woke up and realized that I am slightly wrong here. The child and the parent are right next to each other. To fix this, we actually need a binary infinite node tree, that is a tree where there are two infinite sets of children, the left set and the right set. Fortunately,  having this extra set at each node doesn't significant change how we traverse the tree (instead of adding the next child, we add in the next one from the left and the next one from the right.). Now between any and every parent and child is an infinite number of other children.

AND this tree is traversable.

Thus, in iterating through the tree as we visit each node we can assign a number in N. This appears to form an injective and surjective mapping between N and R.

"You can't refute Cantor's proof using an enumeration without addressing Cantor's proof." -  Mark C. Chu-Carroll

In the first part of Cantor's diagonalization argument, we construct a list of all of the elements from the mapping.

That's the first problem. The structure is actually a tree of infinite breadth and infinite depth, and in this case one that is not easily expressible as a simple list.

The second part of Cantor's construction is to create an element from the diagonal. As we start this, we descend down one branch of the tree, but as the tree is infinite in length, we can't ever get to the bottom, so we can't go to the next branch.

What I believe this means (although it is late), is that we can't construct the element. Ever.

So we can't construct an element that isn't already included in our tree. I take this to mean either we can't apply this construction method OR it definitely shows that all elements are now contained in the tree. Either way it doesn't invalidate the mapping.

OK, that's enough, it's late and I need my beauty sleep.

UPDATE2: I thought of another interesting bit. If we see Cantor's diagonalization construction as a number generator, since it will generate an infinite amount of numbers, then we can use it as the means to count. That is, if we seed it with some finte list of initial values, assigning them all a number. Then at each interation, a new unique number will be generation. Since this goes on for infinity, it will create an infinitly large set of countable numbers. However, the question is, what if there are gaps? Someone could prove that as the list approaches infnity, the gaps decrease in size, but another way of dealing with it is to use two of these generators at each node in the binary infinite node tree. Then we can parallelize all of them at once, and construct a tree, that eventually won't have an gaps at any level (I think). That's an interesting way to create the tree.

UPDATE3: There are a couple of problems with the my above arguments, that I was thinking about today.

The first is that I keep falling back to the calculus notion of infinity, instead of the set theory one. That is, I am relying on "generating" the tree, not just having it created (all at once). I keep trying to sneak up on infinity, and it keeps running away and hiding.

The second is that my tree construction is vague. I can fix both problems at once.

First we'll start with: if there are any trees that will satisfy the bijection, there are an infinite number of them.

I can envision a number of ways to create these trees, but I realized the simplest way was there all along. The tree can be constructed by using Dedekind cuts. That is, each and every cut is actually a node in the tree, with the two sets on either side of it as the left and right child sets. It's a very natural extension of the cuts into an (infinite) tree structure.

Since the cuts resolve the differences between rationals and irrationals, I'm assuming that we can cut down to an infinite number in both the left and right sets, thus getting to every point on the R number line, which is more or less what Wikipedia says the cuts accomplish. Well it seems to say that it applies to the irrational real numbers, but I think that it works for both (or at least for my altered reals where I made everything infinite in length).

Then the tree created in that manner is assured of getting to every member of R. There are no duplicates, and there are no gaps. And the whole structure can be created as a set in one shot.

An issue that might bug some people is that I am using the term 'tree' when I believe there is no such thing in ZFC set theory. That's OK because in a level by level manner, sets can nicely express trees. Thus using the term tree in this context is a convenience to make the whole idea easier to understand, not a necessity.

However, the real meat of this idea is how the structure of the tree itself is used to get around Cantor's diagonalization. If we can't create the element, then it can't be missing. But we can't create it because we get lost down one branch of the tree. I don't get into the trap of having to create the element all at once precisely because Cantor's argument is based on 'constructing' the element, not just having it exist in a completed form. Where I was playing fast and loose before, with time, in this case I believe that I am allowed to rely on it (existing).

(Also, I thought about a bunch of other construction techniques, and also whether or not the tree can have duplicates (or be a DAG instead of a tree), and whether the tree actually needs to be ordered (which I was actually assuming all along, but forgot to mention). But I realized that the Dedekind cut construction was by far the cleanest and that it has already been established as being correct (although not necessarily my usage of it)).

Thursday, December 10, 2009

Quickie: Infinite Sets

 This is just a leftover from the primes stuff, but I thought I'd explain it a bit better (in case it might be of interest to others in relation to Cantor):

If we have the set:

S = { { a1, b1, c1, ...}, {a2, b2, c2, ...}, ....}

Is it mappable onto:

{ a, b, c, d, ....}

Hypothesis: Yes

If we define a pass to be taking the a finite number of elements from a finite set, then adding in a new set, we get an algorithm like:

N=1
sets = S{N}

forever
    for each set in sets
        get next element in current set
    sets += S{N}

So we traverse the above as follows:

a1,   b1, a2,    c1, b2, a3,   d1, c2, b3, a4,   ...

We can see that eventually we will get through all elements of S.

Then:

a = a1, b = b1, c = a2, d = c1, e = b2, f = a3, ....

And thus there is a simple mapping between an infinite set and an infinite set of infinite sets.

This implies that any number of infinities in a set is identical to a single infinity.

Is this a problem?

UPDATE:

The number Pi can be seem as a series of "lower" precision approximations that are approaching the actual infinitely precise version of the number. Thus as a set we can see it as:

Pi ~= { 3, 3.1, 3.14, 3.141, 3.1415, ... , Pi }

And that set is equalivalent to:

{ Pi, 3, 3.1, 3.14, 3.141, 3.1415, ... }

 The same is true for all irrationals they can be represented as a set of increasingly precise approximation to the infinite precision Real representation of the number:

{ 1/3, .3, .33, .333, .3333, .... }

And any two different infinite sets can be traverse in one infinite pass.

Also, since our traversal of the sets essentially has an infinite amount of time and space, we can merge any two duplicate-plagued sets together, being sure to only list each element exact once in the new set construction. It may be unnervingly slow, but it is still computable.

Wednesday, December 9, 2009

Quickie: Let it be Free

I was thinking about how the "scientists" in the Climategate scandal are hiding their raw data. If the science is really objective, then by its own merits other researchers will reach the same conclusions.

If you're afraid that they won't, then you're hiding something.

As such, no scientific researcher should ever hide their data. Ever! Not only that, but their work should be out there too -- formulas, theories, methodologies, everything -- to be judged not only by their peers, but also by the public at large. If science is a quest for the truth, then the truth has nothing to fear from inspection.

But I realize that it is not just our disgraced climate researchers who have adopted policies of secrecy. You can't, for example, get most of the academic work in Computer Science unless you join the ACM. In fact you can't get most works in academia unless you join some tight-nit clique. It's all closely guarded.

Long ago, I could understand how these organizations needed to charge money in order to recoup the costs of publishing on paper. It makes sense that they shouldn't take a loss. But amazingly that changed in the Information Age, and instead of offering free downloads it appears that most organizations choose to monetize their IP (business speak for "go for the profits").

You could make some well thought-out arguments about how the funds collected go back into the organizations to help spread their good works, but oddly one can easily suspect that most of the income is really used to protect the revenue generation (business speak for "stuff that causes profits") and pay salaries.

Academia -- data, papers, methodology, everything -- should be free and done out in the public. It should be open, and visible. Approachable. Companies hide their research because they are locked in competitive battles. Universities (and other research institutes) are not companies.

Research is not competition, it should be co-operative. And it shouldn't matter who contributes (particularly when we live in a day and age were so many people are both eager to help, but also have the available time).

I should never be told that I can't download a paper unless I pay a big sum of money. It should be free.

Saturday, December 5, 2009

Transformers

First, some Administrivia. Given that I haven't sold any copies for quite a while now, I've decided to make the electronic copy of "The Programmer's Paradox" free for a while (a few months probably). It can be found at Lulu:

http://www.lulu.com/content/paperback-book/the-programmers-paradox/178872

Lack of sales is really my fault, as I was never really strong about pushing the book (as it was never edited as well as I would have liked). It was my first real act of writing, so please go easy on me :-)


TRANSFORMATIONS

While I was fiddling with prime numbers, I also had another collection of un-related ideas invade my brain. At the high level, it involves how we have constructed Computer Science as a formal system.

Computers deal with two distinct things: code and data. This is an inherent duality in the basic concept.

However, all of our theoretical models are extremely "code-centric". That is, they focus on the boundaries of the code within a formal system, and leave the data as just a secondary player.

This is true of Turing machines, deterministic finite automata and all of the other formal system's modelling of computing that I am aware of. The data exists, but only as an unimportant set of "symbols" that come and go on some "tape". It's the code we are really studying, not the data.

As I've said before, that is not entirely unexpected as most people's introduction to computers come from a code perspective, and they see the machines in terms of the user functionality and how they can apply it.

Still, the counter-approach of seeing computers in terms of machines that assemble mass piles of data is an equally correct, yet significantly less complex way of visualizing them.

Code is intrisically complex and hard to unwind, data is structurally complex and mostly is normalized by virtue of its usage. That is, there are an infinitely large number of ways to write a program, but there is only a small (and finite) number of reasonable mappings from the internal data back to reality. The data is either correct, or it isn't (not accounting for bugs, of course).


A MODEL

So, I started thinking about some way to model the data "within" the system, without having it overshadowed by the code.

I've worked out a number of the basic elements, but these ideas are far from complete. Because of this, I'll try to include my rational in this discussion, along with the rather sparse ideas in the hope that collectively these ideas will get pushed farther along by anyone interested in them.

Starting at the top, if we aren't interested in the code, then it shouldn't appear in the model explicitly. With that in mind we can define a transform T that maps the data from it's value A to another value B.

T(A) -> B

A transformer is some block of (unknown) code that takes an explicit set of inputs, and returns an explicit set of outputs. To make life easier, there won't be a concept of global variables. Everything necessary for the transformation must be passed in explicitly. And there is also no concept of "side effects". Everything changed in any way is passed back outwards.  Transforms can return multiple values.

So we might have something more complex like:

T(a,b,c,d,e,f,g,h) -> A,B,C

A transform that takes 8 parameters, applies some type of algorithm to the incoming data and returns three other new parameters. It's a nice simple start.

Now, none of this would be interesting if we couldn't use this to model something relevant. What might be interesting to know, is "given that there is some infinite number of possible algorithms for a class of transforms (or there are none), what are the bounds of these algorithms as they are running?"


SORTING

I woke up one night, a while back, with an idea about how to use transforms to calculate the bounds on algorithms. While asleep it seemed to be a  very strong idea, but awake I've been unable to fully pin in down. It's mostly about continuously decomposing the algorithms, to get a better idea of how they work, but without ever having to examine the inner workings themselves.

The general idea is pretty simple (if it works). While we can't just splat in a bunch of algorithms into the transforms and calculate the bounds, we can sub-divide the transforms into smaller ones (endlessly) until we can use those atom pieces to calculate best and worst performing boundaries.

So for instance, we might be interest in studying the class of "sorting" transformers. That is, any transformer that takes N elements in one arrangement and returns them in another re-arranged "sorted" (lexically, numerically, etc.) order.

If we were going to look at the trivial case, then we find that:

Ts(A) -> A [ O(1) .. O(1) ]

In English, this says that any transform that "sorts" a single entity, requires at minimum "constant time" to sort, and at maximum, also constant time to sort  (I used a range notation, but I'd guess that using little o and big O notation may be more expected (acceptable)).

What's really interesting here is to figure out the range of growth of the performance of the family of algorithms. What I'd really like to know is what is the range for:

Ts(A1,A2,...,An) -> Aw,Ac,...,Am

The various different families of algorithms that differently sort through a list of N elements.

We can deal with this by taking the two bounding cases at the same time; the best and worst case.

In the best case, the elements are sorted because the algorithm "knows" the behavior. That is the above algorithm decomposes into

Ts(A1,A2,...,An,L) -> Al,P1
Ts(A1,A2,...,An,K) -> Ak,P2
...
Ts(A1,A2,...,An,M) -> Am,Pn

That is, for each different element (L,K,M) at a time, a transformation takes the whole list, the current element, and returns that element and it's position in the new list.

Since we're still working with the best case, there are N decompositions of the original algorithm that generate N positions, which are used to re-arrange the list. Each decomposition, if we are still in the best case, could use something "intrinsic" from its input to calculate P. As such, the best possible sort algorithm which can use "intrinsic information" from it's incoming data, can only ever re-order the list in n*O(1) time which I think works out to O(n) time.

In the worst case, however, assuming that the code is all practical, and really does things, we can decompose the sort algorithm down into N transforms, each of which has to examine every N elements, in order to calculate P. As such, it works out to n*n*O(1) or O(n^2). So we get something like:

Ts(A1,A2,...,An) -> Aw,Ac,...,Am [ O(1), O(n^2) ]


THE POWER OF SEARCHING

In an analogous manner to sorting we could start examining the class of transformers that take a large number of inputs, and return a subset of "found" outputs. I won't go into too much detail, but at each level of decomposition we can reduce the number of arguments, split off a best and worst case. In that way we might start with:

Tf(A1,A2,...,An,C1,C2,...,Cm) -> (null | Ak,...,Al)

We could simplify this by adding null in as an input:

Tf(null, A1,A2,...,An,C1,C2,...,Cm) -> Ak,..,Al

Then this describes the "find" family of transforms, that takes a set of input, and a set of criteria, and then returns nothing, or some smaller subset of "found" items.

At the very bottom, as a best case, a transform would either "know" about that the data was or was not it what it was looking for. Otherwise the transform would have to look through everything, and determine which were the correct elements.

The best and worse cases are similar to the space for sorting.


SUMMARY

So why do I think this is useful? Primarily because it can be used to draw hard boundaries on a family of algorithms. From the standard Computer Science perspective, you can work with algorithmic complexity to find better algorithms, but you have no way of knowing if you've found the best one.

There is some computational bound which cannot be crossed, and it is nice to know where that bound is.

Of course in my examples, I took the best case to be a trivially not-possible case, but that was primarily because I've really haven't gone deep enough to understand that lower bound in a better fashion. It's likely, from the output, and from decompositions, that we should be able to figure a way to calculate a much more accurate best case boundary.

Another place where I think this could be useful is in studies the computational complexities themselves. We've had categories of complexity for a long time like NP, but we've been unable to make a dent in determining elementary issues like P=NP.

These problems too represent important boundaries on the things that can and would like to do with our computers. Unfortunately now, the studies in these areas have been tied to working with existing algorithmic problems, and with specific algorithms.

If we can work with the complexity classes, without having to descend into finding the actual algorithms themselves, we may be able to construct higher order proofs about our boundaries, without getting too wrapped up in the specifics of existing algorithms.

Another interesting issue about this approach is that it takes us one more step down a path I've talked about before:

http://theprogrammersparadox.blogspot.com/2009/04/end-of-coding-as-we-know-it.html

With our current code-centric perspectives on software systems, it seems as if the systems can quickly exceed everyone's threshold of understanding as they swell in complexity.

What this means in practical terms is that while the system may start under control, with enough people contributing, and enough work getting done, the system rapidly moves into a "stupid zone", where it has become so large and so complex, that nobody can fully understand it anymore.

If nobody can understand it, then nobody can use it (fully), and as such it is  just a monolithic nightmare of related functionality.

Once large enough, the only way programmers can deal with adding in new functionality is by carving off little sections and ignoring the rest. This divide and conquer strategy works, but it also means the functionality itself starts becoming disjoint, and the system gets even more complex as a result.

So we get these behemoths that are so large and so disjoint that people can't make use of the available functionality. We pass over some maximum point in combined functionality, where on the other side our usage degenerates horribly.

The perfect example of this is Microsoft Word. Maybe a few decade agos, most users understood most of the functionality and could use it well. Now, the software is so convoluted that most users are forced to use less features just because the interactions between the features rarely does what is expected.

We used to heavily "type" our documents to allow the formater to make them auto-magically consistent. Now, anybody would have to be crazy to turn on the formatting, it hasn't worked (behaved consistently) in years. So we're back to working with our documents in the most crude manner possible.

The general case is that if you keep building on some underlying complexity, at some point the complexity itself will overwhelm what you are doing, and become its own issue.

If instead of building up ever-increasing mountains of complexity, we just worked at one low level abstraction and used the machines themselves to dynamically assemble the complexity, we could avoid a lot of problems.

The abilities of our computers are strong enough that they can be used to map pathways through a sea of transformations. Intellect is necessary for "how" to do the transformations, but it is not necessary to calculate a deterministic path between a large series of them.

We've based our current architectures and technologies implicitly on our existing theories. The Turing machine model drove the hardware construction and that drove the way we've built software. This in turn controls how we see our software problems.

We see every problem as a nail, because we started out with using hammers. And after a very long time of pounding nails into anything that moved, we're now seeing the intrinsic weaknesses in our approach to building things.

If we can't control and contain complexity, we can't exceed a fairly low threshold for the sophistication of our systems. We are stuck, just crudely re-write the same things over and over again, with simple variations on the same underlying technologies. After twenty years, this lack of progress is getting boring.

Sunday, November 29, 2009

Spaghetti Code

And now back to our regularly scheduled programming :-)

I think I've manged to get past my prime fetish for a while (although I'm still intrigued by Goldbach's conjecture). Fun stuff, but clearly not good for one's popularity. At least somebody said my diagrams were pretty ...

Continuing on from my earlier post:

http://theprogrammersparadox.blogspot.com/2009/10/ok-technically-im-under-influence.html

My second set of laws was:

Fundamental Laws of Spaghetti Code:

1. You can't understand it just by looking at it.
2. Repetitive code will always fall out of synchronization.
3. The work will always gets harder and slower as it progresses.
4. Fixing one bug creates several more.
5. You can't enhance the code if you are spending all of your time fixing the bugs.

But first, some comments from the earlier post. Astrobe commented:
Spaghetti code:
I think that characterizing "bad code" is interesting only if one says by which process the original code "degrades" or "rusts". For instance:
"backwards compatibility is a poison for software"

Hans-Eric Grönlund also commented (on a later law, but still relevant here):
"2. Well-structured software is easier to build."

While I can't argue with the above "law" it makes me think. If well-structured software really is easier to build then why isn't everybody building it so? Why is so much of our code, despite good intentions, ending up as "bad design"?
Could it be that well-structured software is easy to build but requires good design, which is where the experienced difficulty lies?

I figured for this post I could dig deeply into my notions of "spaghetti code". All and all it is actually a controversial issue, since one programmer's spaghetti approach can be another's set of well-applied principles. And vice versa.


JUST A NUMBER

Back in school, in one of our computer theory courses, we were taught this useful concept of assigning a "number" to various different programs. In the days of statically compiled code that was pretty easy to accomplish because the number was just the bits of the code itself.

We can interpret all of the bits in an executable as one giant string representing a fairly huge number. A massive number, but still just a number.

Now, if we take the uniqueness of a program to mean that it handles a specific set of inputs by producing a specific set of outputs, and that two programs are identical if the inputs and output are the same, this leaves us with what are essentially "groups" of programs that perform nearly identically to each other.

From the outside, the programs all do the same thing, but they do not necessary execute the same statements in the same order to get there. Nor do they use the same amount of resources (CPU, memory, etc.)

Simplifying this a bit, for any set of user functionality there are an infinitely large number of programs that can satisfy it. Infinitely large? Yes, because there is always code you can add to a program that doesn't change the results, but does force the program itself to be larger (unless your compiler is super-intelligent) and more complex.

For every possible program, we can construct an infinite number of larger programs that do exactly the same thing, but use up more space, memory, disk, etc. without providing any extra functionality.

Within this group of programs, some of the larger numbers are clearly going to be obfuscated for absolutely no reason. They may contain more instructions, but the instructions contribute nothing towards the overall functionality. There are an infinitely large number of these variations.

Interestedly, something similar is probably true about some of the smallest versions (numbers) of the code as well. Way back I wrote about simplification:

http://theprogrammersparadox.blogspot.com/2007/12/nature-of-simple.html

Our human perspective of simple is not the same as the computers. We don't actually prefer the "simplest" version at any specific level, but rather something that has been well-balanced as it has been reduced.

For example, simple could come from the code being contained in one great giant function or method, instead of being broken down into smaller well-managed calls. It's simpler with respect to the overall size of the code (and the run-time), but certainly not decomposed into something more manageable for a human to understand.

Because of that, for many of these groups of programs, the smallest numbers are likely simplified in ways that are not easy for us to understand. They've not desirable versions.

So, along this range of numbers, representing programs, we find that if we were looking for the best, most reasonable, version of our program, it would be located somewhere towards the middle. The smaller code lacks enough structure, while the bigger code is probably arbitrarily over-complicated; just a tangle of spaghetti. We need something balanced between both sides.


MIDDLE OF THE ROAD

If we are interested in a a reasonable definition for spaghetti code, it would have to be any code were the internal structure is way more complex than necessary. Where the complexity of the code, exceeds the complexity required. Most of the problems lay at the structural level.

Almost by definition you can't mess with the individual lines themselves. Well, that's not entirely true. You can use something like a pre-processor to put a complex second layer over the primary language one.

You can also use the language features themselves to make any conditions grossly complex, or use some crazy number of pointer de-references, or any other syntax that ultimately amounts to the programmer being able to "hide" something important in the line of code. It's not the syntax itself, but what it obscures from the processing that matters.

Now, in general, hiding things isn't necessarily a bad thing.

We do want to "encapsulate away" most of the complexity into nice little sealed areas where we can manage it specifically and not allow it to leak over into other parts of the program.

However, the success of what is being done, is related absolutely to what we are trying to hide in the code. That is, if you hide the "right stuff" (coming up from a lower level), then the code appears far more readable. If you hide the "wrong stuff", then it becomes impossible to tell exactly what is going on with the code.

The different "levels" in the code need to be hidden from above, but they should be perfectly clear when you are looking directly at them. Everything in the code should be obvious and look like it belongs, exactly where it ended up.

We don't want it to be a giant tangled mess of instructions. To avoid being spaghetti code, there must be some reasonable, consistent structure overlaid, that does more than just add extra complexity.

The last thing programs need is unnecessary complexity.

Design patterns are one technique that are often abused in this manner. I saw a post a while ago about how one should replace most switch statements with the strategy pattern.

It's a classic mistake, since by adding in this new level of complexity, you shift the program into one of the higher numbers in its range. But there have been no significant gains in any other attribute (performance, readability, etc) in order to justify the change. If no useful attributes come out of the increase, then the code has passed over into spaghetti territory. It is arbitrarily more complex for no real reason.

The motivation for using the strategy patten in this manner is simply to have a program that is deemed (by pattern enthusiasts) to be more correct. The reality is just more complexity with no real gain.

It's a negative effort, but oddly a common approach to programming. It turns out that one of the seemingly common characteristics of many programmers is their love of complexity.

We're a bunch of people that are always fascinated by small complex, inter-working machinery, so it is no wonder that the second greatest problem in most people's code is over-complexity (the first is always inconsistency, the third encapsulation).

Most programs contain far more machinery than is actually necessary to solve the problem. Many contain far more machinery than will actually ever be used at any point in the future to solve the problems.

From all of the this we can easily understand that over-complexity essentially equals spaghetti code. They are the same thing. And that spaghetti code makes it hard to verify that the code works, find bugs and extend future versions.

All in all, spaghetti code is a bad thing.


FUNDAMENTAL LAWS

Now getting back to the (possible) fundamental laws, the first one was that:

1. You can't understand it just by looking at it.

I'm not sure that's an entirely correct statement. It's more like: if you're skilled, and you should be able to understand it, but you don't then it is probably over-complicated.

Some code is just inherently ugly and hard to understand by any non-practitioners. But once understood by an experienced professional it can be managed.

The classic example is always the APL programming language. The language is a brutal collection of algebraic symbols, where the language experts use any vector processing trick in the book to avoid doing things like simple loops. If they can find some strange set of operators to accomplish their string processing for example, they're consider to be gurus in the language. If they start to use if/for like C processing logic, then they're just using the language wrong.

But it's not just complex syntax like APL. Any non-programmer would have a tough time understanding anything other than the most trivial C or Java. And they certainly wouldn't catch all the little ugly inherent subtleties in the language (like using StringBuffers and appends in Java).

If you give some C code to an experienced C programmer, while he (or she) might not understand the point or the domain aspects of the code, they certainly should be able to understand what the code is doing. And with some investigation (at the various levels in the code) they should be able to take that code and alter it in 'predictable' ways.

Even more to the point, in most languages, we have enough flexibly to encapsulate the different aspects of the logic so that virtually all of the code written in the system should look like it belongs as an example in a text-book. It should always be presented in a way that it looks like it is just a simple example of "how to do something".

Of course, anyone with experience on big systems knows that it is never the case. Sadly most of the code out there is horrifically un-readable. It's not any where close to being able to be used as an example. It shouldn't even see the light of day.

Wiith good conventions and some normalization there is no practical reason why it should be such a mess; still it ends up that way.

It's probably that our industry doesn't insist on reasonable code, and thus the code is ugly and messy. That allows horrible problems to hide gracefully, which ultimately makes it way harder to work with the code.

It's the incredibly stupid, but most common way, that groups of programmers shoot themselves in the foot.


THE PERILS OF DUPLICATION

2. Repetitive code will always fall out of synchronization.

So, by now we have all of these great acronyms like KISS and DRY, but that doesn't stop us from encoding the same thing, in multiple ways, in multiple files, over and over again. When we do this, it increases the probability that some future set of changes will cause a bug because two different "sources" of the same information that are no longer the same.

You gotten love it when someone sticks the "constants" in the code AND in the database AND in the config files. It's just an accident waiting to happen, and eventually it does.

If you consider the sum total of all of the work (code, scripts, installations, documentation, packaging, etc.), then there are a massive number of places where a relatively small amount of key information is replicated over and over again.

If it is fundamentally the same thing (even in a slightly different form) and it is contained in multiple places, and the programmers don't have the underlying discipline in process to make sure it is all updated, then at some point it will go "kablooey".

You can always trace a large number of bugs back to duplication information. It's a popular problem.


SPEED KILLS

3. The work will always get harder and slower as it progresses.

If you've got a mess, then everything you're trying to do with it either makes it worse, or it takes way longer than it should. Messes are the tar pits in which most programmers work daily.

What's really funny is how many excuses people will make for not cleaning up the code.

The favorite is invoking the term "history", as if that in itself is a reasonable justification for not cleaning up.

Another horrible one, mentioned earlier by Astrobe is "backwards-compatibility". A freakin huge amount of our unnecessary complexity has come from bad choices percolating upwards, year after year, under this banner.

Software vendors claim that their lazy-onion approaches to hacking code come from the market's need to support the older versions, but realistically there have always been "points" were it would have been acceptable to fix the earlier problems and remove the backwards compatibility.

For example, nothing has chained OSes to their own stinking tar pits more cataclysmically then their own marketing groups claiming to be backward compatible.

Apple is one of the few companies that ever really turfed their old weak and crippled OS (versions 1 to 9), to jump to a hugely different one with ten (OS X) and above.

A gamble, for sure, but proof that it can be done. Since for most OS versions you inevitable have to upgrade most software packages anyways, the whole notion of backwards compatibility being important is more myth, than good sense.

If the programmers don't fix the rapidly multiplying problems with inconsistent and bad structure, then how does one expect them to go away? Why wouldn't they make it more expensive to dance around? You can't bury your head in the sand and expect that the coding problems will just disappear. It won't happen.


CASCADING TIME

4. Fixing one bug creates several more.

The correct description for this is of stuffing straw into a sack full of holes. As you push it in one side, it pops out the others. It's impossible to finish. A hopeless task.

Anytime a group of programmers generates more bugs then they fix, it is either because they don't have the necessary experience to understand and fix the code, or that the code itself is hopelessly scrambled.

In the first case, there is a significant number of programmers out there that will leap off quickly and start working on code that they do not have the pre-requisite knowledge to understand. It is endemic in our industry to think we know more than we do, to think that we are smarter than we actually are (and user comments don't help).

Still, I've seen enough programmers try to tackle issues like parsers or resource management or even multi-threaded programming without having anything close to enough understanding.

It probably comes from the fact that within the domain itself, we are certainly not experts, but also not even novices. We get a perspective of what our users are trying to accomplish, but particularly for complex domains, we could never use the software ourselves to meet the same goals.

I guess if we learn to become comfortable with our weak domain knowledge, that transfers over to our approach towards our own technical knowledge.

The crux, then, is that a group of programmers may be expending great deals of time and effort trying to fix some multi-threaded code, but to no real great effect. They just move the bugs around the system.

In truth, the programmer's lack of knowledge is the source of the problem, but even as they acquire more understanding, the code itself is probably not helping.

And that comes back to the second case.

If the code is structurally a mess, then it is very hard to determine, with only a small amount of effort, whether or not the code will do as advertised.

If the code is refactored, and cleaned up, then its functioning will be considerably more obvious. And if it is obvious, even if the programmer is not fully versed on all of the theoretical knowledge, they have a much better chance of noticing how to convince the code to do what the programmer desires.

Bugs, and how they related to each other become strong indicators of development problems.


TIME MANAGEMENT

5. You can't enhance the code if you are spending all of your time fixing the bugs.

The single biggest issue with code is time.

The time to write it, the time to debug it, the time to document it, the time to package it, the time to distribute it, and the time to correctly support it. Time, time, and less time drive all programming issues.

And as such, it is always so hard to hear programmers making excuses about how they don't have enough time to fix something, when it's actually that something that is eating up their time in the first place.

Strangely, for a group of people who like detail, we tend not to noticed the cause of our troubles too often.

Time is such a big problem, that I think that's why I always find it so disconcerting when I hear industry experts pushing processes that require large amounts of time, but don't return much in exchange. It is easy to make up a process to waste time, it is hard to get one to use it efficiently in a series of reasonable trade-offs.

But we seem to be suckers in wanting to find easy, and fun solutions, that don't respect the fact that we can't afford the luxury of taking our time.

Programming is at least an order of magnitude more expensive than most people are willing to pay for. It survives because the total final costs get hidden over time, and by a sleazy industry that knows that truth is certainly not the best way to sell things. Software sales and consulting teams often rival our worst notions of used-car salesmen.

So the real underlying fundamental question that every programmer has to always ask about every new process, and every new technology is simply: "does using this process or technology pay for itself in terms of time, or not?"

If it's not helping, then it is hurting.

One popular example these days is unit testing. Somehow, in some strange way, a bunch of people have taken a rather simple idea and sent it off on a crash course.

In the end, we should test most of the system in its most complete, most integrated state. Testing a partially built system, or just some of the pieces is not going to catch enough of the problems.

Testing is very limited, and based on guess-work, so it certainly is an area where we need to be cautious with our resources.

In the past I have built a couple of really great, fully automated regression test environments to insure that the attached products had exceptionally high quality. Automated testing assures that, but it is expensive. It's basically a two for one deal. You write twice as much code, and that will give you a small bump in quality. Expensive, but if that is the priority on the project, it is a necessary cost.

Still, if the automated tests are being run against the full and final integrated system then they conclusively prove, for the coverage of the tests, that the system is working as expected. You can run them, and then release the system.

A unit test, on the other hand is a very low-level test.

In the past, I've used them for two reasons: one is when the permutations of the code make it unreasonable to do the test manually at the integration level, but the code is encapsulated enough to mean that the unit test is nearly identical with an integrated manual one.

That is, I wanted to save time by not having a person test it, and I wanted the test to really permute through all of the possible combinations (which a person is unlikely to do correctly). 

The other times I've used unit tests are as scaffolding to work on some particularly algorithmically nasty code, that's been having a lot of trouble.

It's really just a  a temporary bit of scaffolding to make debugging easier.

Generally, once the code has stabilized (over a few releases) I nuke the tests because I don't want them laying around as possible work sink holes. They did their job, now it's time they get cleaned up. You put up scaffolding, with the intention of taking it down someday.

Unit tests are just more code in the system that are going to be wrong, need fixing, must be updated, and take time. Time that I'd rather apply to areas that need more help. To areas where we can be more successful (like cleaning up the code).

If the choice is: a) write a unit test, or b) refactor a mess, then between the two, I've learned that the second one has a way higher probability of paying off.

But in our industry, the first one has become a newly popular process. As such a lot of programmers have been attracted by it because they would rather write something new, than actually fix what they have. The avoidance of work appeals directly into most people's weaknesses.

Since no one can see that the code is a mess, and it is much more fun to write some new (but not too complex) code nstead of fixing the real problems, programmers choose to add new stuff. And the new stuff eats up more time than fixing the old stuff would have.

In the end, any poor use of time, gradually over the course of a development starts to show up badly. Our industry says that two thirds of all of our projects fail, which means that the majority of programmers don't actually have any real clue on how to build and release systems reliably.

My experience in working in many different organizations has shown that to be mostly true (although some large companies often support completely impractical projects, and that leaves the programmers thinking that they're success rates are far higher than they actually are).

With this law, I might have been more direct and said something like: recovering time lost to maintaining spaghetti code should be the first priority of any development project. It's a sink for wasted time, and it shouldn't be.

Once we figured out how to refactor (mostly non-destructively), we had the tools available to get our code bases in order. It's a critically bad mistake to not fix code early on. The longer it lingers, the more odorous the onion gets, the more resources that get tied into testing, support, patching, covering up and apologizing for the code.


SUMMARY

We all know spaghetti code is bad, but it still shows up frequently in our systems. It is everywhere. In fact, one might guess that most of our code is grossly over-complicated in some easily fixable way. Mis-used of techniques like Design Patterns only make it worse.

Hans-Eric Grönlund was really close in his earlier comments. I do think it is  actually easier for programmers to build over-complicated code, than it is for them to build simple and elegant code. That accounts for the difference in volume between the two.

I think in general, it is way easier to be a bad programmer than a good one, and that most people choose the bad route (whether or not they know it).

Few humans are capable of hammering out minimally complex versions of something. It's not how we think. Even incredibly experienced gurus have to go back, again and again, to simplify their calls and make their work consistent. The first pass is always the brute force one. And the first pass is often an extremely over-complicated one.

We get bad code because, left on our own, we build bad code. And left again on our own, we'd rather do anything else, then cleanup our own mess. So the code stays around, and people just build worse things on top of it.

We are also an industry of extremists.

The Web always has someone, somewhere, taking some idea and pushing it way past its point of usefulness.

A good idea in moderation can become awesomely stupid when applied too pedantically. The machinery we build doesn't need to be complex, we don't want it to be complex, and complex isn't going to help.

So why do people keep dreaming up these crazy ideas that only obfuscate the underlying solutions? Why do programmers keep flocking to them?

Some of it, as I wrote earlier is about simplifying with respect to a different set of variables, but a lot of it is probably about ego.

Programmers have big egos, and they like to work on the hard projects. So, even if their task isn't particularly hard, they'll do everything in their power to make it as difficult and hard as possible. It's a common weakness.

In a similar manner, many people believe that writing big long complex sentences with lots of obscure terms make them sound smarter for example. But the truth is (and always has been), that the smartest people are not the ones that take a simple problem and make it appear complicated. They are instead the ones that make all of their solutions look as simple and trivial as possible. Ultimately that's what makes them famous, makes them stand out from the crowd. Being able to obfuscate something isn't intelligence, it's just patience, and an inability to see the larger picture.

The same is true in the programming world. The systems that survive are the ones that do their job as simply and painlessly as possible (that's why main-frames are still popular, they're ugly but straight-forward). The really complex stuff generally gets replaced fairly often (or never used). But there are just too many people in the industry trying too hard to prove that they are as smart as they think they are. And as such, we are awash in an endless sea of bad code.