Reasoning With Computer Code New Mathematical Logic
Reasoning With Computer Code New Mathematical Logic
Abstract
A logic is a mathematical model of knowledge used to study how we reason, how we describe
the world, and how we infer the conclusions that determine our behavior. The logic presented
here is natural. It has been experimentally observed, not designed. It represents knowledge as a
causal set, includes a new type of inference based on the minimization of an action functional,
and generates its own semantics, making it unnecessary to prescribe one. This logic is suitable
for high-level reasoning with computer code, including tasks such as self-programming, object-
oriented analysis, refactoring, systems integration, code reuse, and automated programming from
sensor-acquired data.
A strong theoretical foundation exists for the new logic. The inference derives laws of
conservation from the permutation symmetry of the causal set, and calculates the corresponding
conserved quantities. The association between symmetries and conservation laws is a fundamental
and well-known law of nature and a general principle in modern theoretical Physics. The conserved
quantities take the form of a nested hierarchy of invariant partitions of the given set. The logic
associates elements of the set and binds them together to form the levels of the hierarchy. It is
conjectured that the hierarchy corresponds to the invariant representations that the brain is known
to generate. The hierarchies also represent fully object-oriented, self-generated code, that can be
directly compiled and executed (when a compiler becomes available), or translated to a suitable
programming language.
The approach is constructivist because all entities are constructed bottom-up, with the
fundamental principles of nature being at the bottom, and their existence is proved by construction.
The new logic is mathematically introduced and later discussed in the context of trans-
formations of algorithms and computer programs. We discuss what a full self-programming
capability would really mean. We argue that self-programming and the fundamental question
about the origin of algorithms are inextricably linked. We discuss previously published, fully
automated applications to self-programming, and present a virtual machine that supports the logic,
an algorithm that allows for the virtual machine to be simulated on a digital computer, and a fully
explained neural network implementation of the algorithm.
Keywords: AGI, emergent inference, mathematical logic, self-programming, virtual machine
1. Introduction
The original motivation for the logic discussed in this paper was the discovery in the course
of computational experiments of extraordinary self-organization properties in canonical matrices
when a certain functional was minimized. This was later confirmed by additional computational
experiments. The fact that this logic was directly discovered, rather than engineered or derived
from another theory, is important, because it implies that the new logic is natural.
This work is licensed under the Creative Commons Attribution 3.0 License.
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
P ISSANETZKY
Causal sets are a particular case of partially ordered sets. Canonical matrices have certain
advantages for the computational experiments, but they are equivalent to causal sets and causal
sets are more fundamental, so the decision was made to continue developing the theory in terms
of causal sets. In addition, causal sets are used to represent causality in physical systems, thus
establishing a strong link between causality and the new logic. I therefore propose that the new logic
be known as causal logic (CL), and the corresponding inference as causal inference (CI). There
have been previous attempts at establishing a causal logic with a proper semantics and inference,
see for example Shafer (1998). But these attempts were limited, in large part due to the lack of
an appropriate inference. CI has been discussed in Pissanetzky (1984) and Pissanetzky (2011c).
Mathematical evidence for the existence of CI was discussed in Pissanetzky (2011a, Section 4).
CL finds its theoretical foundation in the fundamental principles of Physics: causality,
symmetry, least-action, and thermodynamics. The Principle of Causality states that effects follow
their causes. The Principle of Symmetry states that every symmetry of the action in a physical
system has a corresponding self-organized, invariant structure, also known as an attractor. The
Principle of Least Action specifies the trajectory of a dynamical system in its state space as one
of least action. And the laws of Thermodynamics establish that the energy of a closed system can
not change, and its entropy can not decrease. These four principles summarize the supreme law
of nature, the law of laws. CL is grounded on the four principles because it was derived directly
from them. By interpreting the functional mentioned above as physical action, as discussed by
Pissanetzky (2012e), CL becomes a theory of Physics on its own right. The theory is explained in
more detail in the supplementary material (Pissanetzky, 2012b).
The conservation laws and resulting invariant structures of behavior are of particular interest
in artificial intelligence because of the well-known ability of our brain to create structures of
knowledge that are conserved (also known as invariant representations of knowledge), in the sense
that they remain invariant under certain transformations. For example, a person recognizes a chair
even if the chair moves to a different position or is turned upside down, or if the person moves the
eyes or the head, because the representation of the chair in the brain remains invariant under these
transformations. These matters are further discussed in Section 2.1 below.
One way to assess the power of representation of a mathematical model of the world is to
enumerate the different physical systems it can represent. By this measure, computer models are
likely to be the most powerful. But any computer program is a causal set. A finite algorithm is
defined as one with a finite number of variables and finite domains for the states of the variables. All
computer programs used in real-world computers satisfy these conditions and are finite algorithms,
even if they are meant to approximate real-valued functions. Finite algorithms satisfy the definition
of causet, and are causets. The set of variables of a finite algorithm, when appropriately renamed
to take causality into account, satisfies the definition of causal set. A finite algorithm will either
halt or be periodic. If it halts, it is itself a causal set. If it is periodic, the period is a causal
set. Detailed algebraic transformations between notation used for causal sets and programming
languages, both ways, have been obtained and published (see Pissanetzky, 2009, Sec. 3). Although
the transformations are somewhat cumbersome, they can be automated for each programming
language once and for all. Note that a causal set can be disconnected and represent multiple causal
sequences, or admit random events that initiate new causal chains, just like computer programs.
These issues, and the sources of causal sets, are further discussed in Section 3 below, in connection
with the question about the origin of algorithms. Algorithmic aspects of partially ordered sets have
also been considered by Schröder (2002).
12
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
R EASONING WITH COMPUTER CODE : A NEW M ATHEMATICAL L OGIC
Hence, causal sets are at least as powerful as computer models to represent the world. In Section
2.4, we will see that, because of the inference, they are in fact far more powerful than that. Causal
sets are also equivalent to acyclic digraphs and canonical matrices.
When working with causal sets and computer models, it is important to keep in mind that causal
logic works very differently than traditional software development. In traditional development, a
human developer acquires knowledge about something and then writes a program about it. In causal
logic, knowledge is not represented as a traditional computer program. Instead:
In causal logic, knowledge is entered as input. There is no program representing
knowledge. If causal logic is implemented on a computer, then a program is
necessary to handle input/output and to minimize the functional, but not for processing
knowledge. The inference, not the programmer, generates the program as output from
the knowledge provided as input.
The fact that knowledge is input and program is output, is what makes causal logic essential for
self-programming. It is also essential for all applications where knowledge of the world comes
from sensors, not necessarily from humans. This is the case for all autonomous agents.
Causal logic, with the functional that powers it, is a pure mathematical object. It is natural and
immutable. It exists in and of itself, independently of anything else. This fact has been verified
by applying CL to hundreds of causal sets. CL has, however, strong and direct connections with
numerous aspects of Physics, Computer Science, Artificial Intelligence, Complex Systems Science,
and other sciences. CL starts from the symmetry that all causal sets have, and finds the attractor. The
symmetry is represented by the set of legal permutations of the causal set. To this symmetry, there
corresponds a partition of the set known as a block system that is invariant under the permutations.
More detail is given in Section 2.1.
A very strong connection between CL and intelligence was found at the precise moment where
CL was discovered. CL was discovered in the course of a very simple computational experiment
performed by the author, using himself both as the observer an the subject. The experiment was one
of refactoring, and the problem was a simple law of Analytical Mechanics, the law of independence
of the components of motion of a point mass. The experiment is not objective, but it is reproducible
by anyone. Its value is not in its objectivity, but in the fact that it facilitated a discovery, the discovery
of CL, that had eluded so many other researchers. This was achieved because the experiment
was simple enough and the selected problem was appropriate for this type of research. Several
similar experiments were later performed using documented and published data from other human
subjects. These experiments are objective and observer-independent. They confirmed that CL does
describe high brain function in the examined subjects and established the connection between CL
and intelligence. Of course, nothing experimental is definitive, and many more experiments are
needed.
There really isn’t any preliminary work by other authors conducive to CL. CL solves the famous
binding problem. A quick way to introduce the binding problem is the following statement by
Douglas Hofstadter, written in his characteristic style (Hofstadter, 1985, page 633):
The major question of AI is this: What in the world is going on to enable you to convert
100,000,000 retinal dots into one single word ‘mother’ in one tenth of a second?
In other words, how do the retinal dots bind together and produce ‘mother’? It can be said that
the binding problem started in mid-19th century with Hermann von Helmholtz and continued in the
13
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
P ISSANETZKY
20th century with Bertrand Rusell and many of the greatest names of that century. An account of this
research can be found in the overview of related work (Pissanetzky, 2012c) in the supplementary
material. But the problem remains unsolved to this day.
Section 2 introduces the theory. Included are the fundamentals of causets and causal inference,
the nature and properties of the resulting structures, an example with an application to parallel
programming, and an introduction to second-order causal logic.
Section 3 covers applications and the practical use of causal logic to solve problems. Material
in this Section is strongly focused on the issue of setting up a problem in terms of causal sets.
Experience has shown that many developers, not accustomed to using causal sets, tend to believe
that using them would be more difficult than writing a program. That is not so, and the Section
attempts to dispel the feeling.
Section 4 deals with implementations. Included in Section 4 are the virtual machine, a new
much better version of a previously published algorithm known as SCA, and a neural network
implementation of causal logic that can be installed as a massively parallel computer, with a fully
explained example. The connection between causal logic and the brain is further explored, and
several computational experiments based on causal logic are referenced and briefly discussed.
Additional material that is either too detailed or less directly related to self-programming
is provided as supplementary material. Section 4.6 contains a description of the material and
links to the supplementary files. The material includes an overview of the theory, the reading
of which requires a background in Physics, and a special section on verification of the theory,
where predictions obtained from the theory are compared against experimental results and heuristic
theories from several other disciplines, including Neuroscience, Biophysics, Evolution, Artificial
Intelligence, Computer Science, even Google patents. Of critical importance is the prediction that
dendritic trees in the brain must be optimally short, made in (Pissanetzky, 2011a, Sec. 3.8), which
was recently independently confirmed by Cuntz, Mathy, and Häusser (June 2012), who proposed a
2/3 power law for all regions of all brains across species, with extensive experimental support and
replacing a previous 4/3 power law. This is a dramatic confirmation of the theory proposed in the
present paper.
Also included as supplementary material is an extensive and detailed case study (Pissanetzky,
2012a), based on a publicly available Java program, and a study on some aspects of object
recognition (see Pissanetzky, 2012d).
14
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
R EASONING WITH COMPUTER CODE : A NEW M ATHEMATICAL L OGIC
the subject of second order logic. Second order logic is only briefly discussed in Section 2.6 because
it is beyond the scope of the paper.
The most important application of causets is causal analysis, the mathematical analysis of causal
physical systems. In this case the binary relation ω is one of precedence, and the symbol ≺ is used
to represent it. Here is an example of a causet with 10 elements and 9 precedence relations:
S = {a, b, c, d, e, f, g, h, i} (1)
ω = {a ≺ b, a ≺ c, b ≺ d, c ≺ e, c ≺ g, d ≺ f, e ≺ h, g ≺ i}
Σ = (S, ω).
The nature or meaning of the elements of S is irrelevant. In the example, they represent tasks to be
assigned to the processors of a parallel computer.
To every causet Σ = (S, ω) there corresponds an acyclic directed graph G = (V, E) where to
every x ∈ S there corresponds a vertex in V , and (x, y) is a directed edge in E iff (x ≺ y) ∈ ω.
If G is connected, Σ is also said to be connected. If G is not connected, then G has connected
components, and to every connected component in G there corresponds a connected component in
Σ, and the connected component is itself a causet. For definitions and algorithms on digraphs see
for example Pissanetzky (1984, Ch. 5).
Causal sets are finite partially ordered sets, and share their properties. There is a vast literature
on partially ordered sets, much of it dedicated to continuous partially ordered sets. For finite
partially ordered sets, see for example Caspard, Leclerc, and Monjardet (2012).
15
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
P ISSANETZKY
To perform its task, which is to find the subset of selected permutations from set Π(S, ω), CI
uses a functional F , defined as an expression in terms of the partial order ω. The minimization of
F restricts the space of legal permutations to a subspace that has the properties being described.
Let now π ∈ Π(S, ω) be a legal permutation of S. Functional F , called the cost of permutation
π, is now defined over set Π as follows:
1. number the elements of S in the order they appear in π, starting, say, from 1, and let ν(α) be the
number assigned to some element α ∈ S;
2. if r : α ≺ β is a relation in ω, then the cost of relation r is defined to be C(r) = ν(β) − ν(α),
and C(r) > 0 because π is legal; and
3. L(π), the cost of permutation π, is twice the sum of the costs of all relations in ω. The expression
for the functional is: X
L(π) = 2 C(r). (2)
r∈ω
A functional is a map that assigns a number, not necessarily different, to each permutation. In this
case, the number is a positive integer. For example, permutation (b, e, g, d, f, h, i, a, c, j) in Eq. (1)
is legal. The numbers assigned to elements of S are ν(b) = 1, ν(e) = 2, ν(g) = 3, etc. The cost of
this permutation is 28.
The functional plays the role of a hash function on the space Π(S, ω) of legal permutations.
Like a hash function, it partitions Π into subsets, each containing permutations with the same cost.
It then becomes possible to minimize the functional, which means finding the subset of permutations
with the minimum cost, say Πmin (S, ω) ⊆ Π(S, ω). Subset Πmin is non-empty and may contain
more than one permutation. However, unlike a hash function, the minimization of L(π) creates
associations among the elements of S, and the associations result in structure, which is detectable.
Πmin has the following remarkable mathematical property:
A block system is a partition of S into blocks, or disjoint subsets, that have the property of being
stable, meaning that they are invariant under the action of Πmin . In other words, the same blocks
are found in all permutations in Πmin . For our purposes, we use minimal blocks, that is, blocks that
can not be reduced in size without loosing the property of being blocks.
When the block system is generated, the relations in the partial order ω of the original causet
are separated into two groups. Some of them are encapsulated into the blocks. For all practical
purposes, they become a property of the blocks. The remaining relations are induced to the block
system itself, and become relations among the blocks. Induction makes the block system itself
a causal set, the elements of which are the blocks, and the partial order of which consists of the
induced relations. In summary, the process just described can be viewed as a function that takes
a causet and obtains another causet that represents a block system over the first. To formalize this
statement, define:
16
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
R EASONING WITH COMPUTER CODE : A NEW M ATHEMATICAL L OGIC
where H is the collection of all causets, B is the collection of all inheritance maps, both of size
infinite but countable, and E is the function that takes a causet Σ ∈ H and corresponds it to a
pair (Σ0 , B), where Σ0 ∈ H is the resulting causet, possibly smaller, and B ∈ B is the resulting
inheritance map. An inheritance map is a function that maps from each element of Σ0 to its
corresponding block in Σ, including the order of the elements of Σ contained in the block and
the list of their associations and their types as found by CI.
Function E is believed to be uncomputable, and also unprovable, because it can only be
proved by construction, and constructing all cases is not possible. To be more precise about the
uncomputability of E, the part that is uncomputable is the form of the expression of the functional
L that powers CI, Eq. (2). But once the expression is given, as it is in this paper, its value is easily
computable on any machine. It is very easy to implement CI on a PC. The map of E is bijective,
and the inverse function E −1 exists and is computable. E −1 reconstructs set Σ from Σ0 and the
information contained in the inheritance map. Note that everything being said here about causets
can also be said about algorithms or computer programs. Transformations reported in Pissanetzky
(2009, Sec. III) can be used to compute E −1 in the case of computer code, but in the case of causets
in causet notation it is easier to reconstruct Σ directly.
When function E is applied to a causet, the result is another causet and the corresponding
inheritance map. Function E can be re-applied to the new causet, and the process can be repeated
until an exhaustion condition is reached. The result of the multiple application of E to the original
causet is a nested hierarchy of causets, organized by inheritance into levels of the hierarchy. The
levels of the hierarchy possess the properties of a mathematical fractal. Under certain conditions,
such as the requirement for using minimal blocks in order to maximize the depth of the hierarchy,
the hierarchy is unique. This hierarchy is the conserved structure or invariant representation that
corresponds to the symmetry of the original causet.
E and E −1 can operate on entire algorithms, or computer programs, hence the title of this paper
“Reasoning with computer code.” The elements of the causet can be lines of code, subroutines,
or even entire programs. Of course, the critical question still remains: what if anything do these
conserved structures have to do with the brain’s invariant representations. This question can only be
addressed by experiment. Experiments in Section 4.5 suggest that they are the same.
17
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
P ISSANETZKY
A solution to this problem will now be developed using CI, and compared with the solution
obtained by the human analyst of Fig. 1(a). The procedure involves 5 iterations, illustrated in Fig.
1(b). System Σ corresponds to level L1 in the figure. Each iteration associates tasks, and creates an
invariant partition of set S, taking the system from its current level to the next higher level.
Causet Σ has 336 legal permutations, is reduced (no redundant relations), and connected (one
connected component). The cost of the 336 permutations, according to the functional of Eq. (2),
ranges from 26 to 40. The least cost is 26, and there are 2 permutations with that cost. They are:
(a, b, d, f, c, e, h, g, i) (4)
(a, b, d, f, c, g, i, e, h)
By simple inspection, block system B can be verified to be invariant under the action of the two
permutations of Eq. (4). The difference between set S and system B is the presence of the two non-
trivial blocks (e − h) and (g − i), where the − sign indicates a relation of immediate precedence.
In other words, e immediately precedes h and g immediately precedes i. Block (e − h) not only
encapsulates relation e < h from partial order ω, but also confirms that there exist no other relations
in ω that could affect that relation. The same applies to block (g − i). The two blocks represent
associations that the inference has automatically created, associations that exist in B but did not
exist in S. The process of emergence that leads from S to B is one of inference, because the
associations in B are a new fact derived from known facts. Block system B corresponds to level L2
in Fig. 1(b). Many more examples with small systems like this one are given in Pissanetzky (2011a,
Sec. 4).
ω = {a ≺ b, a ≺ c, b ≺ d, c ≺ e, c ≺ g, d ≺ f, e ≺ h, g ≺ i} (7)
Induction of the relations in ω into block system B of Eq. (5) results in the following partial order:
ω (a ≺ b) (a ≺ c) (b ≺ d) (c ≺ e) (c ≺ g) (d ≺ f ) (e ≺ h) (g ≺ i)
(8)
ω2 (a2 ≺ b2 ) (a2 ≺ e2 ) (b2 ≺ c2 ) (e2 ≺ f2 ) (e2 ≺ g2 ) (c2 ≺ d2 ) f2 g2
The bottom line contains the partial order ω2 for the blocks. The first six relations have been
induced into ω2 . The last two have been encapsulated into blocks f2 and g2 , respectively, and do
18
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
R EASONING WITH COMPUTER CODE : A NEW M ATHEMATICAL L OGIC
not participate in ω2 . Block system B of Eq. (5) can now be written as a causal set, say Σ2 , given
in the alphabetical order:
S2 = {a2 , b2 , c2 , d2 , e2 , f2 , g2 } (9)
ω2 = {a2 ≺ b2 , a2 ≺ e2 , b2 ≺ c2 , c2 ≺ d2 , e2 ≺ f2 , e2 ≺ g2 }
Σ2 = (S2 , ω2 ).
In Fig. 1(b), causal set Σ of Eq. (1) has been represented as level L1, and causal set Σ2 of Eq.
(9) as level L2 . In summary, the process just described is: given a causal set, find the associations,
permute accordingly, calculate the block partition, and use it to obtain a new, usually smaller causal
set.
e g d a b (d − f ) c (e − h)|(g − i) L3
a b d f c (e − h) (g − i) L2
h i f
a b d f c e h g i L1
(a) (b)
Figure 1: (a) The solution to the problem of parallel programming. (b) The 6-level hierarchy of
structures obtained in Section 2.3. “–” indicates immediate precedence, in that order, and
“|” indicates no precedence.
We will now apply the same procedure to Σ2 . Causal set Σ2 is of order n2 = 7, has 6 relations,
is reduced and connected, and has 40 legal permutations with a cost range from 20 to 28. There are
4 least-cost permutations, and they are the following:
(a2 , b2 , c2 , d2 , e2 , f2 , g2 ) (10)
(a2 , b2 , c2 , d2 , e2 , g2 , f2 )
(a2 , e2 , f2 , g2 , b2 , c2 , d2 )
(a2 , e2 , g2 , f2 , b2 , c2 , d2 )
where the “|” sign indicates association but not precedence. Block system B2 is a new causal set,
say Σ3 , with 5 elements and 4 relations. It corresponds to level L3 in Fig. 1(b). Renaming the 5
19
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
P ISSANETZKY
S3 = {a3 , b3 , c3 , d3 , e3 } (12)
ω3 = {a3 ≺ b3 , a3 ≺ d3 , b3 ≺ c3 , d3 ≺ e3 }
Σ3 = (S3 , ω3 ),
corresponding to level L3 of the hierarchy. System Σ3 has only 6 legal permutations, the least cost
is 12, and there are 2 permutations with that cost. They are:
(a3 , b3 , c3 , d3 , e3 ) (13)
(a3 , d3 , e3 , b3 , c3 )
which corresponds to level L4 in the hierarchy. Two more systems and two more iterations are
needed to complete the remaining of the hierarchy, for a total of 6 levels. They are simple enough to
be solved manually, and are left as an exercise for the reader. The highest level in the hierarchy
corresponds exactly to the human analyst’s solution of Fig. 1(a), which was discussed at the
beginning of Section 2.2. This is the nested hierarchy of associations and invariant partitions
determined by causal set Σ of Eq. (1).
This example reveals in a dramatic way the progressive build up of associations and structure
obtained by the inference in each one of the 5 iterations. The first iteration associates e with h and g
with i, each in immediate sequence but independently of each other, indicating that the two pairs of
tasks can execute in parallel. The second iteration associates d with f in sequence, and pairs e − h
and g − i, not in sequence, creating a parallel computer with 2 CPU’s, shown in level L3. Iteration
3 associates b to d − f in sequence, creating the sequence b − d − f , suitable for a single CPU.
Iteration 3 also associates task c as a predecessor of both e − h and g − i. In a similar way, one step
at a time, iterations 4 and 5 find the remaining associations and create the parallel computer in Fig.
1 (a).
At this point, it is very important to review the process that led from the definition of system Σ
in Eq. (1) to the resulting structures shown in Fig. 1(b). This process is mechanical, its only input is
system Σ, and its output is the hierarchy of block systems. Nothing in the process depends on Σ or
contains any intelligence related to Σ. The only parameter in the process is the functional of Eq. (2),
but the form of the functional is universal and independent of Σ, and its value is determined only
by the partial order ω. It follows that all the structures in Fig. 1(b) depend on, and are determined
only by the partial order ω in system Σ. This statement gives rise to the following principle, which
is actually a consequence of the Principle of Emergence:
Principle of Structure of the Information. The causal information describing a
system determines its structure.
The process by which the structures are found is called emergence or self-organization. This process
removes entropy from the physical system and causes it to converge to attractors, which are the
structures being sought. At that point, the system becomes conservative. Its state space shrinks to
the attractors, and it is forced to remain there unless it again gains energy, and therefore entropy. In
20
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
R EASONING WITH COMPUTER CODE : A NEW M ATHEMATICAL L OGIC
AGI, the process of gaining entropy is called learning. As the system learns, it gains entropy and
becomes disordered. At the same time, if it is open, it again looses energy to the environment and
converges to a different order. The two processes compete, causing the system to constantly evolve
between states of disorder and constantly changing attractors. An outside observer can only detect
the attractors. The observer will see the system as if it were in a state of constant change, of constant
motion.
21
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
P ISSANETZKY
22
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
R EASONING WITH COMPUTER CODE : A NEW M ATHEMATICAL L OGIC
t = {a, b} (15)
s = {c, d}
x = {e, f, g, h}
v = {i, j, k, `}
fx : t × s → x (16)
fv : t × s → v
fK : x×v →K
Functions fx , fv , and fK are defined in Fig. 2. The symbol ? used in the table defining fK indicates
values that are not used.
v
s s ` i k j s
c d c d g ∗ ∗ m ∗ c d
a e f a i j f ∗ ∗ ∗ n a m n
t t x t
b g h b k ` e ∗ m ∗ ∗ b m n
h n ∗ ∗ ∗
fx : t × s → x fv : t × s → v fK : x × v → K fK? : t × s → K
∗ used in Section 2.6.
Figure 2: Definitions of functions fx , fv , fK , fK
The independent variables are t and s, where t corresponds to time and s will be shown to be a
symmetry. Since each of x and v depend on t and s, and K depends on x, v, one would expect K to
also depend on t and s. However, the fact that each of x and v depends on both t and s establishes
a bijection between x and v and reduces set x × v from 16 elements to only 4. In fact, if t = a and
s = c, then x = e and v = i, and other combinations of the value x = e with other values of v are
eliminated. The result is that only 4 values of K are possible. For example, if x = e, then the value
v = j can not happen. This property is the equivalent of a Lagrangian formulation in Physics.
If, in addition, the 4 allowed values of K are pairwise the same, indicating the existence of a
symmetry, then set K has only 2 elements, m and n in this case, or K = {m, n}, and K becomes
independent of t as indicated by function fK ? in Fig. /refFIG124. Set K is the conserved quantity.
This effect is a cancelation of causality, one where two causal relations cancel each other as a result
23
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
P ISSANETZKY
of properties of the elements. The cancelation of causality is symmetry, and the symmetry results
in a conserved quantity, in this case set K.
3.1 Functions
According to the definition of causet, any function f : x → y is a causet if sets x and y are finite. If
x and y are disjoint, then S = x ∪ y. If they are not, the causet can still be obtained by renaming
the elements of y. For example, the function y = x2 on the natural numbers x < 4. The causal set
describing this function is:
S = {1, 2, 3, , 10 , 40 , 90 } (17)
24
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
R EASONING WITH COMPUTER CODE : A NEW M ATHEMATICAL L OGIC
ω = {1 ≺ 10 , 2 ≺ 40 , 3 ≺ 90 }
More in general, any finite-domain algorithm or computer program that halts can be converted to
causet form. If it doesn’t halt, then it is periodic, and each period is a causet. There follows that any
computer model of a physical system can be converted to a causet model. Conversely, any causet
can be written as an algorithm or computer program. The fact that computer models have been used
to simulate virtually everything, gives a good idea of the power of representation of causets. These
issues have been examined un full detail in Pissanetzky (2009, Sec. III), and are briefly revisited in
Section 3.3 of this paper.
A function over an infinite domain can be described in the same way we humans do in our own
mental representations: by reference to the notion of mathematical limit. This is covered in Section
3.2.
An interesting case, is that of a system of simultaneous algebraic equations, where “simultane-
ous” means that all the equations must be satisfied at the same time and are therefore not causal.
However, every method used to actually solve such a system, for example substitution, or the Gauss
method, of the inverse matrix method, is sequential and causal, and can therefore be described by
a causal set. The case of differential equations, including simultaneous differential equations, is
discussed in Section 3.4.
25
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
P ISSANETZKY
The value found in memory location A at a certain time t is increased by 1 and the result is stored in
the same location at a later time, say t + τ . Formally, this same operation can be written as follows:
At+τ = At + 1; (19)
which is now clearly causal and can be represented by causet S = {At , At+τ , 1}, ω = {At ≺
At+τ , 1 ≺ At+τ }. Converting a computer program to causet notation entails unrolling all loops and
making explicit the time dependence of variables. The result can be a very large causet. However,
if certain allowances are made, it should be possible to develop a C-like or Java-like language that
allows loops and memory and code reuse and can still be mathematically treated as a causet for uses
such as applying this logic and reasoning with the code, and automatic object-oriented analysis and
refactoring.
Causets can also be obtained directly from certain models of computation (Bolognesi, 2010).
26
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
R EASONING WITH COMPUTER CODE : A NEW M ATHEMATICAL L OGIC
This statement specifies four elements, credit card, name, address, and number, and declares that
name, address, and number precede credit card (one can’t have a credit card without a name,
an address, or a number). An implementation of the causal set may realize that “name” has no
predecessors, meaning it has not been defined, and ask the question:
what is a name?,
where first name, middle initial, and last name are added as elements of the set and declared as
predecessors of name. That way, a large causal set is built step by step. If, in addition, a process of
causal inference exists, that process can go on uninterrupted while the process of learning is taking
place. Since CI is behavior-preserving, the two processes, CI and learning, are independent, and
can work concurrently. This is a characteristic of host-guest systems. A more detailed discussion of
learning with a teacher is available in (Pissanetzky, 2009, Sec. V.F).
Note that the nature of the elements introduced by the teacher is irrelevant. An element can
have a structure of its own, or be as complicated or as abstract as desired. When the teacher states
that name has first name and last name, he is specifying additional structure for the existing variable
name. At a higher level of learning, most of the elements are of a much higher level, but they are
still elements that can be used in a causal set and/or refined to any desired degree. Think of the
many computer-based courses that teach just about everything. If the course is a program, then it is
already a causet and can be converted to causet notation, even without an understanding of natural
language processing. Computer-based courses are certainly a very important source of causets.
27
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
P ISSANETZKY
compiled to regular machine code, once suitable interpreters and compilers are developed. It also
communicates directly with humans because transformations between causets and programming
languages are possible, both ways, and will be fully automated once suitable translators become
available. See (Pissanetzky, 2009, Sec. III) for details, and Section 4.3 in this paper for a small
example.
A random number generator works as a source for causets. The system prompts the generator and
obtains a random number in response. That’s a causal relation. The response of the system to the
value of the random number is a collection of causal sequences. To each value there corresponds a
different causal sequence, and they all can be represented as a single causal set or used to grow an
existing one. Note that random numbers used in computers are always finite integers, even if the
generator is a chip based on truly random quantum noise, so the condition that causets are finite is
not violated.
The same considerations apply to any random interactions or unexpected events that may happen
inside the system being modeled. For example, a radioactive nucleus may suddenly decay, where
the term “suddenly” gives rise to two causal sequences, one representing the reaction of the system
to the decay, and the other representing the normal evolution of the system for the case where the
decay has not happened.
A partition of a set S is a collection of subsets of S such that every element of S belongs to exactly
one of the subsets. If set S has a partial order ω, then ω will induce a partial order, say ω 0 , among
the subsets. If the collection of subsets is designated as set S 0 , so that the elements of S 0 are the
subsets of S, then S 0 together with partial order ω0 is a causet. Partition of a causet followed with
order induction generates another smaller causet.
Causal inference (CI), discussed in Section 2.1, systematically generates partitions of a given
causet with certain properties of invariance. CI is an important mechanism for causet generation.
Section 3 is about using CL to solve problems. It may be useful to close it with the following general
remark. Assume you are in doubt. You are working with CL and you are not sure how to interact
with it and get it to do something useful. Then, I recommend the following state of mind: imagine
you are working with a human. An infant, a student, a professional. Just imagine. I am not saying
CL will do what they do, but just imagine. If you work with a human, you would either teach her,
or make sure she already knows whatever you need her to know.
With CL, it is just the same. You train it. I have done it, many times, see my computational
experiments. CL does not care about meaning (in first-order CL). If you have a problem, you train
it with the knowledge pertinent to that problem. Of course, it must be all the pertinent knowledge,
but there is no need to dig any deeper, for example there is no need for the CL to learn everything
that an infant is supposed to learn. If you are solving a problem in self-programming, there is no
need for the CL to know how to grab or understand a natural language. If the CL is for a washing
28
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
R EASONING WITH COMPUTER CODE : A NEW M ATHEMATICAL L OGIC
machine, it does not need to know high Mathematics. Just like humans. But unlike humans, once a
CL prototype for a washing machine is built, copies can be made and mass-produced.
If you think of CL as if it were a human, then my computational experiments, reported or
referenced in this paper, will serve as general directions. You will be building prototypes. Mass-
production will come later.
29
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
P ISSANETZKY
A prototype virtual machine installed on a personal computer has been used for all the
computational experiments reported in this paper.
In the machine, the input stream brings causal information from the outside world. The causet
implemented by the virtual machine grows as it learns that information. The process that minimizes
the functional self-organizes the information by associating elements and organizing them into
blocks, and generates the invariant representations as a hierarchy of block systems. The block
systems are algorithms, or self-programs, automatically generated by the virtual machine from the
causal input and ready to be compiled and executed on a regular computer or sent to a robotic
actuator.
An analysis of some of causet’s mathematical properties and their expected impact on Artificial
Intelligence and Software Engineering is available (Pissanetzky, 2011a). A chronology of discovery
is also included.
30
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
R EASONING WITH COMPUTER CODE : A NEW M ATHEMATICAL L OGIC
same. The last mechanism may be the more realistic, as biological neurons in the brain are known
to have up to 10,000 synapses per neuron. For our purpose here, which is to explain the algorithm,
it is easier to think that neurons do switch their positions physically.
This version of the neural net is one-dimensional. Two-dimensional and three-dimensional
versions are possible, and are possibly much more interesting than the 1D version, but have not been
explored. Also, for comparison with the brain, a 3D version would possibly be more appropriate.
The initial set up of the algorithm requires the definition of a causet, such as the one given in
Eq. (1), and let n = |S|.
Step 1. Initial setup. To each element e, associate three numbers: the number of predecessors p(e),
the number of successors s(e), and their difference p(e) − s(e). I will refer to this difference as
the “net pull” on element e. Predecessors are said to ‘pull up” from the element, successors “pull
down” (pull left and pull right may also be used). These 3 numbers will remain constant during
the entire algorithm, which guarantees the preservation of behavior and makes the algorithm more
efficient because the 3 numbers are never recalculated. Finally, arrange all elements, or “neurons,”
along a line in the order of some arbitrary but legal permutation.
Step 2. Pair selection. Consider any arbitrary pair of adjacent neurons. If they are commutative,
go to Step 3. Otherwise discard and go to Step 2 again. Two neurons are said to commute if they
are not related by any precedence relation. If no commutative pairs are found, then stop.
Step 3. Commutation. Calculate the “differential pull”, which is the net pull of the neuron on the
left less the net pull of the neuron on the right of the pair. If the differential pull is negative, commute
and go to step 2. A negative differential pull is said to be profitable, because the commutation
reduces the cost of the functional. If the differential pull is positive, do not commute, and go to step
2. If it is 0, then commutation is optional. Usually, commutation is effected if some profit for doing
so can be found. Otherwise it is not effected. Examples are given below. Go to step 2.
This completes the basic algorithm. A more advanced version should include the notion of shift,
or migration of an element. Consider one element, say e, and its adjacent element to the right, say
e0 . Let also e00 be the element next to e0 on its right. If the commutation (e, e0 ) is performed, then
element e becomes adjacent to element e00 . If e and e00 are commutative, and the commutation is
effected, then the net effect would be that e has shifted right for 2 steps. Element e can continue
shifting right until it meets one of its successors, where it must stop.
The net shift may be very short or very long. Elements can also shift left. The shift of an
element is characterized by its direction and its extent, the number of steps involved in the shift.
When performing a shift, the algorithm has two undetermined parameters: the selection of seed
element, and the direction of shift, left or right, for that step. Results may be different for different
combinations of selections. This arbitrariness is actually an advantage, because it can be used to
find many different least-cost permutations and populate the set Πmin (S, ω) by simply randomizing
both parameters. As explained above, it is necessary to have as many least-cost permutations as
possible in set Πmin , short of having them all, in order to obtain a good block system.
This algorithm is local. The actual value of the global cost is irrelevant. It is fully parallelizable,
by assigning one adjacent pair per processor, and it is much simpler than the matrix-based version,
because there is no need to access the tables of rows and columns in the matrix. The algorithm
operates directly from the given causet, and preserves behavior.
When a causet contains more than one connected component, this algorithm will find them all
and will create an initial block system with one block per component. The components are subsets
of the causet, and can be given in any order, which is precisely the condition for a subset to be a
31
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
P ISSANETZKY
block. Finding a partition in components is very convenient. When it happens, it becomes possible
to divide the problem into smaller subproblems, one per each component, and solve each one of
them separately.
Experience gained with the study of small systems (Pissanetzky, 2011a, Sec. 4) indicates that
a higher-order version of this algorithm that includes block commutation is necessary. Block
commutation considers the commutation of two adjacent blocks of elements, at least one of which
contains more than one element. It is possible for the commutation of two adjacent blocks to be
profitable as a whole even though no profitable single-element commutations are available inside
the blocks or at their boundary. Shifts, considered above, are an example of block commutation.
Block commutation would follow the same rules outlined above, but is not further expanded here.
Because of the conjecture made in this paper that meaning comes from associations, and that
associations require inference, so that the causal order of things is
and because CI creates associations and structure and therefore meaning, but ignores any previous
meaning of the elements of the set, the presentation in this Section intentionally starts with a causet
that appears meaningless for the reader. Of course, the interpretation of the results, including the
meaning that the elements have for us humans, are discussed at the end of the Section.
The model presented here is minimalist. It extracts only a few features deemed basic and
essential for the operation of a physical system. Minimalist models do not explain the system in
full, but they serve as proof of principle and provide a foundation for further, more detailed exploits.
A researcher, working long before Newton, had observed the motion of a billiard ball in three
dimensions, and had measured a number of parameters that appeared to describe the motion. He
named the parameters arbitrarily, because, for him, they didn’t have any meaning.
The researcher followed an approach known as a thought experiment. After interpolating his
experimental points, he came up with 18 simple equations that represent the motion with good
accuracy. To help the reader into a state of mind similar to that of the researcher, I will do the
same thing: I will write the 18 equations, using arbitrary symbols, and not telling what their modern
32
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
R EASONING WITH COMPUTER CODE : A NEW M ATHEMATICAL L OGIC
S = {a, b, c, d, e, f, g, h, i, j, k, `, α, β, γ, δ, , ζ} (21)
ω = {a ≺ k, b ≺ α, c ≺ j, d ≺ `, e ≺ k, f ≺ j, g ≺ `, h ≺ β, i ≺ , j ≺ ζ, k ≺ γ, ` ≺ δ}
Σ = (S, ω).
The input variables, defined as variables that participate in the equations but do not appear in set S,
and are assumed to be given, are: κ, θ, η, τ, σ, π, ξ, φ, ρ, µ, ν, λ.
The application of the SCA algorithm to the causet of Eq. (21) is displayed in full detail in Fig.
3. The original set S is listed in any arbitrary, but legal order, in line 1. In this case, the order is
that of the given equations. The pull for each element is listed below the name of the element. This
pull is carried along with the element as the element shifts. Pairs eligible for commutation are those
with a negative pull. In this case, the only pair is (i, j), which has a differential pull of -2. Once i
and j are commuted, i becomes adjacent with α. Pair (i, α) again has a differential pull of -2, so
it is also commuted, causing i to become adjacent with β, and so on. In all, element i shifts all the
way to , but not further because pair (i, ) is not commutative. Line 2 indicates the result.
In line 2, h is shifted right to the left of β. In line 3, g is shifted right just left of `. Note that,
at some point in the course of this last shift, g reaches h, where the differential pull of (g, h) is 0.
33
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
P ISSANETZKY
When the differential pull is 0, commutation is optional, and a decision must be made on whether
to commute or not to commute. In this case, it is convenient to effect the commutation and allow g
to reach β, where the pull of pair (g, β) is favorable. The same strategy is applied many times in the
course of the process.
In line 4, since pair (f, j) is not commutative, e is shifted right to k. In line 5, d is shifted right
to `. In line 6, since commuting pair (c, f ) would be worthless, b is shifted right to just left of α.
In line 7, a is shifted right to e. The boundary for a is k, but it is not worth commuting a with e.
Finally, in line 8, since f can not be shifted right, ζ is shifted left all the way to the right of e, where
it stays. Line 9 contains the final least-cost permutation. In this case, no more profitable shifts are
possible.
In this process, a particular choice was made for the direction and extent of each shift. But
the choice is arbitrary, and the resulting permutation depends on the choice. By randomizing the
selection and extent of each shift, or, preferably, of each commutation, different permutations will
be obtained. This idea constitutes a systematic procedure for obtaining the set of all least-cost
permutations of set S, which is required for the calculation of the block system. See an example
in Eq. (10). In this particular case, the given causet has n = 18 elements and r = 12 relations.
1. Original a b c d e f g h i j α β k γ ` δ ζ
−1 −1 −1 −1 −1 −1 −1 −1 −1 1 1 1 1 1 1 1 1 1
2. Shift i right to a b c d e f g h j α β k γ ` δ i ζ
−1 −1 −1 −1 −1 −1 −1 −1 1 1 1 1 1 1 1 −1 1 1
3. Shift h right to β a b c d e f g j α h β k γ ` δ i ζ
−1 −1 −1 −1 −1 −1 −1 1 1 −1 1 1 1 1 1 −1 1 1
4. Shift g right to ` a b c d e f j α h β k γ g ` δ i ζ
−1 −1 −1 −1 −1 −1 1 1 −1 1 1 1 −1 1 1 −1 1 1
5. Shift e right to k a b c d f j α h β e k γ g ` δ i ζ
−1 −1 −1 −1 −1 1 1 −1 1 −1 1 1 −1 1 1 −1 1 1
6. Shift d right to ` a b c f j α h β e k γ g d ` δ i ζ
−1 −1 −1 −1 1 1 −1 1 −1 1 1 −1 −1 1 1 −1 1 1
7. Shift b right to α a c f j b α h β e k γ g d ` δ i ζ
−1 −1 −1 1 −1 1 −1 1 −1 1 1 −1 −1 1 1 −1 1 1
8. Shift a right to e c f j b α h β a e k γ g d ` δ i ζ
−1 −1 1 −1 1 −1 1 −1 −1 1 1 −1 −1 1 1 −1 1 1
9. Shift ζ left to j c f j ζ b α h β a e k γ g d ` δ i
−1 −1 1 1 −1 1 −1 1 −1 −1 1 1 −1 −1 1 1 −1 1
Figure 3: The 9 steps performed by the CI algorithm SCA to find one least-cost permutation.
The number m of connected components must satisfy m ≥ n − r, which is 6 in this case, and
6 components have been found. However, in a general case, the actual number is not known
beforehand. It must be determined, and the components identified. The procedure to do so is very
simple, once some permutation with the least cost has been identified by SCA. Figure 4 illustrates
the procedure. The central row of the table lists the permutation from line 9 of Fig. 3. The top
row lists the number of connections that “pull up” from each element, and the bottom row lists
those that “pull-down.” Connected components are disconnected from each other. A connected
34
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
R EASONING WITH COMPUTER CODE : A NEW M ATHEMATICAL L OGIC
pull up 0 0 2 1 0 1 0 1 0 0 2 1 0 0 2 1 0 1
elements c f j ζ b α h β a e k γ g d ` δ i
pull down 1 1 1 0 1 0 1 0 1 1 1 0 1 1 1 0 1 0
Figure 4: The connected components for the causal set of Eq. (21). The presence of a 0 at the top
of some column, and a 0 at the bottom of the following column identifies a boundary
between two connected components, located between the two columns. This figure
represents the structures existing but hidden in the original information, Eqs. (20).
component must have edges with 0 connections pulling up, and 0 pulling down. The table very
clearly shows 6 components, 3 with 2 elements each, and 3 with 4 elements each, as indicated
by the vertical divisions. In addition, all three 2-element components have (0, 1) of them pulling
up, and (1, 0) pulling down, indicating that they are objects of the same class. The same happens
with all three 4-element components, where the pull-up and pull-down counts are (0, 0, 2, 1) and
(1, 1, 1, 0), respectively. Furthermore, there exists a one-one association between each 2-element
object and each 4-element object. In conclusion, the SCA algorithm has identified 6 connected
components, consisting of two classes with 3 objects each. Meaning can now be derived from
the various structures and associations just described. For example, they can be named, and those
names will be “meaningful.” The 4-element class could be named “coordinate,” and its 3 objects as
rx -component, ry -component, and rz -component. The 2-element class would be “velocity,” with
elements vx , vy , and vz . The associations relate rx with vx , ry with vy , and rz with vz , effectively
separating the 6 objects into 3 groups of 2 objects each. This property is known in Physics as the
law of independence of the components of motion. It is usually taught in Physics-101.
All these conclusions have not been created by CL. They obviously exist in the original
observations, Eqs. (20), but they are not recognizable. CL is the process that organizes the
information by creating meaningful structure. This process is also known as binding, or sometimes
as binding of the qualia. The binding process is not limited to software. There are many other
applications, but this is a paper on software, and the other applications are beyond scope.
Just as Eqs. (20) are a computer program, so also are the structures defined in Fig. (4). But
with one difference: an object-oriented (OO) design exists in the last case, which did not exist in
the first. CL has created a full OO design for an originally very poorly organized and non-OO piece
of software. CL, which is not a program and has not even been told what OO is, has, nevertheless
and of its own initiative, created the OO design. It is possible to write a program for converting Fig.
(4) directly into Java code.
As explained above, Eqs. (20) were intentionally written in a confusing notation. This was
done to insist on the point that CL does not depend on meaning. Quite on the contrary, CL is
believed to be the source of meaning. To complete this argument, it is now necessary to explain the
physical meaning of the various variables, as understood by a human physicist, and compare with
the meaning of the results obtained by CL. The problem at hand is that of a billiard ball of mass m
whose center of mass is initially at position r0 and has velocity v0 . The ball is subject to the action
of a constant force F (bold-face characters indicate vectors). The problem is to find the position r
and velocity v of the center of mass after a time interval ∆t and under the action of a constant force.
35
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
P ISSANETZKY
36
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
R EASONING WITH COMPUTER CODE : A NEW M ATHEMATICAL L OGIC
• CL is natural, a property of nature, not man-made. It has been experimentally observed, not
derived from another theory, not engineered or designed for any particular purpose. The brain is
a natural organ, a not man-made. It evolved by natural selection, because individuals with bigger
brains survived better, not because evolution intended to “engineer” it for some particular purpose.
• The theory of CL is consistent with the foundations of theoretical Physics. The brain is a physical
system, that obeys the laws of Physics.
• CL can be viewed as a neural network (Section 4.3), where elements of the causet correspond to
neurons and causal relations to synaptic connections (of the on-off type), and the connections serve
as memory to store knowledge.
• CL admits of a massively parallel implementation on specialized hardware, where execution time
is approximately independent of the size of the problem. This is possible because the functional is
naturally local, not because it had been designed to be that way. By being local, a big problem can
be subdivided in many small problems, each of which can be solved by a local processor, and all
processors work at the same time. The “massive parallelism” of the brain is legendary and doesn’t
need an introduction.
In view of all of which, I have conjectured (Pissanetzky, 2011a, conjecture K2), but did not
claim, that CL is the logic used by high brain function to perform those tasks. This conjecture can
only be addressed experimentally. CL has not been actually observed in the brain, although some
indirect evidence is available such as the existence of neural cliques (Lin, Osan, and Tsien, 2006),
that remind one of the clusters of artificial neurons that form in the neural network of Section 4.3,
and the existence of a regular 3D grid structure underlying the complexity of the primate brain
(Wedeen et al., March 2012).
To support this research, I have also proposed an experimental program based on computational
experiments that avoid the complexities of implementation of the brain by considering it as a host-
guest system with an input and a corresponding output. This approach is discussed in the next
Section.
37
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
P ISSANETZKY
published material is used as the source of the necessary information. Three suitable types of
sources of such material have been identified, and, fortunately, they are abundant and easily
accessible. They are: object-oriented software designs, theories of Physics, and visual images.
However, depending on their particular characteristics, they may be used in different ways in their
corresponding experiments.
I have performed only one experiment using a visual image. The input is a digitized picture, in
this case a set of black points on a white background, and the output is the resulting structural
hierarchy. Actual recognition would have required to keep a database of previously generated
hierarchies in memory, which is not allowed by the initial condition, so machine recognition was
not included in the experiment. To deal with this limitation, human observers (in one case the
entire audience in a workshop), were asked to interpret a display of the digitized image, which they
had never seen before, and describe their reaction. The experiment is described in Section 4.5 of
Pissanetzky (2011a), and it showed satisfactory results.
I have performed several experiments in the area of object-oriented software design. In a typical
case, a human analyst receives a problem statement, for example describing a company, and possibly
some additional explanations from the company’s subject-matter engineers. The analyst is not
expected to know anything about the company prior to receiving the document or the explanations,
so the initial condition is satisfied. Actual development is not included in the experiments in order
to avoid the need for a programming language, which developers usually learn beforehand and keep
in memory at the time they engage in the analysis. Since the actual information received by the
analyst is usually considered confidential and is published, I had to recreate it by applying function
E −1 to the published structures. This involves a behavior-preserving process where all structure is
removed from the software (for example by translating from Java to single-assignment C), and the
resulting causal set is randomized to destroy any remaining order before using it as input to CL.
I have performed several experiments with software. One them is discussed in Section 4.3. I
call this experiment the foundational experiment for CL, because it is the experiment where CL
was discovered. A detailed account of the process of discovery is available in the Introduction
of Pissanetzky (2011a). Based on the fact that natural systems self-organize, and computers and
programs do not, I knew I had to look for self-organization in a natural system. I chose myself as
the system, the high function in my own brain. I intended only for the experiment to be simple
enough to search for CL, but not necessarily objective or observer-independent. It is, however,
easily reproducible by anyone. For the setup, I selected one of the GUAPs, and I wanted to find out
why it was so easy for people but so difficult for computers. I started with the simplest problem I
could find, the one discussed in Section 4.3.
Since then, a number of observer-independent computational experiments have been conducted
in the area of software self-organization, all of them based on independently created published
material, described in full detail by other authors and publicly available. The final product, obtained
by a human analyst, was converted first to an unstructured causet by application of function E −1 ,
and a process of learning was used to train the CL virtual machine with that material. CL was,
then, applied to generate structures, and the final structures obtained by CL were compared with the
original man-made structures.
One of such experiments is a rather extensive and detailed case study on object-oriented analysis
and design, supported by a complete set of files, available as supplementary material. This case
study is based on a publicly available Java program that describes a real-world token-ring, and is
38
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
R EASONING WITH COMPUTER CODE : A NEW M ATHEMATICAL L OGIC
used to teach refactoring in European universities. The reader, particularly if interested in self-
programming, is encouraged to peruse this material.
Another experiment on software is the example on parallel programming discussed in Section
2.2 of this paper.
In the area of theories of Physics, I have performed only two experiments. One of them is the
foundational experiment, which represents a law of Physics based on Newton’s equations of motion.
The other is a set of equations known as the Euler equations for the motion of a rigid body. This
last one was not published. There was also a study of hundreds of small causal systems, as reported
in Section 4 of Pissanetzky (2011a), intended not so much as experiments but as an analysis of the
properties of CL.
Results were satisfactory in all cases. Not one single disagreement was found. Many of these
experiments are toy problems. But the size of an experiment is not a good measure of its value. The
size depends on factors such as the power of the computer, the amount of money available to buy a
bigger one, and the time available for developing and perfecting the programs. Better measures are
the significance of the principle underlying the work, and the scalability of the process. By these
measures, the computational experiments reported here are very strong.
39
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
P ISSANETZKY
Having established all that, I proceeded to argue in support of my claim that causal logic can
be used for reasoning with computer code. I noted that, under certain conditions, any algorithm is
a causal set, because it satisfies the definition. And I also noted that any program is a causal set,
because the program is a finite algorithm. If put in reverse, these statements read that causal sets
themselves are executable programs, and that anything said about causal sets or causal logic also
applies to computer programs. Which implies that causal logic applies to computer programs and
qualifies as a mathematical approach to computer programming that allows reasoning with code.
I have further noted that the hierarchies of structures created by causal logic represent classes of
objects and inheritance relationships among them, that the blocks in the levels of the hierarchy are
objects of those classes, and that an entire hierarchy can be considered as a UML class diagram
representing the originally given knowledge in an organized and structured way. To qualify
the reference to “originally given knowledge,” I examined a number of sources of knowledge
about the world, and the particular ways in which these various sources always produce causal
information about the world that can be expressed in the form of a causal set, or a computer
program. By enumerating several of those sources, I meant to imply that their number includes
nearly every process used in computation to represent knowledge and work with it. Based on
which, I have noted that causal sets represent a means for universal connectivity, and proposed
a software development platform based on causal sets that can communicate with man by way of
understandable programming languages, and with machine by way of executable code.
Still pursuing the same approach, I have considered traditional computer programming as a
process where a human analyst learns from the world, codes the acquired knowledge, and reasons
with the code in order to organize it and find structures for an algorithm. Finally, the analyst
translates the algorithm to a programming language that both man and machine can understand.
In view of all of which, I have proposed that a self-programming machine should be able to do the
same.
But machines cannot reason with code. People can. The inability of machines to reason with
code has left the deepest scar in the whole of modern computation: the separation of input and
program. Input is information about some system in the outside world. Program is information
about the same system in the outside world, only this time the information has to be reasoned upon
by a human analyst. Information about the system of interest has been artificially separated into two
parts, one for the machine, the other for the programmer. A self-programming machine should be
able to do the reasoning on its own and not depend on a human programmer.
The machine proposed in this paper has gone a long way in that direction. It has no program
specific to the system under analysis. It still has, at least in this version, a program to deal with
input/output and to minimize the functional, but the reasons for that are only circumstantial, dictated
by limitations in time and resources. With time, that program too will become part of the input. The
unification of input and program is now possible in view of the discovery of causal inference and
the new ability of machines to reason with code.
With those goals in mind, I have presented a virtual machine that implements the inference and
needs no problem-specific program. I have discussed an implementation of the virtual machine
as a neural network, and presented and fully explained a direct application of the neural network
to a real problem. I have also presented or referenced a number of examples or experiments that
illustrate actual experience in self-programming with real-world problems, without any need for
human intervention. They include a case study in Java code for a token ring, an example in parallel
programming where tasks are assigned to processors, and a small example in image recognition
40
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
R EASONING WITH COMPUTER CODE : A NEW M ATHEMATICAL L OGIC
that serves as a proof of principle. The experiments cover major tasks in Software Engineering
such as object-oriented analysis, refactoring, and system integration. All these tasks are in the area
of computer programming, if done by a human developer, or of self-programming, if done by the
machine.
All the technology needed to proceed with further development of these projects is currently
available. An easy (but slow) implementation of causal logic on a personal computer is accessible
to anyone with computer experience. To make it faster, a massively parallel neural network-style
implementation is necessary, using specialized hardware, and following the guidelines given in the
corresponding section. It may be possible to develop a hardware unit for causal logic that plugs in a
USB port, and make it commercial. The parallelization of causal logic is, quite possibly, one of the
very first tasks that needs to be undertaken next.
The development of Causal Logic (CL) is just beginning. This long paper is only an
introduction. There is a very large number of connections with a variety of branches of science,
each one of which needs to be explored and appropriately accounted for. A large volume of future
research is expected after publication of this paper. My vision is that, with time, mass-produced
CL machines trained for different purposes will become available, in the form of LSI chips. Users
or manufacturers will be able to specialize the machines by providing additional training specific
to their interests, or even directly from sensors. For now, our task is to make all that come into
existence.
References
Bolognesi, T. 2010. Causal Sets from Simple Models of Computation. arXiv 1004(3128):1–33.
arXiv:1004.3128 Available electronically from http://arxiv.org/abs/1004.3128.
Caspard, N.; Leclerc, B.; and Monjardet, B. 2012. Finite Ordered Sets. New York: Cambridge
University Press.
Cuntz, H.; Mathy, A.; and Häusser, M. June 2012. A scaling law derived from optimal dendritic
wiring. PNAS, 2012, DOI: 10.1073/pnas.1200430109 1–5. Available electronically from
http://www.pnas.org/content/109/27/11014.
Hofstadter, D. R. 1985. Metamagical Themas: Questing for the Essence of Mind and Pattern. New
York: Basic Books, Inc.
Lin, L.; Osan, R.; and Tsien, J. Z. 2006. Organizing principles of real-time memory encoding:
neural clique assemblies and universal neural codes. Trends in Neuroscience 29(1):48–57.
Available electronically from http://www.ncbi.nlm.nih.gov/pubmed/16325278.
41
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM
P ISSANETZKY
Pissanetzky, S. 2011b. Emergent inference and the future of NASA. Workshop, NASA,
NASA Gilruth Center, Johnson Space Center, Clear Lake, TX. Available electronically at
http://www.scicontrols.com/Publications/AbstractNASA2011.pdf.
Wedeen, V. J.; Rosene, D. L.; Wang, R.; Dai, G.; Mortazavi, F.; Hagmann, P.; Kaas,
J. H.; and Tseng, W. I. March 2012. The geometric structure of the brain fiber
pathways. Science DOI: 10.1126/science.1215280:1628–1634. Available electronically from
http://www.sciencemag.org/content/335/6076/1628.abstract.
42
Unauthenticated | 174.29.14.92
Download Date | 12/22/13 4:58 AM