4-input 2-output Boolean circuits

In 2005, Donald Knuth determined the minimum cost required to implement each of the 2^32 different 5-input 1-output Boolean functions as a circuit composed entirely of:

  • 2-input gates (there are 16 of these), each of which has cost 1;
  • 1-input gates (there are only 2 of these, namely the identity function and the NOT gate), each of which has cost 0.

Given that NOT gates are free, every 2-input gate is either silly (ignoring one or both of its inputs) or is equivalent to either AND or XOR.

A previous cp4space post discusses how we can efficiently determine the NPN equivalence class (that is to say, the equivalence class of functions up to permuting and negating the inputs and/or output(s)) of a function given its truth table, and therefore to query a database of optimal circuits for the 616126 equivalence classes.

I decided to attempt the analogous feat for 4-input 2-output functions. There are still 2^32 of them, but the equivalence classes are smaller:

  • For 5-input 1-output functions, the symmetry group (by whose action we’re quotienting) has 2^5 × 5! × 1! × 2^1 = 7680 elements;
  • For 4-input 2-output functions, the symmetry group has 2^4 × 4! × 2! × 2^2 = 3072 elements;

and, as such, there are more equivalence classes: 1476218, as mentioned here. This number can be calculated exactly using Burnside’s lemma, or approximated from below using the ceiling of the leading-order term: ceil(2^32 / 3072) = 1398102.

The breadth-first search

With about 3750 CPU core-hours and a lot of memory usage, I was able to determine the optimal circuits for all of these 1476218 equivalence classes of 4-input 2-output Boolean functions. The number of classes and functions of each cost are tabulated below:

costs

Representatives of the four classes of cost 0, for example, are:

  • f(x1, x2, x3, x4) = (0, 0);
  • f(x1, x2, x3, x4) = (0, x1);
  • f(x1, x2, x3, x4) = (x1, x1);
  • f(x1, x2, x3, x4) = (x1, x2);

and representatives for the eight classes of cost 1 are:

  • f(x1, x2, x3, x4) = (0, x1 AND x2);
  • f(x1, x2, x3, x4) = (0, x1 XOR x2);
  • f(x1, x2, x3, x4) = (x1, x1 AND x2);
  • f(x1, x2, x3, x4) = (x1, x1 XOR x2);
  • f(x1, x2, x3, x4) = (x3, x1 AND x2);
  • f(x1, x2, x3, x4) = (x3, x1 XOR x2);
  • f(x1, x2, x3, x4) = (x1 AND x2, x1 AND x2);
  • f(x1, x2, x3, x4) = (x1 XOR x2, x1 XOR x2).

The methodology was that of a breadth-first search, taking advantage of symmetry to vastly reduce the search space. The search up to depth 8, described here, was conducted using a multithreaded program (taking 115 core-hours), outputting a hefty 27-gigabyte file containing the entire search tree.

Each node in the tree at depth n is an equivalence class of sets of n distinct (from each other and their complements) nontrivial 4-input 1-output functions which can be implemented with minimum cost exactly n. Intuitively, the nodes in the tree represent initial segments of circuits, up to equivalence. Even though the tree grows super-exponentially* as a function of depth, it was still possible to explicitly compute and store the first eight levels:

 

  • 1 node at depth 0;
  • 2 nodes at depth 1;
  • 15 nodes at depth 2;
  • 156 nodes at depth 3;
  • 2396 nodes at depth 4;
  • 50865 nodes at depth 5;
  • 1376962 nodes at depth 6;
  • 45189111 nodes at depth 7;
  • 1733295202 nodes at depth 8.

* technically, the entire tree is finite, because there are only finitely many sets of distinct 4-input Boolean functions, so ‘super-exponentially’ does not apply asymptotically. This is the same (very pedantic!) reason that it’s hard to make precise the notion that the number of positions in a Rubik’s cube that require k moves to solve ‘grows exponentially’ as a function of k — there are only finitely many positions, and Tom Rokicki showed that indeed the tree stops at depth 20. It’s also why a population of bacteria or viruses can’t grow exponentially forever: they run out of limited resources, and the volume that they occupy is bounded above by the sphere which contains the population at time 0 and expands at the speed of light — an upper bound on volume which is cubic as a function of time!

 

From there, I checked every irreducible way to append 2 more gates to one of these circuits, expanding the search to depth 10, and keeping track of the optimal solutions for each of the equivalence classes obtained in this manner. All but nineteen of the equivalence classes were solved with circuits of cost <= 10, providing a lower bound of 11 for those nineteen difficult classes. This lower bound of 11 turned out to be achievable in all nineteen cases, thereby conclusively answering the question.

Partially verifying the results

How do we check that the results are correct?

In addition to computing the minimum cost of each of these equivalence classes, the search yielded explicit witnesses (circuits that achieve the minimum cost). A much simpler program can run through all 2^32 functions and verify that these witnesses indeed implement the purported functions. This verifies that the results are correct in one direction, showing that they are correct upper bounds for the minimum cost of each gate.

I can’t see any way to fully verify that they are correct lower bounds, without someone else independently running an exhaustive search themselves (ideally using a different algorithm) and checking that the results match the table above. This is because it’s easy to verify that a circuit is correct, but hard to verify that it’s minimum-cost.

That said, there was a certain amount of self-verification when running the breadth-first search: for example, creating an on-disk file of depth 6 and extending it by 2 gates produced identical results to the on-disk file of depth 8. (In an earlier attempt, I’d overlooked an edge-case, and these results didn’t agree — this was how I was able to catch the bug.)

Posted in Boolean optimisation | 2 Comments

That group of order 348364800

In nested lattices, we talked about the E8 lattice and its order-696729600 group of origin-preserving symmetries. In minimalistic quantum computation, we saw that this group of 8-by-8 real orthogonal matrices is generated by a set of matrices which are easily describable in terms of quantum gates.

However, something that’s important to note is that quantum states are equivalence classes of vectors, where scalar multiples of vectors are identified. That is to say, the state space of n qubits is the complex projective space (\mathbb{C}^{2^n} \setminus \{ 0 \}) / (\mathbb{C} \setminus \{ 0 \}). So, if U is a unitary matrix, then all unitary matrices of the form e^{i \theta} U induce the same transformation.

Consequently, the order-696729600 subgroup of O(8) is not exactly the group of transformations in which we’re interested. Rather, we’re interested in its quotient by its centre {±I}. The resulting order-348364800 group G turns out to be very easy to describe without having to mention the E8 lattice!

G is isomorphic to the group of 8-by-8 matrices over the field of two elements which preserve the quadratic form \sum_{i \leq j} x_i x_j

That is to say, each of the 256 binary vectors of length 8 can be classified as either:

  • odd norm, if the number of ‘1’s in the vector is either 1 or 2 (modulo 4);
  • even norm, if the number of ‘1’s in the vector is either 0 or 3 (modulo 4).

(Equivalently, the norm is the parity of c(c+1)/2, where c is the popcount (number of ‘1’s) in the binary vector.)

Then G is the group of invertible 8-by-8 binary matrices U which are norm-preserving, i.e. Ux has the same norm as x for all 256 of the choices of vector x.

Why?

To understand this isomorphism, we need to return to the description in terms of the E8 lattice, Λ. We’ll form a quotient of this lattice by a copy of the lattice scaled by a factor of two — the resulting set Λ / 2Λ contains 256 points which form an 8-dimensional vector space over the field of two elements!

Moreover, this set Λ / 2Λ consists of:

  • 1 zero point (of even norm);
  • 120 = 240 / 2 nonzero points of odd norm (each of which corresponds to a pair of antipodal vectors in the first shell of the E8 lattice);
  • 135 = 2160 / 16 nonzero points of even norm (each of which corresponds to an orthoplex of 16 vectors in the second shell of the E8 lattice).

There isn’t an orthonormal basis for this set, so we choose the next best thing: a basis of 8 odd-norm vectors which are pairwise at 60º angles from each other! These, together with the origin, form the 9 vertices of a regular simplex in the E8 lattice. This choice of basis results in the norm having the simple expression we described earlier.

For concreteness, we’ll give an explicit set of vectors:

  • e_0 = (½, ½, ½, ½, ½, ½, ½, ½);
  • e_1 = (1, 1, 0, 0, 0, 0, 0, 0);
  • e_2 = (1, 0, 1, 0, 0, 0, 0, 0);
  • e_3 = (1, 0, 0, 1, 0, 0, 0, 0);
  • e_4 = (1, 0, 0, 0, 1, 0, 0, 0);
  • e_5 = (1, 0, 0, 0, 0, 1, 0, 0);
  • e_6 = (1, 0, 0, 0, 0, 0, 1, 0);
  • e_7 = (1, 0, 0, 0, 0, 0, 0, 1);

These satisfy 〈e_i, e_j〉 = 1 + δ_ij, where δ_ij is the Kronecker delta.

Computational convenience

In the article on minimalistic quantum computation, we mentioned that the choice of gates with dyadic rational coefficients were particularly convenient, as the matrix entries are exactly representable as IEEE 754 floating-point numbers. For the three-qubit case, this binary matrix representation is vastly more convenient still!

Firstly, an 8-by-8 binary matrix can be stored in a 64-bit processor register. Composing rotations is then reduced to multiplying these binary matrices.

Certain processor architectures support the operation of multiplying two 8-by-8 binary matrices in a single processor instruction! Knuth’s MMIX architecture calls this instruction ‘MXOR’, and there are at least two commercial computer architectures which also support this instruction: the original Cray supercomputer had this for government cryptographic reasons, and (up to one of the operands being transposed in the process) there is a vectorised implementation in the GFNI extension for x86_64. There was also a proposal for an extension for RISC-V with this instruction.

For instruction sets which don’t support MXOR in hardware, you can achieve the same effect with a bunch of bitwise manipulations. Challenge: try to implement MXOR in as few operations as possible using only bitwise Boolean operations and logical shifts. If I counted correctly, I can manage it in 113 total operations (73 Boolean operations and 40 shifts), but I expect it’s possible to do better than this.

What about inverting one of these matrices in G? It turns out that the exponent of G (the lowest common multiple of the orders of the elements) is 2520, so we can invert an element by raising it to the power of 2519. Using a technique which generalises repeated squaring, this can be accomplished by a chain of 15 MXOR instructions.

Posted in Uncategorized | 3 Comments

More quantum gates and lattices

The previous post ended with unanswered questions about describing the Conway group, Co0, in terms of quantum gates with dyadic rational coefficients. It turned out to be easier than expected, although the construction is much more complicated than the counterpart in the previous post.

We’ll start with a construction for a much smaller simple group — the order-168 group PSL(2, 7) — which can be generated by three CNOT gates acting on the three lower qubits in the diagram below:

psl27

For the moment, ignore the two unused upper qubits; their purpose will become clear soon enough. The three lower qubits have eight computational basis states (000, 001, 010, 011, 100, 101, 110, and 111). Viewing each of these states as a three-dimensional vector over \mathbb{F}_2, the CNOT gates induce linear shearing maps (and together generate the full 168-element group of linear automorphisms).

Now we introduce another two CNOT gates:

directsum

These act only on the two upper qubits, and generate the symmetric group S_3 (freely permuting the states 01, 10, and 11). These two gates, together with the three gates in the previous diagram, therefore generate the direct product S_3 \times PSL(2, 7) of order 1008.

It is helpful to partition the 32 computational basis states into four 8-element sets:

  • the set W consisting of all basis states of the form 00xyz;
  • the set A consisting of all basis states of the form 01xyz;
  • the set B consisting of all basis states of the form 10xyz;
  • the set C consisting of all basis states of the form 11xyz;

where x, y, z are elements of {0, 1}. Then the two gates on the upper qubits are responsible for bodily permuting the three sets {A, B, C}; the three gates on the lower qubits induce the same linear permutation in each of W, A, B, and C (viewed as 3-dimensional vector spaces over the field of two elements).

Note that the permutation group is transitive on the 8-element set W, and transitive on the 24-element set V (the complement of W, or equivalently the union of A, B, and C), but no elements of V are ever interchanged with elements of W. This will remain the case as we continue to add further gates.

For the time being, we’ll therefore suspend thinking about the smaller 8-element set W, and concentrate on the larger 24-element set V.

We now introduce a sixth CNOT gate, bridging the upper and lower qubits:

bridge

This now expands the size of the group by a factor of 64, resulting in a 64512-element group called the trio group. The vector spaces A, B, and C are effectively upgraded into affine spaces which can be semi-independently translated (the constraint is that the images of their ‘origins’ — 01000, 10000, and 11000 — always have a modulo-2 sum of zero).

The trio group, considered as a permutation group on the 24 basis states in V, is a maximal subgroup of the simple sporadic group M24. That means that adding a single further gate to break out of the trio group will immediately upgrade us to the 244823040-element Mathieu group! Unfortunately, I wasn’t able to find a particularly simple choice of gate:

mathieu

The effect of this complicated gate is to do the following:

  • Within each of the sets W and A, apply a particular non-affine permutation which exchanges the eight elements by treating them as four ‘pairs’ and swapping each element with its partner:
    • 000 ⇔ 100;
    • 001 ⇔ 110;
    • 010 ⇔ 111;
    • 011 ⇔ 101;
  • Swap four elements of B with four elements of C by means of the Fredkin gate on the far right.

In terms of its action on V, this is an element of M24, but does not belong to the trio group. Interestingly, it belongs to many of the other important subgroups of M24 — namely PSL(3, 4) (also called M21), the ‘sextet group’, and the ‘octad group’. At this point, the group generated by these seven gates is now the direct product of the alternating group A8 (acting on the set W) and the Mathieu group M24 (acting on the set V).

The elements of the Mathieu group are permutations on the set V, which can be viewed as 24-by-24 permutation matrices. Permutation matrices are matrices with entries in {0, 1}, where there’s exactly one ‘1’ in each row and column. If we throw in a Z-gate acting on the uppermost qubit, we’ll expand this to a group 212:M24 of ‘signed permutation matrices’ instead, where some of the ‘1’s are replaced with ‘−1’s. (The sets of rows and columns containing negative entries must form codewords of the binary Golay code; this is why each permutation matrix has only 212 sign choices instead of 224.) This group is interesting inasmuch as it’s the group of permutations and bit-flips which preserve the binary Golay code. It’s also a maximal subgroup, called the monomial subgroup, of the Conway group Co0.

Instead of using this Z-gate, we’ll bypass the monomial subgroup and jump straight to the Conway group by adding a single additional three-qubit gate:

conway

By a slight abuse of notation, we’ll reuse the symbols W and V to refer to the vector spaces spanned by first 8 basis states and final 24 basis states, respectively. After introducing this final gate, the resulting group of 32-by-32 matrices is a direct product of:

  • an order-348364800 group (the orientation-preserving index-2 subgroup of the Weyl group of E8) acting on the 8-dimensional vector space W;
  • an order-8315553613086720000 group (the Conway group, Co0) acting on the 24-dimensional vector space V.

These are the groups of rotational symmetries of the two most remarkable* Euclidean lattices — the E8 lattice and the Leech lattice, respectively. Indeed, if we take all linear combinations (with integer coefficients!) of the images (under the group we’ve constructed) of a particular computational basis state, then we recover either the E8 lattice or the Leech lattice (depending on whether we used one of the basis states in W or in V).

* as well as being highly symmetric, they give the optimal packings of unit spheres in 8- and 24-dimensional space, as proved by Maryna Viazovska.

These two groups each have a centre of order 2 (consisting of the identity matrix and its negation), modulo which they’re simple groups:

  • the quotient of the order-348364800 group by its centre is PSΩ(8, 2);
  • the quotient of Co0 by its centre is the sporadic simple group Co1.

This process of quotienting by the centre is especially natural in this quantum gate formulation, as scalar multiples of a vector correspond to the same quantum state.

Further connections

If we take n \geq 4 qubits and the gates mentioned in the previous post, we remarked that the group generated is the full rotation group SO(2^n). If instead we replace the Toffoli gate in our arsenal with a CNOT gate, we get exactly the symmetry groups of the Barnes-Wall lattices! The orders of the groups are enumerated in sequence A014115 of the OEIS.

Acknowledgements

Thanks go to Conway and Sloane for their magnificent and thoroughly illuminating book, Sphere Packings, Lattices, and Groups. Also, the website jspaint.app (which is undoubtedly the most useful thing to have ever been written in JavaScript) was helpful for creating the illustrations herein.

Posted in Uncategorized | 1 Comment

Minimalistic quantum computation

In the usual ‘circuit model’ of quantum computation, we have a fixed number of qubits, {q1, q2, …, qn}, and allow quantum gates to act on these qubits. The diagram below shows a Toffoli gate on the left, and an equivalent circuit of simpler gates on the right:

toffoli

These diagrams represent qubits as horizontal lines (so there are 3 qubits in the circuit above), and the operations are applied from left to right. The circuit has 6 controlled-NOT gates (each acting on an ordered pair of qubits) and 9 single-qubit gates (4 T-gates, 3 inverse T-gates, and 2 Hadamard gates).

Whereas the internal state of a classical computer with n bits of memory can be described by a length-n vector of binary values, a quantum computer with n qubits of memory requires a length-(2^n) vector of complex numbers. A k-qubit gate is a unitary linear map described by a (2^k)-by-(2^k) matrix of complex numbers.

Importantly, it’s the exponential increase in the dimension of this vector (from n to 2^n), and not the involvement of complex numbers, which makes quantum computers [believed to be] able to solve more problems [in polynomial time] than is possible with a mere classical computer. To see this is the case, note that a (2^k)-by-(2^k) matrix of complex numbers can be emulated by a (2^(k+1))-by-(2^(k+1)) matrix of real numbers. Specifically, replace each complex entry with a real 2-by-2 block:

emulate-complex

Consequently, a k-qubit complex gate can be emulated with a (k+1)-qubit real gate.

Indeed, it’s possible to restrict to not only real entries, but even to dyadic rational entries. Specifically, the most common universal set of logic gates {T, H, CNOT} consists of matrices whose entries belong to the ring \mathbb{Z}[\frac{1}{2}, \zeta] where ζ is a primitive eighth root of unity. A similar trick means we can work over the ring of dyadic rationals instead, at the cost of just two extra qubits:

emulate-complex2

This is helpful for simulation in software: all finite values representable as IEEE 754 floating point numbers are dyadic rationals, and a partial converse is true: all dyadic rationals with numerator and denominator less than some bound (2^53 for double-precision and 2^24 for single-precision) are representable as IEEE 754 floating point numbers.

A universal pair of 3-qubit dyadic rational gates

Consider the Toffoli gate (left) and a new gate that I’m going to call the ‘XHH’ gate (it’s simply a tensor product of a Pauli X-gate and two Hadamard gates, all acting on separate qubits):

toffxhh

In an n-qubit circuit, each of these gates yields n(n – 1)(n – 2)/2 different matrices depending on the choice of qubits on which it acts, so this set expands to n(n – 1)(n – 2) matrices in total. Then we have the following universality theorem:

When n >= 4, the group generated by these matrices is a dense subgroup of the complete rotation group SO(2^n).

This is the best that we could hope for: when tensored by (n – 3) copies of the 2-by-2 identity matrix, these gates yield orthogonal matrices of determinant 1. It means that any special orthogonal gate can be approximated arbitrarily closely (in a number of gates polylogarithmic in the required precision, by the Solovay-Kitaev theorem), which (together with the above discussion of emulating complex unitary gates with real orthogonal gates) yields universal quantum computation.

An eight-dimensional surprise

More interesting is what happens when n = 3: the gates do not form a dense subgroup of O(8) as we might expect from extrapolating this result downwards (and noting that the Toffoli matrix has determinant -1, so the matrices lie in O(8) instead of SO(8) when n = 3).

Rather, they form a finite group of order 696729600.

This number should be familiar from the last post, because it’s the order of the E8 Weyl group. Every entry of every matrix in this group is not only dyadic rational, but in fact an integer multiple of 1/4. Inter alia, this finite group contains the familiar single-qubit Pauli gates X and Z, as well as the two-qubit CNOT gate.

e8petrie

Orthogonal projection of the 8-dimensional E8 root system into its 2-dimensional Coxeter plane, drawn by J. G. Moxness. The symmetry group of this set of 240 points is the aforementioned E8 Weyl group of order 696729600.

Given a starting state where all qubits are off (so the vector is (1, 0, 0, 0, 0, 0, 0, 0)), by applying these two gates in various combinations it is possible to reach any of 2160 different vectors (specifically, rescaled copies of the 2160 vectors in the second shell of the E8 lattice). If we instead began with an equal superposition of the eight computational basis states, there would be 240 reachable vectors — the root lattice vectors illustrated in the picture above! Again, the numbers 240 and 2160 should be very familiar from the previous article.

(Vectors that are related by scalar multiplication are identified as the same quantum state, so antipodal pairs of vectors in the previous paragraph correspond to the same quantum state. Consequently, there are only 1080 distinct quantum states reachable from the all-off quantum state, or 120 distinct quantum states reachable from the )

Before I found the set {Toffoli, XHH}, I tried the deceptively similar pair {Toffoli, ZHH}. To my surprise, the program I used to compute the group generated by those elements had an order of 2903040 — significantly smaller than the 696729600-element group I had expected! This is the Weyl group of E7, and in this case we get the smaller subgroup because all six matrices fix the vector (1, 1, 1, -1, 1, -1, -1, -1) and therefore only act nontrivially on its seven-dimensional orthogonal complement. Fortunately, the set {Toffoli, XHH} does not have a fixed vector and generates the full Weyl group of E8.

Unanswered questions

The {Toffoli, ZHH} construction of the E7 Weyl group demonstrates that we can realise the origin-preserving isometry group of a 7-dimensional lattice, even though 7 is not a power of two. An even more beautiful and exceptional lattice than the E8 lattice is the 24-dimensional Leech lattice, whose origin-preserving symmetry group is the Conway group Co0. Is there an elegant set of matrices which generate a group isomorphic to Co0 and have a simple description in terms of quantum gates?

Edit: depending on whether you’d consider it elegant and/or simple, there’s an affirmative answer in the next post.

The first nonempty shell of the Leech lattice consists of 196560 points, as opposed to the 240 points in the first shell of the E8 lattice. David Madore has plotted some beautiful projections of this set — they’re as close as possible to being analogous to the Petrie projection of the E8 root system shown above, except inasmuch as Co0 is not a Coxeter group and therefore it’s unclear which plane (if any!) is analogous to the Coxeter plane.

Posted in Uncategorized | 10 Comments

Nested lattices

1, 240, 2160, 6720, 17520, 30240, 60480, 82560, 140400, …

These terms count the number of points at distance \sqrt{2n} from the origin in the E8 lattice, a highly symmetric arrangement of points which Maryna Viazovska recently (in 2016) proved is the densest way to pack spheres in 8-dimensional space.

Even more recently, Warren D. Smith noticed a rather exceptional numerical coincidence — the sum of the first three terms in this sequence is a perfect fourth power:

1 + 240 + 2160 = 2401 = 7^4

Is this merely a coincidence? To begin with, we’ll look at the E8 lattice through a different lens: as a subset not of \mathbb{R}^8, but of complex (alas, not projective) 4-space, \mathbb{C}^4. Specifically, recall the Eisenstein integers, the ring of complex numbers generated by a cube root of unity:

eisenstein

These points have been 3-coloured according to whether the Eisenstein integer is congruent (modulo \sqrt{-3} ) to 0, +1, or −1. The E8 lattice is then concisely expressible as the set of points (w, x, y, z) where:

  • each coordinate is an Eisenstein integer;
  • the colours of the four coordinates form a codeword in the tetracode.

The tetracode is a 2-dimensional subspace of the 4-dimensional vector space \mathbb{F}_3^4. A point (w, x, y, z) \in \mathbb{F}_3^4 belongs to the tetracode if and only if w = y − x = z − y = x − z. Equivalently, the coordinates are of the form (d, a, a + d, a − d).

The real E8 lattice has a symmetry group of order 696729600; this complex E8 lattice has more structure (complex scalar multiplication) which reduces the symmetry group to order 155520. This extra structure is what we need to explain Warren’s coincidence.

Now that we have defined the E8 lattice as a complex 4-dimensional lattice, consider scalar multiplication by 3 + ω. Returning to an 8-dimensional real viewpoint, this linear map corresponds to composing a rotation by a scaling by sqrt(7), and therefore has a determinant of sqrt(7)^8 = 2401. The image of the E8 lattice under this map is a sublattice, geometrically similar to the original E8 lattice, and the Voronoi cells induced by this sublattice each consist of 2401 points from the original lattice (namely one central point, surrounded by an inner shell of 240 points and an outer shell of 2160 points).

Linear subspaces and positional number systems

If we sum the first three terms of the theta series of the E6 lattice, 1 + 72 + 270, we get exactly 7^3. And similarly for the D4 lattice, 1 + 24 + 24 = 7^2. For the A2 (hexagonal) lattice, we have 1 + 0 + 6 = 7^1. These correspond to taking 3-, 2-, and 1-dimensional (complex) linear subspaces of the Voronoi partition described above.

The latter is particularly easy to describe: every Eisenstein integer z can be uniquely expressed in the form (3 + ω)q + r, where q is an Eisenstein integer and r belongs to the set {0, 1, ω, ω², 1, ω, ω²} consisting of zero together with the sixth roots of unity. We can decompose q in the same manner, and continue recursively; this is tantamount to writing an Eisenstein integer in a positional number system with radix 3 + ω and the seven digits {0, 1, ω, ω², 1, ω, ω²}.

If we allow digits after the ‘decimal point’, we can express any complex number in this positional number system. The set of numbers which ’round to zero’ after discarding the digits after the decimal point form the Gosper island, a self-similar set with a fractal boundary:

gosper

There are two different principal cube roots of unity, and each one yields a different (albeit isomorphic) positional number system for complex numbers.

The Hurwitz integral quaternions form a scaled copy of the D4 lattice, and we similarly obtain a positional number system for the quaternions with radix 3 + ω and 49 different digits: the Hurwitz integral quaternions with squared norm at most 2. There are 8 principal cube-roots of unity, and each one determines a positional number system for quaternions. It’s worth commenting that this isn’t the only nice* number system for integral quaternions: radix-(2 + i) with 25 digits (the Hurwitz integers of norm at most 1) also works.

*where ‘nice’ is defined to mean that the digit set consists of all integers with norm <= r for some value of r.

Finally, the Cayley integer octonions form a scaled copy of the E8 lattice, and there are 56 principal cube-roots of unity. Any one of these results in a positional number system for the octonions (which is well-defined despite the non-associativity of the octonions, since any pair of octonions together generate an associative algebra) with radix 3 + ω and 2401 different digits: the integer octonions with squared norm at most 2. We’ll call this number system ‘Warrenary’ after its discoverer. Unlike in the real, complex, and quaternionic cases, Warrenary is the unique positional number system for the octonions satisfying the aforementioned niceness property.

There is no analogue in six dimensions: normed division algebras only exist in dimensions 1, 2, 4, and 8, so there is no positional number system corresponding to the recursive nesting of E6 lattices.

Disappointingly, there appears not to be any similarly elegant lattice nestings for higher-dimensional lattices beyond D4, E6 and E8, such as the 24-dimensional Leech lattice (in particular, no early initial segment of the theta series yields a perfect 12th power as its sum). As such, the recursive nesting of E8 lattices is quite exceptional indeed.

Further reading

The integral quaternions and octonions have many other fascinating and elegant properties, including analogues of unique prime factorisation, which are explained in the book On Quaternions and Octonions by Derek Smith and the late, great John Conway (1937 — 2020).

Positional number systems for the real and complex numbers are described in the Seminumerical Algorithms volume of Donald Knuth’s The Art of Computer Programming.

Posted in Uncategorized | 3 Comments

Self-replicator caught on video

In a previous article, an announcement was made of a complex self-replicating machine (known as the 0E0P metacell) in a simple 2-state cellular automaton. In the interim between then and now, Thomas Cabaret has prepared a most illuminating video* explaining the method with which the machine copies itself:

Note: the video is in French; recently, Dave Greene added an English translation of the subtitles.

* the video is part of Cabaret’s Passe-Science series. You may enjoy some of his other videos, including an explanation of the P vs NP problem and a reduction of Boolean satisfiability to the 3-colourability of planar graphs.

Anachronistic self-propagation

In related news, Michael Simkin recently created a wonderfully anachronistic self-propagator entitled Remini: it uses the same single-channel/slow-salvo construction mechanism as the 0E0P metacell, but it is built from oscillatory components instead of static ones. That is to say, it implements modern ideas using components available in the 1970s.

The project involved slmake together with a suite of additional tools developed by Simkin. There isn’t a video of this machine self-replicating, so you’d need to download a program such as Golly in order to watch it running.

Further reading

For further reading, I recommend (in order):

  • The wiki entry (under construction) for the 0E0P metacell;
  • An article unveiling various simpler examples of self-constructing circuitry;
  • The slmake repository;
  • A tutorial on effective use of slmake;
  • A challenge thread proposing another contraption, that no-one has yet built. This would require the use of slmake followed by some ‘DNA-splicing’ to interleave the construction recipe with extra operations.

 

Posted in Uncategorized | 10 Comments

Five-input Boolean circuits

Over the past few weeks, I’ve been investigating Boolean optimisation. That is to say, given some circuit of logic gates that implements a particular n-input m-output function, find a more efficient circuit that implements the same function. In practical applications, ‘more efficient’ is a multi-objective optimisation problem, with the two highest priorities generally being:

  1. number of logic gates (smaller is better);
  2. depth of circuit (lower is better).

One of the best pieces of software out there is Berkeley’s ABC tool. It represents a circuit in a form called an AIG (AND-inverter graph), which is a directed acyclic graph of 2-input AND gates and 1-input NOT gates (the latter of which are considered to be free). Then, it performs a variety of rounds of local optimisations, such as:

  • searching for 4-input 1-output subcircuits and ‘rewriting’ them by replacing with equivalent subcircuits of fewer logic gates;
  • searching for subcircuits that can be ‘refactored’ as compositions of smaller subcircuits;
  • ‘balancing’ the graph to minimise the circuit depth.

In 2011, Nan Li and Elena Dubrova wrote an article which demonstrated significant improvements by including a selection of 5-input 1-output replacements. Instead of restricting to AIGs, the authors allowed elementary XOR gates in the graph as well, which (in the presence of costless 1-input inverters) has the elegant effect that every 2-input Boolean gate has unit cost.

There are exactly 2^32 = 4294967296 Boolean functions with 5 inputs and 1 output, so it would be infeasible in practice to directly store optimal circuits for all of them. However, up to inverting the inputs, permuting the inputs, and negating the outputs, there are only 616126 equivalence classes (called NPN classes, for ‘negate-permute-negate’). The authors cherry-picked approximately 1000 of those, and used a Boolean matcher to sequentially test a given subcircuit against each of these classes in turn. Doing so for all 616126 equivalence classes would soon get rather slow…

Knuth’s exhaustive search

Earlier, in 2005, Donald Knuth wrote a collection of computer programs to find the lowest-cost implementations of all 616126 NPN classes of 5-input 1-output functions. Instead of Boolean matching, Knuth’s approach was to ‘canonise’ functions: find the lexicographically smallest truth table which is NPN-equivalent to a given function, and use that as the representative for the NPN class. The serious advantage is that lookup only takes constant time, by using the canonical truth table as a key into a hashtable.

To avoid a full brute-force search, Knuth cleverly approached the problem by induction: try to describe a larger circuit (implementing a harder function) in terms of smaller circuits (implementing easier functions). He separated the inductive step into three cases:

  • Top-down: If we can compute A in n gates and B in m gates, then f(A, B) can be computed in n + m + 1 gates, where f is an arbitrary gate.
  • Bottom-up: If we can compute C(x1, x2, x3, x4, x5) in n gates, then we can compute C(f(x1, x2), x2, x3, x4, x5) in n + 1 gates, and C(f(x1, x2), g(x1, x2), x3, x4, x5) in n + 2 gates.
  • Special: Anything not of the above form. By assuming that it’s not of either of the previous cases, the possible structure of such a circuit can be constrained considerably, reducing the size of the brute-force search.

Eventually, he had solved all but 6 NPN classes of functions (each of which he knew required either 11 or 12 gates). By some extra computational horsepower, he eventually solved these last holdouts, finding that all but one could be managed in 11 gates, and therefore the last one required exactly 12.

Optimal5: an efficient database of Knuth’s solutions

One slight impasse from a usability perspective is that the above results were separated across several databases (for top-down and bottom-up steps), text files (for the majority of the special chains), and even in the README file (for the last 6 NPN classes). As such, I realised that it’s worth organising Knuth’s results into a more convenient form.

This was the motivation behind optimal5: a database I created with two aims:

  • Consolidating Knuth’s research into a uniform database;
  • Making function canonisation as efficient as possible, allowing fast lookup;

The first of these tasks was routine — it just involved tracing the inductive constructions (including keeping track of permutations and negations of intermediate results) and ‘unpacking’ them into complete normalised circuits. It was rather laborious owing to the piecemeal structure of the inductive proof, but not particularly noteworthy.

The second of these tasks was both much more mathematically interesting and challenging. In Knuth’s original code, a function is canonised by iterating through all 3840 (2^5 . 5!) permutations and negations of the inputs, negating the output if necessary to ensure the circuit is zero-preserving, and taking the lexicographic minimum over all of those choices.

But 3840 is quite a large number, so even with Knuth’s very streamlined bitwise tricks, it still took a whole 10 microseconds to canonise a function. After Admiral Grace Hopper’s unforgettable lecture about nanoseconds and microseconds and what length of wire would be hung around my neck per microsecond, I wanted to improve upon that.

If all of this discussion about 5-input 1-output Boolean functions is rather abstract, imagine a 5-dimensional hypercube such as the one below, which is deservedly the logo for the project:

polytope4

A 5-input 1-output Boolean function corresponds to a way to colour the vertices of this hypercube red and green. Two such functions are NPN-equivalent if you can rotate/reflect one hypercube, and possibly alternate the colours, to make it identical to the other. (And if 5-dimensional hypercubes are too difficult to visualise, just visualise 3-dimensional cubes instead — this simplification doesn’t actually ruin any of the intuition.)

This 5-dimensional (resp. 3-dimensional) hypercube has 10 faces (resp. 6). So we can systematically place each one of those face-down, and look at just the 16 vertices (resp. 4) on the top face, and find out the top face’s canonical form by looking it up in a 2^16-element lookup table. So we’ve made 10 lookups so far, one for each face.

Now, a canonical hypercube must have a canonical top face, so we can discard whichever subset of those 10 orientations (in most cases, it will be 9 out of 10) don’t achieve the lexicographical minimum, and concentrate only on the others. At that point we could do an exhaustive search over 384 permutations, instead of 3840, and save ourselves a factor of 10 in most cases (and gain nothing for very highly symmetric functions, such as the parity function). If I recall correctly, this gave an improvement to about 1.6 microseconds. Much better, but I’d still prefer not to have Admiral Hopper suspend half a kilometre of conducting wire around my neck, thereby necessitating even more mathematics:

Hamiltonian paths

Of course, there’s no point traversing all 384 permutations, since you know that (once you’ve made the top face lexicographically minimal) only the elements in the stabiliser subgroup of the top face have any chance of resulting in the lexicographically smallest colouring of the entire cube. So we can instead traverse this subgroup. I decided to ask on MathOverflow whether anyone knew how to do solve the Travelling Salesman Problem efficiently on a Cayley graph, but they didn’t, so I implemented the Held-Karp algorithm instead. Specifically, I opted for:

  • If the stabiliser has at most 24 elements, use the optimal Hamiltonian path determined by Held-Karp;
  • Otherwise (and this case is sufficiently rare that it doesn’t matter that it’s slower), just traverse all 384 elements as before.

Being far too lazy to manually write code for all 75 subgroups that arise in this manner, I instead wrote a much shorter program to generate this code on my behalf. (If you’re wondering whence the constant 1984 arises, it’s the smallest modulus such that all 222 canonical 4-input functions have distinct residues; this is a rudimentary example of perfect hashing.)

By this point, it took a total of 686 nanoseconds on average to canonise a function, look up the circuit in the hashtable, transform that circuit back to the original function, and check the result.

Further optimisations

Using the profiler perf I was able to see that the canonisation was no longer the bottleneck, and the other things were taking the lion’s share of the time. Satisfied with the algorithm, I slightly rewrote parts of the implementation to make it faster (e.g. fixed-size data structures instead of std::vectors for representing circuits), and slashed the total time down to 308 nanoseconds.

Observing that the hashtable lookup itself was taking much of the time, Tom Rokicki helpfully suggested replacing the std::unordered map with a custom implementation of a hashtable (ideally using perfect hashing, as with the Hamiltonian path lookup, or a ‘semi-perfect’ compromise). Back-of-the-envelope calculations suggested that such a hashtable would end up being very sparse, with lots of empty space, annihilating much of the memory advantage of only storing one representative per NPN equivalence class.

Then finally I did something that required ε of effort to accomplish: I simply searched the Internet for the fastest hashtable I could find, swapped the std::unordered_map with this fancy ‘flat hashmap’, and crossed my fingers. The result? 209 nanoseconds. The performance profile is now sufficiently uniform, with no clear bottlenecks or obvious sources of algorithmic inefficiency, that I’m happy to leave it there and not try to squeeze out any extra performance. Moreover, 60 metres of wire isn’t nearly as uncomfortable as the three kilometres we started with…

Future work

I was having a discussion with Rajan Troll, who wondered whether some multi-output rewriting steps could be useful. A back-of-the-envelope calculation (taking the leading term of the Polya enumeration formula and discarding the other terms) suggests that there are about 1.4 million NPPN* classes of 4-input 2-output functions.

*the two outputs can be freely permuted, as well as the four inputs, ergo the extra P. (I suppose that if I had multiple interchangeable inputs and outputs, whatever that means, I would be an APPG.)

Since using 4-input 2-output rewriting could enable logic sharing (where two different computations share intermediate results), there seems to be a significant amount of utility in embarking on a Knuth-style search for optimal 4-input 2-output (as opposed to 5-input 1-output) circuits.

I’ve started working on that now, including having written a script to enumerate all of the possible shapes of optimal n-input 1-output Boolean chains. This is sufficient, since any 4-input 2-output circuit can be decomposed into a 4-input 1-output chain (computing one of the outputs) and an n-input 1-output chain (computing the other output), where the second chain’s inputs may include intermediate values from the first chain.

Updates to follow as events warrant…

Posted in Boolean optimisation | 3 Comments

(W^2 + X^2) / (W^2 + X^2 + Y^2 + Z^2)

The Box-Müller transform is a method of transforming pairs of independent uniform distributions to pairs of independent standard Gaussians. Specifically, if U and V are independent uniform [0, 1], then define the following:

  • ρ = sqrt(–2 log(U))
  • θ = 2π V
  • X = ρ cos(θ)
  • Y = ρ sin(θ)

Then it follows that X and Y are independent standard Gaussian distributions. On a computer, where independent uniform distributions are easy to sample (using a pseudo-random number generator), this enables one to produce Gaussian samples.

As the joint probability density function of a pair of independent uniform distributions is shaped like a box, it is thus entirely reasonable to coin the term ‘Müller’ to refer to the shape of the joint probability density function of a pair of independent standard Gaussians.

What about the reverse direction?

It transpires that it’s even easier to manufacture a uniform distribution from a collection of independent standard Gaussian distributions. In particular, if W, X, Y, and Z are independent standard Gaussians, then we can produce a uniform distribution using a rational function:

U := \dfrac{W^2 + X^2}{W^2 + X^2 + Y^2 + Z^2}

The boring way to prove this is to note that this is the ratio of an exponential distribution over the sum of itself and another independent identically-distributed exponential distribution. But is there a deeper reason? Observing that the function is homogeneous of degree 0, it is equivalent to the following claim:

Take a random point on the unit sphere in 4-dimensional space (according to its Haar measure), and orthogonally project onto a 2-dimensional linear subspace. Then the squared length of the projection is uniformly distributed in the interval [0, 1].

This has a very natural interpretation in quantum theory (which seems to be a special case of a theorem by Bill Wootters, according to this article by Scott Aaronson arguing why quantum theory is more elegant over the complex numbers as opposed to the reals or quaternions):

Take a random qubit. The probability p of measuring zero in the computational basis is uniformly distributed in the interval [0, 1].

Discarding the irrelevant phase factor, qubits can be viewed as elements of S² rather than S³. (This quotient map is the Hopf fibration, whose discrete analogues we discussed earlier). Here’s a picture of the Bloch sphere, taken from my 2014 essay on quantum computation:

Bloch sphere and explanation thereof

Then, the observation reduces to the following result first proved by Archimedes:

Take a random point on the unit sphere (in 3-dimensional space). Its z-coordinate is uniformly distributed.

Equivalently, if you take any slice containing a sphere and its bounding cylinder, the areas of the curved surfaces agree precisely:

There are certainly more applications of Archimedes’ theorem on the 2-sphere, such as the problem mentioned at the beginning of Poncelet’s Porism: the Socratic Dialogue. But what about the statement involving the 3-sphere (i.e. the preimage of Archimedes’ theorem under the Hopf fibration), or the construction of a uniform distribution from four independent standard Gaussians?

Posted in Uncategorized | 2 Comments

Complexity of integer multiplication almost solved

Whilst not quite as close as the proofs of the ternary Goldbach conjecture and bounded gaps between primes, there has been a quick succession of two important and somewhat complementary breakthroughs on the computational complexity of integer multiplication:

  • Afshani, Freksen, Kamma, and Larsen proved a lower bound of Ω(n log n) on the circuit complexity of integer multiplication, conditional on a conjecture in network coding.
  • Harvey and van der Hoeven published an algorithm for large integer multiplication, establishing an unconditional upper bound of O(n log n). This is only marginally faster than the O(n log n log log n) Schönhage–Strassen algorithm, overtaking it only for unimaginably large numbers, but is of great theoretical interest because it coincides with the conjectural lower bound. (The authors also showed that the same complexity can be achieved by a multi-tape Turing machine.)

Essentially all modern integer multiplication algorithms are recursive in nature, and the computational complexity depends on the number of levels of recursion together with computational complexity of each level. To summarise:

Screenshot_2019-03-28_19-41-25

In practice, it is common to mix-and-match these algorithms: using FFT-based algorithms (typically Schönhage–Strassen) near the root of the recursion, and switching to Toom-Cook at lower levels, before finally falling back on hardware multiplication at the leaves. This new Harvey–Hoeven algorithm is only suitable for really large integers, and switches to older algorithms (in the manner described) for numbers with fewer than 2^(1729^12) binary digits.

A refinement of the algorithm reduces that to 2^(9^12) = 2^282429536481 binary digits, but that is still much much larger than any number that could be practically stored, even storing one digit per atom in the observable universe.

Posted in Uncategorized | Leave a comment

Fully self-directed replication

A new form of artificial life has been born — and there are no doubts that it directs its own self-replication:

So, what exactly is happening?

  • At 0:06, the organism begins to sequentially construct four identical copies of itself.
  • At 0:14, the original organism self-destructs to leave room for its offspring.
  • At 0:16, each of the four children begin to sequentially construct copies of themselves. By 0:18, there are eight organisms.
  • By 0:24, there are a total of thirteen organisms.
  • At 0:27, the four from the previous generation self-destruct, followed shortly by the eight outermost organisms.
  • By 0:34, the apoptosis of the outermost organisms finishes, leaving behind a clean isolated copy indistinguishable from the original cell.

How does it work? Why did the cells suddenly choose to die, and how did the middle cell know that it was due to survive? And how does this relate to multicellular life?

Update, 2019-05-12: Here’s a high-definition video of the construction of the south-east daughter machine:

History

The field of artificial life is often ascribed to Christopher Langton’s self-replicating loops. We have discussed these previously. A sequence of simple LOGO-like instructions circulate in an ensheathed loop. This information is executed 4 times to construct another copy of the loop (taking advantage of the symmetry of the daughter loop), and then the same tape is copied into the daughter loop:

langton

Langton’s loop

If we quantify the number of times the loop’s instruction tape is utilised, we can represent it as the formal sum 4E + 1C (where ‘E’ represents one tape execution and ‘C’ represents tape copying).

However, there’s more. If the loop were only able to produce one child, the number of fertile loops would remain bounded (at 1), and it is disputed whether such bounded-fecundity ‘linear propagators‘ are actually true self-replicators. Note that at the end of the animation above, the loop has extended a new pseudopodium upwards, and will begin constructing a second offspring.

This continues for each of the sides of the parent loop, thereby giving an overall tape utility of 4(4E + 1C) = 16E + 4C. Note that the inner ‘4E’ comes from the fourfold symmetry of the daughter loop, whereas the outer ‘4E’ comes from the fourfold symmetry of the parent loop.

Anyway, after a while, the colony of self-replicating loops resemble this:

langtons_loop_colony

Colony of Langton’s self-replicating loops. The number of fertile loops grows linearly without bound, and the total number of loops (including the necrotic core at the centre) grows quadratically as a function of time.

Race to the bottom

Five years after Langton’s loops were invented, John Byl removed the inner sheath of the loop to result in a more minimalistic self-replicator, with only 4 tape cells surrounded by 8 sheath cells:

byl_loop_animation

Byl’s simpler self-replicating loop. Image courtesy of Claudio Rocchini

Moreover, the underlying rule is simpler: only 6 states instead of 8. This comes at the expense of reduced flexibility; whereas one could build a larger Langton’s loop by increasing each side-length by n and inserting n ‘move forward’ instructions into the loop, there is no way to construct a Byl loop with any other genome.

Nor does it stop with Byl. In 1993, Chou and Reggia removed the outer sheath from the loop by adding two more states (returning to 8, same as Langton). The loops, which are barely recognisable as such, are only 6 cells in size: half of Byl’s loop and an order of magnitude smaller than Langton’s.

If minimality were the only concern, all of these examples would be blown out of the water by Edward Fredkin’s single-cell replicator in the 2-state XOR rule. However, every configuration in that rule replicates, including a photograph of Fredkin, so it is hard to claim that this is self-directed.

Ancestors of Langton’s Loops

The inspiration for Langton’s loop was an earlier (1968) 8-state cellular automaton by E. F. Codd (the inventor of the relational database). Codd’s cellular automaton was designed to support universal computers augmented with universal construction capabilities: unlike Langton’s loops, the instruction tape can program the machine to build any configuration of quiescent cells, not just a simple copy of itself.

It took until 2010 before Codd’s machine was actually built, with some slight corrections, by Tim Hutton. It is massive:

coddselfrepcomputer_anim

Tim Hutton’s implementation of Codd’s self-replicating computer

Codd’s cellular automaton itself was borne out of a bet in a pub, where Codd challenged a friend that he could create a self-replicating computer in a cellular automaton with fewer states than von Neumann’s original 29-state cellular automaton.

Comparison of replicators

For an n-state k-neighbour cellular automaton, there are n^x different rules, where x \leq n^k is the number of distinct neighbourhoods that can occur. (We get equality x = n^k in the case of asymmetric rules, but for rules with symmetries the count is more complex and depends on the Polya Enumeration Theorem.) Consequently, we can concretely define the ‘complexity’ of the rule (in bits) to be x \dfrac{\log{n}}{\log{2}}.

For instance, Langton’s, Codd’s and Chou-Reggia’s cellular automata all have a complexity of 25056 bits, whereas Nobili’s 32-state adaptation of von Neumann’s original 29-state rule has a complexity of 167772160 bits. Conway’s two-state rule, by comparison, has only 18 bits of complexity.

We can plot the population count (including the tape) of different self-replicating machines on one axis, and the complexity of the rule on the other axis. Interestingly, qualitative categories of replicator such as ‘universal constructor’, ‘loop’, and ‘parity-rule replicator’ form visually distinct clusters in the space:

Near the top of the plot are two rough hypothetical designs of replicators which have never been built:

  • Conway’s original blueprint for a universal constructor in his 2-state 9-neighbour cellular automaton, as described in Winning Ways and The Recursive Universe;
  • An estimate of how large a self-replicating machine would need to be in Edwin Roger Banks’ ‘Banks-IV‘ cellular automaton, described in his 1971 PhD thesis.

The third point from the top (Codd’s 1968 self-replicating computer) also fell into this category, until Tim Hutton actually constructed the behemoth. This has been estimated to take 1000 years to replicate, which is why it is firmly above the threshold of ‘full simulation is beyond present computational capabilities’.

Everything else in this plot has been explicitly built and simulated for at least one full cycle of replication. Immediately below Codd’s machine, for instance, is Devore’s machine (built by Hightower in 1992), which is much more efficient and can be simulated within a reasonable time. The other patterns form clusters in the plot:

  • On the right-hand side of the plot is a cluster of self-replicating machines in von Neumann cellular automata, along with Renato Nobili’s and Tim Hutton’s modifications of the rule.
  • The green points in this centre at the bottom are loop-like replicators. As well as Langton’s loops, this includes evolvable variants by Sayama and Oros + Nehaniv.
  • The bottom-left cluster comprises trivial parity-rule replicators which have no tape and are passively copied by the underlying rule.

The yellow points on the left edge are self-propagating configurations which move by universal construction, but are not replicators in the strictest sense. They are all bounded-fecundity self-constructors, and with the exception of Greene 2013, they do not even copy their own tapes.

Why is the new organism interesting?

Finally, we have the new organism (shown in white on the left-hand side of the log-log plot, immediately below the threshold of practicality). Suitably programmed, this is a parity-rule replicator, and a loop-like replicator, and a universal constructor. It is also the first unbounded-fecundity replicator in Conway’s 2-state cellular automaton.

If we look again at the video:

we can see that, macroscopically, it copies itself in all four directions similar to Langton’s loops. The circuitry is designed such that each new child is placed in the same orientation and phase as the parent. Moreover, we see that the organism is programmed to self-destruct — either before or after constructing up to four children.

Whether or not it self-destructs prematurely depends on what signals it has received from its neighbours. Effectively, the machine receives a signal (a positive integer between 1 and 7, inclusive) from each of the (up to four) neighbours, and a 0 from any empty spaces if there are fewer than four neighbours. It then computes the quantity 8^3 a + 8^2 b + 8^1 c + d, where (a, b, c, d) are the four input signals, and indexes into a 4096-element lookup table to retrieve a value between 0 and 7 (the new ‘state’ of the machine). If 0, it immediately self-destructs without constructing any children; if nonzero, it constructs a daughter machine in each vacant space. Finally, it broadcasts the new state as a signal to all four neighbours, before self-destructing anyway.

In doing so, this loop-like replicator behaves as a single cell in any 8-state 4-neighbour cellular automaton; the rule is specified by the lookup table inside the replicator. We call this construct a metacell because it emulates a single cell in a (8-state 4-neighbour) cellular automaton using a large collection of cells in the underlying (2-state 9-neighbour) cellular automaton.

This is not the first metacell (David Bell’s Unit Life Cell being the first example), but it is unique in having a 0-population ground state. As such, unlike the Unit Life Cell (which requires the entire plane to be tiled with infinitely many copies), any finite pattern in the emulated rule can be realised as a finite pattern in the underlying rule.

Interestingly, every 2-state 9-neighbour cellular automaton can be emulated at half the speed as an 8-state 4-neighbour cellular automaton. As such, we can ‘import’ any pattern from any such cellular automaton into Conway’s rule, thereby obtaining the first examples of:

  • a parity-rule replicator (by emulating Fredkin, HighLife, or ThighLife);
  • a reflectorless rotating oscillator;
  • a spaceship made of perpetually colliding copies of smaller spaceships;

or even the metacell itself, recursively, obtaining an infinite sequence of exponentially larger and slower copies thereof (as if the existing metacell isn’t already too large and slow!).

To simplify the process of ‘metafying’ a pattern from an arbitrary isotropic 2-state 9-neighbour cellular automaton, I have included a Python script; this programs the metacell for the desired rule and assembles many copies (one for each cell in the original pattern) thereof into an equivalent pattern ‘writ large’.

Next time, we shall discuss in greater detail how the metacell itself was built. Until then, you may want to read Dave Greene’s recent article about some of the technology involved.

Posted in Uncategorized | 17 Comments