Associative universal gates

The Boolean function NAND is famously universal, in that any Boolean function on n inputs and m outputs can be implemented as a circuit composed entirely of NAND gates. For example, the exclusive-or operation, A XOR B, can be written as:


This raises the question: are there any associative universal functions? It is routine to check that none of the 16 two-input Boolean functions are simultaneously associative and universal, but perhaps we can replace the set {true, false} with a larger set such that a function of this form exists?

Even better, can we find a nontrivial finite group G such that any function from G^n to G^m can be implemented as an n-input m-output circuit composed entirely of 2-input ‘multiplication gates’ (which merely multiply the two inputs, viewed as group elements)?

This is not possible if there is a homomorphism from G to [the additive group of] a finite field F: the induced function from F^n to F^m must be linear, and there exist nonlinear functions from F^n to F^m, so certain functions from F^n to F^m are not realisable as a circuit of multiplication gates. It follows that there exist functions from G^n to G^m which are not realisable as a circuit of multiplication gates either.

If the group G is solvable, then it must have a homomorphism to a nontrivial Abelian group (the quotient of G by its commutator subgroup), which in turn has a homomorphism to a nontrivial prime-power cyclic group (by the structure theorem), which is indeed the additive group of a finite field.

Consequently, we can restrict attention to unsolvable groups, the smallest example of which is A_5, the alternating group on five elements.

Universality of group operation in A_5

Let γ = (1 2 3 4 5) and consider the function:

f(x) = ((x^15 γ)^6 γ^4)^4

Usefully, the image of f contains exactly two elements — γ and the identity — which we’ll call ‘true’ and ‘false’.

Following the idea from the proof of Barrington’s theorem, we can also implement Boolean logic gates. Specifically, the elements γ = (1 2 3 4 5), δ = (1 3 5 4 2), and ε = (1 3 2 5 4) are all conjugate to each other in A_5, and ε is the group commutator of γ and δ. A Boolean AND gate (which expects each input to be either the identity or γ) can be implemented by:

  • conjugating one of the inputs by an appropriate constant (so that if it were originally γ, it becomes δ);
  • taking the commutator of that with the other input (resulting in ε iff both inputs were γ);
  • conjugating the output by an appropriate constant (so that the output is γ rather than ε).

Similarly, a NOT gate can be implemented by multiplying the input by γ^4 and then conjugating by an appropriate double-transposition. This means that we can construct arbitrary Boolean logic circuits out of ‘A_5 composition gates’ together with constants.

Finally, since every element of A_5 can be obtained by multiplying together a bunch of conjugates of γ, then for each pair of elements α, β of A_5 we can create a ‘demultiplexer gate’ which outputs α when the input is γ and outputs β when the input is the identity — this is analogous to the ‘ternary operator’ (x ? α : β) familiar from many programming languages.

Consequently, we can implement an arbitrary function from (A_5)^n to A_5 by doing the following:

  • for each of the n inputs, multiply it by each of the 60 elements of A_5 and apply f to each of these 60 results (so for each of the n inputs, we get a 60-bit ‘signature’ which acts as an (inefficient!) injective binary encoding of that input);
  • apply some complicated Boolean logic circuit composed of the AND and NOT gates described above, which has 60 Boolean outputs (containing a one-hot encoding of the output of the function);
  • apply appropriate demultiplexer gates to the outputs (so that one of these 60 outputs is the desired output, and the other 59 outputs are the identity element) and multiply the results together (in any order!).

An arbitrary function from (A_5)^n to (A_5)^m can then be implemented by m non-interacting single-output circuits of the above form, all sharing the same n inputs.

Note that this is a very inefficient construction, but it suffices to establish the universality of the composition gate for A_5-valued logic. We’ll discuss efficiency more in a later post, when we discuss Barrington’s theorem and applications thereof.

Posted in Uncategorized | Leave a comment

Another two rational dodecahedra

Since finding one rational dodecahedron inscribed in the unit sphere, I decided to port the search program to CUDA so that it can run on a GPU and thereby search a larger space in a reasonable amount of time. Firstly, let us recall our search methodology:


We exhaustively look for all small positive integer points (a, b) and (c, d) satisfying b < d and ad > bc. The first of these inequalities ensures that (c, d) is further up the upper-left circle than (a, b); the second inequality ensures that the line through the origin and (a, b) intersects the upper-right circle at two distinct points.

For each such candidate pair of points, we analytically compute x and y, checking that they are rational. If so, we check that the four indicated points on the lower circle are indeed concyclic by performing a final determinant check.

This is embarrassingly parallel, so well-suited for a GPU. We launch one thread for each pair of 1 ≤ a, dN, where N = 10000, so a total of 10^8 threads are launched. Each thread performs two nested loops: the outer loop searches each value of b up to (but excluding) d; the inner loop starts with c = 1 and increments it until the inequality ad > bc is violated or c exceeds the bound N.

Dealing with integer overflow

One rather annoying bugbear for searching a larger space was that of integer overflow. In the process of computing x, we need to take the square-root of a degree-6 polynomial in the four integer parameters a, b, c, d and check that this square-root is an integer*:


*since the expression for x is rational if and only if \sqrt{K^2 - L^2(c^2 + d^2)} is an integer.

When the integer parameters grow beyond 1000, this degree-6 polynomial can easily overflow the maximum representable 64-bit integer. To circumvent this problem, we take the following ugly approach:

  • Compute K^2 - L^2(c^2 + d^2) in both double-precision floating-point arithmetic (which is approximate) and in 64-bit integer arithmetic (which is exact, but reduced modulo 2^64).
  • ‘Repair’ any loss of relative accuracy (caused by the subtraction) in the double-precision approximation by rounding to the nearest multiple of 2^64 and adding the exact result reduced into the interval [-2^63, 2^63 – 1].
  • Compute the double-precision square-root. This will have comparable relative error, and therefore small absolute error (i.e. less than 1).
  • Cast the square-root back to a 64-bit integer (which won’t overflow).
  • Check this integer and nearby values to see whether they square (modulo 2^64) to the desired result.

Source code: here.

The analogous computation for y involves polynomials of degree at most 4, so we can safely search all a, b, c, d <= 10000 without overflow. The final determinant check is just the evaluation of a polynomial, so we can use arithmetic modulo 2^64 and not worry about integer overflow until later.


I opted to use an NVIDIA Volta V100 because it’s the most powerful GPU to which I currently have access (via Amazon Web Services). The V100 is huge, consisting of 80 streaming multiprocessors, each of which is capable of simultaneously issuing 4 instructions per cycle, where each instruction is vectorised over 32 ‘threads’. (For this reason, the GPU is sometimes advertised as having 80 × 4 × 32 = 5120 ‘CUDA cores’, but these are not comparable with CPU cores; a modern CPU core with vectorisation capabilities and multithreading is more akin to an entire streaming multiprocessor.)


Schematic of the Volta V100 from the whitepaper. Only 80 of the 84 streaming multiprocessors are actually ‘activated’; this means that the fabrication of the chip is allowed to contain up to 4 manufacturing defects without affecting the validity of the resulting GPU.

Each of the 80 streaming multiprocessors also has 8 ‘tensor cores’, which perform reduced-precision matrix multiplications useful for neural networks. These tensor cores were unused by this search program, as the nature of this particular problem requires high-precision 64-bit integer and double-precision floating-point computations.

Compiling the program with the switch -Xptxas=-v shows that there is no expensive register spillage or other performance problems:

$ nvcc -O3 -Xptxas=-v -lineinfo -o cudodeca
ptxas info : 56 bytes gmem
ptxas info : Compiling entry function '_Z12dodecakernelv' for 'sm_52'
ptxas info : Function properties for _Z12dodecakernelv
64 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 47 registers, 320 bytes cmem[0], 24 bytes cmem[2]

Sixteen hours of continual uninterrupted runtime on the V100 (costing a total of about fifty US dollars) were sufficient to exhaust this space and show that there exist at most 3 primitive solutions within these bounds (full output here). The second and third solutions are much larger than the first:

Solution 1:   a=22,   b=21,   c=22,   d=54,   x=40,   y=10
Solution 2:   a=1680, b=1474, c=2023, d=2601, x=2890, y=578
Solution 3:   a=2860, b=3915, c=3297, d=6594, x=7065, y=2355

The reason for saying ‘at most 3 primitive solutions’ is that the final determinant check is performed modulo 2^64, so false positives can slip through. To be absolutely sure of these results, we need an additional manual verification step using unbounded (‘bigint’) integer arithmetic. Unbounded integers are natively provided in languages such as Python, Haskell, and Wolfram Mathematica; we use the latter for convenience.


We can check the three determinants for the putative solution (1680, 1474, 2023, 2601, 2890, 578)…

$ wolfram12.0.0
Mathematica 12.0.0 Kernel for Linux x86 (64-bit)
Copyright 1988-2019 Wolfram Research, Inc.

In[1]:= f = Function[{x, y}, {1, x, y, x^2 + y^2}] 

Out[1]= Function[{x, y}, {1, x, y, x^2 + y^2}]

In[2]:= Det[{f[a, b], f[x, 0], f[0, y], f[0, -y]}] /. {a -> 1680, b -> 1474, c -> 2023, d -> 2601, x -> 2890, y -> 578} 

Out[2]= 0

In[3]:= Det[{f[(c^2+d^2)/x, 0], f[c, d], f[a, b], f[x, 0]}] /. {a -> 1680, b -> 1474, c -> 2023, d -> 2601, x -> 2890, y -> 578} 

Out[3]= 0

In[4]:= Det[{f[c, d], f[a, b], f[-a, b], f[0, y]}] /. {a -> 1680, b -> 1474, c -> 2023, d -> 2601, x -> 2890, y -> 578} 

Out[4]= 0

…and likewise for the putative solution (2860, 3915, 3297, 6594, 7065, 2355)…

In[5]:= Det[{f[a, b], f[x, 0], f[0, y], f[0, -y]}] /. {a -> 2860, b -> 3915, c -> 3297, d -> 6594, x -> 7065, y -> 2355}

Out[5]= 0

In[6]:= Det[{f[c, d], f[a, b], f[-a, b], f[0, y]}] /. {a -> 2860, b -> 3915, c -> 3297, d -> 6594, x -> 7065, y -> 2355}

Out[6]= 0

In[7]:= Det[{f[(c^2+d^2)/x, 0], f[c, d], f[a, b], f[x, 0]}] /. {a -> 2860, b -> 3915, c -> 3297, d -> 6594, x -> 7065, y -> 2355}

Out[7]= 0

…confirming that these are indeed valid primitive solutions.

Posted in Uncategorized | 4 Comments

Banach-Tarski and the Axiom of Choice

Tomasz Kania and I recently coauthored a paper about Banach spaces. The paper makes extensive use of the axiom of choice, involving a transfinite induction in the proof of Theorem B as well as several appeals to the fact that every vector space admits a Hamel basis.

The axiom of choice is often misunderstood, as is many of its consequences. I often hear the Banach-Tarski ‘paradox’ being quoted as a philosophical argument against the truth of the axiom of choice. However, the statement of the Banach-Tarski theorem is not paradoxical at all. One way to state Banach-Tarski is:

  • The set of points in a closed unit ball can be rearranged into the disjoint union of two copies of the set of points in a closed unit ball through a finite sequence of operations of splitting a set [into a disjoint union of two subsets], rotating a set, translating a set, and recombining two disjoint sets into their union.

Note that the set of points in a closed unit ball is an [uncountably] infinite set. We are perfectly accustomed to the points in an infinite set being in bijective correspondence with two copies of the original set: for instance, the even integers and the odd integers are each isomorphic to the integers. So we could write the following uncontroversial statement which differs from the Banach-Tarski theorem in only the indicated locations:

  • The set of integers can be rearranged into the disjoint union of two copies of the set of integers through a finite sequence of operations of splitting a set [into a disjoint union of two subsets], applying an affine transformation to a set, and recombining two disjoint sets into their union.

In particular, we split the integers into the odd integers and the even integers, and affine-transform each of these sets into a copy of all of the integers. This is uncontroversial, and doesn’t require the axiom of choice. No-one would non-ironically argue that this implies the arithmetic statement 1 = 1 + 1, because it is intuitively obvious that the set of integers is infinite and that the only statement about cardinals that this immediately implies is that ℵ_0 = ℵ_0 + ℵ_0.

However, people often feel differently about the Banach-Tarski ‘paradox’ because a closed unit ball feels like a tangible solid object. It is often joked that the Banach-Tarski paradox can be used to duplicate approximately-spherical real-world objects, such as oranges, as was the subject of this rather bizarre music video:

Notwithstanding any physical intuition, a closed unit ball is nonetheless an uncountable set of points. The axiom of choice merely gives us extra freedom in manipulating certain infinite sets and is particularly useful for constructions that involve induction over uncountable sets.

The Banach-Tarski theorem is still mathematically interesting and nontrivial, unlike the statement we made about rearranging the integers. It proves that there is no translation-invariant rotation-invariant finitely-additive measure on \mathbb{R}^n (n >= 3), whereas such a measure does exist in n = 2 dimensions as proved by Banach.

The proof of Banach-Tarski is even more interesting than the statement, and is well worth reading. It relies on the fact that the free group on two generators can be embedded as a subgroup of SO(n) when n >= 3; this is not the case for n = 2.

But what Banach-Tarski certainly does not imply is this nonsense*:


A misunderstanding of Banach-Tarski

(*ordinarily I would be more polite when someone is wrong on the internet, but the author of this tweet has been engaging in a highly dubious trolling campaign. Tim Gowers has weighed in on the discussion with an informative thread.)

The constructible universe

There’s actually a more fundamental reason why the axiom of choice cannot possibly be blamed for any results in arithmetic, false or otherwise. Assuming ZF set theory, then inside the von Neumann universe V is a subclass** (which may or may not be the whole of V) called L, also known as Kurt Gödel’s constructible universe.

**like a subset, but too big to be a set in the set-theoretic sense of the word set.

L is very well-behaved. Firstly, it is an internally consistent model of ZF set theory. Moreover, this operation of taking the constructible universe is idempotent: if we take the constructible universe within L (instead of within V), we still get the whole of L. This means that L is a model of V=L, together with anything that logically follows from V=L such as the axiom of choice or the Generalised Continuum Hypothesis. That is to say that L unconditionally satisfies the axiom of choice even if the full universe V does not.

Finally, and importantly for us, the Shoenfield absoluteness theorem states that certain statements (namely those that are at most Σ^1_2 or Π^1_2 in the analytical hierarchy, which subsumes all statements in first-order Peano arithmetic) are true in V if and only if they are true in L.

In particular, if a statement about first-order Peano arithmetic is proved in ZFC, then the result is also true in ZF (because you can ‘run the proof inside L’ where the axiom of choice holds, and then use the Shoenfield absoluteness theorem to transfer the result back to V). Indeed, you can also freely assume anything else that follows from V=L, such as the truth of the Generalised Continuum Hypothesis. This meant that reliance on the axiom of choice could easily be removed from Wiles’s proof of FLT, for instance.

If the author of that tweet or anyone else managed to prove 2+2=5 using ZFC, then the proof could be modified to also operate in ZF without requiring choice. This would, of course, mean that ZF is inconsistent and mathematics would reënter a state of foundational crisis.

Anyway, this is something of a distraction from the main purpose of this post, which is to briefly discuss one of the many*** useful applications of the axiom of choice.

***other applications include proving Tychonoff’s theorem in topology, the compactness theorem for first-order logic, the existence of nontrivial ultrafilters, that every vector space has a Hamel basis, et cetera.

Transfinite induction

One equivalent form of the axiom of choice is that every set can be bijected with an ordinal. Ordinals have the property that every non-empty subset of an ordinal has a least element, which makes them ideal for inductive proofs: if you want to prove that a property P holds for all elements, you need only show that there isn’t a least counterexample.

An application of this is to be able to perform a step-by-step construction involving uncountably many ‘steps’. For example, a fun question is:

Can three-dimensional space be expressed as a union of disjoint circles?

Using a transfinite induction, it is possible to place each of the uncountably many circles one at a time, avoiding any previous circles and ensuring that every point in the ambient space has been accounted for. Peter Komjath described such a construction in an answer to the question when it was asked on MathOverflow:


It is worth emphasising that this uses the least ordinal of cardinality continuum. These ‘initial ordinals’ have the useful property that all previous ordinals are strictly smaller from a cardinality perspective. This means that at each stage in the transfinite induction, the number of circles that have already been emplaced is strictly lower than the cardinality of the continuum, so there’s plenty of room to insert another circle passing through a specified point. This same idea was used in the paper I coauthored with Tomasz Kania.

A generalisation of this problem which remains open is whether there exists such a partition where every pair of circles is pairwise linked. The Hopf fibration provides a solution where \mathbb{R}^3 is augmented with an extra point at infinity, where every pair of circles interlock in the same manner as these two rings of dodecahedra:


Without this point at infinity, though, the problem is much harder and has evaded solution. Transfinite induction can show that we can cover all but one point with disjoint linked circles, but there is no easy way to modify the proof to cover the last point.

Posted in Uncategorized | Leave a comment

Rational dodecahedron inscribed in unit sphere

Moritz Firsching asked in 2016 whether there exists a dodecahedron, combinatorially equivalent to a regular dodecahedron, with rational vertices lying on the unit sphere. The difficulty arises from the combination of three constraints:

  1. The twelve pentagonal faces must all be planar;
  2. The vertices must all be rational points;
  3. The vertices must all lie on the unit sphere.

Conditions 1 and 2 are realised by Schulz’s dodecahedron (shown in Firsching’s question). Conditions 1 and 3 are realised by a regular dodecahedron. But finding a solution that satisfies all three conditions simultaneously turns out to be hard. A computer-assisted search only found a single example, up to Möbius transformations of the unit sphere:


The vertices are obtained from the following set of integer lattice points:

iverts = [[ 1840720, 1798335, 0],
[ 2573375, 0, 0],
[ 773500, 1963500, -1472625],
[ 773500, 1963500, 1472625],
[ 794070, 323510, -2426325],
[ 794070, 323510, 2426325],
[ 905828, -2223396, -926415],
[ 905828, -2223396, 926415],
[ 1925420, -863940, -1472625],
[ 1925420, -863940, 1472625],
[-1840720, -1798335, 0],
[-2573375, 0, 0],
[ -773500, -1963500, 1472625],
[ -773500, -1963500, -1472625],
[ -794070, -323510, 2426325],
[ -794070, -323510, -2426325],
[ -905828, 2223396, 926415],
[ -905828, 2223396, -926415],
[-1925420, 863940, 1472625],
[-1925420, 863940, -1472625]]

by dividing everything by 2573375. The planarity of the pentagonal faces and the fact that the points lie on a radius-2573375 sphere are checked by this notebook.

The dodecahedron has an order-8 symmetry group (it has three orthogonal planes of reflectional symmetry) and is visually very close to being regular. The symmetry group was a consequence of the search methodology; I specifically looked for solutions with this symmetry group in order to narrow down the search space.

By polar reciprocation, this also answers the equivalent question: ‘does there exist an icosahedron with rational vertices that contains an inscribed unit sphere?’


Subject to the points being on the unit sphere, the planarity is equivalent to concyclicity. After stereographically projecting the unit sphere to the plane, which preserves rationality of points, the problem becomes:

Does there exist an embedding of the dodecahedral graph into the plane, with each vertex being a rational point, such that the five vertices of each face are all concyclic?

I restricted attention to symmetrical solutions invariant under reflection in each of the coordinate axes and under inversion in a circle centred on the origin:


(Note that the two points furthest from the origin are not shown in the above diagram.)

Here the red lines and circle are the axes of reflection and inversion, respectively. The three green circles are the necessary concyclicities, implying that the other nine pentagons are also concyclic by symmetry. These can be translated into Diophantine equations by using the fact that the four points {(x1, y1), (x2, y2), (x3, y3), (x4, y4)} are concyclic or collinear if and only if the following 4-by-4 determinant vanishes:

  • 1, x1, y1, x1² + y1²;
  • 1, x2, y2, x2² + y2²;
  • 1, x3, y3, x3² + y3²;
  • 1, x4, y4, x4² + y4²;

I wrote a program to search over integer values of a,b,c,d and check whether the values of x and y implied by the upper-right and upper-left green circles, respectively, are rational. If so, the program proceeds to check that the remaining (lower) green circle holds; if so, it reports a solution. The program reported several solutions, but they were all scalar multiples of a single ‘primitive’ solution and therefore correspond to the same polyhedron.

Do there exist any more solutions?

My search program (available here) only found a single primitive solution, even when the threshold was increased to N = 1000. This is not particularly conclusive: it’s often the case that Diophantine equations have solutions that are very few and far between, even if there are infinitely many solutions; the Pell equation is a simple example of this phenomenon.

EDIT: after running on a GPU with N = 10000, another two primitive solutions emerged.

Of course, if history is any guide, it’s entirely likely that Noam Elkies will storm in at any moment wearing a superhero cape and brandishing an infinite family of solutions to this problem.

Posted in Uncategorized | 4 Comments

Fast-growing functions revisited

There have been many exciting results proved by members of the Googology wiki, a website concerned with fast-growing functions. Some of the highlights include:

  • Wythagoras’s construction of an 18-state Turing machine which takes more than Graham’s number of steps to terminate.
  • LittlePeng9’s construction of a Turing machine which simulates the Buchholz hydra, showing that ‘BB(160,7)>>most of defined computable numbers’.

Recently, however, there have been claims that several of the results on cp4space pertaining to fast-growing functions have never been proved. For the record, here are the original (hitherto unpublished) proofs with all of the missing details filled in such that they are no longer ‘exercises for the reader’:

Theorem 1: TREE(3) > f^5(8), where f(n) := tree^n(7)


To prove this (see here for definitions), it is sufficient to exhibit a sequence of trees whose nodes are each coloured red, green, or blue, such that there is no colour-preserving embedding of an earlier tree into a later tree. We begin the sequence as follows:


The first tree is a single red node, and we henceforth have no red nodes in any of the later trees. The twelfth tree, T12, and all subsequent trees have a single green root node connected to:

  • n branches containing a single blue node attached to a single green leaf;
  • m branches containing a single blue leaf;
  • one or more green trees.

Trees T1 through to T11 can be seen to not embed into any tree of this form, so it remains to define the sequence {T12, ….} and show that none of these trees embeds into any later tree.

The sequence begins as follows:

T1 {}
T2 [[]]
T3 [()()]
T4 [((()))]
T5 ([(())][])
T6 ([(())](()))
T7 ([(())]()()())
T8 ([(())]()())
T9 ([(())]())
T10 ([(())])
T11 [(())]
T12 ([()][()][()][()][()][])
T13 ([()][()][()][()][()](()))
T14 ([()][()][()][()][()]()()())
T15 ([()][()][()][()][()]()())
T16 ([()][()][()][()][()]())
T17 ([()][()][()][()][()])
T18 ([()][()][()][()][][][][][][][][][])
T19 ([()][()][()][()][][][][][][][][](()))
T20 ([()][()][()][()][][][][][][][][]()()())
T21 ([()][()][()][()][][][][][][][][]()())
T22 ([()][()][()][()][][][][][][][][]())
T23 ([()][()][()][()][][][][][][][][])
T24 ([()][()][()][()][][][][][][][]X7)

where X7 is any (monochromatically green) tree on 7 vertices. This can be followed by the sequence:

T25 ([()][()][()][()][][][][][][][]X8)
T26 ([()][()][()][()][][][][][][][]X9)
T27 ([()][()][()][()][][][][][][][]X10)
T_(23 + tree(7)) ([()][()][()][()][][][][][][][]())

where X7, X8, X9, X10, … () is a maximal-length sequence for tree(7). If any earlier tree here embeds into a later tree, then the same must have been true in the maximal-length sequence for tree(7), contradicting the definition.

Consequently, we’ve already proved that TREE(3) > tree(7). We can then extend the sequence further by ‘burning’ another blue node (pair of square brackets):

T_(24 + tree(7)) ([()][()][()][()][][][][][][][])
T_(25 + tree(7)) ([()][()][()][()][][][][][][]Y1)

where Y1 is the first tree in the maximal-length sequence for tree(tree(7)). We proceed in the same way as before with a sequence of tree(tree(7)) terms, culminating in:

T_(23 + tree(7) + tree(tree(7))) ([()][()][()][()][][][][][][]())
T_(24 + tree(7) + tree(tree(7))) ([()][()][()][()][][][][][][])
T_(25 + tree(7) + tree(tree(7))) ([()][()][()][()][][][][][]Y2)

where Y2 is the first tree in the maximal-length sequence for tree(tree(tree(7))). Repeating this process, we eventually reach the tree:


at time 24 + tree(7) + tree(tree(7)) + … + tree^8(7). We can then make the next tree in our process have the form:


where we have created tree^8(7) new blue nodes by ‘burning’ a branch of the form [()]. The same argument as before allows us to reach the tree:


at time well beyond tree^(tree^8(7))(7). Repeating this outer iteration another three times gets us to the claimed bound.

To see that no tree embeds homeomorphically into a later tree, we note that if T precedes T’, then we have at least one of the following:

  • T contains more copies of [()] than T’;
  • T contains more copies of [] than T’;
  • The monochromatic green subtree of T precedes (in some tree(k) sequence) the monochromatic green subtree of T’;

where the latter condition was ensured by using a sequence for tree(k) which has this property by definition.

Theorem 2: tree(n) outgrows f_alpha(n) for every alpha preceding the small Veblen ordinal

This is implied by the statement in Harvey Friedman’s e-mails to the FOM mailing list:

One of our many finite forms of Kruskal's theorem asserts that

for all k, TREE[k] exists.

This function eventually dominates every provably recursive
function of the system ACA_0 + Pi12-BI.

and this is elaborated upon in Deedlit’s excellent MathOverflow answer.

Theorem 3: SSCG(4n + 3) >= SCG(n)

Conveniently, someone had e-mailed me to ask for an explicit proof of this statement, so I’ll include the reply I sent:

Suppose you have a sequence of N = SCG(n) (not necessarily simple)
subcubic graphs, (G_1, ..., G_N), such that G_i has at most i + n
vertices and no graph is a minor of a later graph.

Now, we let H_i be obtained from G_i by replacing each vertex with
a copy of the 4-vertex 3-edge Y-shaped graph, K3,1. For each pair
of adjacent vertices in the original graph, we join two as-yet-unused
'limbs' of the corresponding Y-shapes in the new graph. Then H_i is
simple and subcubic, and we can show that no graph is a minor of a
later graph. This is most easily shown by the fact that for subcubic
graphs, 'minor' is equivalent to 'topological minor', so it suffices
to show that no graph is homeomorphically embeddable into a later
graph. (Proof: if H_i embeds into H_j, the degree-3 vertices of H_i
must map to degree-3 vertices of H_j; this induces a homeomorphic
embedding from G_i to G_j.)

At this point we're almost done. The graphs (H_1, H_2, H_3, ...)
have at most (4n + 4, 4n + 8, 4n + 12, ...) vertices, respectively.
To prove the claim, we actually want a sequence with at most
(4n + 4, 4n + 5, 4n + 6, ...).

Now, since G_1 contains at least 2 vertices, it must contain at
least 1 edge which is not a self-loop (otherwise, every vertex has
degree <= 2 and we can add another edge, producing a more optimal
initial graph G_0, contradicting the assumption that our original
sequence of graphs is maximal). Consequently, H_1 contains a path
of vertices of degrees (3, 2, 2, 3). Let H_1' be the graph obtained
by contracting this path once, and H_1'' be the graph obtained by
contracting it again. Then the graphs (H_1, H_1', H_1'', H_2, H_3,
...) have the property that no graph is a minor of a later graph.

Hence, we can construct the sequence which begins:

- H_1 with at most 4n + 4 vertices;
- (H_1' + K_2) with at most 4n + 5 vertices;
- (H_1' + K_1 + K_1) with at most 4n + 6 vertices;
- (H_1' + K_1) with at most 4n + 7 vertices;
- H_1' with at most 4n + 8 vertices;
- (H_1'' + C_7) with at most 4n + 9 vertices;
- (H_1'' + C_6) with at most 4n + 10 vertices;
- (H_1'' + C_5) with at most 4n + 11 vertices;

where + denotes disjoint union of graphs. At this point, we've
'bought ourselves enough vertices' to continue the sequence as

- (H_2 + K_1 + K_1 + K_1 + K_1) with at most 4n + 12 vertices;
- (H_2 + K_1 + K_1 + K_1) with at most 4n + 13 vertices;
- (H_2 + K_1 + K_1) with at most 4n + 14 vertices;
- (H_2 + K_1) with at most 4n + 15 vertices;

- (H_3 + K_1 + K_1 + K_1 + K_1) with at most 4n + 16 vertices;
- (H_3 + K_1 + K_1 + K_1) with at most 4n + 17 vertices;
- (H_3 + K_1 + K_1) with at most 4n + 18 vertices;
- (H_3 + K_1) with at most 4n + 19 vertices;

and so forth. The property of no graph being a minor of a later
graph can easily be seen to be true, because at each iteration
we either have:

 - the 'big component' (the union of connected components which
contain at least one degree-3 vertex) gets strictly smaller (in
the poset of graphs under the minorship relation);
 - or the 'big component' stays identical and the 'small
component' gets strictly smaller;

and any homeomorphic embedding of one graph into another must map
the big component of the first graph into the big component of the
other graph.

I'm sure that there's enough slack in this proof to improve the
upper bound to SSCG(4n + 2) (by starting with H_1' instead of H_1),
but this proof suffices to show the claimed statement.

Theorem 4: SSCG(3) > TREE^(TREE^2(3))(3)

Again, we want to exhibit a sequence for SSCG(3) which exceeds TREE^(TREE^2(3))(3). We can begin as follows:


All subsequent graphs are the disjoint union of:

  • p copies of the double-square with leaves attached to two diametrically opposite degree-2 vertices;
  • q copies of the double-square with leaves attached to two degree-2 vertices which are neither adjacent nor diametrically opposite;
  • r copies of the double-square with a single leaf;
  • s copies of the double-square;
  • a graph T which does not contain the double-square as a graph minor.

For example, the last graph in the above sequence is (1, 0, 0, 1, Ø), where Ø is the graph with no vertices or edges.

We define a sequence such that if the graph (p, q, r, s, T) precedes the graph (p’, q’, r’, s’, T’) in the sequence we define, then we have one of the following:

  • p > p’;
  • p = p’ and q > q’;
  • (p, q) = (p’, q’) and r > r’;
  • (p, q, r) = (p’, q’, r’) and s > s’;
  • (p, q, r, s) = (p’, q’, r’, s’) and T is not a graph minor of T’.

With this convenient notation, we can write down some more terms of the sequence:

G18 (1, 0, 0, 0, 15-cycle)
G19 (1, 0, 0, 0, 14-cycle)
G29 (1, 0, 0, 0, square)
G30 (1, 0, 0, 0, triangle = H1)
G31 (1, 0, 0, 0, H2)
G32 (1, 0, 0, 0, H3)
G(2^(3 × 2^95) + 20) (1, 0, 0, 0, isolated_vertex)
G(2^(3 × 2^95) + 21) (1, 0, 0, 0, Ø)

where H1, H2, H3, … is the sequence of length 2^(3 × 2^95) − 9 that provides a lower bound for SSCG(2).

We can encode a rooted ordered coloured tree as a graph using the construction described here; this graph satisfies the properties required of T. The long initial segment of length 2^(3 × 2^95) + 21 means that we can use lots of vertices in the next step:

G(2^(3 × 2^95) + 22) (0, 2^(2^96), 0, 0, T1)

where T1 is the encoding of the first tree in a maximal-length sequence of rooted coloured trees for TREE(3). The ‘shrinking counter’ mechanism allows us to obtain a sequence of (l+7) TREE(3) graphs (none of which is a minor of any later graph) culminating in:

G(2^(3 × 2^95) + 22 + (l+7) TREE(3)) (0, 2^(2^96), 0, 0, Ø)
G(2^(3 × 2^95) + 23 + (l+7) TREE(3)) (0, 2^(2^96)-1, TREE(3), 0, U1)

where U1 is the encoding of the first tree in a maximal-length sequence of rooted coloured trees for TREE^2(3). At the end of the graph sequence corresponding to this tree sequence, we have the graph represented by the 5-tuple:

(0, 2^(2^96)-1, TREE(3), 0, Ø)

which can then be followed by:

(0, 2^(2^96)-1, TREE(3)-1, TREE^2(3), V1)

where V1 is the encoding of the first tree in a maximal-length sequence of rooted coloured trees for TREE^3(3). We can continue to repeat this same argument a total of TREE^2(3) times, decrementing the penultimate element of this 5-tuple each time, with the final iteration exceeding TREE^(TREE^2(3))(3) steps before we reach:

(0, 2^(2^96)-1, TREE(3)-1, 0, Ø)

This already proves what we set out to show, but it is clear that the iteration can go much further.

Posted in Fast-growing functions | 7 Comments

4-input 2-output Boolean circuits

In 2005, Donald Knuth determined the minimum cost required to implement each of the 2^32 different 5-input 1-output Boolean functions as a circuit composed entirely of:

  • 2-input gates (there are 16 of these), each of which has cost 1;
  • 1-input gates (there are only 2 of these, namely the identity function and the NOT gate), each of which has cost 0.

Given that NOT gates are free, every 2-input gate is either silly (ignoring one or both of its inputs) or is equivalent to either AND or XOR.

A previous cp4space post discusses how we can efficiently determine the NPN equivalence class (that is to say, the equivalence class of functions up to permuting and negating the inputs and/or output(s)) of a function given its truth table, and therefore to query a database of optimal circuits for the 616126 equivalence classes.

I decided to attempt the analogous feat for 4-input 2-output functions. There are still 2^32 of them, but the equivalence classes are smaller:

  • For 5-input 1-output functions, the symmetry group (by whose action we’re quotienting) has 2^5 × 5! × 1! × 2^1 = 7680 elements;
  • For 4-input 2-output functions, the symmetry group has 2^4 × 4! × 2! × 2^2 = 3072 elements;

and, as such, there are more equivalence classes: 1476218, as mentioned here. This number can be calculated exactly using Burnside’s lemma, or approximated from below using the ceiling of the leading-order term: ceil(2^32 / 3072) = 1398102.

The breadth-first search

With about 3750 CPU core-hours and a lot of memory usage, I was able to determine the optimal circuits for all of these 1476218 equivalence classes of 4-input 2-output Boolean functions. The number of classes and functions of each cost are tabulated below:


Representatives of the four classes of cost 0, for example, are:

  • f(x1, x2, x3, x4) = (0, 0);
  • f(x1, x2, x3, x4) = (0, x1);
  • f(x1, x2, x3, x4) = (x1, x1);
  • f(x1, x2, x3, x4) = (x1, x2);

and representatives for the eight classes of cost 1 are:

  • f(x1, x2, x3, x4) = (0, x1 AND x2);
  • f(x1, x2, x3, x4) = (0, x1 XOR x2);
  • f(x1, x2, x3, x4) = (x1, x1 AND x2);
  • f(x1, x2, x3, x4) = (x1, x1 XOR x2);
  • f(x1, x2, x3, x4) = (x3, x1 AND x2);
  • f(x1, x2, x3, x4) = (x3, x1 XOR x2);
  • f(x1, x2, x3, x4) = (x1 AND x2, x1 AND x2);
  • f(x1, x2, x3, x4) = (x1 XOR x2, x1 XOR x2).

The methodology was that of a breadth-first search, taking advantage of symmetry to vastly reduce the search space. The search up to depth 8, described here, was conducted using a multithreaded program (taking 115 core-hours), outputting a hefty 27-gigabyte file containing the entire search tree.

Each node in the tree at depth n is an equivalence class of sets of n distinct (from each other and their complements) nontrivial 4-input 1-output functions which can be implemented with minimum cost exactly n. Intuitively, the nodes in the tree represent initial segments of circuits, up to equivalence. Even though the tree grows super-exponentially* as a function of depth, it was still possible to explicitly compute and store the first eight levels:


  • 1 node at depth 0;
  • 2 nodes at depth 1;
  • 15 nodes at depth 2;
  • 156 nodes at depth 3;
  • 2396 nodes at depth 4;
  • 50865 nodes at depth 5;
  • 1376962 nodes at depth 6;
  • 45189111 nodes at depth 7;
  • 1733295202 nodes at depth 8.

* technically, the entire tree is finite, because there are only finitely many sets of distinct 4-input Boolean functions, so ‘super-exponentially’ does not apply asymptotically. This is the same (very pedantic!) reason that it’s hard to make precise the notion that the number of positions in a Rubik’s cube that require k moves to solve ‘grows exponentially’ as a function of k — there are only finitely many positions, and Tom Rokicki showed that indeed the tree stops at depth 20. It’s also why a population of bacteria or viruses can’t grow exponentially forever: they run out of limited resources, and the volume that they occupy is bounded above by the sphere which contains the population at time 0 and expands at the speed of light — an upper bound on volume which is cubic as a function of time!


From there, I checked every irreducible way to append 2 more gates to one of these circuits, expanding the search to depth 10, and keeping track of the optimal solutions for each of the equivalence classes obtained in this manner. All but nineteen of the equivalence classes were solved with circuits of cost <= 10, providing a lower bound of 11 for those nineteen difficult classes. This lower bound of 11 turned out to be achievable in all nineteen cases, thereby conclusively answering the question.

Partially verifying the results

How do we check that the results are correct?

In addition to computing the minimum cost of each of these equivalence classes, the search yielded explicit witnesses (circuits that achieve the minimum cost). A much simpler program can run through all 2^32 functions and verify that these witnesses indeed implement the purported functions. This verifies that the results are correct in one direction, showing that they are correct upper bounds for the minimum cost of each gate.

I can’t see any way to fully verify that they are correct lower bounds, without someone else independently running an exhaustive search themselves (ideally using a different algorithm) and checking that the results match the table above. This is because it’s easy to verify that a circuit is correct, but hard to verify that it’s minimum-cost.

That said, there was a certain amount of self-verification when running the breadth-first search: for example, creating an on-disk file of depth 6 and extending it by 2 gates produced identical results to the on-disk file of depth 8. (In an earlier attempt, I’d overlooked an edge-case, and these results didn’t agree — this was how I was able to catch the bug.)

Posted in Boolean optimisation | 2 Comments

That group of order 348364800

In nested lattices, we talked about the E8 lattice and its order-696729600 group of origin-preserving symmetries. In minimalistic quantum computation, we saw that this group of 8-by-8 real orthogonal matrices is generated by a set of matrices which are easily describable in terms of quantum gates.

However, something that’s important to note is that quantum states are equivalence classes of vectors, where scalar multiples of vectors are identified. That is to say, the state space of n qubits is the complex projective space (\mathbb{C}^{2^n} \setminus \{ 0 \}) / (\mathbb{C} \setminus \{ 0 \}). So, if U is a unitary matrix, then all unitary matrices of the form e^{i \theta} U induce the same transformation.

Consequently, the order-696729600 subgroup of O(8) is not exactly the group of transformations in which we’re interested. Rather, we’re interested in its quotient by its centre {±I}. The resulting order-348364800 group G turns out to be very easy to describe without having to mention the E8 lattice!

G is isomorphic to the group of 8-by-8 matrices over the field of two elements which preserve the quadratic form \sum_{i \leq j} x_i x_j

That is to say, each of the 256 binary vectors of length 8 can be classified as either:

  • odd norm, if the number of ‘1’s in the vector is either 1 or 2 (modulo 4);
  • even norm, if the number of ‘1’s in the vector is either 0 or 3 (modulo 4).

(Equivalently, the norm is the parity of c(c+1)/2, where c is the popcount (number of ‘1’s) in the binary vector.)

Then G is the group of invertible 8-by-8 binary matrices U which are norm-preserving, i.e. Ux has the same norm as x for all 256 of the choices of vector x.


To understand this isomorphism, we need to return to the description in terms of the E8 lattice, Λ. We’ll form a quotient of this lattice by a copy of the lattice scaled by a factor of two — the resulting set Λ / 2Λ contains 256 points which form an 8-dimensional vector space over the field of two elements!

Moreover, this set Λ / 2Λ consists of:

  • 1 zero point (of even norm);
  • 120 = 240 / 2 nonzero points of odd norm (each of which corresponds to a pair of antipodal vectors in the first shell of the E8 lattice);
  • 135 = 2160 / 16 nonzero points of even norm (each of which corresponds to an orthoplex of 16 vectors in the second shell of the E8 lattice).

There isn’t an orthonormal basis for this set, so we choose the next best thing: a basis of 8 odd-norm vectors which are pairwise at 60º angles from each other! These, together with the origin, form the 9 vertices of a regular simplex in the E8 lattice. This choice of basis results in the norm having the simple expression we described earlier.

For concreteness, we’ll give an explicit set of vectors:

  • e_0 = (½, ½, ½, ½, ½, ½, ½, ½);
  • e_1 = (1, 1, 0, 0, 0, 0, 0, 0);
  • e_2 = (1, 0, 1, 0, 0, 0, 0, 0);
  • e_3 = (1, 0, 0, 1, 0, 0, 0, 0);
  • e_4 = (1, 0, 0, 0, 1, 0, 0, 0);
  • e_5 = (1, 0, 0, 0, 0, 1, 0, 0);
  • e_6 = (1, 0, 0, 0, 0, 0, 1, 0);
  • e_7 = (1, 0, 0, 0, 0, 0, 0, 1);

These satisfy 〈e_i, e_j〉 = 1 + δ_ij, where δ_ij is the Kronecker delta.

Computational convenience

In the article on minimalistic quantum computation, we mentioned that the choice of gates with dyadic rational coefficients were particularly convenient, as the matrix entries are exactly representable as IEEE 754 floating-point numbers. For the three-qubit case, this binary matrix representation is vastly more convenient still!

Firstly, an 8-by-8 binary matrix can be stored in a 64-bit processor register. Composing rotations is then reduced to multiplying these binary matrices.

Certain processor architectures support the operation of multiplying two 8-by-8 binary matrices in a single processor instruction! Knuth’s MMIX architecture calls this instruction ‘MXOR’, and there are at least two commercial computer architectures which also support this instruction: the original Cray supercomputer had this for government cryptographic reasons, and (up to one of the operands being transposed in the process) there is a vectorised implementation in the GFNI extension for x86_64. There was also a proposal for an extension for RISC-V with this instruction.

For instruction sets which don’t support MXOR in hardware, you can achieve the same effect with a bunch of bitwise manipulations. Challenge: try to implement MXOR in as few operations as possible using only bitwise Boolean operations and logical shifts. If I counted correctly, I can manage it in 113 total operations (73 Boolean operations and 40 shifts), but I expect it’s possible to do better than this.

What about inverting one of these matrices in G? It turns out that the exponent of G (the lowest common multiple of the orders of the elements) is 2520, so we can invert an element by raising it to the power of 2519. Using a technique which generalises repeated squaring, this can be accomplished by a chain of 15 MXOR instructions.

Posted in Uncategorized | 3 Comments

More quantum gates and lattices

The previous post ended with unanswered questions about describing the Conway group, Co0, in terms of quantum gates with dyadic rational coefficients. It turned out to be easier than expected, although the construction is much more complicated than the counterpart in the previous post.

We’ll start with a construction for a much smaller simple group — the order-168 group PSL(2, 7) — which can be generated by three CNOT gates acting on the three lower qubits in the diagram below:


For the moment, ignore the two unused upper qubits; their purpose will become clear soon enough. The three lower qubits have eight computational basis states (000, 001, 010, 011, 100, 101, 110, and 111). Viewing each of these states as a three-dimensional vector over \mathbb{F}_2, the CNOT gates induce linear shearing maps (and together generate the full 168-element group of linear automorphisms).

Now we introduce another two CNOT gates:


These act only on the two upper qubits, and generate the symmetric group S_3 (freely permuting the states 01, 10, and 11). These two gates, together with the three gates in the previous diagram, therefore generate the direct product S_3 \times PSL(2, 7) of order 1008.

It is helpful to partition the 32 computational basis states into four 8-element sets:

  • the set W consisting of all basis states of the form 00xyz;
  • the set A consisting of all basis states of the form 01xyz;
  • the set B consisting of all basis states of the form 10xyz;
  • the set C consisting of all basis states of the form 11xyz;

where x, y, z are elements of {0, 1}. Then the two gates on the upper qubits are responsible for bodily permuting the three sets {A, B, C}; the three gates on the lower qubits induce the same linear permutation in each of W, A, B, and C (viewed as 3-dimensional vector spaces over the field of two elements).

Note that the permutation group is transitive on the 8-element set W, and transitive on the 24-element set V (the complement of W, or equivalently the union of A, B, and C), but no elements of V are ever interchanged with elements of W. This will remain the case as we continue to add further gates.

For the time being, we’ll therefore suspend thinking about the smaller 8-element set W, and concentrate on the larger 24-element set V.

We now introduce a sixth CNOT gate, bridging the upper and lower qubits:


This now expands the size of the group by a factor of 64, resulting in a 64512-element group called the trio group. The vector spaces A, B, and C are effectively upgraded into affine spaces which can be semi-independently translated (the constraint is that the images of their ‘origins’ — 01000, 10000, and 11000 — always have a modulo-2 sum of zero).

The trio group, considered as a permutation group on the 24 basis states in V, is a maximal subgroup of the simple sporadic group M24. That means that adding a single further gate to break out of the trio group will immediately upgrade us to the 244823040-element Mathieu group! Unfortunately, I wasn’t able to find a particularly simple choice of gate:


The effect of this complicated gate is to do the following:

  • Within each of the sets W and A, apply a particular non-affine permutation which exchanges the eight elements by treating them as four ‘pairs’ and swapping each element with its partner:
    • 000 ⇔ 100;
    • 001 ⇔ 110;
    • 010 ⇔ 111;
    • 011 ⇔ 101;
  • Swap four elements of B with four elements of C by means of the Fredkin gate on the far right.

In terms of its action on V, this is an element of M24, but does not belong to the trio group. Interestingly, it belongs to many of the other important subgroups of M24 — namely PSL(3, 4) (also called M21), the ‘sextet group’, and the ‘octad group’. At this point, the group generated by these seven gates is now the direct product of the alternating group A8 (acting on the set W) and the Mathieu group M24 (acting on the set V).

The elements of the Mathieu group are permutations on the set V, which can be viewed as 24-by-24 permutation matrices. Permutation matrices are matrices with entries in {0, 1}, where there’s exactly one ‘1’ in each row and column. If we throw in a Z-gate acting on the uppermost qubit, we’ll expand this to a group 212:M24 of ‘signed permutation matrices’ instead, where some of the ‘1’s are replaced with ‘−1’s. (The sets of rows and columns containing negative entries must form codewords of the binary Golay code; this is why each permutation matrix has only 212 sign choices instead of 224.) This group is interesting inasmuch as it’s the group of permutations and bit-flips which preserve the binary Golay code. It’s also a maximal subgroup, called the monomial subgroup, of the Conway group Co0.

Instead of using this Z-gate, we’ll bypass the monomial subgroup and jump straight to the Conway group by adding a single additional three-qubit gate:


By a slight abuse of notation, we’ll reuse the symbols W and V to refer to the vector spaces spanned by first 8 basis states and final 24 basis states, respectively. After introducing this final gate, the resulting group of 32-by-32 matrices is a direct product of:

  • an order-348364800 group (the orientation-preserving index-2 subgroup of the Weyl group of E8) acting on the 8-dimensional vector space W;
  • an order-8315553613086720000 group (the Conway group, Co0) acting on the 24-dimensional vector space V.

These are the groups of rotational symmetries of the two most remarkable* Euclidean lattices — the E8 lattice and the Leech lattice, respectively. Indeed, if we take all linear combinations (with integer coefficients!) of the images (under the group we’ve constructed) of a particular computational basis state, then we recover either the E8 lattice or the Leech lattice (depending on whether we used one of the basis states in W or in V).

* as well as being highly symmetric, they give the optimal packings of unit spheres in 8- and 24-dimensional space, as proved by Maryna Viazovska.

These two groups each have a centre of order 2 (consisting of the identity matrix and its negation), modulo which they’re simple groups:

  • the quotient of the order-348364800 group by its centre is PSΩ(8, 2);
  • the quotient of Co0 by its centre is the sporadic simple group Co1.

This process of quotienting by the centre is especially natural in this quantum gate formulation, as scalar multiples of a vector correspond to the same quantum state.

Further connections

If we take n \geq 4 qubits and the gates mentioned in the previous post, we remarked that the group generated is the full rotation group SO(2^n). If instead we replace the Toffoli gate in our arsenal with a CNOT gate, we get exactly the symmetry groups of the Barnes-Wall lattices! The orders of the groups are enumerated in sequence A014115 of the OEIS.


Thanks go to Conway and Sloane for their magnificent and thoroughly illuminating book, Sphere Packings, Lattices, and Groups. Also, the website (which is undoubtedly the most useful thing to have ever been written in JavaScript) was helpful for creating the illustrations herein.

Posted in Uncategorized | 1 Comment

Minimalistic quantum computation

In the usual ‘circuit model’ of quantum computation, we have a fixed number of qubits, {q1, q2, …, qn}, and allow quantum gates to act on these qubits. The diagram below shows a Toffoli gate on the left, and an equivalent circuit of simpler gates on the right:


These diagrams represent qubits as horizontal lines (so there are 3 qubits in the circuit above), and the operations are applied from left to right. The circuit has 6 controlled-NOT gates (each acting on an ordered pair of qubits) and 9 single-qubit gates (4 T-gates, 3 inverse T-gates, and 2 Hadamard gates).

Whereas the internal state of a classical computer with n bits of memory can be described by a length-n vector of binary values, a quantum computer with n qubits of memory requires a length-(2^n) vector of complex numbers. A k-qubit gate is a unitary linear map described by a (2^k)-by-(2^k) matrix of complex numbers.

Importantly, it’s the exponential increase in the dimension of this vector (from n to 2^n), and not the involvement of complex numbers, which makes quantum computers [believed to be] able to solve more problems [in polynomial time] than is possible with a mere classical computer. To see this is the case, note that a (2^k)-by-(2^k) matrix of complex numbers can be emulated by a (2^(k+1))-by-(2^(k+1)) matrix of real numbers. Specifically, replace each complex entry with a real 2-by-2 block:


Consequently, a k-qubit complex gate can be emulated with a (k+1)-qubit real gate.

Indeed, it’s possible to restrict to not only real entries, but even to dyadic rational entries. Specifically, the most common universal set of logic gates {T, H, CNOT} consists of matrices whose entries belong to the ring \mathbb{Z}[\frac{1}{2}, \zeta] where ζ is a primitive eighth root of unity. A similar trick means we can work over the ring of dyadic rationals instead, at the cost of just two extra qubits:


This is helpful for simulation in software: all finite values representable as IEEE 754 floating point numbers are dyadic rationals, and a partial converse is true: all dyadic rationals with numerator and denominator less than some bound (2^53 for double-precision and 2^24 for single-precision) are representable as IEEE 754 floating point numbers.

A universal pair of 3-qubit dyadic rational gates

Consider the Toffoli gate (left) and a new gate that I’m going to call the ‘XHH’ gate (it’s simply a tensor product of a Pauli X-gate and two Hadamard gates, all acting on separate qubits):


In an n-qubit circuit, each of these gates yields n(n – 1)(n – 2)/2 different matrices depending on the choice of qubits on which it acts, so this set expands to n(n – 1)(n – 2) matrices in total. Then we have the following universality theorem:

When n >= 4, the group generated by these matrices is a dense subgroup of the complete rotation group SO(2^n).

This is the best that we could hope for: when tensored by (n – 3) copies of the 2-by-2 identity matrix, these gates yield orthogonal matrices of determinant 1. It means that any special orthogonal gate can be approximated arbitrarily closely (in a number of gates polylogarithmic in the required precision, by the Solovay-Kitaev theorem), which (together with the above discussion of emulating complex unitary gates with real orthogonal gates) yields universal quantum computation.

An eight-dimensional surprise

More interesting is what happens when n = 3: the gates do not form a dense subgroup of O(8) as we might expect from extrapolating this result downwards (and noting that the Toffoli matrix has determinant -1, so the matrices lie in O(8) instead of SO(8) when n = 3).

Rather, they form a finite group of order 696729600.

This number should be familiar from the last post, because it’s the order of the E8 Weyl group. Every entry of every matrix in this group is not only dyadic rational, but in fact an integer multiple of 1/4. Inter alia, this finite group contains the familiar single-qubit Pauli gates X and Z, as well as the two-qubit CNOT gate.


Orthogonal projection of the 8-dimensional E8 root system into its 2-dimensional Coxeter plane, drawn by J. G. Moxness. The symmetry group of this set of 240 points is the aforementioned E8 Weyl group of order 696729600.

Given a starting state where all qubits are off (so the vector is (1, 0, 0, 0, 0, 0, 0, 0)), by applying these two gates in various combinations it is possible to reach any of 2160 different vectors (specifically, rescaled copies of the 2160 vectors in the second shell of the E8 lattice). If we instead began with an equal superposition of the eight computational basis states, there would be 240 reachable vectors — the root lattice vectors illustrated in the picture above! Again, the numbers 240 and 2160 should be very familiar from the previous article.

(Vectors that are related by scalar multiplication are identified as the same quantum state, so antipodal pairs of vectors in the previous paragraph correspond to the same quantum state. Consequently, there are only 1080 distinct quantum states reachable from the all-off quantum state, or 120 distinct quantum states reachable from the )

Before I found the set {Toffoli, XHH}, I tried the deceptively similar pair {Toffoli, ZHH}. To my surprise, the program I used to compute the group generated by those elements had an order of 2903040 — significantly smaller than the 696729600-element group I had expected! This is the Weyl group of E7, and in this case we get the smaller subgroup because all six matrices fix the vector (1, 1, 1, -1, 1, -1, -1, -1) and therefore only act nontrivially on its seven-dimensional orthogonal complement. Fortunately, the set {Toffoli, XHH} does not have a fixed vector and generates the full Weyl group of E8.

Unanswered questions

The {Toffoli, ZHH} construction of the E7 Weyl group demonstrates that we can realise the origin-preserving isometry group of a 7-dimensional lattice, even though 7 is not a power of two. An even more beautiful and exceptional lattice than the E8 lattice is the 24-dimensional Leech lattice, whose origin-preserving symmetry group is the Conway group Co0. Is there an elegant set of matrices which generate a group isomorphic to Co0 and have a simple description in terms of quantum gates?

Edit: depending on whether you’d consider it elegant and/or simple, there’s an affirmative answer in the next post.

The first nonempty shell of the Leech lattice consists of 196560 points, as opposed to the 240 points in the first shell of the E8 lattice. David Madore has plotted some beautiful projections of this set — they’re as close as possible to being analogous to the Petrie projection of the E8 root system shown above, except inasmuch as Co0 is not a Coxeter group and therefore it’s unclear which plane (if any!) is analogous to the Coxeter plane.

Posted in Uncategorized | 10 Comments

Nested lattices

1, 240, 2160, 6720, 17520, 30240, 60480, 82560, 140400, …

These terms count the number of points at distance \sqrt{2n} from the origin in the E8 lattice, a highly symmetric arrangement of points which Maryna Viazovska recently (in 2016) proved is the densest way to pack spheres in 8-dimensional space.

Even more recently, Warren D. Smith noticed a rather exceptional numerical coincidence — the sum of the first three terms in this sequence is a perfect fourth power:

1 + 240 + 2160 = 2401 = 7^4

Is this merely a coincidence? To begin with, we’ll look at the E8 lattice through a different lens: as a subset not of \mathbb{R}^8, but of complex (alas, not projective) 4-space, \mathbb{C}^4. Specifically, recall the Eisenstein integers, the ring of complex numbers generated by a cube root of unity:


These points have been 3-coloured according to whether the Eisenstein integer is congruent (modulo \sqrt{-3} ) to 0, +1, or −1. The E8 lattice is then concisely expressible as the set of points (w, x, y, z) where:

  • each coordinate is an Eisenstein integer;
  • the colours of the four coordinates form a codeword in the tetracode.

The tetracode is a 2-dimensional subspace of the 4-dimensional vector space \mathbb{F}_3^4. A point (w, x, y, z) \in \mathbb{F}_3^4 belongs to the tetracode if and only if w = y − x = z − y = x − z. Equivalently, the coordinates are of the form (d, a, a + d, a − d).

The real E8 lattice has a symmetry group of order 696729600; this complex E8 lattice has more structure (complex scalar multiplication) which reduces the symmetry group to order 155520. This extra structure is what we need to explain Warren’s coincidence.

Now that we have defined the E8 lattice as a complex 4-dimensional lattice, consider scalar multiplication by 3 + ω. Returning to an 8-dimensional real viewpoint, this linear map corresponds to composing a rotation by a scaling by sqrt(7), and therefore has a determinant of sqrt(7)^8 = 2401. The image of the E8 lattice under this map is a sublattice, geometrically similar to the original E8 lattice, and the Voronoi cells induced by this sublattice each consist of 2401 points from the original lattice (namely one central point, surrounded by an inner shell of 240 points and an outer shell of 2160 points).

Linear subspaces and positional number systems

If we sum the first three terms of the theta series of the E6 lattice, 1 + 72 + 270, we get exactly 7^3. And similarly for the D4 lattice, 1 + 24 + 24 = 7^2. For the A2 (hexagonal) lattice, we have 1 + 0 + 6 = 7^1. These correspond to taking 3-, 2-, and 1-dimensional (complex) linear subspaces of the Voronoi partition described above.

The latter is particularly easy to describe: every Eisenstein integer z can be uniquely expressed in the form (3 + ω)q + r, where q is an Eisenstein integer and r belongs to the set {0, 1, ω, ω², 1, ω, ω²} consisting of zero together with the sixth roots of unity. We can decompose q in the same manner, and continue recursively; this is tantamount to writing an Eisenstein integer in a positional number system with radix 3 + ω and the seven digits {0, 1, ω, ω², 1, ω, ω²}.

If we allow digits after the ‘decimal point’, we can express any complex number in this positional number system. The set of numbers which ’round to zero’ after discarding the digits after the decimal point form the Gosper island, a self-similar set with a fractal boundary:


There are two different principal cube roots of unity, and each one yields a different (albeit isomorphic) positional number system for complex numbers.

The Hurwitz integral quaternions form a scaled copy of the D4 lattice, and we similarly obtain a positional number system for the quaternions with radix 3 + ω and 49 different digits: the Hurwitz integral quaternions with squared norm at most 2. There are 8 principal cube-roots of unity, and each one determines a positional number system for quaternions. It’s worth commenting that this isn’t the only nice* number system for integral quaternions: radix-(2 + i) with 25 digits (the Hurwitz integers of norm at most 1) also works.

*where ‘nice’ is defined to mean that the digit set consists of all integers with norm <= r for some value of r.

Finally, the Cayley integer octonions form a scaled copy of the E8 lattice, and there are 56 principal cube-roots of unity. Any one of these results in a positional number system for the octonions (which is well-defined despite the non-associativity of the octonions, since any pair of octonions together generate an associative algebra) with radix 3 + ω and 2401 different digits: the integer octonions with squared norm at most 2. We’ll call this number system ‘Warrenary’ after its discoverer. Unlike in the real, complex, and quaternionic cases, Warrenary is the unique positional number system for the octonions satisfying the aforementioned niceness property.

There is no analogue in six dimensions: normed division algebras only exist in dimensions 1, 2, 4, and 8, so there is no positional number system corresponding to the recursive nesting of E6 lattices.

Disappointingly, there appears not to be any similarly elegant lattice nestings for higher-dimensional lattices beyond D4, E6 and E8, such as the 24-dimensional Leech lattice (in particular, no early initial segment of the theta series yields a perfect 12th power as its sum). As such, the recursive nesting of E8 lattices is quite exceptional indeed.

Further reading

The integral quaternions and octonions have many other fascinating and elegant properties, including analogues of unique prime factorisation, which are explained in the book On Quaternions and Octonions by Derek Smith and the late, great John Conway (1937 — 2020).

Positional number systems for the real and complex numbers are described in the Seminumerical Algorithms volume of Donald Knuth’s The Art of Computer Programming.

Posted in Uncategorized | 3 Comments