Deep Learning with the Analytical Engine

In December we held a hackathon to celebrate the 200th birthday of Ada Lovelace. In the last post, the results of the baking competition were mentioned along with our efforts to program the Analytical Engine emulator. Although I briefly alluded to a project I undertook last month which built upon these ideas, the implementation was only completed a few hours before I had to present it in a talk. As such, the actual results are hitherto undocumented, except in the deepest recesses of a GitLab repository.

The project was to implement a machine-learning algorithm within the confines of the Analytical Engine (which amounts to about 20 kilobytes of memory and a very minimalistic instruction set). If you have 35 minutes to spare, here is my talk introducing the project. My apologies for the hesitant beginning; the talk was unscripted:

It transpired that, at the time of the talk, there were a couple of mistakes in the implementation; I was later able to detect and remove these bugs by writing additional Analytical Engine code to facilitate debugging. After making these modifications, the machine-learning code learned how to recognise handwritten digits surprisingly well.

As mentioned in the talk, a simple neural network would occupy too much memory, so I implemented a deep convolutional network instead (with extra optimisations such as using the cross-entropy cost function and L2 regularisation in the final fully-connected layer). Specifically, there are 5 feature-maps in the first convolutional layer, and 10 in the second convolutional layer:

convnet-complete

Code generation was performed by a combination of a Bash script (for overall orchestration) and several Python scripts (for tasks such as generating code for each layer, implementing a call-stack to enable subroutines, and including the image data in the program). The generated code comprises 412663 lines, each one corresponding to a punched card for the actual Analytical Engine. Even if the punched cards were each just 2mm thick, the program would be as tall as the Burj Khalifa in Dubai!

The following progress report shows a dot for an incorrect guess and an asterisk for a correct guess; it is evident that the code is rapidly learning:

first-epoch

You can see during the first training epoch that its performance increases from 10% (which is expected since an untrained network equates to random guessing, and there are 10 distinct digits {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}) to 79% from training on 5000 images. This was followed by a testing phase, in which a different set of 1000 images were presented to the network without the labels; unlike in the training phase, the network isn’t given the labels for the testing images so it cannot learn from the test and subsequently ‘cheat’. The testing performance was 79.1%, broadly in line with its performance on the training data towards the end of the epoch as one would expect.

We then repeat this process of training on 5000 images and evaluation on 1000 images. The performance improved again, from 79.1% to 84.9%. After the third epoch, it had increased to 86.8%, and by the end of the ninth epoch it had exceeded 93%. At that point I accidentally switched off my remote server, so the results end there. If you set the program running on your computer, it should overtake this point after about 24 hours.

Theoretically the architecture should eventually reach 98.5% accuracy if trained on 50000 images (confirmed by implementing the same algorithm in C++, which runs about 1000 times more quickly); I urge you to try it yourself. Complete source code is available on the Deep Learning with the Analytical Engine repository together with additional information and links to various resources.

Posted in Uncategorized | 1 Comment

Lovelace hackathon: results

In the previous post, I mentioned the hackathon we were organising to celebrate the 200th birthday of Ada Lovelace. The event was a huge success, and it is safe to say that all attendees had a most enjoyable, creative and productive time. For posterity, here is a recount of the various aspects of the hackathon.

The baking competition

Many of the attendees produced delicious treats in advance of the hackathon. These were designed to sustain us throughout the latter half of the event (i.e. after the café upstairs closed at 16:00). Everyone else constituted a committee of ‘impartial judges’ who would decide amongst themselves the winning entry. But before I announce the results, here are the entries:

Edible Tantrix tiles (Simon Rawles, Deepa Shah, Adam P. Goucher):

edible-tantrix

We only had three colours of icing and an insufficient amount of gingerbread to constitute a standard-issue Tantrix set (cf. the decidedly inedible tiles surrounding the plate). Also, I was responsible for cutting the hexagons and was in something of a rush due to having to catch a bus back to central Cambridge in time for a feast.

Hackathon title (Jade-Amanda Laporte, Jardin-Alice Laporte)

laporte-title

Delicious, pertinent, and with a stylised ‘200’ resembling the digits embossed on the wheels of the Analytical Engine.

Miscellaneous shapes of variable genus (origin unknown)

variable-genus

Definitely the tastiest three-holed torus I’ve encountered.

Hyperbolic tessellation (Tim Hutton, Neela Hutton)

hyperbolic-tiling

Somewhere between the Poincare and Beltrami-Klein models.

The results

The impartial judges were far too terrified to vote! Due to fear of offending the other entries, they didn’t reach any conclusion! As such, no entry was declared the winner, much to the disappointment of the contestants.

Programming the emulator

We succeeded in implementing an absolute-value function and (later) an activation function for a neural network, verifying its correctness using the Analytical Engine Emulator.

kali-aes

After a while, we managed to get an actual plot of the activation function using the Curve Plotting Apparatus (which Babbage proposed as part of his design for the Analytical Engine). This research eventually culminated in a project to realise a machine learning algorithm on the Analytical Engine (cp4space post coming soon!).

Posted in Uncategorized | 1 Comment

Ada Lovelace Day: 10th December 2015

After several months of planning, I would like to announce that there will indeed be an event in Cambridge to celebrate the 200th birthday of Ada Lovelace (10th December 2015):

http://www.cambridgelovelace.org/

The organising committee is composed entirely of volunteers, so the event is completely free to attend: just turn up on the day! Details, including the date, time and location, are on the official website in addition to the Facebook event page.

We hope to see you there!

Posted in Uncategorized | Leave a comment

Superflip composed with fourspot

Somehow, it slipped past my radar that Tomas Rokicki and Morley Davidson have established the maximum number of quarter-turns necessary to solve a Rubik’s cube (the more widely-popularised figure of 20 was for the half-turn metric). That is to say, any of the six faces can be rotated by 90° clockwise or anticlockwise in a single move.

We can consider the Cayley graph, which is 12-regular (since at each position, there are 12 possible moves one can make) and bipartite; this is most easily seen by noting that a quarter-turn induces an odd permutation (a 4-cycle) on the eight vertices of the cube. Then the maximum number of moves necessary is, by definition, the diameter of the Cayley graph.

Anyway, the diameter of the graph is 26, and surprisingly there are very few positions which take 26 moves to solve (compared with billions of distance-20 positions in the half-turn metric). Indeed, it is conjectured that there are only three such positions, or one up to isomorphism: the so-called superflip composed with fourspot. It is worth explaining this terminology.

The superflip is the unique non-identity element in the centre of the Rubik’s cube group where every cell remains in its original position, but with all 12 edge cells reversed. Since it commutes with every element, the order of composing it with fourspot is not important.

The fourspot is a common pattern where four of the face cells undergo a derangement, and all other cells remain unchanged. Somehow, the fourspot has acquired something of a cult following, and even boasts its own music video:

In other news, the Erdos discrepancy problem was recently solved by Terry Tao. There is a natural way to turn this into a two-player game. Specifically, Alice and Bob take turns colouring positive integers (cobalt blue for Alice and moss green for Bob), and the game terminates when there exists a progression {n, 2n, 3n, …, kn} (for some positive integers k and n) such that the difference between the number of cobalt blue elements and moss green elements in that progression exceeds d (a predetermined constant that determines the difficulty of winning).

Also, there’s a petition for LEGO to produce a Lovelace and Babbage themed set. You’re strongly encouraged to support the petition; more information is available over at the Aperiodical. This is currently in the limelight due to Ada Lovelace’s impending 200th birthday (this December); watch this cp4space for more details of Lovelace-themed events in the coming months…

Posted in Uncategorized | 1 Comment

Coverings, convolutions and Corbynatorics

Amidst Saturday’s turmoil, the following problem came into consideration:

There are plans to build a nuclear power station on an initially empty 12-by-12 chessboard. Doing so would require an empty 4-by-3 or 3-by-4 rectangular region of squares. To foil this plan, Jeremy decides to erect coal mines on squares of the chessboard (with each mine occupying a single square) to ensure there is no sufficiently large empty rectangle. What is the minimum number of coal mines necessary for Jeremy to prevent the construction of the nuclear power station?

With little effort, one can obtain a lower bound of 12 by partitioning the chessboard into twelve disjoint rectangles, noting that each one must contain at least one mine:

lower-bound

Similarly, we can establish an upper bound of 16 by placing a mine at (x, y) if and only if x and y are both divisible by 3. Can this bound be improved? In particular, note that it is never wise to place a coal mine near the edge of the board, since we can move it inwards without reducing its effectiveness for blocking potential nuclear power plants. Without loss of generality, we will therefore only consider arrangements where the mines are in the 8-by-8 ‘reasonable region’ in the middle of the board:

reasonable-region

One reasonable approach is to place mines on the boundary of this region, and then concentrate on the centre of the board. This reduces the upper bound from 16 to 12, and is therefore optimal:

construction-12

What about a larger board? In particular, what is the lowest density of mines that can be placed on the infinite plane to leave no 4-by-3 or 3-by-4 rectangle unoccupied? Again we have a lower bound of 1/12 and an upper bound of 1/9. The upper bound can actually be reduced to 1/10 by using the following lattice:

det10lattice

It appears that it would be difficult, if not impossible, to improve upon this. Observe that if an empty 3-by-3 box occurs in any valid configuration, then there must be a coal mine in each of the four surrounding 1-by-3 boxes. Moreover, the lattice shown above achieves this bound, and has no 3-by-3 boxes containing more than one mine.

But how would one prove optimality? Given an arrangement of mines, we define a scoring function over \mathbb{Z}^2 by means of a convolution. Specifically, each mine contributes a score to its own square and those surrounding it as follows:

mine-convolution

That is to say, each mine contributes a score of 20 to its own cell, 15 to the four edge-adjacent cells, 16 to the four other vertex-adjacent cells, 4 to the four cells at a distance of 2 in a horizontal or vertical direction, and 5 to each of the eight cells a knight’s move away from the mine.

Then the score associated with a given square is simply the sum of the contributions from nearby mines.

This may appear to be a seemingly arbitrary scoring function, but it was chosen to exhibit the following properties:

  • The score associated with the density-1/10 lattice is uniform across all cells (and equal to 20);
  • For any valid configuration (one with no 3-by-4 or 4-by-3 empty rectangle), every cell gets a reasonably high score (in this case, the minimum possible is 15).

Since the total score contributed by each mine is 200, we can determine the density by dividing the average score by 200. Hence, proving that the density-1/10 lattice is optimal is equivalent to the following statement:

The average score in any valid configuration must be at least 20.

It transpires that this is true, and can be obtained by locally perturbing the scoring function (without violating conservation of total score) so that no cell has a score below 20. This is a finite check, and sufficiently small that it can be performed without the assistance of a computer.

Suppose we have a cell with a score strictly less than 20. It is easy enough to show that one of the following two possibilities must arise:

  • Case I: The score is 19, and the cell is positioned between two mines like so:

two-mines

  • Case II: The cell belongs to precisely one empty 3-by-3 region, which is surrounded by four mines.

We deal with all instances of Case I before Case II, by incrementing each ’19’ and decrementing the two cells which are edge-adjacent to the ’19’ and vertex-adjacent to the neighbouring mine. This actually decreases the overall score, so (if we can show that the average score is still at least 20) any configuration with an example of Case I is suboptimal.

Case II is more complicated to address than Case I, because each of the four mines surrounding the 3-by-3 region can be in any of three possible positions. Moreover, empty 3-by-3 regions can potentially overlap slightly. Nevertheless, if we consider only the portion of the 3-by-3 region which belongs to no other 3-by-3 regions, the average score is still at least 20, with equality if and only if the mines are arranged to resemble a portion of the density-1/10 lattice. This is true even after we perform any reductions associated with Case I.

Convincing yourself is a matter of testing each of the possible sub-cases for Case II (there are at most 81, and this quickly reduces when you take symmetry into account).

What have convolutions ever done for us?

In addition to solving this rather niche problem in Corbynatorial geometry, convolutions have a wealth of other applications.

  • Even if we restrict ourselves to considering convolutions of an arbitrary function on a square grid with a small finitely-supported kernel, this is useful for image-processing. Common edge-detection algorithms use this concept, right up to the sophisticated convolutional neural networks underlying Google’s image recognition software. These are explained very clearly in various places.
  • A continuous analogue is taking a convolution of measurable functions over \mathbb{R}^n or the torus obtained by quotienting by a lattice. Gaussian kernels are often used.
  • More generally, one can convolve continuous functions over an arbitrary compact group. In the cases where the group is Abelian, this can be computed quickly by means of a Fourier transform (although this is typically overkill when the kernel has small finite support).
Posted in Uncategorized | 2 Comments

∂ eclipse

There was a solar eclipse today.

eclipse

Certain regions (including the Faroe Islands, familiar to anyone who has listened to the Shipping Forecast) landed in the umbra, experiencing a total eclipse. I was slightly less fortunate, landing in the penumbra (thereby only seeing a partial eclipse, which itself was largely obscured by cloud cover).

Pi day and the media

Since citizens of the United States feel the need to use Middle-Endian date formats (mm/dd/yy, instead of the standard yyyy-mm-dd format), and because sequences of digits can be interpreted as decimal digits irrespective of whether or not they actually are, the 14th March 2015 was proclaimed ‘pi day’. Consequently, after a 10.5-mile run, the founders of Oligomath baked a pie containing blackberries, blueberries and raspberries. Unfortunately we didn’t photograph the pie, so here’s a plan view of the run instead:

pi-day-run

As one would expect, this has been covered in an extensive barrage of posts in the Aperiodical, including an ode* to constrained writing by Alex Bellos.

* the type of poem, rather than differential equation. Feel free to write an ode to ordinary differential equations.

A sesquimonth ago, the same Alex Bellos invited me down to London to watch a media screening of X+Y. I then wrote a review for his Guardian column, attracting a similar amount of controversy as Noa Lessof-Gendler’s review of a particular coffee house in Cambridge. If you enjoyed the latter, you may also want to read her brother’s more positive synopsis of the Rado Graph.

Distributed searching

Around the same time, I launched a distributed search I had been working on since August. It collects data from people running a particular Golly script to simulate millions of random initial seeds in a variety of cellular automata rules. It’s currently gathering around 8 * 10^9 objects per day, depending on how many machines are running the script, and has already found previously-undiscovered patterns. We’ve also had a few other surprises, such as these pairs of interacting spaceships at the bottom of this list:

spaceships

The third column gives the total number of occurrences so far in the census. So whilst over 19 billion gliders have made an appearance, and millions of each of the other ‘standard spaceships’, there are only a handful of occurrences of other moving objects (in this case, pairs of interacting standard spaceships).

If you want to get involved, you can download the requisite software (Golly, Python, and the search script) from here. In order to maximise your machine’s potential, run one instance of the search program per CPU. For instance, if you have a quad-core computer, run four instances of Golly, each running the apgsearch script.

The script will prompt you for the number of soups to search between successive uploads (default: 5000000), the rule to use (default: B3/S23 = Conway’s Game of Life), and the seed symmetry (default: C1 = no symmetry). You can leave all of these parameters unchanged.

Happy searching…

Posted in Uncategorized | 2 Comments

Proto-Penrose tilings

A while ago, Roger Penrose famously filed a lawsuit against Kimberly Clark when his wife discovered that a roll of quilted lavatory paper was adorned with his aperiodic tiling. Although the tiling occurs naturally in quasicrystals, Penrose was probably the first person to discover and abstractly formalise the pattern. Nevertheless, there have been several historical ‘near-misses’ which came close, and indeed would have yielded isomorphic tilings if applied recursively. Since we’ve already mentioned Penrose, it seems only natural to approach this in inverse chronological order.

Kepler’s monsters

It is unknown as to exactly why Johannes Kepler investigated this particular tiling, although one sensible suggestion is that he was trying to construct a tiling of the plane involving only shapes with D10 symmetry. For example, ten regular pentagons neatly fit around a regular decagon of the same edge length, and it is possible to continue the pattern further if pentacles are also allowed:

kepler-fragment

Unfortunately, however, it is not possible to continue this process forever. Pretty soon one is required to have decagons overlapping, as in this particular drawing by Kepler:

Kepler’s original drawing of the Kepler’s Monsters tiling.

With the assistance of these fused decagons, termed monsters by Kepler, it is possible to create various periodic and nonperiodic tilings of the plane. Craig Kaplan explored many variations on this theme in his article, with the Keplerian problem of finding a tiling using only (finitely many distinct) shapes of D10 symmetry remaining unsolved.

Even earlier were several attempts by Albrecht Dürer, whom you may know from his Melencolia I engraving inter alia. These featured pentagons and rhombi, similar to Penrose’s original tiling but with less sophisticated matching rules incapable of enforcing aperiodicity.

Girih tiles

I was in the Persian section of the Victoria and Albert Museum during the post-Christmas weekend at the end of last year, and happened to notice various geometric designs. Some of these were periodic carvings of knotwork, not unlike similar examples in the Alhambra in Moorish Spain.

VLUU L200  / Samsung L200

Others were more exciting. One particularly ornate example features angles commensurate with the internal angles of a pentagon, and bears a striking similarity to the aforementioned tilings of Penrose, Kepler and Dürer:

VLUU L200  / Samsung L200

Apparently these are formed from a set of five so-called Girih tiles, and patterns of a similar nature occurred throughout the Islamic world since the Middle Ages. In particular, I am intrigued to see the more sophisticated Girih tilings featuring patterns on two scales, where the small-scale pattern is created by applying a set of subdivision rules to the large-scale pattern. This is analogous to the method by which Penrose tilings can be constructed, as you will know from my online demonstration:

wieringa

This surprising and fascinating connection between Islamic architecture and quasicrystalline tilings was first discovered and investigated by Paul Steinhardt and Peter Lu, the latter of whom presented an exposition on the subject at the Harvard Physics Colloquium:

If you found this interesting, there is a talk on early Islamic mathematics by Dr. Bursill-Hall at 16:00 today in Meeting Room 3 of the Centre of Mathematical Sciences.

Miscellany

Finally, I have a few late items of news. Stuart Gascoigne has just sent in the first 8192 2-adic valuations of polylogarithms. The ‘spike’ in the last row (associated with 8192) penetrates significantly deeper into the grid, with Q(3, 8181) being the first integer of the form 64n + 53:

polylog-8192

Here are some other recent highlights:

  • Eugenia Cheng has committed a third instance of gastro-mathematical marketing, namely a formula for the perfect doughnut.
  • Alex Bellos and I attended a pre-release viewing of X+Y, an action thriller following the romantic life of a British olympiad contestant. Expect a review from us next month.
  • Jeffrey Ventrella has been investigating self-similar fractal curves for a while, and recently sent me this analysis of fractal curves based on Gaussian and Eisenstein integers.
Posted in Uncategorized | 1 Comment