Machine Learning for Program Synthesis
Programming is at the heart of intelligence. If we want to develop “Artificial General Intelligence”, it will certainly be necessary to have algorithms that can write code, because this is something that humans can do. I claim it will also be sufficient: if we can get computers to code, they’ll be able to build many other systems autonomously — in particular, they’ll be able to code new AI systems, and will therefore be able to self-improve. In this sense, getting computers to synthesize new, original code is the most fundamental problem we can pose in AI.
I am intrigued by the idea of using machine learning for program synthesis — inspired by Differentiable Neural Computers, Neural Programmer Interpreters and related techniques — but endowing them with more capabilities, some of which are influenced by classical AI programming techniques.
Consider the Differentiable Neural Computer (DNC). From the paper:
A DNC is a neural network coupled to an external memory matrix. … If the memory can be thought of as the DNC’s RAM, then the network, referred to as the ‘controller’, is a differentiable CPU whose operations are learned with gradient descent.
What instructions are available on this “differentiable CPU”?
The heads use three distinct forms of differentiable attention. The first is content lookup … A second attention mechanism records transitions between consecutively written locations … The third form of attention allocates memory for writing.
The primitive operations this machine knows how to do are:
- locate a particular piece of content in memory
- re-trace its steps through the different locations it’s written to, in-order
- allocate and free memory for writing.
It is amazing how much DNCs are able to do with just this simple set of primitive operations, by learning which instructions to apply when. But, this begs the question about what would happen if there were more primitive instructions made available, which it then learned to use. What about common processor instructions like AND, OR, NOT, add, multiply, conditional jump (enabling if/then/else behavior), etc.? What about a full x86 instruction set? If a neural network could learn how to effectively use these as primitives for doing computation, this would expand the reach of what the machine could do in a given number of steps — because the set of primitive moves it can make is a richer one.
This brings us to Neural Programmer Interpreters (NPI). NPIs have a different set of primitive operations to take at each step, based on current state observation and previous states:
- select a program to invoke
- end the current program and return control to the caller
This is starting to get more generic, since the NPI is actually writing programs that can call other programs, rather than just executing low-level instructions one-by-one. However, the NPI requires strong supervision, including the whole hierarchy of low-level to high-level programs. Can we do away with this requirement too? The NPI isn’t able to choose which helper functions it should define — something a human programmer has to think about all the time. Defining the right abstractions is the key to programming. Can we have an algorithm learn to do this on its own? I hypothesize that this is possible.
Just like we can have a DNC or NPI determine the best action to take at a given step, we could take a similar approach but add one more primitive operation for it to choose from at each time step: “Define new function here”.
Instead of only having options like reading/writing to memory, calling another function, etc., we would give our algorithm the option to start defining a new abstraction, by writing the syntax for defining a new function/method. When the algorithm chooses this option, whatever code it generates next would become part of this new function that it’s defining — until it decides to end the current function and return-to-caller (the same way the NPI has this option).
This requires us to think about the machine’s “actions” as writing lines of code, rather than as procedural steps in an execution trace like the DNC does. It may be easier to do this in a functional-programming style, rather than procedural, since we can easily exploit compositional structure of programs that have no side effects.
Learning how to define useful abstractions is the key to programming — and getting computers to program would be a huge step towards general AI. So it is essential that we start trying to train machine learning algorithms to define abstractions automatically.
If we want to make our programming-machines even smarter, we should teach them not just the mechanics of defining new abstractions, but the art of creating good abstractions and building robust systems. It is harder for me to concretely say how we would do this. But one direction to think in, is to try to have program-synthesizers use techniques like the ones described in Professor Gerald Sussman’s essay, “Building Robust Systems”, when deciding where to form a new abstraction and which abstraction to make.
Ideas such as degenerate and redundant design, extensible generic operations, generate-and-test approaches, constraint-based programming, data propagation systems, and many others, greatly help with writing programs that solve a class of problems rather than one particular problem, and which generalize to inputs other than those anticipated be the designer. This latter issue is a big problem in NPIs and DNCs (and neural networks in general): they often fail to generalize to inputs beyond the range seen in the training set. If we can encode some of the principles for how to form good abstractions, it is possible that we would see algorithms that generate code which generalizes much further than that of existing techniques.
- Argued that getting machines to program is the key to AI, as it is both necessary and sufficient.
- Further argued that defining the right abstractions is the key to effective programming.
- Proposed a high-level approach for getting machines to form abstractions: provide a learning algorithm with an option to start defining a new function at every time step.
- Argued that this approach has promise in theory, for the same reason that DNCs and NPIs successfully learn to program: given a set of basic operations they can choose from at each step, these algorithms are miraculously trainable to select the right choices at the right time.