On Hallucinatory Intelligence
Let’s look at one example of something amazing that our human visual apparatus can do. Take a look at the image below, and say the first word that comes to mind:
Do you see the triangle?
Many people say “triangle” before they’d say “three pac-man / pie chart objects” (which is what’s really there). Why does our vision work this way? Would state of the art computer vision systems see this “triangle that isn’t there”? In general, do our AI/ML systems infer this kind of missing information, and then take that leap of imagination?
When I was an undergraduate at MIT, Professor Gerry Sussman showed me this image as evidence that our visual system has both bottom-up and top-down processing. The bottom-up system recognizes a set of three 60-degree corners in a particular orientation, and sends these up for higher-level processing. The higher levels of processing somehow fill in the blanks in a way that makes sense; “connect the dots”, if you will. The top-down system then creates a partial hallucination in the lower levels of processing, of something that looks like an actual triangle! These new edges aren’t really in the image, and we don’t confuse ourselves into thinking they’re really there. And yet we somehow “see” them in our mind’s eye.
With the hallucinated data plus the real data, we get an object that’s more recognizable than the original one. We see that the information is “flowing both ways” (that is, from low-level perception of line segments up to higher-level imagination of a shape, and then back down to low-level “connecting the dots” in which we actually start to see the rest of the shape). Input determines output, which extrapolates some extra input, which produces another iteration of output, and so on.
If our brain has information “flow both ways” like this, then why do so many of our machine learning systems — and most computer programs in general, for that matter — have information flowing just one way? Even in artificial neural networks, which are loosely based on the brain, we often have data flow just one way, from the input layer to the output layer. (During training, of course, errors back-propagate the other way — but that’s not the same thing.)
The exciting thing is that we could imagine implementing this in a new neural network structure where information flows both ways (a bit similar to a bidirectional recurrent neural network). Suppose we replicate all the layers in a neural network (say, a convolutional neural network), but this time reverse all the edge directions, having them point backwards instead of forwards. We go from a network that looks roughly like Figure 1, to one that looks more like Figure 2.
Each “backward layer” combines its values with those of its corresponding “forward layer”, and sends that on to the next forward layer, as seen below.
The combiner function, represented by the circles in Figure 2, is some way of merging the “real” data from the forward layers, and the “hallucinated” data from the backward layers. We’d like to weight the real data more heavily than the hallucinated data, lest our neural network fail to distinguish its own imagination from reality. This could be implemented by a linear combination like f(x, y) = 0.8x + 0.2y, where x and y are vectors of values from the feed-forward and feed-back layers, respectively.
In the above figures, I have purposefully left out details specific to how a vision system would work, like convolutional layers, etc. This is for simplicity in illustrating the idea, and second, to let us think about using this approach on problems besides vision. Our brains likely use this forward-backward type behavior all over the place — indeed, information seems to flow in all directions at once. To name just two examples, what we see influences what we hear (known as the McGurk Effect), and what we hear influences what we see.
Such multi-directional flow of information seems essential in how our brains compute, and our computer programs should have it too. Not just our artificial neural networks, either — but other machine learning & AI algorithms, and just computer programs in general. (The Art of the Propagator shows an elegant way to achieve multi-directional computation in everyday programming — it is an enjoyable and mind-expanding read!) If we can successfully implement architectures like the one I propose here, and if we go on to form multi-directional connections between many other types of learning systems, we may just reach the holy grail of merging classical AI techniques (symbolic reasoning based systems) with modern machine learning.
Professor Marvin Minsky, in his book The Emotion Machine, proposed six layers of mental processes, ranging from Instinctive Reactions to Self-Conscious Reflection, shown below:
The lowest-level, instinctive reactions, has to do with processing stimuli. This is a domain where deep learning has made significant contributions, namely in object recognition and speech recognition. The next level, learned reactions, is being addressed by ongoing work on reinforcement learning.
With the bottom-most layers of the mental process now being reasonably well-modeled, much work remains in building the rest of the mental “stack” — connecting higher-level reasoning and planning with low-level perception and instinct. These are areas where other AI approaches from classical, symbolic AI, can lend a hand — there are many techniques for symbolic reasoning and meta-reasoning, which on their own are neat, but combined with a system that can actually sense the outside world (through vision, hearing, etc.), will make for one darn intelligent machine. We need to build a connection between low-level perceptual systems (eg. vision, speech recognition, etc.) and high-level reasoning systems. These connections need to go both ways: low-level perceptions should influence high-level reasoning, but high-level reasoning should also influence where we direct our perception next (where we choose to look, etc.), and through what frame we perceive.
A nice example of all this in action would be as follows. Suppose a robot sees an image which its vision system recognizes as smoke. It also sees a few tiny orange specks near the smoke, but these don’t qualify on their own to be any detectable object. When the next level of reasoning receives the message that there’s smoke, it infers a few possible explanations, and considers that one cause for smoke is a nearby fire. This idea of a possible fire then feeds back to the lower-level perceptual machinery, producing a minor hallucination of fire in the visual field. That minor hallucination amplifies the small orange specks that were already there, further confirming the hypothesis that there may be a fire. This should trigger some decision-making and planning about what to do next: walk slowly towards it to better see whether it’s a fire, alert other people, etc. We may want this to connect up with some probabilistic inference, some other context knowledge (don’t yell “fire” in a crowded theater), etc. A lot becomes possible here because the Instinctive Reactions layer from Figure 3 has a two-way dialogue with the Learned Reactions layer, which in turn has a two-way dialogue with the Deliberative Thinking layer. By building layer after layer like this, we can start to get a very rich set of possible behaviors. Importantly, we need to develop these capabilities not just as standalone algorithms— but as modules in a system where they communicate skillfully.
- Argued that a form of mild hallucination is at the heart of how human minds work, and that information flows in all directions at once.
- Proposed a design pattern for creating neural network architectures that allow information to flow both forward and backward, in which the conclusions of later layers can back-influence what the earlier layers see as their input.
- Argued that deep learning is poised for a strong partnership with classical AI if the two are connected with such two-way data-flow. In particular, proposed that deep learning handles low-level perception and learned reactions; classical AI techniques provide deliberation, conceptual reasoning and reflection; and the interface between them allows raw perception to influence high-level reasoning, while also allowing the reverse: high-level reasoning back-influences lower-level reasoning and even raw perception.
I must greatly acknowledge and thank:
- Professor Gerry Sussman, who got me thinking about these problems, and planted many of the core ideas in my head that enabled me to write this essay.
- Danny Hillis, for reminding me and my peers that in the early days of AI, there was a 6-step plan for building full AI, in which perception and pattern recognition would sit at the first layer — and for pushing us to think about the other 5 steps, now that we finally have the first one more-or-less solved!