Artificial Intelligence - Getting from A to B
Submit Articles Back to Articles
Artificial Intelligence - Getting from A to B How does something 'smart' get 'smarter'?
At the very core of the search for Artificial Intelligence lies the question "how does something 'smart' get 'smarter'?" To answer this question, let's look at an extreme example of intelligence: that of Albert Einstein. Let's look at the specific example of Einstein's theory of gravity. He knew of fabrics and how a ball (weight) on a fabric would cause it to bend. He then applied this pattern to a larger phenomenon, namely space-time itself, and realized that he could treat space-time as a fabric, which mass bent or deformed in some way. In essence what he did was take one pattern of things which could happen and applied it to a (seemingly) completely different question and recognized a pattern. The core of self improving intelligence lies herein: recognizing patterns and applying them to novel situations.
Most of us are incapable (or so we think) of Einstein caliber Herculean jumps of intellect. For our benefit, we will work a simpler example to explain how something 'smart' can become 'smarter'. Many of us are familiar with the idea of Optical Character Recognition (OCR). For those not familiar, this is the process whereby a computer looks at a character of the alphabet and recognizes it. Many of us know that OCR is used to read the addresses we write on our envelopes and then correctly route the letter to the correct location.
I will briefly explain the process of how a Neural Network can be used to solve this problem. For each character under consideration, the result is pixilated, that is broken down in equal sized pieces. Typically for this type of problem, the shapes are pixilated into around 100 pixels. Each of this pixels then are either black or white, which corresponds to an on or off input to a neural network. An example of just such a solution can be found here: http://neil.fraser.name/software/recog/ This program can run on just about any windows computer.
Given enough training, neural networks can effectively train themselves. Training involves exposing the network to an input, allowing it to guess an answer, telling the network if it was right or wrong and allowing it to adjust the values accordingly. Improvements in the network are accomplished via adjusting weights between neurons to 'evolve' the correct circuitry to produce the correct answers. (This is also known as 'genetic programming').
The process described above represents the current paradigm of AI. This has led to the development of 'intelligences' which are subject matter experts and completely 'dumb' in every other respect. That is because the true nature of intelligence is to apply experiences from one domain and use them to learn and grow in a completely other domain. AI as modeled above has no chance of accomplishing this.
The reason is a notion I term 'the limit of brute force intelligence'. The method described above will only ever grow so 'intelligent' or accurate at the task it's assigned. No matter how finely you pixilated the character, no matter how many neurons you have, it will only improve to a bounded accuracy.
We happen to know of another form of OCR whereby parallel nets work in concert to produce an analysis of the character. For example, in the case of the character 'b', one net looks for 'enclosed space (circle)'. Another looks for 'vertical line' and another still looks for the bottom of the vertical line to be near the left most point of the enclosed space. If all these conditions are met, chances are you have the letter 'b'. As you might intuitively guess, this constellation of patterns is much better at character recognition than its 'brute force' counterpart. With this constellation, one can intuitively imagine writing a 'b' with the vertical line not quite touching the circle part, and still having the constellation of neural nets consider it close enough, whereas the brute force solution would likely get tripped up.
The key is, how do we get from 'brute force' to parallel nets without having to use our own intelligence (programming) to do it? Endemic to this question is the larger question how does something smart (brute force algorithm is pretty good) get smarter (parallel nets are better)?
The answer to this question lies on the notion of the 'intelligence kernel'. Others call this 'seed AI'. Nomeclature aside, both terms point to a notion of recursive self improvement. Returning to our example of OCR, an intelligence kernel could look for new (better) ways of solving the problem.
What follows is a high level description of how an intelligence kernel could operate. When first exposed to a new type of pattern (problem) to recognize, the kernel would first see if any patterns it is aware of are sufficient to classify this problem. (More details follow...) Failing that, it would spawn a neural net with inputs facing the problem space to classify the pattern. In our specific case, the kernel would spawn a neural net to try a 'brute force' neural net recognize the characters.
The success of a brute force is given by the cross product of the amount of training and the number of neurons involved in solving the problem. The kernel would have enough intelligence to iteratively add neurons and provide training until the optimal brute force solution is reached. In the case of our OCR problem, the kernel would add neurons and provide training until a maximum efficiency is reached. Let's say the maximum efficiency for the brute force algorithm for OCR is 80% (this number is invented).
Recall that the parallel neural nets solution to the OCR problem was the better solution. So our kernel has only managed to create a brute force neural net solution and is 'stuck' with this 80% efficient solution. Were it not for the notion of 'experience' our AI could only ever reach this plateau (in and of itself).
However, suppose we also teach this intelligence to recognize a closed space (like a circle). Again the kernel has never been exposed to this type of problem and evolves a brute force solution. At some other point in time this intelligence is exposed to the problem of recognizing a vertical line. Again, the kernel develops a brute force solution for this type of problem. There are actually a few other types of patterns that are necessary but let us omit them temporarily to see how this system could evolve.
Now return to our original problem of OCR. Now when the OCR network is asked to recognize the letter 'b' the kernel notices that the 'enclosed space' registers true and the 'vertical line' also registers true. The kernel also notices that when the brute force network guesses that the suspected 'b' is not a 'b' when in fact the character is in fact a 'b' (ie a false negative), that the constellation of 'vertical line' and 'closed space' is true. Over time and experience, the kernel creates pathways and mappings such that when the brute force neural net isn't 'highly negative' and this constellation of patterns is true, to actually override the negative of the brute force network and guess positive. This is a hybridized model between brute force neural networks and parallel networks. It is more efficient than the brute force network, but still not optimal. For example in certain cases, when a 'd' is entered, and if the brute force network isn't highly negative, the constellation of vertical line and enclosed space will be true and the hybrid model will guess 'b' but it will be wrong.
This situation is critical to understanding how to involve intelligence. The question know remains: what should the kernel do in such a case. It just guessed incorrectly, so something is wrong, but what? There are 3 possibilities: 1) the brute force network needs more neurons or training, 2) the link to the constellation of other patterns should be weakened because it's just plain wrong (has nothing to do with the nature of the problem) or 3) the constellation of other patterns represents a set of 'necessary' conditions but has yet to meet the set of sufficient conditions to classify the problem.
In our example, the miscategorization of a 'd' as a 'b' due to vertical line and enclosed space is due to (3) the constellation doesn't represent the sufficient set of conditions to characterize a b. (vertical line and enclosed space are necessary (but not sufficient) conditions for b, p, d, and q) However, on average, this hybridized model will be more correct than the original brute force model. For example, let's use some human pattern recognition. Which single character is this most similar to: "| o"? This would break a brute force neural net because the vertical line and the circle are too far apart. However this would trip the constellation of vertical line and enclosed space. Thus the hybridized model get this whereas the brute force model would miss it. The kernel can recognize that it is in situation 3 because the hybridized efficiency is generally better than the brute force standalone model and that efficiency reaches some sort of maximum as did the brute force model. Let's call the favouring of a constellation of patterns 'beneficial inclusion'.
How could the kernel recognize that it is in situation 2? We need an example to make this clear. The US military invested millions in an AI to study images of forests and see if a tank is hidden in that forest. They trained up the AI (brute force mind you) with hundreds of images and got it to 90+% correctness. They deployed this in the field and it was a disaster. It turns out that the pictures of the forests with tanks in them were coincidentally taken on cloudy days whereas the pictures of the forests without tanks were taken on sunny days. So the military had developed a multimillion dollar weather analyzer. In this case the AI had honed in on a constellation of patterns (weather) which had nothing to do with the problem it was being asked to solve.
Let's suppose that there are two networks involved, one looking at the sky and one looking at the forest. Both of those networks are brute force networks, one detecting sun-no-sun and the other detecting tank-no tank. Suppose again that the kernel is in the middle of considering the viability of combining (hybridizing) the inputs of the sky network and the forest network. The kernel can recognize that it is in this situation when, with added exposure to the problem it notices that the hybrid efficiency lowers relative to the brute force network looking at the forest area only. Meaning, with enough exposure to the problem, the kernel will notice that the sun-no sun network is offering input entirely irrelevant to the problem at hand. Let us call the disfavouring of a constellation of patterns 'pruning'.
So, let's now complete the evolution of the optimum OCR by continuing with the experience of our created intelligence. Over time the intelligence is exposed to proportion patterns, and relation patterns such as one shape on the left or right of another shape. In the end, the kernel notices that when the following constellation of patterns is true, it's likely a 'b': - vertical line - enclosed space - vertical line approx twice size of enclosed space - vertical line bottom near left side circle Now when the kernel comes up with this constellation, it notices that this constellation alone (without the base brute force network) is better than either the hybridized model or the brute force network, it prunes out both older networks and favours this constellation. Let's call this action a pruning replacement.
If you'd like proof that this sort of thing is in effect in your own brain right now, here it is. You agree that you have a pattern recognition capability which can detect individual letters. A E I O U... you just detected the vowels. But I can prove to you that your brain has found a better way of reading than just detecting each letter, stringing them together to form words and then detecting the words. Your brain has performed a pruning replacement a long time ago when it learned to read, here is the proof: Cn y rd ths sntnc? Hw dd y d tht? You did it by using the context of likely word combinations to detect words instead of individual letters.
A few axioms for the kernel to consider. First, there is no such thing as an atomic pattern. All patterns are infinitely decomposeable and composeable. The decision to decompose or compose patterns must always be done under correctness optimization. (In the case of the forest tank example, we got something out of decomposing the problem into having a neural network look at the forest and not the sky. In the case of character recognition we wouldn't get something out of having 4 networks looking at four quadtrants of the character.) When exposed to a new problem, the kernel will always look for a constellation of patterns which solves the problem. If none is sufficient, it will spawn a brute force network to bootstrap itself until such time as there is sufficient experience to come up with a constellation type solution. If the kernel notices that a hybridized model is suddenly losing efficiency, compared with the brute force model, it knows that an inclusion of a coincidental constellation has occurred and prunes the constellation. If the kernel notices a situation where a constellation of patterns is more efficient by itself compared with the hybridized constellation and the brute force it prunes and replaces the hybridized and brute force networks with the constellation.
Epilogue: Project Multivac is an open source project seeking to explore issues related to the kernel. One of the first experiments will be in trying to have an AI evolve itself from a brute force character recognition AI to a parallel networks AI. Any interested academic or developer is invited to inquire at: http://sourceforge.net/projects/multivac
About the Author
Martin Winer is the project lead at Project Multivac http://sourceforge.net/projects/multivac which is currently looking for interested academics and developers
Follow us @Scopulus_News
Article Published/Sorted/Amended on Scopulus 2006-12-29 19:57:51 in Computer Articles