Machine-vision systems are great for analyzing relatively simple images. They can quickly identify a bad part from a sea of good ones when the task of identification involves matching up nonchanging dimensions. Machine-vision systems count, measure, and figure the dimensions of features in the image they capture. They typically compare these calculations to those for “known good” parts that they’ve memorized. If the two sets of data are within some programmed tolerance, the software decides the part is good and passes it.
Who, What, Where
Edited by Leland Teschler, [email protected]
In a nutshell
Neural ID, (www.Neuralid.com) (650) 288-1180
Orbitform Group, (www.orbitform.com) (517) 787-9447
Wikipedia page on neural networks: tinyurl.com/3dponx
Basic introduction to neural nets used in machine vision: tinyurl.com/6qb2d
Neural network FAQ: tinyurl.com/5h8wc3
Orbitform WTS news item: tinyurl.com/6jsd92
Problems arise, however, when the definition of a “good” part involves some variations or subjectivity. The conventional approach to machine vision typically doesn’t do well in such situations. The system may not be able to memorize enough examples to cover the universe of what might constitute an acceptable part.
The reason is that the process of measuring appropriate features in a scene, then comparing those to the memorized examples can be time consuming for real-life machine- vision applications. Moreover, it can be problematic for such systems when there are a wide variety of scenarios that can constitute a valid example. The classic example of such a situation is one where a human would exercise subjective judgment in classifying the part of interest as good or bad.
Neural networks were devised to overcome this kind of drawback. Popularized in the 1980s, they are made up of interconnecting artificial neurons, basically programming constructs that mimic the properties of biological neurons. These neurons connect to each other through paths that carry different weighting factors. The value of the weighting factors get determined through the training process. The training process effectively adjusts the weights of each unit to minimize the error between the desired output and the output the network actually produces.
In the case of a machine-vision application, the training process consists of showing the system some examples of “good” parts, as with a conventional vision system. The resulting weights then get used to classify unknown parts as either acceptable or not.
The use of a weighting scheme was viewed as a way to introduce more flexibility into the interpretation of images. That was because a network of neurons could adapt to a particular situation by changing its weights and the threshold at which individual neurons would generate a particular output. The general feeling was that use of a weighting scheme could more quickly distinguish a wider variety of valid examples than could conventional machine-vision systems.
The problem with neural net schemes, though, has been one of performance. Despite their promise, neural nets generally have not been able to recognize scenes more quickly or with any more flexibility than conventional methods. Additionally, the use of a weighting scheme to classify scenes has introduced an additional difficulty: It has been hard to decipher why such systems make the decisions they do. Ferreting out a decision involves tracking back to the original inputs through the various neuron paths and their weighting factors. For real-world problems, this track-back process may be complicated and impractical.
However, neural-net technology has progressed to a point where it can handle decisions in which a human would use subjectivity. Moreover, modern neural technoogy builds in mechanisms that make it easier to see the role specific inputs play in generating a given output.
An example of such advances comes from Neural ID, San Mateo, Calif. The firm has devised what’s called Cure, for Concurrent Universal Recognition Engine. The technology uses neural-network concepts to recognize patterns. In contrast to earlier methods, the Cure technique builds-in ways of deducing how specific inputs lead to specific outputs. Moreover, Cure has proven to be better then ordinary machine-vision methods at recognizing “good” and “bad” cases in conditions characterized by a lot of variability.
One of the first applications of the technique is in recognizing whether or not welding electrode tips that have been through a dressing process are acceptable for use. Conventional machine-vision processes have proven unable to manage the subjective nature of the good/bad tip decision and also struggle with scene variations caused by factors such as debris in the weld cell and different lighting conditions. A Cure-based Weld Tip System, distributed by Orbitform Group, Jackson, Mich., recently came on the market and is now deployed in several spot-welding applications. (See M D, 9/25/08, “Tips Any Good? Weld Rod Inspector Can Tell,” p. 35.)
In the WTS, the Cure technique is implemented in the form of a combo FPGA/processor as a means of gaining processing speed through use of parallel operations. The basic technique, however, could also be handled entirely in software if speed was less crucial. Additionally, Cure is not limited to machine-vision applications. Its creators say it is a candidate for any task that involves recognizing complicated patterns in any type of data.
Inside an advanced neural system
It is helpful to understand the workings of the Cure technique by considering how it would handle a simple recognition problem in machine vision. The first step would consist of showing the system examples of “good” parts. A typical system might use a smart camera to digitize the image of each part and then perform some processing to extract features from each digitized scene.
Image preprocessing is no different with the Cure system than with a conventional machine-vision setup. Next, image-processing algorithms detect and isolate key shapes or dimensions within the scene. Feature extraction can involve a variety of techniques such edge or corner detection, blob analysis (sizing up bright or dark regions), and numerous others. As with any vision system, different feature extraction mechanisms would be brought to bear depending on what qualities were important in the digitized scenes.
The feature-extraction process produces a set of measurements (termed vectors in machine-vision parlance) for each feature in the example part. These measurements are then fed to the Cure algorithm.
In preparation for memorizing part features, the Cure’s neurons (Neural ID calls them knowledge elements rather than neurons) have been divided, or partitioned. Each partition corresponds to a specific feature extracted from the digitized scene. So, for example, a “right-edge” feature vector would go to knowledge elements (KEs) in the “right-edge” partition; “left-corner” vectors would go to KEs in the “left-corner” partition, and so forth. Within each partition, the “good” parts form a category. Other application specific categories can also be used.
The key to handling features that must be interpreted with subjectivity lies in how the KEs store information from examples and use it to make decisions about unknown parts. For the sake of simplicity, consider the KEs devoted to one feature, say, an edge. During the training phase, each edge dimension memorized from a known “good” part would occupy one KE. Of course, for real parts, each memorized edge dimension could be slightly different because of ordinary manufacturing tolerances or analogous factors. Together, all the memorized edge features make up what is called an influence region. This region basically corresponds to how much the feature of interest can vary and still be considered a “good” part feature.
So when the system attempts to classify an edge from an unknown part, it will compare the unknown’s edge dimension to the influence region defined by the edge KEs. If the unknown edge lies within the influence region, the conclusion is that the edge dimension is “good.” The edge partition at that point generates a signal indicating it found a “good” edge.
The situation is more complicated if the unknown edge lies outside the KE region of influence. In this case the edge partition may generate a signal that has some relationship to the distance between the region of influence and the unknown part’s edge dimension.
The number of examples necessary to define an influence region depends on the variability of the underlying process generating the parts being categorized. In the Orbitform application, for instance, the system can make decisions about welding electrodes after seeing as few as 15 examples. However, some operations may need as many as 50 examples before the system can make decisions. Orbitform says WTS installations that need more than 50 examples expose instability within the process and can’t be classified.
In making a decision about a specific unknown part, the Cure algorithm would take the output of each KE partition and use the results to reach a conclusion. In a simple case, for example, each KE partition might generate a one if it found the feature was within its influence region, a zero if it was outside. A summing function might total up the outputs of the KE partitions and note the result. If the result was above a certain threshold, the conclusion might be that the unknown part was considered good.
A point to note about the Cure algorithm is that many of its steps can happen in parallel. For instance, the comparison of an unknown feature to a specific influence region takes place independently of operations on other features. Each of these comparisons can happen simultaneously if implemented in, say, a sufficiently large FPGA chip. Designers can use this parallelism to speed up the classification process when rapid real-time operation is important.
There are also a variety of complexities that could be introduced, if need be, to the basic classification scheme outlined here. For example, some classification tasks need more than a simple summing of KE outputs to make a decision about unknown parts. KE outputs can be more than simple yes/no signals and can be processed in more sophisticated ways.
And it is also possible to construct KEs so they are in a hierarchy. For example, the KEs classifying part features might be processed to generate a signal that, in turn, serves as one of the inputs to a second layer of KEs. This kind of organization is good for handling situations characterized by lower-level and higher-level features of interest as, for example, when recognizing faces in a crowd scene.