New Test Compares AI Reasoning With Human Thinking

The way in which artificial intelligence reaches insights and makes decisions is often mysterious, raising concerns about how trustworthy machine learning can be. Now, in a new study, researchers have revealed a new method for comparing how well the reasoning of AI software matches that of a human in order to rapidly analyze its behavior.

As machine learning increasingly finds real-world applications, it becomes critical to understand how it reaches its conclusions and whether it does so correctly. For example, an AI program may appear to have accurately predicted that a skin lesion was cancerous, but it may have done so by focusing on an unrelated blot in the background of a clinical image.

“Machine-learning models are infamously challenging to understand,” says Angie Boggust, a computer science researcher at MIT and lead author of a new study concerning AI’s trustworthiness. “Knowing a model’s decision is easy, but knowing why that model made that decision is hard.”

A common strategy to make sense of AI reasoning examines the features of the data that the program focused on—say, an image or a sentence—in order to make its decision. However, such so-called saliency methods often yield insights on just one decision at a time, and each must be manually inspected. AI software is often trained using millions of instances of data, making it nearly impossible for a person to analyze enough decisions to identify patterns of correct or incorrect behavior.

Now scientists at MIT and IBM Research have created a way to collect and inspect the explanations an AI gives for its decisions, thus allowing a quick analysis of its behavior. The new technique, named Shared Interest, compares saliency analyses of an AI’s decisions with human-annotated databases.

For example, an image-recognition program might classify a picture as that of a dog, and saliency methods might show that the program highlighted the pixels of the dog’s head and body to make its decision. The Shared Interest approach might, by contrast, compare the results of these saliency methods with databases of images where people annotated which parts of pictures were those of dogs.

Based on these comparisons, the Shared Interest method then calls for computing how much an AI’s decision-making aligned with human reasoning, classifying it as one of eight patterns. On one end of the spectrum, the AI may prove completely human-aligned, with the program making the correct prediction and highlighting the same features in the data as humans did. On the other end, the AI is completely distracted, with the AI making an incorrect prediction and highlighting none of the features that humans did.

The other patterns into which AI decision-making might fall highlight the ways in which a machine-learning model correctly or incorrectly interprets details in the data. For example, Shared Interest might find that an AI correctly recognizes a tractor in an image based solely on a fragment of it—say, its tire—instead of identifying the whole vehicle, as a human might, or find that an AI might recognize a snowmobile helmet in an image only if a snowmobile was also in the picture.

In experiments, Shared Interest helped reveal how AI programs worked and whether they were reliable or not. For example, Shared Interest helped a dermatologist quickly see examples of a program’s correct and incorrect predictions of cancer diagnosis from photos of skin lesions. Ultimately, the dermatologist decided he could not trust the program because it made too many predictions based on unrelated details rather than actual lesions.

In another experiment, a machine-learning researcher used Shared Interest to test a saliency method he was applying to the BeerAdvocate data set, helping him analyze thousands of correct and incorrect decisions in a fraction of the time needed with traditional manual methods. Shared Interest helped show that the saliency method generally behaved well as hoped but also revealed previously unknown pitfalls, such as overvaluing certain words in reviews in ways that led to incorrect predictions. [READ MORE]