How Adversarial Attacks Could Destabilize Military AI Systems

Articles

Adversarial attacks threaten the safety of AI and robotic technologies. Can we stop them?

By David Danks

Artificial intelligence and robotic technologies with semi-autonomous learning, reasoning, and decision-making capabilities are increasingly being incorporated into defense, military, and security systems. Unsurprisingly, there is increasing concern about the stability and safety of these systems. In a different sector, runaway interactions between autonomous trading systems in financial markets have produced a series of stock market “flash crashes,” and as a result, those markets now have rules to prevent such interactions from having a significant impact¹.

Could the same kinds of unexpected interactions and feedback loops lead to similar instability with defense or security AIs?

Adversarial attacks on AI systems

General concerns about the impacts of defense AIs and robots on stability, whether in isolation or through interaction, have only been exacerbated by recent demonstrations of adversarial attacks against these systems². Perhaps the most widely-discussed attack cases involve image classification algorithms that are deceived into “seeing” images in noise³, or are easily tricked by pixel-level changes so they classify, say, a turtle as a rifle⁴. Similarly, game-playing systems that outperform any human (e.g., AlphaGo) can suddenly fail if the game structure or rules are even slightly altered in ways that would not affect a human⁵. Autonomous vehicles that function reasonably well in ordinary conditions can, with the application of a few pieces of tape, be induced to swerve into the wrong lane or speed through a stop sign⁶. And the list of adversarial attacks continues to grow and grow over time.

Adversarial attacks pose a tangible threat to the stability and safety of AI and robotic technologies. The exact conditions for such attacks are typically quite unintuitive for humans, so it is difficult to predict when and where the attacks could occur. And even if we could estimate the likelihood of an adversarial attack, the exact response of the AI system can be difficult to predict as well, leading to further surprises and less stable, less safe military engagements and interactions. Even overall assessments of reliability are difficult in the face of adversarial attacks.

We might hope that adversarial attacks would be relatively rare in the everyday world, since “random noise” that targets image classification algorithms is actually far from random: The tape on the stop sign must be carefully placed, the pixel-level perturbations added to the image must be carefully calculated, and so on. Significant effort is required to construct an adversarial attack, and so we might simply deploy our AI and robotic systems with the hope that the everyday world will not conspire to deceive them.

Unfortunately, this confidence is almost certainly unwarranted for defense or security technologies. These systems will invariably be deployed in contexts where the other side has the time, energy, and ability to develop and construct exactly these types of adversarial attacks. AI and robotic technologies are particularly appealing for deployment in enemy-controlled or enemy-contested areas since those environments are riskiest for our human soldiers, in large part because the other side has the most control over the environment.

Defenses against adversarial attacks

Although adversarial attacks on defense and military AIs and robots are likely, they are not necessarily destabilizing, particularly since humans are typically unaffected by these attacks. We can easily recognize that a turtle is not a rifle even with random noise, we view tape on a stop sign as an annoyance rather than something that disrupts our ability to follow the rules of the road, and so on. Of course, there are complexities, but we can safely say that human performance is strongly robust to adversarial attacks against AIs. Adversarial attacks will thus not be destabilizing if we follow a straightforward policy recommendation: Keep humans in (or on) the loop for these technologies. If there is human-AI teaming, then people can (hopefully!) recognize that an adversarial attack has occurred, and guide the system to appropriate behaviors. [READ MORE]