Mayhem, the Machine That Finds Software Vulnerabilities, Then Patches Them

The machine triumphed in DARPA’s Cyber Grand Challenge, where teams automated white-hat hacking

Back in 2011, when the venture capitalist Marc Andreessen said that “software is eating the world,” it was still a fresh idea. Now it’s obvious that software permeates our lives. From complex electronics like medical devices and autonomous vehicles to simple objects like Internet-connected lightbulbs and thermometers, we’re surrounded by software.

And that means we’re all more exposed to attacks on that software than ever before.

Every year, 111 billion lines are added to the mass of software code in existence, and every line presents a potential new target. Steve Morgan, founder and editor in chief at the research firm Cybersecurity Ventures, predicts that system break-ins made through a previously unknown weakness—what the industry calls “zero-day exploits”—will average one per day in the United States by 2021, up from one per week in 2015.

It was to solve this problem that my colleagues and I at Carnegie Mellon University (CMU), in Pittsburgh, spent nearly 10 years building technology that would make software safe, automatically. Then, in 2012, we founded ForAllSecure to bring our product to the world. The one thing we needed was a way to prove that we could do what we said we could do, and we got it in the form of a prize competition.

Fast-forward to 2016: My team is huddled in a hotel ballroom in Las Vegas, chewing our fingernails and fairly sure that we had just lost a competition we’d spent thousands of hours preparing for. It was the DARPA Cyber Grand Challenge (CGC), one of several such events—like the one for self-driving vehicles back in the early 2000s—staged by the U.S. Defense Advanced Research Projects Agency to stimulate technological breakthroughs for national security. The CGC grew out of DARPA’s recognition that the United States might one day find itself without the manpower or the tools to fend off cyberthreats.

The cybersecurity battleground is populated by hackers who are technically skilled and, at the highest levels, creative in exploiting weaknesses in software to penetrate an organization’s defenses. The criminals who do this for their own gain are commonly called black hats, and they often create tools that legions of amateur “script kiddies” can use to unleash havoc, like the IoT botnets that in 2016 launched a massive attack on the Internet after gaining control over minicams and video recorders in people’s homes. In contrast, “white hats” use their skills to thwart such attacks. But there simply aren’t enough white-hat hackers to protect all the software proliferating in the commercial world, let alone the common infrastructure and the military platforms vital to national and global security.

In 2014, DARPA announced the Cyber Grand Challenge as a two-year project with the goal of testing whether it was possible to develop AI systems that could find, verify, and patch software weaknesses. In 2015, some 100 teams entered the prequalification stage. In 2016, the top seven advanced to the grand championship finale, where they’d need to enter a full cyber-reasoning system—one that would not merely notice a problem but could also infer its nature. The champion would win US $2 million, and the second- and third-place finishers would get $1 million and $750,000, respectively. [READ MORE]