Originally published in ycombinator.com
Recent studies by Google Brain have shown that any machine learning classifier can be tricked to give incorrect predictions, and with a little bit of skill, you can get them to give pretty much any result you want.
This fact steadily becomes worrisome as more and more systems are powered by artificial intelligence — and many of them are crucial for our safe and comfortable life. Banks, surveillance systems, ATMs, face recognition on your laptop — and very very soon, self-driving cars. Lately, safety concerns about AI were revolving around ethics — today we are going to talk about more pressuring and real issues.
What is an Adversarial Attack?
Machine learning algorithms accept inputs as numeric vectors. Designing an input in a specific way to get the wrong result from the model is called an adversarial attack.
How is this possible? No machine learning algorithm is perfect and they make mistakes — albeit very rarely. However, machine learning models consist of a series of specific transformations, and most of these transformations turn out to be very sensitive to slight changes in input. Harnessing this sensitivity and exploiting it to modify an algorithm’s behavior is an important problem in AI security.
In this article we will show practical examples of the main types of attacks, explain why is it so easy to perform them, and discuss the security implications that stem from this technology.
Types of Adversarial Attacks
Here are the main types of hacks we will focus on:
About the Authors