Machine Learning Times
Machine Learning Times
EXCLUSIVE HIGHLIGHTS
Three Best Practices for Unilever’s Global Analytics Initiatives
    This article from Morgan Vawter, Global Vice...
Getting Machine Learning Projects from Idea to Execution
 Originally published in Harvard Business Review Machine learning might...
Eric Siegel on Bloomberg Businessweek
  Listen to Eric Siegel, former Columbia University Professor,...
Effective Machine Learning Needs Leadership — Not AI Hype
 Originally published in BigThink, Feb 12, 2024.  Excerpted from The...
SHARE THIS:

6 years ago
How Adversarial Attacks Work

 

Originally published in ycombinator.com

Recent studies by Google Brain have shown that any machine learning classifier can be tricked to give incorrect predictions, and with a little bit of skill, you can get them to give pretty much any result you want.

This fact steadily becomes worrisome as more and more systems are powered by artificial intelligence — and many of them are crucial for our safe and comfortable life. Banks, surveillance systems, ATMs, face recognition on your laptop — and very very soon, self-driving cars. Lately, safety concerns about AI were revolving around ethics — today we are going to talk about more pressuring and real issues.

What is an Adversarial Attack?

Machine learning algorithms accept inputs as numeric vectors. Designing an input in a specific way to get the wrong result from the model is called an adversarial attack.

How is this possible? No machine learning algorithm is perfect and they make mistakes — albeit very rarely. However, machine learning models consist of a series of specific transformations, and most of these transformations turn out to be very sensitive to slight changes in input. Harnessing this sensitivity and exploiting it to modify an algorithm’s behavior is an important problem in AI security.

In this article we will show practical examples of the main types of attacks, explain why is it so easy to perform them, and discuss the security implications that stem from this technology.

Types of Adversarial Attacks

Here are the main types of hacks we will focus on:

  1. Non-targeted adversarial attack:the most general type of attack when all you want to do is to make the classifier give an incorrect result.
  2. Targeted adversarial attack:a slightly more difficult attack which aims to receive a particular class for your input.

Click here to continue this article.

About the Authors

Emil Mikhailov is the founder of XIX.ai (YC W17). Roman Trusov is a researcher at XIX.ai.

Leave a Reply