Machine Learning Times
Machine Learning Times
EXCLUSIVE HIGHLIGHTS
Effective Machine Learning Needs Leadership — Not AI Hype
 Originally published in BigThink, Feb 12, 2024.  Excerpted from The...
Today’s AI Won’t Radically Transform Society, But It’s Already Reshaping Business
 Originally published in Fast Company, Jan 5, 2024. Eric...
Calculating Customer Potential with Share of Wallet
 No question about it: We, as consumers have our...
A University Curriculum Supplement to Teach a Business Framework for ML Deployment
    In 2023, as a visiting analytics professor...
SHARE THIS:

6 years ago
How Adversarial Attacks Work

 

Originally published in ycombinator.com

Recent studies by Google Brain have shown that any machine learning classifier can be tricked to give incorrect predictions, and with a little bit of skill, you can get them to give pretty much any result you want.

This fact steadily becomes worrisome as more and more systems are powered by artificial intelligence — and many of them are crucial for our safe and comfortable life. Banks, surveillance systems, ATMs, face recognition on your laptop — and very very soon, self-driving cars. Lately, safety concerns about AI were revolving around ethics — today we are going to talk about more pressuring and real issues.

What is an Adversarial Attack?

Machine learning algorithms accept inputs as numeric vectors. Designing an input in a specific way to get the wrong result from the model is called an adversarial attack.

How is this possible? No machine learning algorithm is perfect and they make mistakes — albeit very rarely. However, machine learning models consist of a series of specific transformations, and most of these transformations turn out to be very sensitive to slight changes in input. Harnessing this sensitivity and exploiting it to modify an algorithm’s behavior is an important problem in AI security.

In this article we will show practical examples of the main types of attacks, explain why is it so easy to perform them, and discuss the security implications that stem from this technology.

Types of Adversarial Attacks

Here are the main types of hacks we will focus on:

  1. Non-targeted adversarial attack:the most general type of attack when all you want to do is to make the classifier give an incorrect result.
  2. Targeted adversarial attack:a slightly more difficult attack which aims to receive a particular class for your input.

Click here to continue this article.

About the Authors

Emil Mikhailov is the founder of XIX.ai (YC W17). Roman Trusov is a researcher at XIX.ai.

Leave a Reply