Machine Learning Times
EXCLUSIVE HIGHLIGHTS
Our Last Hope Before The AI Bubble Detonates: Taming LLMs
  Originally published in Forbes To know that we’re in...
The Agentic AI Hype Cycle Is Out Of Control — Yet Widely Normalized
  Originally published in Forbes I recently wrote about how...
Predictive AI Must Be Valuated – But Rarely Is. Here’s How To Do It
  Originally published in Forbes To be a business is...
Agentic AI Is The New Vaporware
  Originally published in Forbes The hype term “agentic AI”...
SHARE THIS:

8 years ago
How Adversarial Attacks Work

 

Originally published in ycombinator.com

Recent studies by Google Brain have shown that any machine learning classifier can be tricked to give incorrect predictions, and with a little bit of skill, you can get them to give pretty much any result you want.

This fact steadily becomes worrisome as more and more systems are powered by artificial intelligence — and many of them are crucial for our safe and comfortable life. Banks, surveillance systems, ATMs, face recognition on your laptop — and very very soon, self-driving cars. Lately, safety concerns about AI were revolving around ethics — today we are going to talk about more pressuring and real issues.

What is an Adversarial Attack?

Machine learning algorithms accept inputs as numeric vectors. Designing an input in a specific way to get the wrong result from the model is called an adversarial attack.

How is this possible? No machine learning algorithm is perfect and they make mistakes — albeit very rarely. However, machine learning models consist of a series of specific transformations, and most of these transformations turn out to be very sensitive to slight changes in input. Harnessing this sensitivity and exploiting it to modify an algorithm’s behavior is an important problem in AI security.

In this article we will show practical examples of the main types of attacks, explain why is it so easy to perform them, and discuss the security implications that stem from this technology.

Types of Adversarial Attacks

Here are the main types of hacks we will focus on:

  1. Non-targeted adversarial attack:the most general type of attack when all you want to do is to make the classifier give an incorrect result.
  2. Targeted adversarial attack:a slightly more difficult attack which aims to receive a particular class for your input.

Click here to continue this article.

About the Authors

Emil Mikhailov is the founder of XIX.ai (YC W17). Roman Trusov is a researcher at XIX.ai.

Leave a Reply