Gradient Descent Models Are Kernel Machines (Deep Learning)

Analytics, artificial intelligence, Deep Learning, kernal function, Machine Learning
2815 Views

3 years ago
Gradient Descent Models Are Kernel Machines (Deep Learning)

By: Steven Hsu

Originally published in infoproc.blogspot.com, Feb 7, 2021.

This paper shows that models which result from gradient descent training (e.g., deep neural nets) can be expressed as a weighted sum of similarity functions (kernels) which measure the similarity of a given instance to the examples used in training. The kernels are defined by the inner product of model gradients in the parameter space, integrated over the descent (learning) path.

Roughly speaking, two data points x and x’ are similar, i.e., have large kernel function K(x,x’), if they have similar effects on the model parameters in the gradient descent. With respect to the learning algorithm, x and x’ have similar information content. The learned model y = f(x) matches x to similar data points x_i: the resulting value y is simply a weighted (linear) sum of kernel values K(x,x_i).

This result makes it very clear that without regularity imposed by the ground truth mechanism which generates the actual data (e.g., some natural process), a neural net is unlikely to perform well on an example which deviates strongly (as defined by the kernel) from all training examples. See note added at bottom for more on this point, re: AGI, etc. Given the complexity (e.g., dimensionality) of the ground truth model, one can place bounds on the amount of data required for successful training.

This formulation locates the nonlinearity of deep learning models in the kernel function. The superposition of kernels is entirely linear as long as the loss function is additive over training data.

To continue reading this article, click here.

EXCLUSIVE HIGHLIGHTS

Related

3 years ago
Gradient Descent Models Are Kernel Machines (Deep Learning)

Originally published in infoproc.blogspot.com, Feb 7, 2021.

One thought on “Gradient Descent Models Are Kernel Machines (Deep Learning)”

Leave a Reply Cancel reply

Login

Industry News

Connect with Us

Subscription

ADVERTISEMENTS

Produced By:

Archives

The Machine Learning Times © 2020 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact

EXCLUSIVE HIGHLIGHTS

Related

3 years agoGradient Descent Models Are Kernel Machines (Deep Learning)

Originally published in infoproc.blogspot.com, Feb 7, 2021.

Recommended

Large language models use a surprisingly simple mechanism to retrieve some stored knowledge

Apple researchers develop AI that can ‘see’ and understand screen context

A.I. Is Spying on the Food We Throw Away

This new forecasting model is better than machine learning, researchers say

One thought on “Gradient Descent Models Are Kernel Machines (Deep Learning)”

Leave a Reply Cancel reply

Login

Industry News

Connect with Us

Subscription

ADVERTISEMENTS

Produced By:

Archives

The Machine Learning Times © 2020 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190 Produced by: Rising Media & Prediction Impact

3 years ago
Gradient Descent Models Are Kernel Machines (Deep Learning)

The Machine Learning Times © 2020 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact