Machine Learning Times
Machine Learning Times
EXCLUSIVE HIGHLIGHTS
The Great AI Myth: These 3 Misconceptions Fuel It
 Originally published in Forbes, July 29, 2024. The hottest thing...
How to Sell a Machine Learning Project
 Originally published in Built In, February 6, 2024. Never...
The 3 Things You Need To Know About Predictive AI
 Originally published in Forbes, June 29, 2024. Some problems are...
Alphabet Uses AI To Rush First Responders To Disasters—Takeaways For Businesses
 Originally published in Forbes, July 7, 2024. The National Guard...
SHARE THIS:

3 years ago
Do Wide and Deep Networks Learn the Same Things?

 
Originally published in Google AI Blog, May 4, 2021.

A common practice to improve a neural network’s performance and tailor it to available computational resources is to adjust the architecture depth and width. Indeed, popular families of neural networks,  including  EfficientNetResNet and Transformers, consist of a set of architectures of flexible depths and widths. However, beyond the effect on accuracy, there is limited understanding of how these fundamental choices of architecture design affect the model, such as the impact on its internal representations.

In “Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth”, we perform a systematic study of the similarity between wide and deep networks from the same architectural family through the lens of their hidden representations and final outputs. In very wide or very deep models, we find a characteristic block structure in their internal representations, and establish a connection between this phenomenon and model overparameterization. Comparisons across models demonstrate that those without the block structure show significant similarity between representations in corresponding layers, but those containing the block structure exhibit highly dissimilar representations. These properties of the internal representations in turn translate to systematically different errors at the class and example levels for wide and deep models when they are evaluated on the same test set.

Comparing Representation Similarity with CKA

We extended prior work on analyzing representations by leveraging our previously developed Centered Kernel Alignment (CKA) technique, which provides a robust, scalable way to determine the similarity between the representations learned by any pair of neural network layers. CKA takes as input the representations (i.e., the activation matrices) from two layers, and outputs a similarity score between 0 (not at all similar) and 1 (identical representations).

To continue reading this article, click here.

Leave a Reply