Machine Learning Times
EXCLUSIVE HIGHLIGHTS
Hybrid AI: Industry Event Signals Emerging Hot Trend
 Originally published in Forbes After decades chairing and keynoting myriad...
Predictive AI Thrives, Despite GenAI Stealing The Spotlight
 Originally published in Forbes Generative AI and predictive AI ought...
For Managing Business Uncertainty, Predictive AI Eclipses GenAI
  Originally published in Forbes The future is the ultimate...
AI Business Value Is Not an Oxymoron: How Predictive AI Delivers Real ROI for Enterprises
  Originally published in AI Realized Now “Shouldn’t a great...
SHARE THIS:

5 years ago
Do Wide and Deep Networks Learn the Same Things?

 
Originally published in Google AI Blog, May 4, 2021.

A common practice to improve a neural network’s performance and tailor it to available computational resources is to adjust the architecture depth and width. Indeed, popular families of neural networks,  including  EfficientNetResNet and Transformers, consist of a set of architectures of flexible depths and widths. However, beyond the effect on accuracy, there is limited understanding of how these fundamental choices of architecture design affect the model, such as the impact on its internal representations.

In “Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth”, we perform a systematic study of the similarity between wide and deep networks from the same architectural family through the lens of their hidden representations and final outputs. In very wide or very deep models, we find a characteristic block structure in their internal representations, and establish a connection between this phenomenon and model overparameterization. Comparisons across models demonstrate that those without the block structure show significant similarity between representations in corresponding layers, but those containing the block structure exhibit highly dissimilar representations. These properties of the internal representations in turn translate to systematically different errors at the class and example levels for wide and deep models when they are evaluated on the same test set.

Comparing Representation Similarity with CKA

We extended prior work on analyzing representations by leveraging our previously developed Centered Kernel Alignment (CKA) technique, which provides a robust, scalable way to determine the similarity between the representations learned by any pair of neural network layers. CKA takes as input the representations (i.e., the activation matrices) from two layers, and outputs a similarity score between 0 (not at all similar) and 1 (identical representations).

To continue reading this article, click here.

Comments are closed.