Predicting Gene Expression With AI

data analytics, data science, DNA sequence, gene expression, Machine Learning, Predictive Analytics
1500 Views

3 years ago
Predicting Gene Expression With AI

Originally published in DeepMind, Oct 4, 2021.

Based on Transformers, our new Enformer architecture advances genetic research by improving the ability to predict how DNA sequence influences gene expression.

When the Human Genome Project succeeded in mapping the DNA sequence of the human genome, the international research community were excited by the opportunity to better understand the genetic instructions that influence human health and development. DNA carries the genetic information that determines everything from eye colour to susceptibility to certain diseases and disorders. The roughly 20,000 sections of DNA in the human body known as genes contain instructions about the amino acid sequence of proteins, which perform numerous essential functions in our cells. Yet these genes make up less than 2% of the genome. The remaining base pairs — which account for 98% of the 3 billion “letters” in the genome — are called “non-coding” and contain less well-understood instructions about when and where genes should be produced or expressed in the human body. At DeepMind, we believe that AI can unlock a deeper understanding of such complex domains, accelerating scientific progress and offering potential benefits to human health.

Today Nature Methods published “Effective gene expression prediction from sequence by integrating long-range interactions” (first shared as a preprint on bioRxiv), in which we — in collaboration with our Alphabet colleagues at Calico — introduce a neural network architecture called Enformer that led to greatly increased accuracy in predicting gene expression from DNA sequence. To advance further study of gene regulation and causal factors in diseases, we also made our model and its initial predictions of common genetic variants openly available here.

Previous work on gene expression has typically used convolutional neural networks as fundamental building blocks, but their limitations in modelling the influence of distal enhancers on gene expression have hindered their accuracy and application. Our initial explorations relied on Basenji2, which could predict regulatory activity from relatively long DNA sequences of 40,000 base pairs. Motivated by this work and the knowledge that regulatory DNA elements can influence expression at greater distances, we saw the need for a fundamental architectural change to capture long sequences.

To continue reading this article, click here.

EXCLUSIVE HIGHLIGHTS

Related

3 years ago
Predicting Gene Expression With AI

Originally published in DeepMind, Oct 4, 2021.

Leave a Reply Cancel reply

Login

Industry News

Connect with Us

Subscription

ADVERTISEMENTS

Produced By:

Archives

The Machine Learning Times © 2020 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact

EXCLUSIVE HIGHLIGHTS

Related

3 years agoPredicting Gene Expression With AI

Originally published in DeepMind, Oct 4, 2021.

Recommended

This new forecasting model is better than machine learning, researchers say

Widespread machine learning methods behind ‘link prediction’ are performing very poorly, study shows

AI’s $600B Question

Google scrambles to manually remove weird AI answers in search

Leave a Reply Cancel reply

Login

Industry News

Connect with Us

Subscription

ADVERTISEMENTS

Produced By:

Archives

The Machine Learning Times © 2020 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190 Produced by: Rising Media & Prediction Impact

3 years ago
Predicting Gene Expression With AI

The Machine Learning Times © 2020 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact