ESMFold: Protein Language Models Solve Folding, Too
Today, Meta AI Protein Team announced ESMFold - a protein folding model that uses representations right from a protein LM. Meta AI has been working on BERT-style protein language models for a while, e.g., they created a family of ESM models that are currently SOTA in masked protein sequence prediction tasks.
“A key difference between ESMFold and AlphaFold2 is the use of language model representations to remove the need for explicit homologous sequences (in the form of an MSA) as input.”
To this end, the authors design a new family of protein LMs ESM-2. ESM-2 are much more parameter efficient compared to ESM-1b, e.g., 150M ESM-2 is on par with 650M ESM-1b, and 15B ESM-2 leaves all ESM-1 models far behind. Having pre-trained an LM, ESMFold applies Folding Trunk blocks (simplified EvoFormer blocks from AlphaFold 2) and yields 3D predictions.
ESMFold outperforms AlphaFold and RoseTTAFold when only given a single-sequence input w/o MSAs and also much faster! Check out the attached illustration with architecture and charts.
“On a single NVIDIA V100 GPU, ESMFold makes a prediction on a protein with 384 residues in 14.2 seconds, 6X faster than a single AlphaFold2 model. On shorter sequences we see a ~60X improvement. … ESMFold can be run reasonably quickly on CPU, and an Apple M1 Macbook Pro makes the same prediction in just over 5 minutes.”
Finally, ESMFold shows remarkable scaling properties:
“We see non-linear improvements in protein structure predictions as a function of model
scale, and observe a strong link between how well the language model understands a sequence (as measured by perplexity) and the structure prediction that emerges.”
Are you already converted to the church of Scale Is All You Need - AGI Is Coming? 😉
Today, Meta AI Protein Team announced ESMFold - a protein folding model that uses representations right from a protein LM. Meta AI has been working on BERT-style protein language models for a while, e.g., they created a family of ESM models that are currently SOTA in masked protein sequence prediction tasks.
“A key difference between ESMFold and AlphaFold2 is the use of language model representations to remove the need for explicit homologous sequences (in the form of an MSA) as input.”
To this end, the authors design a new family of protein LMs ESM-2. ESM-2 are much more parameter efficient compared to ESM-1b, e.g., 150M ESM-2 is on par with 650M ESM-1b, and 15B ESM-2 leaves all ESM-1 models far behind. Having pre-trained an LM, ESMFold applies Folding Trunk blocks (simplified EvoFormer blocks from AlphaFold 2) and yields 3D predictions.
ESMFold outperforms AlphaFold and RoseTTAFold when only given a single-sequence input w/o MSAs and also much faster! Check out the attached illustration with architecture and charts.
“On a single NVIDIA V100 GPU, ESMFold makes a prediction on a protein with 384 residues in 14.2 seconds, 6X faster than a single AlphaFold2 model. On shorter sequences we see a ~60X improvement. … ESMFold can be run reasonably quickly on CPU, and an Apple M1 Macbook Pro makes the same prediction in just over 5 minutes.”
Finally, ESMFold shows remarkable scaling properties:
“We see non-linear improvements in protein structure predictions as a function of model
scale, and observe a strong link between how well the language model understands a sequence (as measured by perplexity) and the structure prediction that emerges.”
Are you already converted to the church of Scale Is All You Need - AGI Is Coming? 😉
Upcoming Graph Workshops
If you are finishing a project and would like to probe your work and get the first round of reviews, consider submitting to recently announced workshops:
- Federated Learning with Graph Data (FedGraph) @ CIKM 2022 - deadline August 15
- Trustworthy Learning on Graphs (TrustLOG) @ CIKM 2022 - deadline September 2
- New Frontiers in Graph Learning (GLFrontiers) @ NeurIPS 2022 - deadline September 15
- Symmetry and Geometry in Neural Representations (NeurReps) @ NeurIPS 2022 - deadline September 22
If you are finishing a project and would like to probe your work and get the first round of reviews, consider submitting to recently announced workshops:
- Federated Learning with Graph Data (FedGraph) @ CIKM 2022 - deadline August 15
- Trustworthy Learning on Graphs (TrustLOG) @ CIKM 2022 - deadline September 2
- New Frontiers in Graph Learning (GLFrontiers) @ NeurIPS 2022 - deadline September 15
- Symmetry and Geometry in Neural Representations (NeurReps) @ NeurIPS 2022 - deadline September 22
Google
FedGraph2022
Graph Machine Learning @ ICML 2022
In case you missed all the ICML’22 fun, we prepared a comprehensive overview of graph papers published at the conference: 35+ papers in 10 categories:
- Generation: Denoising Diffusion Is All You Need
- Graph Transformers
- Theory and Expressive GNNs
- Spectral GNNs
- Explainable GNNs
- Graph Augmentation: Beyond Edge Dropout
- Algorithmic Reasoning and Graph Algorithms
- Knowledge Graph Reasoning
- Computational Biology: Molecular Linking, Protein Binding, Property Prediction
- Cool Graph Applications
In case you missed all the ICML’22 fun, we prepared a comprehensive overview of graph papers published at the conference: 35+ papers in 10 categories:
- Generation: Denoising Diffusion Is All You Need
- Graph Transformers
- Theory and Expressive GNNs
- Spectral GNNs
- Explainable GNNs
- Graph Augmentation: Beyond Edge Dropout
- Algorithmic Reasoning and Graph Algorithms
- Knowledge Graph Reasoning
- Computational Biology: Molecular Linking, Protein Binding, Property Prediction
- Cool Graph Applications
Medium
Graph Machine Learning @ ICML 2022
Recent advancements and hot trends, July 2022 edition
Towards Geometric Deep Learning IV: Chemical Precursors of GNNs
In the final post of the series, Michael Bronstein covers the role of chemistry and computational chemistry in developing mathematical concept that were further used in creating GNNs. For instance, the problem of patent offices when registering a new drug required a way to compare a new molecule with those in the existing database - starting from strings continuing with molecular fingerprints and finally arriving to the WL-test and its modern variants.
In the final post of the series, Michael Bronstein covers the role of chemistry and computational chemistry in developing mathematical concept that were further used in creating GNNs. For instance, the problem of patent offices when registering a new drug required a way to compare a new molecule with those in the existing database - starting from strings continuing with molecular fingerprints and finally arriving to the WL-test and its modern variants.
Medium
Towards Geometric Deep Learning IV: Chemical Precursors of GNNs
In the last post in the “Towards Geometric Deep Learning” series, we look at early prototypes of GNNs in the field of chemistry.
Geometric Deep Learning Course: 2022 Update
The go-to GDL course by Michael M. Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković has just been updated! New materials are in the introduction, in the graph transformers section, more about category theory (don’t forget your vegetables 🥦), differential geometry, and topolgy, as well as a new set of invited speakers covering recent hot topics from subgraph GNNs to AlphaFold 2.
The go-to GDL course by Michael M. Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković has just been updated! New materials are in the introduction, in the graph transformers section, more about category theory (don’t forget your vegetables 🥦), differential geometry, and topolgy, as well as a new set of invited speakers covering recent hot topics from subgraph GNNs to AlphaFold 2.
Geometricdeeplearning
GDL Course
Grids, Groups, Graphs, Geodesics, and Gauges
Geometric DL News: 200M proteins in AlphaFold DB, Euclidean nets, Italian GDL Summer School, Diffusers
This week brought us a bunch of news and new materials:
- DeepMind announced expanding the AlphaFold DB to 200 million protein structures. Celebrating 1Y anniversary since the release of groundbreaking AlphaFold 2, DeepMind mentions a huge success of the system among scientists all over the world - more than 500.000 researchers from 190 countries have accesses AlphaFold predictions - and sketches further plans to apply the outcomes in other areas such as drug discovery, fusion, and climate change
- Mario Geiger (MIT) and Tess Smidt (MIT) released an updated version of the writeup on e3nn - the most popular Python library to build Euclidean Neural Networks, a basis for many new cool works like Steerable GNNs and SE(3)-Transformers. The writeup includes simple intuitions behind spherical harmonics, tensor product, irreducible representations, and other key building blocks - if you work on equivariant architectures, you probably do that with e3nn 😉
- 🇮🇹 First Italian School on Geometric Deep Learning releases all slides and Colab Notebooks on equivariance, topology, differential geometry and other topics covered by top speakers including Michael Bronstein, Cristian Bodnar, Maurice Weiler, Pim de Haan, and Francesco Di Giovanni.
- Following the hottest 2022 trend, HuggingFace 🤗 aims to tame the wilds of diffusion models and releases Diffusers 🧨, a single library to build and train diffusion models of all modalities - image generation, text generation, and, of course, graph generation! The PR with GeoDiff, a SOTA molecule generation model from ICLR 2022, is already prepared 🚀
This week brought us a bunch of news and new materials:
- DeepMind announced expanding the AlphaFold DB to 200 million protein structures. Celebrating 1Y anniversary since the release of groundbreaking AlphaFold 2, DeepMind mentions a huge success of the system among scientists all over the world - more than 500.000 researchers from 190 countries have accesses AlphaFold predictions - and sketches further plans to apply the outcomes in other areas such as drug discovery, fusion, and climate change
- Mario Geiger (MIT) and Tess Smidt (MIT) released an updated version of the writeup on e3nn - the most popular Python library to build Euclidean Neural Networks, a basis for many new cool works like Steerable GNNs and SE(3)-Transformers. The writeup includes simple intuitions behind spherical harmonics, tensor product, irreducible representations, and other key building blocks - if you work on equivariant architectures, you probably do that with e3nn 😉
- 🇮🇹 First Italian School on Geometric Deep Learning releases all slides and Colab Notebooks on equivariance, topology, differential geometry and other topics covered by top speakers including Michael Bronstein, Cristian Bodnar, Maurice Weiler, Pim de Haan, and Francesco Di Giovanni.
- Following the hottest 2022 trend, HuggingFace 🤗 aims to tame the wilds of diffusion models and releases Diffusers 🧨, a single library to build and train diffusion models of all modalities - image generation, text generation, and, of course, graph generation! The PR with GeoDiff, a SOTA molecule generation model from ICLR 2022, is already prepared 🚀
Google DeepMind
AlphaFold reveals the structure of the protein universe
Today, in partnership with EMBL’s European Bioinformatics Institute (EMBL-EBI), we’re now releasing predicted structures for nearly all catalogued proteins known to science, which will expand the...
Sampling from Large Heterogeneous Graphs with TF-GNN
In this new blogpost, Brandon Mayer and Bryan Perozzi go into details on how to organize scalable neighborhood sampling over large heterogeneous graphs (of many node types and edge types) using the example of OGB MAG dataset (2M nodes, 20M edges). Sampling can be defined using Apache Beam configs and can fetch data right from the Google Cloud Platform through the Dataflow Engine.
Recently, we covered the release of TensorFlow-GNN (TF-GNN), a new framework by Google to train GNNs on very large graphs that often do not fit into main memory. Today’s post is a more hands-on tutorial with particular code examples you could try yourself 🛠️.
In this new blogpost, Brandon Mayer and Bryan Perozzi go into details on how to organize scalable neighborhood sampling over large heterogeneous graphs (of many node types and edge types) using the example of OGB MAG dataset (2M nodes, 20M edges). Sampling can be defined using Apache Beam configs and can fetch data right from the Google Cloud Platform through the Dataflow Engine.
Recently, we covered the release of TensorFlow-GNN (TF-GNN), a new framework by Google to train GNNs on very large graphs that often do not fit into main memory. Today’s post is a more hands-on tutorial with particular code examples you could try yourself 🛠️.
Google Cloud Blog
Scalable Heterogeneous Graph Sampling with GCP and Dataflow For Graph Neural Networks. | Google Cloud Blog
KDD 2022
KDD 2022, one of the premier Graph & Data Mining venues, will take place in Washington DC in two weeks (Aug 14-18). As always, the published program of Research Track papers and Applied Data Science Track papers is full of graph papers so check them out.
Furthermore, there will be a rich selection of workshops:
- International Workshop on Mining and Learning with Graphs (MLG) (co-located with DLG)
- Deep Learning on Graphs: Methods and Applications (DLG-KDD’22) (co-located with MLG)
- International Workshop on Knowledge Graphs: Open Knowledge Network
- International Workshop on Data Mining in Bioinformatics (BIOKDD 2022)
And even more tutorials:
- Trustworthy Graph Learning: Reliability, Explainability, and Privacy Protection (Tencent AI)
- Graph-based Representation Learning for Web-scale Recommender Systems (Twitter)
- Algorithmic Fairness on Graphs: Methods and Trends (U. Illinois at Urbana-Champaign)
- Toward Graph Minimally-Supervised Learning (Arizona State University)
- Accelerated GNN training with DGL and RAPIDS cuGraph in a Fraud Detection Workflow (NVIDIA)
- Graph Neural Networks: Foundation, Frontiers and Applications
- Temporal Graph Learning for Financial World: Algorithms, Scalability, Explainability & Fairness (MasterCard)
- Efficient Machine Learning on Large-Scale Graphs (TigerGraph)
- Frontiers of Graph Neural Networks with DIG (Texas A&M University)
- Graph Neural Networks in Life Sciences: Opportunities and Solutions (Amazon)
KDD 2022, one of the premier Graph & Data Mining venues, will take place in Washington DC in two weeks (Aug 14-18). As always, the published program of Research Track papers and Applied Data Science Track papers is full of graph papers so check them out.
Furthermore, there will be a rich selection of workshops:
- International Workshop on Mining and Learning with Graphs (MLG) (co-located with DLG)
- Deep Learning on Graphs: Methods and Applications (DLG-KDD’22) (co-located with MLG)
- International Workshop on Knowledge Graphs: Open Knowledge Network
- International Workshop on Data Mining in Bioinformatics (BIOKDD 2022)
And even more tutorials:
- Trustworthy Graph Learning: Reliability, Explainability, and Privacy Protection (Tencent AI)
- Graph-based Representation Learning for Web-scale Recommender Systems (Twitter)
- Algorithmic Fairness on Graphs: Methods and Trends (U. Illinois at Urbana-Champaign)
- Toward Graph Minimally-Supervised Learning (Arizona State University)
- Accelerated GNN training with DGL and RAPIDS cuGraph in a Fraud Detection Workflow (NVIDIA)
- Graph Neural Networks: Foundation, Frontiers and Applications
- Temporal Graph Learning for Financial World: Algorithms, Scalability, Explainability & Fairness (MasterCard)
- Efficient Machine Learning on Large-Scale Graphs (TigerGraph)
- Frontiers of Graph Neural Networks with DIG (Texas A&M University)
- Graph Neural Networks in Life Sciences: Opportunities and Solutions (Amazon)
www.mlgworkshop.org
MLG 2022 - 17th International Workshop on Mining and Learning with Graphs
MLG 2022, 17th International Workshop on Mining and Learning with Graphs, co-located with KDD 2022, Washington, DC, USA
New Software and Library Updates
August is a notoriously quiet month without big news, but there is something new in the graph software:
- Uni-Fold - a re-implemented AlphaFold and AlphaFold-Multimer in PyTorch. The authors emphasize this is the first open-source repo for training AlphaFold-Multimer and their AlphaFold implementation can be trained 2x faster than the original.
- PyKEEN 1.9 features new tools for adding textual representations to KG embedding models as well as adds significant speedups of NodePiece on large graphs (5M nodes / 30M edges in 10 minutes on a laptop) thanks to the METIS partitioning algorithm and GPU-accelerated BFS.
- GRAPE - a Rust/Python library for graph processing and embedding with many compbio datasets integrated.
August is a notoriously quiet month without big news, but there is something new in the graph software:
- Uni-Fold - a re-implemented AlphaFold and AlphaFold-Multimer in PyTorch. The authors emphasize this is the first open-source repo for training AlphaFold-Multimer and their AlphaFold implementation can be trained 2x faster than the original.
- PyKEEN 1.9 features new tools for adding textual representations to KG embedding models as well as adds significant speedups of NodePiece on large graphs (5M nodes / 30M edges in 10 minutes on a laptop) thanks to the METIS partitioning algorithm and GPU-accelerated BFS.
- GRAPE - a Rust/Python library for graph processing and embedding with many compbio datasets integrated.
GitHub
GitHub - dptech-corp/Uni-Fold: An open-source platform for developing protein models beyond AlphaFold.
An open-source platform for developing protein models beyond AlphaFold. - dptech-corp/Uni-Fold
Recordings from the Italian School on Geometric DL and Graph ML for Visual Computing @ CVPR 2022
- The full playlist of 14 lectures from the recent Italian School on Geometric Deep Learning is now available on YouTube featuring 10 long talks and 4 introductory lectures on Group Theory, Manifolds, Topology, and Category Theory (don’t forget that Category Theory is your veggies 🥦 that you should take regularly). Slides and Colab notebooks are already available on the website
- All videos from the CVPR workshop on graphs in visual computing are now available covering graph-based approaches for video understanding, 3D vision, and scene understanding.
- The full playlist of 14 lectures from the recent Italian School on Geometric Deep Learning is now available on YouTube featuring 10 long talks and 4 introductory lectures on Group Theory, Manifolds, Topology, and Category Theory (don’t forget that Category Theory is your veggies 🥦 that you should take regularly). Slides and Colab notebooks are already available on the website
- All videos from the CVPR workshop on graphs in visual computing are now available covering graph-based approaches for video understanding, 3D vision, and scene understanding.
YouTube
First Italian School on Geometric Deep Learning - Pescara 2022
Video recording of the First Italian School on Geometric Deep Learning held in Pescara in July 2022. Lecturers: Cristian Bodnar (Cambridge) • Michael Bronste...
Graphcore IPUs for GNNs are freely available on Paperspace
IPUs (Intelligence Processing Unit) by UK-based Graphcore is a new type of hardware (chips and servers) tailored for AI compute - including optimized sparse matrix multiplications. Sparse operations are the main building block of GNNs but are still one of the slowest operations on GPUs (tailored for dense matrix multiplications).
The ImageNet moment in 2012 happened thanks to the hardware lottery as well - when we found that GPUs are dramatically better than CPUs in training deep nets. IPUs can well be the winning hardware lottery ticket for GNNs!
In the recent blog post, Michael Bronstein, Emanuele Rossi, and Daniel Justus hinted upon spectacular performance gains when training Temporal Graph Nets (TGN): 3-11x faster on a single IPU chip compared to A100. IPUs also deliver great general performance in MLPerf, the biggest go-to benchmark of efficient training large vision and language models.
Today, you can try running the code for free on IPU-POD16 that has four IPU chips thanks to the partnership between Paperspace and Graphcore. In addition to standard BERT, RoBERTa and ViTs, Graphcore prepared modules with Cluster-GCN, TGN, and SchNet (a popular baseline for molecular dynamics).
You can run most of PyTorch / TensorFlow code, and IPUs should natively support XLA, so it’s a good time to catch up with JAX and its GNN libraries like Jraph 😉
IPUs (Intelligence Processing Unit) by UK-based Graphcore is a new type of hardware (chips and servers) tailored for AI compute - including optimized sparse matrix multiplications. Sparse operations are the main building block of GNNs but are still one of the slowest operations on GPUs (tailored for dense matrix multiplications).
The ImageNet moment in 2012 happened thanks to the hardware lottery as well - when we found that GPUs are dramatically better than CPUs in training deep nets. IPUs can well be the winning hardware lottery ticket for GNNs!
In the recent blog post, Michael Bronstein, Emanuele Rossi, and Daniel Justus hinted upon spectacular performance gains when training Temporal Graph Nets (TGN): 3-11x faster on a single IPU chip compared to A100. IPUs also deliver great general performance in MLPerf, the biggest go-to benchmark of efficient training large vision and language models.
Today, you can try running the code for free on IPU-POD16 that has four IPU chips thanks to the partnership between Paperspace and Graphcore. In addition to standard BERT, RoBERTa and ViTs, Graphcore prepared modules with Cluster-GCN, TGN, and SchNet (a popular baseline for molecular dynamics).
You can run most of PyTorch / TensorFlow code, and IPUs should natively support XLA, so it’s a good time to catch up with JAX and its GNN libraries like Jraph 😉
Medium
Accelerating and scaling Temporal Graph Networks on the Graphcore IPU
Is GPU the best hardware choice for GNNs? Together with Graphcore, we explore the advantages of the new IPU architecture for temporal GNNs.
Proteins, Galaxies, and Robotaxis: GraphML News August’22
August is a notoriously quiet month when it comes to research and news. As folks slowly come back from vacations, we see more and more interesting articles and releases:
🧬 Meta AI released the weights for 3B and 15B ESM-2 models - we recently covered how cool those models are and how you can predict 3D protein structure right from the frozen language model hidden states. Now you can try them on your own premise!
💫 Yesukhei Jagvaral from the Department of Physics at CMU wrote a wonderful post with cool graphics how the team uses GNNs to model scalar and vector quantities of real galaxies with graph GANs. Each galaxy is a node in the graph and has a set of physical features (like mass or tidal fields). Galaxies are connected through the radius nearest neighbors algorithm. The authors train generative models that yield good approximations of real physical properties agreeing with simulations.
🚕 Zoox, a robotaxi startup, employs GNNs to model road dynamics and improve estimations of what’s happening around the car. The post is a bit obscure about the prediction task but we can hypothesize it has to do with vehicle dynamics (like molecular dynamics, but for cars and pedestrians).
August is a notoriously quiet month when it comes to research and news. As folks slowly come back from vacations, we see more and more interesting articles and releases:
🧬 Meta AI released the weights for 3B and 15B ESM-2 models - we recently covered how cool those models are and how you can predict 3D protein structure right from the frozen language model hidden states. Now you can try them on your own premise!
💫 Yesukhei Jagvaral from the Department of Physics at CMU wrote a wonderful post with cool graphics how the team uses GNNs to model scalar and vector quantities of real galaxies with graph GANs. Each galaxy is a node in the graph and has a set of physical features (like mass or tidal fields). Galaxies are connected through the radius nearest neighbors algorithm. The authors train generative models that yield good approximations of real physical properties agreeing with simulations.
🚕 Zoox, a robotaxi startup, employs GNNs to model road dynamics and improve estimations of what’s happening around the car. The post is a bit obscure about the prediction task but we can hypothesize it has to do with vehicle dynamics (like molecular dynamics, but for cars and pedestrians).
Telegram
Graph Machine Learning
ESMFold: Protein Language Models Solve Folding, Too
Today, Meta AI Protein Team announced ESMFold - a protein folding model that uses representations right from a protein LM. Meta AI has been working on BERT-style protein language models for a while,…
Today, Meta AI Protein Team announced ESMFold - a protein folding model that uses representations right from a protein LM. Meta AI has been working on BERT-style protein language models for a while,…
Graph ML position at Trade Desk
An interesting position to apply at the Trade Desk. As a researcher in the AI Lab working on graph ML, you will be part of the mission to upgrade their TTD ML tech stack to be graph-ML based. You will also have the opportunities to R&D on cutting edge graph ML technologies and publish them in top conferences, or build innovative product PoC to shape our future product roadmaps. 1 day in the office a week. The tech hubs in London, Madrid & Munich, or in the US!
An interesting position to apply at the Trade Desk. As a researcher in the AI Lab working on graph ML, you will be part of the mission to upgrade their TTD ML tech stack to be graph-ML based. You will also have the opportunities to R&D on cutting edge graph ML technologies and publish them in top conferences, or build innovative product PoC to shape our future product roadmaps. 1 day in the office a week. The tech hubs in London, Madrid & Munich, or in the US!
Upcoming GraphML Venues: LoG and Stanford Graph Learning Workshop
September finally brings some fresh news and updates:
- The abstract deadline for the upcoming Learning of Graphs (LoG) conference is September, 9th AoE with two tracks: full papers and extended abstracts. LoG aims to be the premier venue for Graph ML research, so consider publishing there your best stuff.
- Stanford organizes the 2nd iteration of the Graph Learning Workshop on September 28th covering latest updates in PyG and cool industrial applications. In addition to Stanford speakers there will be invited talks from NVidia, Intel, Meta, Google, Spotify, and Kumo.ai.
A nice relaxing event after the ICLR deadline 🙂 We will be keeping an eye on interesting ICLR submissions as well.
September finally brings some fresh news and updates:
- The abstract deadline for the upcoming Learning of Graphs (LoG) conference is September, 9th AoE with two tracks: full papers and extended abstracts. LoG aims to be the premier venue for Graph ML research, so consider publishing there your best stuff.
- Stanford organizes the 2nd iteration of the Graph Learning Workshop on September 28th covering latest updates in PyG and cool industrial applications. In addition to Stanford speakers there will be invited talks from NVidia, Intel, Meta, Google, Spotify, and Kumo.ai.
A nice relaxing event after the ICLR deadline 🙂 We will be keeping an eye on interesting ICLR submissions as well.
Learning on Graphs Conference
👃 GNNs Learn To Smell & Awesome NeurReps
1) Back in 2019, Google AI started a project on learning representations of smells. From basic chemistry we know that aromaticity depends on the molecular structure, e.g., cyclic compounds. In fact, the whole group of ”aromatic hydrocarbons” was named aromatic because they actually has some smell (compared to many non-organic molecules). If we have a molecular structure, we can employ a GNN on top of it and learn some representations - that is a tl;dr of smell representation learning with GNNs.
Recently, Google AI released a new blogpost describing the next phase of the project - the Principal Odor Map that is able to group molecules in “odor clusters”. The authors conducted 3 cool experiments: classifying 400 new molecules never smelled before and comparison to the averaged rating of a group of human panelists; linking odor quality to fundamental biology; and probing aromatic molecules on their mosquito repelling qualities. The GNN-based model shows very good results - now we can finally claim that GNNs can smell! Looking forward for GNNs transforming the perfume industry 📈
2) The NeurReps commnuity (Symmetry and Geometry in Neural Representations) is curating the Awesome List of resources and research related to the geometry of representations in the brain, deep networks, and beyond. A great resource for Neuroscience and Geometric DL folks to learn about the adjacent field!
1) Back in 2019, Google AI started a project on learning representations of smells. From basic chemistry we know that aromaticity depends on the molecular structure, e.g., cyclic compounds. In fact, the whole group of ”aromatic hydrocarbons” was named aromatic because they actually has some smell (compared to many non-organic molecules). If we have a molecular structure, we can employ a GNN on top of it and learn some representations - that is a tl;dr of smell representation learning with GNNs.
Recently, Google AI released a new blogpost describing the next phase of the project - the Principal Odor Map that is able to group molecules in “odor clusters”. The authors conducted 3 cool experiments: classifying 400 new molecules never smelled before and comparison to the averaged rating of a group of human panelists; linking odor quality to fundamental biology; and probing aromatic molecules on their mosquito repelling qualities. The GNN-based model shows very good results - now we can finally claim that GNNs can smell! Looking forward for GNNs transforming the perfume industry 📈
2) The NeurReps commnuity (Symmetry and Geometry in Neural Representations) is curating the Awesome List of resources and research related to the geometry of representations in the brain, deep networks, and beyond. A great resource for Neuroscience and Geometric DL folks to learn about the adjacent field!
research.google
Digitizing Smell: Using Molecular Maps to Understand Odor
Posted by Richard C. Gerkin, Google Research, and Alexander B. Wiltschko, Google Did you ever try to measure a smell? …Until you can measure their ...
Workshop: Hot Topics in Graph Neural Networks
Uni Kassel and Fraunhofer IEE organize a GNN workshop on October 25th, the announced line-up of speakers includes Fabian Jogl (TU Wien), Massimo Perini (University of Edinburgh), Hannes Stärk (MIT), Maximilian Thiessen (TU Wien), Rakshit Trivedi (Harvard), and Petar Veličković (DeepMind). Quoting the chairs:
“Find out about our current projects and follow exciting talks about new advances in Graph Neural Networks by international speakers. The work of the GAIN group addresses dynamic GNN models, the expressivity of GNN models, and their application in the power grid. Among others, the speakers will enlighten us with their work on Algorithmically-aligned GNNs, the Improvement of Message-passing, and Geometric Machine Learning for Molecules.
The public part of the event will take place on the 25th of October 2022 from 10am to 6pm. The workshop will be held in a hybrid format, but we are happy if you could come in person! To make the workshop more interactive for everyone who cannot participate in person, we have built a virtual 2D world which you can join to network with other participants!”
Uni Kassel and Fraunhofer IEE organize a GNN workshop on October 25th, the announced line-up of speakers includes Fabian Jogl (TU Wien), Massimo Perini (University of Edinburgh), Hannes Stärk (MIT), Maximilian Thiessen (TU Wien), Rakshit Trivedi (Harvard), and Petar Veličković (DeepMind). Quoting the chairs:
“Find out about our current projects and follow exciting talks about new advances in Graph Neural Networks by international speakers. The work of the GAIN group addresses dynamic GNN models, the expressivity of GNN models, and their application in the power grid. Among others, the speakers will enlighten us with their work on Algorithmically-aligned GNNs, the Improvement of Message-passing, and Geometric Machine Learning for Molecules.
The public part of the event will take place on the 25th of October 2022 from 10am to 6pm. The workshop will be held in a hybrid format, but we are happy if you could come in person! To make the workshop more interactive for everyone who cannot participate in person, we have built a virtual 2D world which you can join to network with other participants!”
Upcoming NeurIPS’22 Workshops & Submission Deadlines
As NeurIPS’22 decisions are out, you might want to submit your work to some cool upcoming domain-specific graph workshops:
1. Temporal Graph Learning Workshop @ NeurIPS’22 organized by researchers from Mila and Oxford - deadline September 19th
2. New Frontiers in Graph Learning @ NeurIPS’22 organized by researchers from Stanford, Harvard, Yale, UCLA, Google Brain, and MIT - deadline September 22nd
3. Symmetry and Geometry in Neural Representations @ NeurIPS’22 - organized by researchers from UC Berkley, Institut Pasteur, ENS, UC Santa Barbara - deadline September 22nd
4. Workshop on Graph Learning for Industrial Applications @ NeurIPS’22 organized by JP Morgan, Capital One, Bank of America, Schonfeld, Mila, IBM, Pfizer, Oxford, and FINRA - deadline September 22nd
5. Critical Assessment of Molecular ML (NeurIPS’22 side-event) organized by ELLIS units in Cambridge and Linz - deadline October 18th
If you are at MICCAI in Singapore those days, don’t forget to attend the 4th Workshop on Graphs in biomedical Image Analysis (GRAIL) on September 18th organized by NVIDIA, TU Munich, and Oxford. There will be talks by Marinka Zitnik, Islem Rekik, Mark O’Donoghue, and Xavier Bresson.
As NeurIPS’22 decisions are out, you might want to submit your work to some cool upcoming domain-specific graph workshops:
1. Temporal Graph Learning Workshop @ NeurIPS’22 organized by researchers from Mila and Oxford - deadline September 19th
2. New Frontiers in Graph Learning @ NeurIPS’22 organized by researchers from Stanford, Harvard, Yale, UCLA, Google Brain, and MIT - deadline September 22nd
3. Symmetry and Geometry in Neural Representations @ NeurIPS’22 - organized by researchers from UC Berkley, Institut Pasteur, ENS, UC Santa Barbara - deadline September 22nd
4. Workshop on Graph Learning for Industrial Applications @ NeurIPS’22 organized by JP Morgan, Capital One, Bank of America, Schonfeld, Mila, IBM, Pfizer, Oxford, and FINRA - deadline September 22nd
5. Critical Assessment of Molecular ML (NeurIPS’22 side-event) organized by ELLIS units in Cambridge and Linz - deadline October 18th
If you are at MICCAI in Singapore those days, don’t forget to attend the 4th Workshop on Graphs in biomedical Image Analysis (GRAIL) on September 18th organized by NVIDIA, TU Munich, and Oxford. There will be talks by Marinka Zitnik, Islem Rekik, Mark O’Donoghue, and Xavier Bresson.
Google
TGL Workshop
Important Information
NeurIPS virtual site (to follow livestream and recordings) Workshop Date: Saturday, Dec. 3rd 2022; Room 399 Workshop Time Zone: New Orleans (GMT-5)
Contact Email: tglworkshop2022@gmail.com Twitter: https://twitter.com/tgl_workshop join…
NeurIPS virtual site (to follow livestream and recordings) Workshop Date: Saturday, Dec. 3rd 2022; Room 399 Workshop Time Zone: New Orleans (GMT-5)
Contact Email: tglworkshop2022@gmail.com Twitter: https://twitter.com/tgl_workshop join…
📚 Weekend Reading
This week brought quite a few interesting papers and resources - we encourage you to invest there some time:
Geometric multimodal representation learning by Yasha Ektefaie, George Dasoulas, Ayush Noori, Maha Farhat, and Marinka Zitnik. A survey of 100+ papers on graphs combined with other modalities and a framework of multi-modal approaches for natural sciences like physical interaction, molecular reasoning, and protein modeling.
Clifford Neural Layers for PDE Modeling by Johannes Brandstetter, Rianne van den Berg, Max Welling, Jayesh K. Gupta. If you thought you know all the basics from the Geometric Deep Learning Course - here is something more challenging. The authors introduce the ideas from Geometric Algebra into ML tasks, namely, Clifford Algebras that unify numbers, vectors, complex numbers, quaternions, and have additional primitives to incorporate plane and volume segments. The paper gives a great primer on the math and applications. You can also watch a very visual YouTube lecture on Geometric Algebras.
Categories for AI (Cats4AI) - an upcoming open course on Category Theory created by Andrew Dudzik, Bruno Gavranović, João Guilherme Araújo, Petar Veličković, and Pim de Haan. “This course is aimed towards machine learning researchers, but approachable to anyone with a basic understanding of linear algebra and differential calculus. The material is self-contained and all the necessary background will be introduced along the way.” Don’t forget your veggies 🥦
This week brought quite a few interesting papers and resources - we encourage you to invest there some time:
Geometric multimodal representation learning by Yasha Ektefaie, George Dasoulas, Ayush Noori, Maha Farhat, and Marinka Zitnik. A survey of 100+ papers on graphs combined with other modalities and a framework of multi-modal approaches for natural sciences like physical interaction, molecular reasoning, and protein modeling.
Clifford Neural Layers for PDE Modeling by Johannes Brandstetter, Rianne van den Berg, Max Welling, Jayesh K. Gupta. If you thought you know all the basics from the Geometric Deep Learning Course - here is something more challenging. The authors introduce the ideas from Geometric Algebra into ML tasks, namely, Clifford Algebras that unify numbers, vectors, complex numbers, quaternions, and have additional primitives to incorporate plane and volume segments. The paper gives a great primer on the math and applications. You can also watch a very visual YouTube lecture on Geometric Algebras.
Categories for AI (Cats4AI) - an upcoming open course on Category Theory created by Andrew Dudzik, Bruno Gavranović, João Guilherme Araújo, Petar Veličković, and Pim de Haan. “This course is aimed towards machine learning researchers, but approachable to anyone with a basic understanding of linear algebra and differential calculus. The material is self-contained and all the necessary background will be introduced along the way.” Don’t forget your veggies 🥦
YouTube
A Swift Introduction to Geometric Algebra
This video is an introduction to geometric algebra, a severely underrated mathematical language that can be used to describe almost all of physics. This video was made as a presentation for my lab that I work in. While I had the people there foremost in…
TorchProtein & PEER Protein Sequence Benchmark Release
MilaGraph released TorchProtein, a new version of TorchDrug powered with a suite of tools for protein sequence understanding. Quoting the authors:
“ TorchProtein encapsulates many complicated yet repetitive subroutines into functional modules, including widely-used datasets, flexible data processing operations, advanced encoding models, and diverse protein tasks.
With TorchProtein, we can rapidly prototype machine learning solutions to various protein applications within 20 lines of codes, and conduct ablation studies by substituting different parts of the solution with off-the-shelf modules. Furthermore, we can easily adapt these modules to our own needs, and make systematic analyses by comparing the new results to a benchmark provided in the library.”
Simultaneously, the authors present PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding, a new benchmark of 17 protein understanding tasks grouped into 5 categories (Function Prediction, Localization Prediction, Structure Prediction, Protein-Protein Interaction Prediction, Protein-Ligand Interaction Prediction) already available in TorchProtein. ProtBert and ESM-1b have been probed on PEER (and ESM-2 is expected to arrive as well).
MilaGraph released TorchProtein, a new version of TorchDrug powered with a suite of tools for protein sequence understanding. Quoting the authors:
“ TorchProtein encapsulates many complicated yet repetitive subroutines into functional modules, including widely-used datasets, flexible data processing operations, advanced encoding models, and diverse protein tasks.
With TorchProtein, we can rapidly prototype machine learning solutions to various protein applications within 20 lines of codes, and conduct ablation studies by substituting different parts of the solution with off-the-shelf modules. Furthermore, we can easily adapt these modules to our own needs, and make systematic analyses by comparing the new results to a benchmark provided in the library.”
Simultaneously, the authors present PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding, a new benchmark of 17 protein understanding tasks grouped into 5 categories (Function Prediction, Localization Prediction, Structure Prediction, Protein-Protein Interaction Prediction, Protein-Ligand Interaction Prediction) already available in TorchProtein. ProtBert and ESM-1b have been probed on PEER (and ESM-2 is expected to arrive as well).
GitHub
Release 0.2.0 Release · DeepGraphLearning/torchdrug
V0.2.0 is a major release with a new family member TorchProtein, a library for machine-learning-guided protein science. Aiming at simplifying the development of protein methods, TorchProtein encaps...
GraphML News: PyG + NVIDIA, Breakthrough Prize
🚀 PyG announced the release of pyg-lib, the result of collaboration with NVIDIA on speeding up most important PyG operations. It is a low-level GNN library that integrates cuGraph, cuDF, and CUTLASS that improve the speed of matrix multiplications and graph sampling (a common bottleneck when working on large graphs). The reported speedups are pretty astounding - up to x150 when sampling on a GPU. There will be more exciting news about PyG at the upcoming Stanford Graph Learning Workshop!
👏 Breakthrough Prize (renowned as the “Oscars of Science”) announced the winners in life sciences, maths, and physics - graph and geometry areas are well represented there!
- John Jumper (DeepMind) and Demis Hassabis (DeepMind) received the Life Sciences prize for AlphaFold
- Daniel A. Spielman (Yale University) received the Math prize for contributions to spectral graph theory, the Kadison-Singer problem, optimization, and coding theory
- Ronen Eldan (Weizmann Institute of Science and Microsoft Research) received the New Horizons in Mathematics Prize for advancing high-dimensional geometry and probability including the KLS conjecture
- Vera Traub (Uni Bonn PhD 2020) received the Maryam Mirzakhani New Frontiers Prize for advances in approximation results in classical combinatorial optimization problems, including the traveling salesman problem and network design.
🚀 PyG announced the release of pyg-lib, the result of collaboration with NVIDIA on speeding up most important PyG operations. It is a low-level GNN library that integrates cuGraph, cuDF, and CUTLASS that improve the speed of matrix multiplications and graph sampling (a common bottleneck when working on large graphs). The reported speedups are pretty astounding - up to x150 when sampling on a GPU. There will be more exciting news about PyG at the upcoming Stanford Graph Learning Workshop!
👏 Breakthrough Prize (renowned as the “Oscars of Science”) announced the winners in life sciences, maths, and physics - graph and geometry areas are well represented there!
- John Jumper (DeepMind) and Demis Hassabis (DeepMind) received the Life Sciences prize for AlphaFold
- Daniel A. Spielman (Yale University) received the Math prize for contributions to spectral graph theory, the Kadison-Singer problem, optimization, and coding theory
- Ronen Eldan (Weizmann Institute of Science and Microsoft Research) received the New Horizons in Mathematics Prize for advancing high-dimensional geometry and probability including the KLS conjecture
- Vera Traub (Uni Bonn PhD 2020) received the Maryam Mirzakhani New Frontiers Prize for advances in approximation results in classical combinatorial optimization problems, including the traveling salesman problem and network design.
ICLR 2023 Submissions
The list of submissions to the top AI venue is available on OpenReview (with full-text PDFs). There are 6000+ submissions this year (3x growth from 2000+ last year), we will be keeping an eye on cool Graph ML submissions and prepare an overview. Enjoy the weekend reading and checking if someone has scooped a project you’ve been working on last months/years 😉
The list of submissions to the top AI venue is available on OpenReview (with full-text PDFs). There are 6000+ submissions this year (3x growth from 2000+ last year), we will be keeping an eye on cool Graph ML submissions and prepare an overview. Enjoy the weekend reading and checking if someone has scooped a project you’ve been working on last months/years 😉
openreview.net
ICLR 2023 Conference
Welcome to the OpenReview homepage for ICLR 2023 Conference