Friday News: LOG Accepted Papers, NeurIPS
The inaugural event of the Learning of Graphs (LOG) conference announced accepted papers, extended abstracts, and spotlights - the acceptance rate this year is pretty tough (<25%) but we have heard multiple times that the quality of reviews is on average higher than in other big conferences. Is it the impact of the $$$ rewards for the best reviewers?
Tech companies summarize their presence at NeurIPS’22 that starts next week: have a look at works from DeepMind, Amazon, Microsoft, and the GraphML team from Google.
A new blog post by Petar Veličković and Fabian Fuchs on universality of neural networks on sets and graphs - the authors identify a direct link between permutation-invariant DeepSets and permutation-invariant aggregations in GNNs like GIN. However, when it comes to multisets (such as nodes sending exactly the same message), PNA might be more expressive thanks to the link to the theoretical findings - given a set of n elements, that the width of the encoder should be at least n - recall that PNA postulates that it is necessary to have n aggregators. Nice read with references!
The inaugural event of the Learning of Graphs (LOG) conference announced accepted papers, extended abstracts, and spotlights - the acceptance rate this year is pretty tough (<25%) but we have heard multiple times that the quality of reviews is on average higher than in other big conferences. Is it the impact of the $$$ rewards for the best reviewers?
Tech companies summarize their presence at NeurIPS’22 that starts next week: have a look at works from DeepMind, Amazon, Microsoft, and the GraphML team from Google.
A new blog post by Petar Veličković and Fabian Fuchs on universality of neural networks on sets and graphs - the authors identify a direct link between permutation-invariant DeepSets and permutation-invariant aggregations in GNNs like GIN. However, when it comes to multisets (such as nodes sending exactly the same message), PNA might be more expressive thanks to the link to the theoretical findings - given a set of n elements, that the width of the encoder should be at least n - recall that PNA postulates that it is necessary to have n aggregators. Nice read with references!
OpenReview
LOG 2022 Conference
Welcome to the OpenReview homepage for LOG 2022 Conference
Denoising Diffusion Is All You Need in Graph ML? - Now on Medium
We just published the extended version of the posts on diffusion models on Medium with more spelled out intro and newly generated images by Stable Diffusion! A good option to spend the time if you are on the way to New Orleans and NeurIPS.
We just published the extended version of the posts on diffusion models on Medium with more spelled out intro and newly generated images by Stable Diffusion! A good option to spend the time if you are on the way to New Orleans and NeurIPS.
Medium
Denoising Diffusion Generative Models in Graph ML
Is Denoising Diffusion all you need?
GPS++ (OGB LSC’22 Winner) is Available on IPUs
GPS++, the model by Graphcore, Mila, and Valence Discovery that won the OGB Large-Scale Challenge 2022 in the PCQM4M v2 track (graph regression) is now publicly available on Paperspace with simple training and inference examples in Jupyter Notebooks. Actually, you can try it on powerful IPUs — custom chips and servers built by Graphcore for optimized sparse operations. Raw checkpoints are also available in the official Github repo.
GPS++, the model by Graphcore, Mila, and Valence Discovery that won the OGB Large-Scale Challenge 2022 in the PCQM4M v2 track (graph regression) is now publicly available on Paperspace with simple training and inference examples in Jupyter Notebooks. Actually, you can try it on powerful IPUs — custom chips and servers built by Graphcore for optimized sparse operations. Raw checkpoints are also available in the official Github repo.
Paperspace
Build and scale ML applications with a cloud platform focused on speed and simplicity.
Friday News: PyG 2.2 and Protein Diffusion Models
For those who are at the NeurIPS workboat 2022, Saturday and Sunday are days of workshops on graph learning, structural biology, physics, and material discovery. Apart from that,
The PyG team has finally released PyG 2.2.0, the first version to feature the super-optimized pyg-lib that speeds up GNNs and sampling on both CPUs and GPUs (sometimes up to 20x!). The 2.2 update also includes new FeatureStore and GraphStore with which you can set up communications with large databases and graphs too big to store in memory. Time to update your envs ⏰
Generate Biomedicines releases Chroma, an equivariant conditional diffusion model for generating proteins. The conditional part is particularly cool as we usually want to generate proteins with certain properties and functions - Chroma allows to impose functional and geometric constraints, and even use natural language queries like “Generate a protein with CHAD domain” thanks to a small GPT-Neo trained on protein captioning. The 80-pager paper is on the website, and you can have a look at the thread by Andrew Beam.
Simultaneously, the Baker Lab releases RoseTTa Fold Diffusion (RF Diffusion) packed with the similar functionality also allowing for text prompts like “Generate a protein that binds to X”. Check out the Twitter thread by Joseph Watson, the 1st author. The 70-pager preprint is available, so here is your casual weekend reading of two papers 🙂
For those who are at the NeurIPS workboat 2022, Saturday and Sunday are days of workshops on graph learning, structural biology, physics, and material discovery. Apart from that,
The PyG team has finally released PyG 2.2.0, the first version to feature the super-optimized pyg-lib that speeds up GNNs and sampling on both CPUs and GPUs (sometimes up to 20x!). The 2.2 update also includes new FeatureStore and GraphStore with which you can set up communications with large databases and graphs too big to store in memory. Time to update your envs ⏰
Generate Biomedicines releases Chroma, an equivariant conditional diffusion model for generating proteins. The conditional part is particularly cool as we usually want to generate proteins with certain properties and functions - Chroma allows to impose functional and geometric constraints, and even use natural language queries like “Generate a protein with CHAD domain” thanks to a small GPT-Neo trained on protein captioning. The 80-pager paper is on the website, and you can have a look at the thread by Andrew Beam.
Simultaneously, the Baker Lab releases RoseTTa Fold Diffusion (RF Diffusion) packed with the similar functionality also allowing for text prompts like “Generate a protein that binds to X”. Check out the Twitter thread by Joseph Watson, the 1st author. The 70-pager preprint is available, so here is your casual weekend reading of two papers 🙂
GitHub
Release PyG 2.2.0: Accelerations and Scalability · pyg-team/pytorch_geometric
We are excited to announce the release of PyG 2.2 🎉🎉🎉
Highlights
Breaking Changes
Deprecations
Features
Bugfixes
Full Changelog
PyG 2.2 is the culmination of work from 78 contributors who have wo...
Highlights
Breaking Changes
Deprecations
Features
Bugfixes
Full Changelog
PyG 2.2 is the culmination of work from 78 contributors who have wo...
Weisfeiler and Leman Go Relational
by Pablo Barcelo (PUC Chile & IMFD & CENIA Chile), Mikhail Galkin (Mila), Christopher Morris (RWTH Aachen), Miguel Romero Orth (Universidad Adolfo Ibáñez & CENIA Chile)
arxiv
Multi-relational graphs have been surprisingly neglected by the GNN theory community for quite a while. In our fresh LOG 2022 paper, we bridge this gap and propose Relational WL (RWL), an extension of the classical Weisfeiler-Leman test for multi-relational graphs (such as molecular graphs or knowledge graphs).
We prove several important theorems:
1) 1-RWL is strictly more powerful than 1-WL;
2) R-GCN and CompGCN, common multi-relational GNNs, are bounded by 1-RWL;
3) R-GCN and CompGCN are in fact equally expressive
Even more interesting finding is that the most expressive message functions should capture vector scaling, eg, multiplication or circular correlation. This result gives a solid foundation for GINE with multiplicative message function (one of the most popular GNN encoders for molecular graphs) and CompGCN with DistMult.
Based on the theory for homogeneous higher-order GNNs, we show there exist higher-order relational networks, k-RNs, that are more expressive than 1-RWL. Similarly to local k-GNNs, there exist approximations that reduce their computational complexity.
---
Now we have theoretical mechanisms to explain the expressiveness of relational GNNs! In the next post, we’ll check the other places Weisfeiler and Leman visited in 2022 and what are the results of their trips 🚠
by Pablo Barcelo (PUC Chile & IMFD & CENIA Chile), Mikhail Galkin (Mila), Christopher Morris (RWTH Aachen), Miguel Romero Orth (Universidad Adolfo Ibáñez & CENIA Chile)
arxiv
Multi-relational graphs have been surprisingly neglected by the GNN theory community for quite a while. In our fresh LOG 2022 paper, we bridge this gap and propose Relational WL (RWL), an extension of the classical Weisfeiler-Leman test for multi-relational graphs (such as molecular graphs or knowledge graphs).
We prove several important theorems:
1) 1-RWL is strictly more powerful than 1-WL;
2) R-GCN and CompGCN, common multi-relational GNNs, are bounded by 1-RWL;
3) R-GCN and CompGCN are in fact equally expressive
Even more interesting finding is that the most expressive message functions should capture vector scaling, eg, multiplication or circular correlation. This result gives a solid foundation for GINE with multiplicative message function (one of the most popular GNN encoders for molecular graphs) and CompGCN with DistMult.
Based on the theory for homogeneous higher-order GNNs, we show there exist higher-order relational networks, k-RNs, that are more expressive than 1-RWL. Similarly to local k-GNNs, there exist approximations that reduce their computational complexity.
---
Now we have theoretical mechanisms to explain the expressiveness of relational GNNs! In the next post, we’ll check the other places Weisfeiler and Leman visited in 2022 and what are the results of their trips 🚠
LOG 2022 In-Person Meetups
LOG 2022, the premier graph conference, starts this Friday! It is going to be fully remote but the GraphML community all over the globe organizes physical local meetups you might want to join:
- Cambridge meetup
- Würzburg meetup of the DACH area
- Boston area meetup at MIT
- Montreal meetup at Mila
- Paris meetup at CentraleSupélec (Paris-Saclay)
Let us know if you organize a meetup in your area and we’ll update the post.
LOG 2022, the premier graph conference, starts this Friday! It is going to be fully remote but the GraphML community all over the globe organizes physical local meetups you might want to join:
- Cambridge meetup
- Würzburg meetup of the DACH area
- Boston area meetup at MIT
- Montreal meetup at Mila
- Paris meetup at CentraleSupélec (Paris-Saclay)
Let us know if you organize a meetup in your area and we’ll update the post.
Eventbrite
LoG Conference Boston Area Meetup
Boston-area local meetup for the first Learning on Graphs Conference (https://logconference.org/).
LOG 2022 and News
The Learning of Graphs conference started on Friday - join the talks, poster sessions, and recently announced tutorials on Saturday and Sunday!
Other news of the week:
DeepMind finally announced A Generalist Neural Algorithmic Learner (spotlight at LOG 2022) - an approach to train a single GNN processor network that can solve 30 diverse algorithmic reasoning tasks from the CLRS-3- benchmark. Don’t miss the 3-hour tutorial on Saturday
PDEArena by Microsoft Research is the PDE surrogate benchmarking framework, 20 models, 4 datasets on fluid dynamics and electrodynamics. Time to flex your PDE solvers 💪
The OpenCatalyst team released AdsorbML: ML-based potentials deliver a whopping 1300x boost compared to DFT while retaining 85% accuracy (and 4000x retaining 75%) in identification of low energy adsorbate-surface configurations.
The Learning of Graphs conference started on Friday - join the talks, poster sessions, and recently announced tutorials on Saturday and Sunday!
Other news of the week:
DeepMind finally announced A Generalist Neural Algorithmic Learner (spotlight at LOG 2022) - an approach to train a single GNN processor network that can solve 30 diverse algorithmic reasoning tasks from the CLRS-3- benchmark. Don’t miss the 3-hour tutorial on Saturday
PDEArena by Microsoft Research is the PDE surrogate benchmarking framework, 20 models, 4 datasets on fluid dynamics and electrodynamics. Time to flex your PDE solvers 💪
The OpenCatalyst team released AdsorbML: ML-based potentials deliver a whopping 1300x boost compared to DFT while retaining 85% accuracy (and 4000x retaining 75%) in identification of low energy adsorbate-surface configurations.
Learning on Graphs Conference
CASP 15 - MSAs Strike Back
CASP (Critical Assessment of Techniques for Protein Structure Prediction) is a bi-annual challenge on protein structure modeling. In 2020, AlphaFold 2 revolutionized the field of protein structure prediction winning the CASP 14 challenge by a huge margin using Geometric Deep Learning. This weekend the results of CASP 15 were announced - what do we see there after glancing through the abstracts?
In short, multiple sequence alignments (MSAs) do no go anywhere and still are the main component of winning approaches. Most of the top models are based on AlphaFold 2 (and its Multimer version) with many tweaks here and there. Protein LM-based folding like ESM Fold (popular for not needing MSAs) seems to be far away from the top. More reflections by Ezgi Karaca and Sergey Ovchinnikov
CASP (Critical Assessment of Techniques for Protein Structure Prediction) is a bi-annual challenge on protein structure modeling. In 2020, AlphaFold 2 revolutionized the field of protein structure prediction winning the CASP 14 challenge by a huge margin using Geometric Deep Learning. This weekend the results of CASP 15 were announced - what do we see there after glancing through the abstracts?
In short, multiple sequence alignments (MSAs) do no go anywhere and still are the main component of winning approaches. Most of the top models are based on AlphaFold 2 (and its Multimer version) with many tweaks here and there. Protein LM-based folding like ESM Fold (popular for not needing MSAs) seems to be far away from the top. More reflections by Ezgi Karaca and Sergey Ovchinnikov
X (formerly Twitter)
Ezgi Karaca (@Ezgi_Karaca_) on X
It's finally time to discuss the main highlights of #CASP15's assembly category, where we had these beatiful 41 targets:
Friday GraphML News
Not much news this week - seems that the community went for a break after consecutive NeurIPS and LOG. A few things came to our attention:
- IPAM organizes a workshop on Learning and Emergence in Molecular Systems at UCLA in Jan 23-27 with invited talks including Xavier Bresson, Kyunghyun Cho, Bruno Correia, Tommi Jaakkola, Frank Noe, Tess Smidt, and Max Welling
- Recordings of keynotes and orals at LOG have been published on YouTube, recordings of workshops and tutorials are expected soon
Not much news this week - seems that the community went for a break after consecutive NeurIPS and LOG. A few things came to our attention:
- IPAM organizes a workshop on Learning and Emergence in Molecular Systems at UCLA in Jan 23-27 with invited talks including Xavier Bresson, Kyunghyun Cho, Bruno Correia, Tommi Jaakkola, Frank Noe, Tess Smidt, and Max Welling
- Recordings of keynotes and orals at LOG have been published on YouTube, recordings of workshops and tutorials are expected soon
IPAM
Learning and Emergence in Molecular Systems - IPAM
Xmas Papers: Molecule Editing from Text, Protein Generation
It's the holiday season 🎄, and what better way to spend it than reading some new papers on molecule and protein generation! Here are a few cool papers published on arxiv this week:
Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing by Shengchao Liu and the Mila/NVIDIA team proposes MoleculeSTM, a CLIP-like text-to-molecule model. MoleculeSTM can do 2 impressive things: (1) retrieve molecules by text denoscription like “triazole derivatives” and retrieve text denoscription from a given molecule in SMILES, (2) molecule editing from text prompts like “make the molecule soluble in water with low permeability” - and the model edits the molecular graph according to the denoscription, mindblowing 🤯
Protein Sequence and Structure Co-Design with Equivariant Translation by Chence Shi and the Mila team propose ProtSEED, a generative model for protein sequence and structure simultaneously (for example, most existing diffusion models for proteins can do only one of those at a time). ProtSEED can be conditioned on residue features or pairs of residues. Model-wise, it is an equivariant iterative model (AlphaFold 2 vibes) with improved triangular attention. ProtSEED was evaluated on Antibody CDR co-design, Protein sequence-structure co-design, and Fixed backbone sequence design.
And 2 more papers from the ESM team, Meta AI, and BakerLab (check the Twitter thread by Alex Rives for more details)!
Language models generalize beyond natural proteins by Robert Verkuil et al. find that ESM2 can generate de novo protein sequences that can actually be synthesized in the lab and, more importantly, do not have any match among known natural proteins. Great result knowing that ESM2 was only trained on sequences!
A high-level programming language for generative protein design by Brian Hie et al. propose pretty much a new programming language for protein designers (think of it as a query language for ESMFold) - production rules organized in a syntax tree with constraint functions. Then, each program is “compiled” into an energy function that governs the generative process.
It's the holiday season 🎄, and what better way to spend it than reading some new papers on molecule and protein generation! Here are a few cool papers published on arxiv this week:
Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing by Shengchao Liu and the Mila/NVIDIA team proposes MoleculeSTM, a CLIP-like text-to-molecule model. MoleculeSTM can do 2 impressive things: (1) retrieve molecules by text denoscription like “triazole derivatives” and retrieve text denoscription from a given molecule in SMILES, (2) molecule editing from text prompts like “make the molecule soluble in water with low permeability” - and the model edits the molecular graph according to the denoscription, mindblowing 🤯
Protein Sequence and Structure Co-Design with Equivariant Translation by Chence Shi and the Mila team propose ProtSEED, a generative model for protein sequence and structure simultaneously (for example, most existing diffusion models for proteins can do only one of those at a time). ProtSEED can be conditioned on residue features or pairs of residues. Model-wise, it is an equivariant iterative model (AlphaFold 2 vibes) with improved triangular attention. ProtSEED was evaluated on Antibody CDR co-design, Protein sequence-structure co-design, and Fixed backbone sequence design.
And 2 more papers from the ESM team, Meta AI, and BakerLab (check the Twitter thread by Alex Rives for more details)!
Language models generalize beyond natural proteins by Robert Verkuil et al. find that ESM2 can generate de novo protein sequences that can actually be synthesized in the lab and, more importantly, do not have any match among known natural proteins. Great result knowing that ESM2 was only trained on sequences!
A high-level programming language for generative protein design by Brian Hie et al. propose pretty much a new programming language for protein designers (think of it as a query language for ESMFold) - production rules organized in a syntax tree with constraint functions. Then, each program is “compiled” into an energy function that governs the generative process.
🎄 It's 2023! In a new post, we provide an overview of what’s happened in Graph ML in 2022 and its subfields (and hypothesize for potential breakthroughs in 2023), including Generative Models, Physics, PDEs, Graph Transformerrs, Theory, KGs, Algorithmic Reasoning, Hardware, and more!
Brought to you by Michael Galkin, Hongyu Ren, Zhaocheng Zhu with the help of Christopher Morris and Johannes Brandstetter
https://mgalkin.medium.com/graph-ml-in-2023-the-state-of-affairs-1ba920cb9232
Brought to you by Michael Galkin, Hongyu Ren, Zhaocheng Zhu with the help of Christopher Morris and Johannes Brandstetter
https://mgalkin.medium.com/graph-ml-in-2023-the-state-of-affairs-1ba920cb9232
Medium
Graph ML in 2023: The State of Affairs
Hot trends and major advancements
New End of the Year Blog Posts
In the first week of a new year, many researchers summarize their thoughts about the past and future. In addition to the previous post reflecting on GraphML in 2022 and 2023, a few new ones appeared:
1. AI in Drug Discovery 2022 by Pat Walters (Relay Therapeutics) on most inspiring papers in molecular and protein ML.
2. The Batch #177 includes predictions for 2023 by Yoshua Bengio (on reasoning), Alon Halevy (on personal data treatment), Douwe Kiela (on practical aspects of LLMs), Been Kim (on interpretability), and Reza Zadeh (on active learning)
3. Using Graph Learning for Personalization: How GNNs Solve Inherent Structural Issues with Recommender Systems by Dylan Sandfelder and Ivaylo Bahtchevanov (kumo.ai) - on applying GNNs in RecSys with examples from Spotify, Pinterest and UberEats.
4. Top Language AI research papers from Yi Tay (Google) - on large language models, the forefront of AI that does have an impact on Graph ML (remember protein language models like ESM-2 and ESM Fold, for instance).
In the first week of a new year, many researchers summarize their thoughts about the past and future. In addition to the previous post reflecting on GraphML in 2022 and 2023, a few new ones appeared:
1. AI in Drug Discovery 2022 by Pat Walters (Relay Therapeutics) on most inspiring papers in molecular and protein ML.
2. The Batch #177 includes predictions for 2023 by Yoshua Bengio (on reasoning), Alon Halevy (on personal data treatment), Douwe Kiela (on practical aspects of LLMs), Been Kim (on interpretability), and Reza Zadeh (on active learning)
3. Using Graph Learning for Personalization: How GNNs Solve Inherent Structural Issues with Recommender Systems by Dylan Sandfelder and Ivaylo Bahtchevanov (kumo.ai) - on applying GNNs in RecSys with examples from Spotify, Pinterest and UberEats.
4. Top Language AI research papers from Yi Tay (Google) - on large language models, the forefront of AI that does have an impact on Graph ML (remember protein language models like ESM-2 and ESM Fold, for instance).
Blogspot
AI in Drug Discovery 2022 - A Highly Opinionated Literature Review
Here’s a roundup of some of the papers I found interesting in 2022. This list is heavily slanted to my interests, which lean toward the appl...
ICLR 2023 Workshops
The list of workshops at upcoming ICLR’23 has been announced! A broad Graph ML audience might be interested in:
- From Molecules to Materials: ICLR 2023 Workshop on Machine learning for materials (ML4Materials)
- Machine Learning for Drug Discovery (MLDD)
- Neurosymbolic Generative Models (NeSy-GeMs)
- Physics for Machine Learning
- Deep Learning for Code (DL4C)
The list of workshops at upcoming ICLR’23 has been announced! A broad Graph ML audience might be interested in:
- From Molecules to Materials: ICLR 2023 Workshop on Machine learning for materials (ML4Materials)
- Machine Learning for Drug Discovery (MLDD)
- Neurosymbolic Generative Models (NeSy-GeMs)
- Physics for Machine Learning
- Deep Learning for Code (DL4C)
Validated de novo generated antibodies & AI 4 Science talks
- Absci announced their de novo zero-shot generated therapeutic antibodies were validated in the wet lab. The pre-print is scarce on technical details, but what we can infer is that they combine many new geometric generative models with fast screening pipelines.
- A new series of talks on AI4Science is starting next week! The inaugural talk will be delivered by Simon Batzner (Harvard) on “E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials”
- Absci announced their de novo zero-shot generated therapeutic antibodies were validated in the wet lab. The pre-print is scarce on technical details, but what we can infer is that they combine many new geometric generative models with fast screening pipelines.
- A new series of talks on AI4Science is starting next week! The inaugural talk will be delivered by Simon Batzner (Harvard) on “E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials”
ai4sciencetalks.github.io
Talks | AI4Science Talks
A growing collection of invited talks on Machine Learning for Science and Simulations.
Friday News
- No big protein / molecule diffusion model announcement this week 🤨
- ELLIS starts a new program “Machine Learning for Molecule Discovery” that aims to improve computational molecular science in a multitude of its applications (eg, our favourite generative diffusion models). ELLIS is a pan-European AI network of excellence which focuses on fundamental science, technical innovation and societal impact. According to José Miguel Hernández Lobato (Cambridge): ELLIS Programs focus on high-impact problem areas that have the potential to move the needle in modern AI. Each Program has a budget for 2-3 workshops each year to enable meetings of ELLIS Fellows plus guests for intensive scientific exchange”.
- Helmholtz Munich Campus organizes the 1st International Symposium on AI for Health to be held on Jan 16. Keynotes will be delivered by Mackenzie Mathis (EPFL), Michaela van der Schaar (Cambridge), and Marinka Zitnik (Harvard).
- The Growth-focused ML event with leaders from Whatnot and Kumo.AI on Feb 1st, featuring Jure Leskovec discussing SOTA GNNs and where the ML community is heading.
- No big protein / molecule diffusion model announcement this week 🤨
- ELLIS starts a new program “Machine Learning for Molecule Discovery” that aims to improve computational molecular science in a multitude of its applications (eg, our favourite generative diffusion models). ELLIS is a pan-European AI network of excellence which focuses on fundamental science, technical innovation and societal impact. According to José Miguel Hernández Lobato (Cambridge): ELLIS Programs focus on high-impact problem areas that have the potential to move the needle in modern AI. Each Program has a budget for 2-3 workshops each year to enable meetings of ELLIS Fellows plus guests for intensive scientific exchange”.
- Helmholtz Munich Campus organizes the 1st International Symposium on AI for Health to be held on Jan 16. Keynotes will be delivered by Mackenzie Mathis (EPFL), Michaela van der Schaar (Cambridge), and Marinka Zitnik (Harvard).
- The Growth-focused ML event with leaders from Whatnot and Kumo.AI on Feb 1st, featuring Jure Leskovec discussing SOTA GNNs and where the ML community is heading.
Department of Engineering
New program aiming to accelerate molecular discovery
However, the discovery of new molecules or molecular materials that are optimised for a particular purpose can often take up to a decade and is highly cost-intensive. Machine-learning (ML) methods
Temporal Graph Learning in 2023
A new blogpost by Andy Huang, Emanuele Rossi, Michael Galkin, and Kellin Pelrine on the recent progress in temporal Graph ML! Featuring theoretical advancements in understanding expressive power of temporal GNNs, discussing evaluation protocols and trustworthiness concerns, looking at temporal KGs, disease modeling, and anomaly detection, as well as pointing to the software libraries and new datasets!
A new blogpost by Andy Huang, Emanuele Rossi, Michael Galkin, and Kellin Pelrine on the recent progress in temporal Graph ML! Featuring theoretical advancements in understanding expressive power of temporal GNNs, discussing evaluation protocols and trustworthiness concerns, looking at temporal KGs, disease modeling, and anomaly detection, as well as pointing to the software libraries and new datasets!
Medium
Temporal Graph Learning in 2023
The story so far
Friday Graph ML News: Ankh Protein LM, Deadlines, and New Blogs
If you didn’t make it to polish the submission for ICML or IJCAI, consider other upcoming submission deadlines:
- Deep Leaning for Graphs @ International Joint Conference on Neural Networks: Jan 31st
- Special Issue on Graph Learning @ IEEE Transactions on Neural Networks and Learning Systems: March 1st
- Graph Representation Learning track @ European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning: May 2nd
The Summer Geometry Initiative (MIT) is a six-week paid summer research program introducing undergraduate and graduate students to the field of geometry processing, no prior experience is required: apply until February 15th.
New articles and blogs about graphs and more general deep learning:
- Quanta Magazine published a fascinating article on the discovery of the shortest path algorithm on graphs with negative edge weights;
- Kexin Huang prepared a post explaining a variety of datasets available in the Therapeutic Data Commons from drug-target interactions to retrosynthesis and predicting CRISPR editing outcomes
- Tim Dettmers updated his annual report on most efficient GPUs per $ with new data from H100. In relative performance, if you don’t have H100’s — get RTX 4090; for perf per $ 4070 Ti is surprisingly in the top.
- Google published a Deep Learning Tuning Playbook - a collection of tuning advice that will help you to squeeze that 1% of performance and get top-1 in OGB!
- Finally, a huge post from Lilian Weng on optimizing inference of large Transformers
☥ This week we do see a new big model: meet Ankh, a protein LM! Thanks to the recent observation of importance of data size and training vs model size, 1.5B Ankh often outperforms 15B ESM-2B on contact prediction, structure prediction, and a good bunch of protein representation learning tasks. Arxiv pre-print is available as well!If you didn’t make it to polish the submission for ICML or IJCAI, consider other upcoming submission deadlines:
- Deep Leaning for Graphs @ International Joint Conference on Neural Networks: Jan 31st
- Special Issue on Graph Learning @ IEEE Transactions on Neural Networks and Learning Systems: March 1st
- Graph Representation Learning track @ European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning: May 2nd
The Summer Geometry Initiative (MIT) is a six-week paid summer research program introducing undergraduate and graduate students to the field of geometry processing, no prior experience is required: apply until February 15th.
New articles and blogs about graphs and more general deep learning:
- Quanta Magazine published a fascinating article on the discovery of the shortest path algorithm on graphs with negative edge weights;
- Kexin Huang prepared a post explaining a variety of datasets available in the Therapeutic Data Commons from drug-target interactions to retrosynthesis and predicting CRISPR editing outcomes
- Tim Dettmers updated his annual report on most efficient GPUs per $ with new data from H100. In relative performance, if you don’t have H100’s — get RTX 4090; for perf per $ 4070 Ti is surprisingly in the top.
- Google published a Deep Learning Tuning Playbook - a collection of tuning advice that will help you to squeeze that 1% of performance and get top-1 in OGB!
- Finally, a huge post from Lilian Weng on optimizing inference of large Transformers
Proteinea
Ankh: Pioneering the paradigm shifting technology of protein language models
To model the language of life (i.e., the language of proteins) in a data-efficient and cost-optimized modality, we need a meaningful representation encompassing its structural and functional information. Proteinea has developed -and is continuously improving…
Friday Graph ML News: ProGen, ClimaX, WebConf Workshops, Everything is Connected
No week without new foundation models!
A collaboration of researchers from Salesforce, UCSF and Berkeley announced ProGen, an LLM for protein sequence generation. Claimed to be “ChatGPT for proteins”, ProGen is a 1.2B model trained on 280M sequences controllable by input tags, eg “Protein Family: Pfam ID PF16754, Pesticin”. The authors synthesized in a lab a handful of generated proteins to confirm model quality.
In the GraphML’23 State of Affairs we highlighted weather prediction models GraphCast (from DeepMind) and PanguWeather (from Huawei). This week, Microsoft Research and UCLA announced ClimaX, the foundation model for climate and weather that can serve as a backbone for many many downstream applications. In contrast to now-casting GraphCast and PanguWeather, ClimaX is tailored for more long-range predictions up to a month. ClimaX is a ViT-based image-to-image model with several tokenization and representation novelties to account for different input granularity and sequence length - check out the full paper preprint for more details.
Petar Veličković published the opinion paper Everything is Connected: Graph Neural Network framing many ML applications through the lens of graph representation learning. The article gives a gentle introduction to the basics of GNNs and their applications including geometric equivariant models. Nice read!
The WebConf’23 (April 30 - May 4) announced accepted workshops with a handful of Graph ML venues:
- Graph Neural Networks: Foundation, Frontiers and Applications
- Mining of Real-world Hypergraphs: Patterns, Tools, and Generators
- Graph Neural Networks for Tabular Data Learning
- Continual Graph Learning
- Towards Out-of-Distribution Generalization on Graphs
- Self-supervised Learning and Pre-training on Graphs
- When Sparse Meets Dense: Learning Advanced Graph Neural Networks with DGL-Sparse Package
No week without new foundation models!
A collaboration of researchers from Salesforce, UCSF and Berkeley announced ProGen, an LLM for protein sequence generation. Claimed to be “ChatGPT for proteins”, ProGen is a 1.2B model trained on 280M sequences controllable by input tags, eg “Protein Family: Pfam ID PF16754, Pesticin”. The authors synthesized in a lab a handful of generated proteins to confirm model quality.
In the GraphML’23 State of Affairs we highlighted weather prediction models GraphCast (from DeepMind) and PanguWeather (from Huawei). This week, Microsoft Research and UCLA announced ClimaX, the foundation model for climate and weather that can serve as a backbone for many many downstream applications. In contrast to now-casting GraphCast and PanguWeather, ClimaX is tailored for more long-range predictions up to a month. ClimaX is a ViT-based image-to-image model with several tokenization and representation novelties to account for different input granularity and sequence length - check out the full paper preprint for more details.
Petar Veličković published the opinion paper Everything is Connected: Graph Neural Network framing many ML applications through the lens of graph representation learning. The article gives a gentle introduction to the basics of GNNs and their applications including geometric equivariant models. Nice read!
The WebConf’23 (April 30 - May 4) announced accepted workshops with a handful of Graph ML venues:
- Graph Neural Networks: Foundation, Frontiers and Applications
- Mining of Real-world Hypergraphs: Patterns, Tools, and Generators
- Graph Neural Networks for Tabular Data Learning
- Continual Graph Learning
- Towards Out-of-Distribution Generalization on Graphs
- Self-supervised Learning and Pre-training on Graphs
- When Sparse Meets Dense: Learning Advanced Graph Neural Networks with DGL-Sparse Package
Nature
Large language models generate functional protein sequences across diverse families
Nature Biotechnology - A generative deep-learning model designs artificial proteins with desired enzymatic activities.
On the Expressive Power of Geometric Graph Neural Networks
Geometric GNNs are an emerging class of GNNs for spatially embedded graphs across science and engineering, e.g. SchNet for molecules, Tensor Field Networks for materials, GemNet for electrocatalysts, MACE for molecular dynamics, and E(n)-Equivariant Graph ConvNet for macromolecules.
How powerful are geometric GNNs? How do key design choices influence expressivity and how to build maximally powerful ones?
Check out this recent paper from Chaitanya K. Joshi, Cristian Bodnar, Simon V. Mathis, Taco Cohen, and Pietro Liò for more:
📄 PDF: http://arxiv.org/abs/2301.09308
💻 Code: http://github.com/chaitjo/geometric-gnn-dojo
❓Research gap: Standard theoretical tools for GNNs, such as the Weisfeiler-Leman graph isomorphism test, are inapplicable for geometric graphs. This is due to additional physical symmetries (roto-translation) that need to be accounted for.
💡Key idea: notion of geometric graph isomorphism + new geometric WL framework --> upper bound on geometric GNN expressivity.
The Geometric WL framework formalises the role of depth, invariance vs. equivariance, body ordering in geometric GNNs.
- Invariant GNNs cannot tell apart one-hop identical geometric graphs, fail to compute global properties.
- Equivariant GNNs distinguish more graphs; how? Depth propagates local geometry beyond one-hop.
What about practical implications? Synthetic experiments highlight challenges in building maximally powerful geom. GNNs:
- Oversquashing of geometric information with increased depth.
- Utility of higher order order spherical tensors over cartesian vectors.
P.S. Are you new to Geometric GNNs, GDL, PyTorch Geometric, etc.? Want to understand how theory/equations connect to real code?
Try this Geometric GNN 101 notebook before diving in:
https://github.com/chaitjo/geometric-gnn-dojo/blob/main/geometric_gnn_101.ipynb
Geometric GNNs are an emerging class of GNNs for spatially embedded graphs across science and engineering, e.g. SchNet for molecules, Tensor Field Networks for materials, GemNet for electrocatalysts, MACE for molecular dynamics, and E(n)-Equivariant Graph ConvNet for macromolecules.
How powerful are geometric GNNs? How do key design choices influence expressivity and how to build maximally powerful ones?
Check out this recent paper from Chaitanya K. Joshi, Cristian Bodnar, Simon V. Mathis, Taco Cohen, and Pietro Liò for more:
📄 PDF: http://arxiv.org/abs/2301.09308
💻 Code: http://github.com/chaitjo/geometric-gnn-dojo
❓Research gap: Standard theoretical tools for GNNs, such as the Weisfeiler-Leman graph isomorphism test, are inapplicable for geometric graphs. This is due to additional physical symmetries (roto-translation) that need to be accounted for.
💡Key idea: notion of geometric graph isomorphism + new geometric WL framework --> upper bound on geometric GNN expressivity.
The Geometric WL framework formalises the role of depth, invariance vs. equivariance, body ordering in geometric GNNs.
- Invariant GNNs cannot tell apart one-hop identical geometric graphs, fail to compute global properties.
- Equivariant GNNs distinguish more graphs; how? Depth propagates local geometry beyond one-hop.
What about practical implications? Synthetic experiments highlight challenges in building maximally powerful geom. GNNs:
- Oversquashing of geometric information with increased depth.
- Utility of higher order order spherical tensors over cartesian vectors.
P.S. Are you new to Geometric GNNs, GDL, PyTorch Geometric, etc.? Want to understand how theory/equations connect to real code?
Try this Geometric GNN 101 notebook before diving in:
https://github.com/chaitjo/geometric-gnn-dojo/blob/main/geometric_gnn_101.ipynb
GitHub
GitHub - chaitjo/geometric-gnn-dojo: Geometric GNN Dojo provides unified implementations and experiments to explore the design…
Geometric GNN Dojo provides unified implementations and experiments to explore the design space of Geometric Graph Neural Networks (ICML 2023) - chaitjo/geometric-gnn-dojo
Friday Graph ML News: Blogs, ICLR Acceptances, and Software Releases
No big protein / molecule diffusion model announcement this week 🤨
Still, a handful of nice blogposts!
Graph Machine Learning Explainability with PyG by Blaž Stojanovič and PyG Team is a massive tutorial on GNN explainability tools in PyG with datasets, code examples, and metrics. Must read for all explainability studies.
Unleashing ML Innovation at Spotify with Ray talks about Ray and an application of running A/B tests of the GNN-based recommender system for the main page of Spotify.
🤓 ICLR accepted papers are now available with distinction of top-5%, top-25%, and posters. Stay tuned for the review of ICLR graph papers! Meanwhile, have a look at some hot new papers put on arxiv:
- WL Meets VC by Christopher Morris, Floris Geerts, Jan Tönshoff, Martin Grohe - the first work to connect WL test with VC dimension of GNNs with provable bounds!
- Curvature Filtrations for Graph Generative Model Evaluation by Joshua Southern, Jeremy Wayland, Michael Bronstein, Bastian Rieck - a novel look on graph generation evaluation using the concepts of topological data analysis
🛠️ New software releases this week!
- DGL v1.0 - finally, the major 1.0 release featuring a new sparse backend
- PyKEEN v1.10 - still the best library to work with KG embeddings
No big protein / molecule diffusion model announcement this week 🤨
Still, a handful of nice blogposts!
Graph Machine Learning Explainability with PyG by Blaž Stojanovič and PyG Team is a massive tutorial on GNN explainability tools in PyG with datasets, code examples, and metrics. Must read for all explainability studies.
Unleashing ML Innovation at Spotify with Ray talks about Ray and an application of running A/B tests of the GNN-based recommender system for the main page of Spotify.
🤓 ICLR accepted papers are now available with distinction of top-5%, top-25%, and posters. Stay tuned for the review of ICLR graph papers! Meanwhile, have a look at some hot new papers put on arxiv:
- WL Meets VC by Christopher Morris, Floris Geerts, Jan Tönshoff, Martin Grohe - the first work to connect WL test with VC dimension of GNNs with provable bounds!
- Curvature Filtrations for Graph Generative Model Evaluation by Joshua Southern, Jeremy Wayland, Michael Bronstein, Bastian Rieck - a novel look on graph generation evaluation using the concepts of topological data analysis
🛠️ New software releases this week!
- DGL v1.0 - finally, the major 1.0 release featuring a new sparse backend
- PyKEEN v1.10 - still the best library to work with KG embeddings
Medium
Graph Machine Learning Explainability with PyG
By Blaž Stojanovič
HuggingFace enters GraphML
After unifying Vision and Language models and datasets under one hood, 🤗 comes for GraphML! Today, Datasets started hosting graph datasets including OGB ones, ZINC, CSL, and others from Benchmarking GNNs, as well as MD17 for molecular dynamics. Let’s see what HF does next with GNNs.
After unifying Vision and Language models and datasets under one hood, 🤗 comes for GraphML! Today, Datasets started hosting graph datasets including OGB ones, ZINC, CSL, and others from Benchmarking GNNs, as well as MD17 for molecular dynamics. Let’s see what HF does next with GNNs.
huggingface.co
Graph Machine Learning Datasets – Hugging Face
Explore datasets powering machine learning.