Graph Machine Learning – Telegram
Graph Machine Learning
6.71K subscribers
53 photos
11 files
808 links
Everything about graph theory, computer science, machine learning, etc.


If you have something worth sharing with the community, reach out @gimmeblues, @chaitjo.

Admins: Sergey Ivanov; Michael Galkin; Chaitanya K. Joshi
Download Telegram
ICML 2022 - Graph Workshops

ICML starts today with the full week of tutorials, main talks, and workshops. While we are preparing a blog post about interesting graph papers, you can already check the contents of graph- and related workshops to be held on Friday and Saturday.

- Topology, Algebra, and Geometry in Machine Learning (TAG in ML)
- Knowledge Retrieval and Language Models (KRLM)
- Beyond Bayes: Paths Towards Universal Reasoning Systems
- Machine Learning in Computational Design
Origins of Geometric Deep Learning - Part 2 and 3

A while ago we referenced the first article of the series on the Origins of Geometric DL by Michael Bronstein. Recently, the series got new episodes - Part 2 focuses on the high hopes about the perceptron, the curse of dimensionality, and first AI winters. Part 3 introduces first architectures with baked geometrical priors - the neocognitron (precursor of convnets) and convolutional neural networks.

As always, Michael did a great and meticulous job of finding original references and adding some comments to them - often the references section is as interesting and informative as the main text! 🍿
​​ESMFold: Protein Language Models Solve Folding, Too

Today, Meta AI Protein Team announced ESMFold - a protein folding model that uses representations right from a protein LM. Meta AI has been working on BERT-style protein language models for a while, e.g., they created a family of ESM models that are currently SOTA in masked protein sequence prediction tasks.

“A key difference between ESMFold and AlphaFold2 is the use of language model representations to remove the need for explicit homologous sequences (in the form of an MSA) as input.”

To this end, the authors design a new family of protein LMs ESM-2. ESM-2 are much more parameter efficient compared to ESM-1b, e.g., 150M ESM-2 is on par with 650M ESM-1b, and 15B ESM-2 leaves all ESM-1 models far behind. Having pre-trained an LM, ESMFold applies Folding Trunk blocks (simplified EvoFormer blocks from AlphaFold 2) and yields 3D predictions.

ESMFold outperforms AlphaFold and RoseTTAFold when only given a single-sequence input w/o MSAs and also much faster! Check out the attached illustration with architecture and charts.

“On a single NVIDIA V100 GPU, ESMFold makes a prediction on a protein with 384 residues in 14.2 seconds, 6X faster than a single AlphaFold2 model. On shorter sequences we see a ~60X improvement. … ESMFold can be run reasonably quickly on CPU, and an Apple M1 Macbook Pro makes the same prediction in just over 5 minutes.”

Finally, ESMFold shows remarkable scaling properties:

“We see non-linear improvements in protein structure predictions as a function of model
scale, and observe a strong link between how well the language model understands a sequence (as measured by perplexity) and the structure prediction that emerges.”

Are you already converted to the church of Scale Is All You Need - AGI Is Coming? 😉
Upcoming Graph Workshops

If you are finishing a project and would like to probe your work and get the first round of reviews, consider submitting to recently announced workshops:

- Federated Learning with Graph Data (FedGraph) @ CIKM 2022 - deadline August 15
- Trustworthy Learning on Graphs (TrustLOG) @ CIKM 2022 - deadline September 2
- New Frontiers in Graph Learning (GLFrontiers) @ NeurIPS 2022 - deadline September 15
- Symmetry and Geometry in Neural Representations (NeurReps) @ NeurIPS 2022 - deadline September 22
Graph Machine Learning @ ICML 2022

In case you missed all the ICML’22 fun, we prepared a comprehensive overview of graph papers published at the conference: 35+ papers in 10 categories:

- Generation: Denoising Diffusion Is All You Need
- Graph Transformers
- Theory and Expressive GNNs
- Spectral GNNs
- Explainable GNNs
- Graph Augmentation: Beyond Edge Dropout
- Algorithmic Reasoning and Graph Algorithms
- Knowledge Graph Reasoning
- Computational Biology: Molecular Linking, Protein Binding, Property Prediction
- Cool Graph Applications
Towards Geometric Deep Learning IV: Chemical Precursors of GNNs

In the final post of the series, Michael Bronstein covers the role of chemistry and computational chemistry in developing mathematical concept that were further used in creating GNNs. For instance, the problem of patent offices when registering a new drug required a way to compare a new molecule with those in the existing database - starting from strings continuing with molecular fingerprints and finally arriving to the WL-test and its modern variants.
Geometric Deep Learning Course: 2022 Update

The go-to GDL course by Michael M. Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković has just been updated! New materials are in the introduction, in the graph transformers section, more about category theory (don’t forget your vegetables 🥦), differential geometry, and topolgy, as well as a new set of invited speakers covering recent hot topics from subgraph GNNs to AlphaFold 2.
Geometric DL News: 200M proteins in AlphaFold DB, Euclidean nets, Italian GDL Summer School, Diffusers

This week brought us a bunch of news and new materials:

- DeepMind announced expanding the AlphaFold DB to 200 million protein structures. Celebrating 1Y anniversary since the release of groundbreaking AlphaFold 2, DeepMind mentions a huge success of the system among scientists all over the world - more than 500.000 researchers from 190 countries have accesses AlphaFold predictions - and sketches further plans to apply the outcomes in other areas such as drug discovery, fusion, and climate change

- Mario Geiger (MIT) and Tess Smidt (MIT) released an updated version of the writeup on e3nn - the most popular Python library to build Euclidean Neural Networks, a basis for many new cool works like Steerable GNNs and SE(3)-Transformers. The writeup includes simple intuitions behind spherical harmonics, tensor product, irreducible representations, and other key building blocks - if you work on equivariant architectures, you probably do that with e3nn 😉

- 🇮🇹 First Italian School on Geometric Deep Learning releases all slides and Colab Notebooks on equivariance, topology, differential geometry and other topics covered by top speakers including Michael Bronstein, Cristian Bodnar, Maurice Weiler, Pim de Haan, and Francesco Di Giovanni.

- Following the hottest 2022 trend, HuggingFace 🤗 aims to tame the wilds of diffusion models and releases Diffusers 🧨, a single library to build and train diffusion models of all modalities - image generation, text generation, and, of course, graph generation! The PR with GeoDiff, a SOTA molecule generation model from ICLR 2022, is already prepared 🚀
Sampling from Large Heterogeneous Graphs with TF-GNN

In this new blogpost, Brandon Mayer and Bryan Perozzi go into details on how to organize scalable neighborhood sampling over large heterogeneous graphs (of many node types and edge types) using the example of OGB MAG dataset (2M nodes, 20M edges). Sampling can be defined using Apache Beam configs and can fetch data right from the Google Cloud Platform through the Dataflow Engine.

Recently, we covered the release of TensorFlow-GNN (TF-GNN), a new framework by Google to train GNNs on very large graphs that often do not fit into main memory. Today’s post is a more hands-on tutorial with particular code examples you could try yourself 🛠️.
KDD 2022

KDD 2022, one of the premier Graph & Data Mining venues, will take place in Washington DC in two weeks (Aug 14-18). As always, the published program of Research Track papers and Applied Data Science Track papers is full of graph papers so check them out.

Furthermore, there will be a rich selection of workshops:

- International Workshop on Mining and Learning with Graphs (MLG) (co-located with DLG)
- Deep Learning on Graphs: Methods and Applications (DLG-KDD’22) (co-located with MLG)
- International Workshop on Knowledge Graphs: Open Knowledge Network
- International Workshop on Data Mining in Bioinformatics (BIOKDD 2022)

And even more tutorials:

- Trustworthy Graph Learning: Reliability, Explainability, and Privacy Protection (Tencent AI)
- Graph-based Representation Learning for Web-scale Recommender Systems (Twitter)
- Algorithmic Fairness on Graphs: Methods and Trends (U. Illinois at Urbana-Champaign)
- Toward Graph Minimally-Supervised Learning (Arizona State University)
- Accelerated GNN training with DGL and RAPIDS cuGraph in a Fraud Detection Workflow (NVIDIA)
- Graph Neural Networks: Foundation, Frontiers and Applications
- Temporal Graph Learning for Financial World: Algorithms, Scalability, Explainability & Fairness (MasterCard)
- Efficient Machine Learning on Large-Scale Graphs (TigerGraph)
- Frontiers of Graph Neural Networks with DIG (Texas A&M University)
- Graph Neural Networks in Life Sciences: Opportunities and Solutions (Amazon)
New Software and Library Updates

August is a notoriously quiet month without big news, but there is something new in the graph software:

- Uni-Fold - a re-implemented AlphaFold and AlphaFold-Multimer in PyTorch. The authors emphasize this is the first open-source repo for training AlphaFold-Multimer and their AlphaFold implementation can be trained 2x faster than the original.

- PyKEEN 1.9 features new tools for adding textual representations to KG embedding models as well as adds significant speedups of NodePiece on large graphs (5M nodes / 30M edges in 10 minutes on a laptop) thanks to the METIS partitioning algorithm and GPU-accelerated BFS.

- GRAPE - a Rust/Python library for graph processing and embedding with many compbio datasets integrated.
Recordings from the Italian School on Geometric DL and Graph ML for Visual Computing @ CVPR 2022

- The full playlist of 14 lectures from the recent Italian School on Geometric Deep Learning is now available on YouTube featuring 10 long talks and 4 introductory lectures on Group Theory, Manifolds, Topology, and Category Theory (don’t forget that Category Theory is your veggies 🥦  that you should take regularly). Slides and Colab notebooks are already available on the website

- All videos from the CVPR workshop on graphs in visual computing are now available covering graph-based approaches for video understanding, 3D vision, and scene understanding.
Graphcore IPUs for GNNs are freely available on Paperspace

IPUs (Intelligence Processing Unit) by UK-based Graphcore is a new type of hardware (chips and servers) tailored for AI compute - including optimized sparse matrix multiplications. Sparse operations are the main building block of GNNs but are still one of the slowest operations on GPUs (tailored for dense matrix multiplications).

The ImageNet moment in 2012 happened thanks to the hardware lottery as well - when we found that GPUs are dramatically better than CPUs in training deep nets. IPUs can well be the winning hardware lottery ticket for GNNs!

In the recent blog post, Michael Bronstein, Emanuele Rossi, and Daniel Justus hinted upon spectacular performance gains when training Temporal Graph Nets (TGN): 3-11x faster on a single IPU chip compared to A100. IPUs also deliver great general performance in MLPerf, the biggest go-to benchmark of efficient training large vision and language models.

Today, you can try running the code for free on IPU-POD16 that has four IPU chips thanks to the partnership between Paperspace and Graphcore. In addition to standard BERT, RoBERTa and ViTs, Graphcore prepared modules with Cluster-GCN, TGN, and SchNet (a popular baseline for molecular dynamics).

You can run most of PyTorch / TensorFlow code, and IPUs should natively support XLA, so it’s a good time to catch up with JAX and its GNN libraries like Jraph 😉
Proteins, Galaxies, and Robotaxis: GraphML News August’22

August is a notoriously quiet month when it comes to research and news. As folks slowly come back from vacations, we see more and more interesting articles and releases:

🧬 Meta AI released the weights for 3B and 15B ESM-2 models - we recently covered how cool those models are and how you can predict 3D protein structure right from the frozen language model hidden states. Now you can try them on your own premise!

💫 Yesukhei Jagvaral from the Department of Physics at CMU wrote a wonderful post with cool graphics how the team uses GNNs to model scalar and vector quantities of real galaxies with graph GANs. Each galaxy is a node in the graph and has a set of physical features (like mass or tidal fields). Galaxies are connected through the radius nearest neighbors algorithm. The authors train generative models that yield good approximations of real physical properties agreeing with simulations.

🚕 Zoox, a robotaxi startup, employs GNNs to model road dynamics and improve estimations of what’s happening around the car. The post is a bit obscure about the prediction task but we can hypothesize it has to do with vehicle dynamics (like molecular dynamics, but for cars and pedestrians).
Graph ML position at Trade Desk

An interesting position to apply at the Trade Desk. As a researcher in the AI Lab working on graph ML, you will be part of the mission to upgrade their TTD ML tech stack to be graph-ML based. You will also have the opportunities to R&D on cutting edge graph ML technologies and publish them in top conferences, or build innovative product PoC to shape our future product roadmaps. 1 day in the office a week. The tech hubs in London, Madrid & Munich, or in the US!
Upcoming GraphML Venues: LoG and Stanford Graph Learning Workshop

September finally brings some fresh news and updates:

- The abstract deadline for the upcoming Learning of Graphs (LoG) conference is September, 9th AoE with two tracks: full papers and extended abstracts. LoG aims to be the premier venue for Graph ML research, so consider publishing there your best stuff.

- Stanford organizes the 2nd iteration of the Graph Learning Workshop on September 28th covering latest updates in PyG and cool industrial applications. In addition to Stanford speakers there will be invited talks from NVidia, Intel, Meta, Google, Spotify, and Kumo.ai.

A nice relaxing event after the ICLR deadline 🙂 We will be keeping an eye on interesting ICLR submissions as well.
👃 GNNs Learn To Smell & Awesome NeurReps

1) Back in 2019, Google AI started a project on learning representations of smells. From basic chemistry we know that aromaticity depends on the molecular structure, e.g., cyclic compounds. In fact, the whole group of ”aromatic hydrocarbons” was named aromatic because they actually has some smell (compared to many non-organic molecules). If we have a molecular structure, we can employ a GNN on top of it and learn some representations - that is a tl;dr of smell representation learning with GNNs.

Recently, Google AI released a new blogpost describing the next phase of the project - the Principal Odor Map that is able to group molecules in “odor clusters”. The authors conducted 3 cool experiments: classifying 400 new molecules never smelled before and comparison to the averaged rating of a group of human panelists; linking odor quality to fundamental biology; and probing aromatic molecules on their mosquito repelling qualities. The GNN-based model shows very good results - now we can finally claim that GNNs can smell! Looking forward for GNNs transforming the perfume industry 📈

2) The NeurReps commnuity (Symmetry and Geometry in Neural Representations) is curating the Awesome List of resources and research related to the geometry of representations in the brain, deep networks, and beyond. A great resource for Neuroscience and Geometric DL folks to learn about the adjacent field!
​​Workshop: Hot Topics in Graph Neural Networks

Uni Kassel and Fraunhofer IEE organize a GNN workshop on October 25th, the announced line-up of speakers includes Fabian Jogl (TU Wien), Massimo Perini (University of Edinburgh), Hannes Stärk (MIT), Maximilian Thiessen (TU Wien), Rakshit Trivedi (Harvard), and Petar Veličković (DeepMind). Quoting the chairs:

“Find out about our current projects and follow exciting talks about new advances in Graph Neural Networks by international speakers. The work of the GAIN group addresses dynamic GNN models, the expressivity of GNN models, and their application in the power grid. Among others, the speakers will enlighten us with their work on Algorithmically-aligned GNNs, the Improvement of Message-passing, and Geometric Machine Learning for Molecules.

The public part of the event will take place on the 25th of October 2022 from 10am to 6pm. The workshop will be held in a hybrid format, but we are happy if you could come in person! To make the workshop more interactive for everyone who cannot participate in person, we have built a virtual 2D world which you can join to network with other participants!”
Upcoming NeurIPS’22 Workshops & Submission Deadlines

As NeurIPS’22 decisions are out, you might want to submit your work to some cool upcoming domain-specific graph workshops:

1. Temporal Graph Learning Workshop @ NeurIPS’22 organized by researchers from Mila and Oxford - deadline September 19th

2. New Frontiers in Graph Learning @ NeurIPS’22 organized by researchers from Stanford, Harvard, Yale, UCLA, Google Brain, and MIT - deadline September 22nd

3. Symmetry and Geometry in Neural Representations @ NeurIPS’22 - organized by researchers from UC Berkley, Institut Pasteur, ENS, UC Santa Barbara - deadline September 22nd

4. Workshop on Graph Learning for Industrial Applications @ NeurIPS’22 organized by JP Morgan, Capital One, Bank of America, Schonfeld, Mila, IBM, Pfizer, Oxford, and FINRA - deadline September 22nd

5. Critical Assessment of Molecular ML (NeurIPS’22 side-event) organized by ELLIS units in Cambridge and Linz - deadline October 18th

If you are at MICCAI in Singapore those days, don’t forget to attend the 4th Workshop on Graphs in biomedical Image Analysis (GRAIL) on September 18th organized by NVIDIA, TU Munich, and Oxford. There will be talks by Marinka Zitnik, Islem Rekik, Mark O’Donoghue, and Xavier Bresson.
📚 Weekend Reading

This week brought quite a few interesting papers and resources - we encourage you to invest there some time:

Geometric multimodal representation learning by Yasha Ektefaie, George Dasoulas, Ayush Noori, Maha Farhat, and Marinka Zitnik. A survey of 100+ papers on graphs combined with other modalities and a framework of multi-modal approaches for natural sciences like physical interaction, molecular reasoning, and protein modeling.

Clifford Neural Layers for PDE Modeling by Johannes Brandstetter, Rianne van den Berg, Max Welling, Jayesh K. Gupta. If you thought you know all the basics from the Geometric Deep Learning Course - here is something more challenging. The authors introduce the ideas from Geometric Algebra into ML tasks, namely, Clifford Algebras that unify numbers, vectors, complex numbers, quaternions, and have additional primitives to incorporate plane and volume segments. The paper gives a great primer on the math and applications. You can also watch a very visual YouTube lecture on Geometric Algebras.

Categories for AI (Cats4AI) - an upcoming open course on Category Theory created by Andrew Dudzik, Bruno Gavranović, João Guilherme Araújo, Petar Veličković, and Pim de Haan. “This course is aimed towards machine learning researchers, but approachable to anyone with a basic understanding of linear algebra and differential calculus. The material is self-contained and all the necessary background will be introduced along the way.” Don’t forget your veggies 🥦
TorchProtein & PEER Protein Sequence Benchmark Release

MilaGraph released TorchProtein, a new version of TorchDrug powered with a suite of tools for protein sequence understanding. Quoting the authors:

TorchProtein encapsulates many complicated yet repetitive subroutines into functional modules, including widely-used datasets, flexible data processing operations, advanced encoding models, and diverse protein tasks.

With TorchProtein, we can rapidly prototype machine learning solutions to various protein applications within 20 lines of codes, and conduct ablation studies by substituting different parts of the solution with off-the-shelf modules. Furthermore, we can easily adapt these modules to our own needs, and make systematic analyses by comparing the new results to a benchmark provided in the library.”

Simultaneously, the authors present PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding, a new benchmark of 17 protein understanding tasks grouped into 5 categories (Function Prediction, Localization Prediction, Structure Prediction, Protein-Protein Interaction Prediction, Protein-Ligand Interaction Prediction) already available in TorchProtein. ProtBert and ESM-1b have been probed on PEER (and ESM-2 is expected to arrive as well).