Graph Machine Learning – Telegram
Graph Machine Learning
6.71K subscribers
53 photos
11 files
808 links
Everything about graph theory, computer science, machine learning, etc.


If you have something worth sharing with the community, reach out @gimmeblues, @chaitjo.

Admins: Sergey Ivanov; Michael Galkin; Chaitanya K. Joshi
Download Telegram
Towards Geometric Deep Learning IV: Chemical Precursors of GNNs

In the final post of the series, Michael Bronstein covers the role of chemistry and computational chemistry in developing mathematical concept that were further used in creating GNNs. For instance, the problem of patent offices when registering a new drug required a way to compare a new molecule with those in the existing database - starting from strings continuing with molecular fingerprints and finally arriving to the WL-test and its modern variants.
Geometric Deep Learning Course: 2022 Update

The go-to GDL course by Michael M. Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković has just been updated! New materials are in the introduction, in the graph transformers section, more about category theory (don’t forget your vegetables 🥦), differential geometry, and topolgy, as well as a new set of invited speakers covering recent hot topics from subgraph GNNs to AlphaFold 2.
Geometric DL News: 200M proteins in AlphaFold DB, Euclidean nets, Italian GDL Summer School, Diffusers

This week brought us a bunch of news and new materials:

- DeepMind announced expanding the AlphaFold DB to 200 million protein structures. Celebrating 1Y anniversary since the release of groundbreaking AlphaFold 2, DeepMind mentions a huge success of the system among scientists all over the world - more than 500.000 researchers from 190 countries have accesses AlphaFold predictions - and sketches further plans to apply the outcomes in other areas such as drug discovery, fusion, and climate change

- Mario Geiger (MIT) and Tess Smidt (MIT) released an updated version of the writeup on e3nn - the most popular Python library to build Euclidean Neural Networks, a basis for many new cool works like Steerable GNNs and SE(3)-Transformers. The writeup includes simple intuitions behind spherical harmonics, tensor product, irreducible representations, and other key building blocks - if you work on equivariant architectures, you probably do that with e3nn 😉

- 🇮🇹 First Italian School on Geometric Deep Learning releases all slides and Colab Notebooks on equivariance, topology, differential geometry and other topics covered by top speakers including Michael Bronstein, Cristian Bodnar, Maurice Weiler, Pim de Haan, and Francesco Di Giovanni.

- Following the hottest 2022 trend, HuggingFace 🤗 aims to tame the wilds of diffusion models and releases Diffusers 🧨, a single library to build and train diffusion models of all modalities - image generation, text generation, and, of course, graph generation! The PR with GeoDiff, a SOTA molecule generation model from ICLR 2022, is already prepared 🚀
Sampling from Large Heterogeneous Graphs with TF-GNN

In this new blogpost, Brandon Mayer and Bryan Perozzi go into details on how to organize scalable neighborhood sampling over large heterogeneous graphs (of many node types and edge types) using the example of OGB MAG dataset (2M nodes, 20M edges). Sampling can be defined using Apache Beam configs and can fetch data right from the Google Cloud Platform through the Dataflow Engine.

Recently, we covered the release of TensorFlow-GNN (TF-GNN), a new framework by Google to train GNNs on very large graphs that often do not fit into main memory. Today’s post is a more hands-on tutorial with particular code examples you could try yourself 🛠️.
KDD 2022

KDD 2022, one of the premier Graph & Data Mining venues, will take place in Washington DC in two weeks (Aug 14-18). As always, the published program of Research Track papers and Applied Data Science Track papers is full of graph papers so check them out.

Furthermore, there will be a rich selection of workshops:

- International Workshop on Mining and Learning with Graphs (MLG) (co-located with DLG)
- Deep Learning on Graphs: Methods and Applications (DLG-KDD’22) (co-located with MLG)
- International Workshop on Knowledge Graphs: Open Knowledge Network
- International Workshop on Data Mining in Bioinformatics (BIOKDD 2022)

And even more tutorials:

- Trustworthy Graph Learning: Reliability, Explainability, and Privacy Protection (Tencent AI)
- Graph-based Representation Learning for Web-scale Recommender Systems (Twitter)
- Algorithmic Fairness on Graphs: Methods and Trends (U. Illinois at Urbana-Champaign)
- Toward Graph Minimally-Supervised Learning (Arizona State University)
- Accelerated GNN training with DGL and RAPIDS cuGraph in a Fraud Detection Workflow (NVIDIA)
- Graph Neural Networks: Foundation, Frontiers and Applications
- Temporal Graph Learning for Financial World: Algorithms, Scalability, Explainability & Fairness (MasterCard)
- Efficient Machine Learning on Large-Scale Graphs (TigerGraph)
- Frontiers of Graph Neural Networks with DIG (Texas A&M University)
- Graph Neural Networks in Life Sciences: Opportunities and Solutions (Amazon)
New Software and Library Updates

August is a notoriously quiet month without big news, but there is something new in the graph software:

- Uni-Fold - a re-implemented AlphaFold and AlphaFold-Multimer in PyTorch. The authors emphasize this is the first open-source repo for training AlphaFold-Multimer and their AlphaFold implementation can be trained 2x faster than the original.

- PyKEEN 1.9 features new tools for adding textual representations to KG embedding models as well as adds significant speedups of NodePiece on large graphs (5M nodes / 30M edges in 10 minutes on a laptop) thanks to the METIS partitioning algorithm and GPU-accelerated BFS.

- GRAPE - a Rust/Python library for graph processing and embedding with many compbio datasets integrated.
Recordings from the Italian School on Geometric DL and Graph ML for Visual Computing @ CVPR 2022

- The full playlist of 14 lectures from the recent Italian School on Geometric Deep Learning is now available on YouTube featuring 10 long talks and 4 introductory lectures on Group Theory, Manifolds, Topology, and Category Theory (don’t forget that Category Theory is your veggies 🥦  that you should take regularly). Slides and Colab notebooks are already available on the website

- All videos from the CVPR workshop on graphs in visual computing are now available covering graph-based approaches for video understanding, 3D vision, and scene understanding.
Graphcore IPUs for GNNs are freely available on Paperspace

IPUs (Intelligence Processing Unit) by UK-based Graphcore is a new type of hardware (chips and servers) tailored for AI compute - including optimized sparse matrix multiplications. Sparse operations are the main building block of GNNs but are still one of the slowest operations on GPUs (tailored for dense matrix multiplications).

The ImageNet moment in 2012 happened thanks to the hardware lottery as well - when we found that GPUs are dramatically better than CPUs in training deep nets. IPUs can well be the winning hardware lottery ticket for GNNs!

In the recent blog post, Michael Bronstein, Emanuele Rossi, and Daniel Justus hinted upon spectacular performance gains when training Temporal Graph Nets (TGN): 3-11x faster on a single IPU chip compared to A100. IPUs also deliver great general performance in MLPerf, the biggest go-to benchmark of efficient training large vision and language models.

Today, you can try running the code for free on IPU-POD16 that has four IPU chips thanks to the partnership between Paperspace and Graphcore. In addition to standard BERT, RoBERTa and ViTs, Graphcore prepared modules with Cluster-GCN, TGN, and SchNet (a popular baseline for molecular dynamics).

You can run most of PyTorch / TensorFlow code, and IPUs should natively support XLA, so it’s a good time to catch up with JAX and its GNN libraries like Jraph 😉
Proteins, Galaxies, and Robotaxis: GraphML News August’22

August is a notoriously quiet month when it comes to research and news. As folks slowly come back from vacations, we see more and more interesting articles and releases:

🧬 Meta AI released the weights for 3B and 15B ESM-2 models - we recently covered how cool those models are and how you can predict 3D protein structure right from the frozen language model hidden states. Now you can try them on your own premise!

💫 Yesukhei Jagvaral from the Department of Physics at CMU wrote a wonderful post with cool graphics how the team uses GNNs to model scalar and vector quantities of real galaxies with graph GANs. Each galaxy is a node in the graph and has a set of physical features (like mass or tidal fields). Galaxies are connected through the radius nearest neighbors algorithm. The authors train generative models that yield good approximations of real physical properties agreeing with simulations.

🚕 Zoox, a robotaxi startup, employs GNNs to model road dynamics and improve estimations of what’s happening around the car. The post is a bit obscure about the prediction task but we can hypothesize it has to do with vehicle dynamics (like molecular dynamics, but for cars and pedestrians).
Graph ML position at Trade Desk

An interesting position to apply at the Trade Desk. As a researcher in the AI Lab working on graph ML, you will be part of the mission to upgrade their TTD ML tech stack to be graph-ML based. You will also have the opportunities to R&D on cutting edge graph ML technologies and publish them in top conferences, or build innovative product PoC to shape our future product roadmaps. 1 day in the office a week. The tech hubs in London, Madrid & Munich, or in the US!
Upcoming GraphML Venues: LoG and Stanford Graph Learning Workshop

September finally brings some fresh news and updates:

- The abstract deadline for the upcoming Learning of Graphs (LoG) conference is September, 9th AoE with two tracks: full papers and extended abstracts. LoG aims to be the premier venue for Graph ML research, so consider publishing there your best stuff.

- Stanford organizes the 2nd iteration of the Graph Learning Workshop on September 28th covering latest updates in PyG and cool industrial applications. In addition to Stanford speakers there will be invited talks from NVidia, Intel, Meta, Google, Spotify, and Kumo.ai.

A nice relaxing event after the ICLR deadline 🙂 We will be keeping an eye on interesting ICLR submissions as well.
👃 GNNs Learn To Smell & Awesome NeurReps

1) Back in 2019, Google AI started a project on learning representations of smells. From basic chemistry we know that aromaticity depends on the molecular structure, e.g., cyclic compounds. In fact, the whole group of ”aromatic hydrocarbons” was named aromatic because they actually has some smell (compared to many non-organic molecules). If we have a molecular structure, we can employ a GNN on top of it and learn some representations - that is a tl;dr of smell representation learning with GNNs.

Recently, Google AI released a new blogpost describing the next phase of the project - the Principal Odor Map that is able to group molecules in “odor clusters”. The authors conducted 3 cool experiments: classifying 400 new molecules never smelled before and comparison to the averaged rating of a group of human panelists; linking odor quality to fundamental biology; and probing aromatic molecules on their mosquito repelling qualities. The GNN-based model shows very good results - now we can finally claim that GNNs can smell! Looking forward for GNNs transforming the perfume industry 📈

2) The NeurReps commnuity (Symmetry and Geometry in Neural Representations) is curating the Awesome List of resources and research related to the geometry of representations in the brain, deep networks, and beyond. A great resource for Neuroscience and Geometric DL folks to learn about the adjacent field!
​​Workshop: Hot Topics in Graph Neural Networks

Uni Kassel and Fraunhofer IEE organize a GNN workshop on October 25th, the announced line-up of speakers includes Fabian Jogl (TU Wien), Massimo Perini (University of Edinburgh), Hannes Stärk (MIT), Maximilian Thiessen (TU Wien), Rakshit Trivedi (Harvard), and Petar Veličković (DeepMind). Quoting the chairs:

“Find out about our current projects and follow exciting talks about new advances in Graph Neural Networks by international speakers. The work of the GAIN group addresses dynamic GNN models, the expressivity of GNN models, and their application in the power grid. Among others, the speakers will enlighten us with their work on Algorithmically-aligned GNNs, the Improvement of Message-passing, and Geometric Machine Learning for Molecules.

The public part of the event will take place on the 25th of October 2022 from 10am to 6pm. The workshop will be held in a hybrid format, but we are happy if you could come in person! To make the workshop more interactive for everyone who cannot participate in person, we have built a virtual 2D world which you can join to network with other participants!”
Upcoming NeurIPS’22 Workshops & Submission Deadlines

As NeurIPS’22 decisions are out, you might want to submit your work to some cool upcoming domain-specific graph workshops:

1. Temporal Graph Learning Workshop @ NeurIPS’22 organized by researchers from Mila and Oxford - deadline September 19th

2. New Frontiers in Graph Learning @ NeurIPS’22 organized by researchers from Stanford, Harvard, Yale, UCLA, Google Brain, and MIT - deadline September 22nd

3. Symmetry and Geometry in Neural Representations @ NeurIPS’22 - organized by researchers from UC Berkley, Institut Pasteur, ENS, UC Santa Barbara - deadline September 22nd

4. Workshop on Graph Learning for Industrial Applications @ NeurIPS’22 organized by JP Morgan, Capital One, Bank of America, Schonfeld, Mila, IBM, Pfizer, Oxford, and FINRA - deadline September 22nd

5. Critical Assessment of Molecular ML (NeurIPS’22 side-event) organized by ELLIS units in Cambridge and Linz - deadline October 18th

If you are at MICCAI in Singapore those days, don’t forget to attend the 4th Workshop on Graphs in biomedical Image Analysis (GRAIL) on September 18th organized by NVIDIA, TU Munich, and Oxford. There will be talks by Marinka Zitnik, Islem Rekik, Mark O’Donoghue, and Xavier Bresson.
📚 Weekend Reading

This week brought quite a few interesting papers and resources - we encourage you to invest there some time:

Geometric multimodal representation learning by Yasha Ektefaie, George Dasoulas, Ayush Noori, Maha Farhat, and Marinka Zitnik. A survey of 100+ papers on graphs combined with other modalities and a framework of multi-modal approaches for natural sciences like physical interaction, molecular reasoning, and protein modeling.

Clifford Neural Layers for PDE Modeling by Johannes Brandstetter, Rianne van den Berg, Max Welling, Jayesh K. Gupta. If you thought you know all the basics from the Geometric Deep Learning Course - here is something more challenging. The authors introduce the ideas from Geometric Algebra into ML tasks, namely, Clifford Algebras that unify numbers, vectors, complex numbers, quaternions, and have additional primitives to incorporate plane and volume segments. The paper gives a great primer on the math and applications. You can also watch a very visual YouTube lecture on Geometric Algebras.

Categories for AI (Cats4AI) - an upcoming open course on Category Theory created by Andrew Dudzik, Bruno Gavranović, João Guilherme Araújo, Petar Veličković, and Pim de Haan. “This course is aimed towards machine learning researchers, but approachable to anyone with a basic understanding of linear algebra and differential calculus. The material is self-contained and all the necessary background will be introduced along the way.” Don’t forget your veggies 🥦
TorchProtein & PEER Protein Sequence Benchmark Release

MilaGraph released TorchProtein, a new version of TorchDrug powered with a suite of tools for protein sequence understanding. Quoting the authors:

TorchProtein encapsulates many complicated yet repetitive subroutines into functional modules, including widely-used datasets, flexible data processing operations, advanced encoding models, and diverse protein tasks.

With TorchProtein, we can rapidly prototype machine learning solutions to various protein applications within 20 lines of codes, and conduct ablation studies by substituting different parts of the solution with off-the-shelf modules. Furthermore, we can easily adapt these modules to our own needs, and make systematic analyses by comparing the new results to a benchmark provided in the library.”

Simultaneously, the authors present PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding, a new benchmark of 17 protein understanding tasks grouped into 5 categories (Function Prediction, Localization Prediction, Structure Prediction, Protein-Protein Interaction Prediction, Protein-Ligand Interaction Prediction) already available in TorchProtein. ProtBert and ESM-1b have been probed on PEER (and ESM-2 is expected to arrive as well).
​​GraphML News: PyG + NVIDIA, Breakthrough Prize

🚀 PyG announced the release of pyg-lib, the result of collaboration with NVIDIA on speeding up most important PyG operations. It is a low-level GNN library that integrates cuGraph, cuDF, and CUTLASS that improve the speed of matrix multiplications and graph sampling (a common bottleneck when working on large graphs). The reported speedups are pretty astounding - up to x150 when sampling on a GPU. There will be more exciting news about PyG at the upcoming Stanford Graph Learning Workshop!

👏 Breakthrough Prize (renowned as the “Oscars of Science”) announced the winners in life sciences, maths, and physics - graph and geometry areas are well represented there!

- John Jumper (DeepMind) and Demis Hassabis (DeepMind) received the Life Sciences prize for AlphaFold
- Daniel A. Spielman (Yale University) received the Math prize for contributions to spectral graph theory, the Kadison-Singer problem, optimization, and coding theory
- Ronen Eldan (Weizmann Institute of Science and Microsoft Research) received the New Horizons in Mathematics Prize for advancing high-dimensional geometry and probability including the KLS conjecture
- Vera Traub (Uni Bonn PhD 2020) received the Maryam Mirzakhani New Frontiers Prize for advances in approximation results in classical combinatorial optimization problems, including the traveling salesman problem and network design.
ICLR 2023 Submissions

The list of submissions to the top AI venue is available on OpenReview (with full-text PDFs). There are 6000+ submissions this year (3x growth from 2000+ last year), we will be keeping an eye on cool Graph ML submissions and prepare an overview. Enjoy the weekend reading and checking if someone has scooped a project you’ve been working on last months/years 😉
DGL: Billion-Scale Graphs and Sparse Matrix API

In a new release 0.9.1 DGL accelerated the pipeline of working with very large graphs (5B edges). Before it was taking 10 hours and 4TB of RAM and now 3 hours and 500GB of RAM, which also reduces the cost by 4x.

Also, if you use or would like to use sparse API for your GNNs, you can provide the feedback and use cases to the DGL team (feel free to reach out to @ivanovserg990 to connect). They are looking for the following profiles:

* Researchers/students who are familiar with sparse matrix notations or linear algebra.
* May have math or geometry backgrounds.
* Work majorly on innovating GNN architecture; less on domain applications.
* May have PyG/DGL experience.
Hot New Graph ML Submissions from ICLR

🧬 Diffusion remains the top trend in AI/ML venues this year, including the graph domain. Ben Blaiszik compiled a Twitter thread of interesting papers in AI 4 Science domain including material discovery, catalyst discovery, and crystallography. Particularly cool works:

- Protein structure generation via folding diffusion by the collab between Stanford and MSR - Kevin E. Wu, Kevin K. Yang, Rianne van den Berg, James Y. Zou, Alex X. Lu, Ava P. Amini - why do you need AlphaFold and MSAs if you can just train a diffusion model to predict all the structure? 😉

- Dynamic-Backbone Protein-Ligand Structure Prediction with Multiscale Generative Diffusion Models by NVIDIA and Caltech - Zhuoran Qiao, Weili Nie, Arash Vahdat, Thomas F. Miller III,  Anima Anandkumar

- DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking by MIT - Gabriele Corso, Hannes Stärk, Bowen Jing, Regina Barzilay, Tommi Jaakkola - the next version of the famous EquiDock and EquiBind combined with the recent Torsional Diffusion.

- We’d include here a novel benchmark work Tartarus: A Benchmarking Platform for Realistic And Practical Inverse Molecular Design by Stanford and University of Toronto - AkshatKumar Nigam, Robert Pollice, Gary Tom, Kjell Jorner, Luca A. Thiede, Anshul Kundaje, Alan Aspuru-Guzik

📚 In a more general context, Yilun Xu shared a Google Sheet with ICLR submissions on diffusion papers and score-based generative modeling including trendy text-to-video models announced by FAIR and Google.

🤖 Derek Lim compiled a Twitter thread on 10+ ICLR submissions on Graph Transformers - the field looks a bit saturated at the moment, let’s see what reviewers say.

🪓 Michael Bronstein’s lab at Twitter announced two cool papers:

- Gradient Gating for Deep Multi-Rate Learning on Graphs by the collab between ETH Zurich, Oxford, and Berkley - T. Konstantin Rusch, Benjamin P. Chamberlain, Michael W. Mahoney, Michael M. Bronstein, Siddhartha Mishra. A clever trick improving a standard residual connection to allow nodes to get updated ad different speeds. A blast from the past - GraphSAGE from 2017 with gradient gating becomes a unanimous leader by a large margin in heterophilic graphs 👀

- Graph Neural Networks for Link Prediction with Subgraph Sketching by Benjamin Paul Chamberlain, Sergey Shirobokov, Emanuele Rossi, Fabrizio Frasca, Thomas Markovich, Nils Hammerla, Michael M. Bronstein, Max Hansmire. A neat usage of sketching to encode subgraphs in ELPH and its more scalable buddy BUDDY for solving link prediction in large graphs.
📏 Long Range Graph Benchmark

Vijay Dwivedi (NTU, Singapore) published a new blogpost on long-range graph benchmarks introducing 5 new challenging tasks in node classification, link prediction, graph classification, and graph regression.

“Many of the existing graph learning benchmarks consist of prediction tasks that primarily rely on local structural information rather than distant information propagation to compute a target label or metric. This can be observed in datasets such as ZINCogbg-molhiv and ogbg-molpcba where models that rely significantly on encoding local (or, near-local) structural information continue to be among leaderboard toppers.”

LRGB, a new collection of datasets, aims at evaluating long-range capabilities of MPNNs and graph transformers. Particularly, the node classification tasks were derived from image-based Pascal-VOC and COCO, the link prediction task is derived from PCQM4M asking about links between atoms distant in the 2D space (5+ hops away) but close in the 3D space where only 2D features are given, and the graph-level tasks focus on predicting structures and functions of small proteins (peptides).

Message passing nets (MPNNs) are known to suffer from the bottleneck effects and oversquashing and, hence, underperform in long-range tasks. First LRGB experiments confirm that showing that fully-connected graph transformers quite significantly outperform MPNNs. A big room for improving MPNNs!



Paper, Code, Leaderboard