Graph Machine Learning – Telegram
Graph Machine Learning
6.71K subscribers
53 photos
11 files
808 links
Everything about graph theory, computer science, machine learning, etc.


If you have something worth sharing with the community, reach out @gimmeblues, @chaitjo.

Admins: Sergey Ivanov; Michael Galkin; Chaitanya K. Joshi
Download Telegram
Friday Graph ML News: Ankh Protein LM, Deadlines, and New Blogs

This week we do see a new big model: meet Ankh, a protein LM! Thanks to the recent observation of importance of data size and training vs model size, 1.5B Ankh often outperforms 15B ESM-2B on contact prediction, structure prediction, and a good bunch of protein representation learning tasks. Arxiv pre-print is available as well!

If you didn’t make it to polish the submission for ICML or IJCAI, consider other upcoming submission deadlines:

- Deep Leaning for Graphs @ International Joint Conference on Neural Networks: Jan 31st
- Special Issue on Graph Learning @ IEEE Transactions on Neural Networks and Learning Systems: March 1st
- Graph Representation Learning track @ European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning: May 2nd

The Summer Geometry Initiative (MIT) is a six-week paid summer research program introducing undergraduate and graduate students to the field of geometry processing, no prior experience is required: apply until February 15th.

New articles and blogs about graphs and more general deep learning:

- Quanta Magazine published a fascinating article on the discovery of the shortest path algorithm on graphs with negative edge weights;
- Kexin Huang prepared a post explaining a variety of datasets available in the Therapeutic Data Commons from drug-target interactions to retrosynthesis and predicting CRISPR editing outcomes
- Tim Dettmers updated his annual report on most efficient GPUs per $ with new data from H100. In relative performance, if you don’t have H100’s — get RTX 4090; for perf per $ 4070 Ti is surprisingly in the top.
- Google published a Deep Learning Tuning Playbook - a collection of tuning advice that will help you to squeeze that 1% of performance and get top-1 in OGB!
- Finally, a huge post from Lilian Weng on optimizing inference of large Transformers
Friday Graph ML News: ProGen, ClimaX, WebConf Workshops, Everything is Connected

No week without new foundation models!

A collaboration of researchers from Salesforce, UCSF and Berkeley announced ProGen, an LLM for protein sequence generation. Claimed to be “ChatGPT for proteins”, ProGen is a 1.2B model trained on 280M sequences controllable by input tags, eg “Protein Family: Pfam ID PF16754, Pesticin”. The authors synthesized in a lab a handful of generated proteins to confirm model quality.

In the GraphML’23 State of Affairs we highlighted weather prediction models GraphCast (from DeepMind) and PanguWeather (from Huawei). This week, Microsoft Research and UCLA announced ClimaX, the foundation model for climate and weather that can serve as a backbone for many many downstream applications. In contrast to now-casting GraphCast and PanguWeather, ClimaX is tailored for more long-range predictions up to a month. ClimaX is a ViT-based image-to-image model with several tokenization and representation novelties to account for different input granularity and sequence length - check out the full paper preprint for more details.

Petar Veličković published the opinion paper Everything is Connected: Graph Neural Network framing many ML applications through the lens of graph representation learning. The article gives a gentle introduction to the basics of GNNs and their applications including geometric equivariant models. Nice read!

The WebConf’23 (April 30 - May 4) announced accepted workshops with a handful of Graph ML venues:

- Graph Neural Networks: Foundation, Frontiers and Applications
- Mining of Real-world Hypergraphs: Patterns, Tools, and Generators
- Graph Neural Networks for Tabular Data Learning
- Continual Graph Learning
- Towards Out-of-Distribution Generalization on Graphs
- Self-supervised Learning and Pre-training on Graphs
- When Sparse Meets Dense: Learning Advanced Graph Neural Networks with DGL-Sparse Package
On the Expressive Power of Geometric Graph Neural Networks

Geometric GNNs are an emerging class of GNNs for spatially embedded graphs across science and engineering, e.g. SchNet for molecules, Tensor Field Networks for materials, GemNet for electrocatalysts, MACE for molecular dynamics, and E(n)-Equivariant Graph ConvNet for macromolecules.

How powerful are geometric GNNs? How do key design choices influence expressivity and how to build maximally powerful ones?

Check out this recent paper from Chaitanya K. Joshi, Cristian Bodnar, Simon V. Mathis, Taco Cohen, and Pietro Liò for more:

📄 PDF: http://arxiv.org/abs/2301.09308

💻 Code: http://github.com/chaitjo/geometric-gnn-dojo

Research gap: Standard theoretical tools for GNNs, such as the Weisfeiler-Leman graph isomorphism test, are inapplicable for geometric graphs. This is due to additional physical symmetries (roto-translation) that need to be accounted for.

💡Key idea: notion of geometric graph isomorphism + new geometric WL framework --> upper bound on geometric GNN expressivity.

The Geometric WL framework formalises the role of depth, invariance vs. equivariance, body ordering in geometric GNNs.
- Invariant GNNs cannot tell apart one-hop identical geometric graphs, fail to compute global properties.
- Equivariant GNNs distinguish more graphs; how? Depth propagates local geometry beyond one-hop.

What about practical implications? Synthetic experiments highlight challenges in building maximally powerful geom. GNNs:
- Oversquashing of geometric information with increased depth.
- Utility of higher order order spherical tensors over cartesian vectors.

P.S. Are you new to Geometric GNNs, GDL, PyTorch Geometric, etc.? Want to understand how theory/equations connect to real code?

Try this Geometric GNN 101 notebook before diving in:
https://github.com/chaitjo/geometric-gnn-dojo/blob/main/geometric_gnn_101.ipynb
Friday Graph ML News: Blogs, ICLR Acceptances, and Software Releases

No big protein / molecule diffusion model announcement this week 🤨

Still, a handful of nice blogposts!

Graph Machine Learning Explainability with PyG by Blaž Stojanovič and PyG Team is a massive tutorial on GNN explainability tools in PyG with datasets, code examples, and metrics. Must read for all explainability studies.

Unleashing ML Innovation at Spotify with Ray talks about Ray and an application of running A/B tests of the GNN-based recommender system for the main page of Spotify.

🤓 ICLR accepted papers are now available with distinction of top-5%, top-25%, and posters. Stay tuned for the review of ICLR graph papers! Meanwhile, have a look at some hot new papers put on arxiv:

- WL Meets VC by Christopher Morris, Floris Geerts, Jan Tönshoff, Martin Grohe - the first work to connect WL test with VC dimension of GNNs with provable bounds!
- Curvature Filtrations for Graph Generative Model Evaluation by Joshua Southern, Jeremy Wayland, Michael Bronstein, Bastian Rieck - a novel look on graph generation evaluation using the concepts of topological data analysis

🛠️ New software releases this week!

- DGL v1.0 - finally, the major 1.0 release featuring a new sparse backend
- PyKEEN v1.10 - still the best library to work with KG embeddings
HuggingFace enters GraphML

After unifying Vision and Language models and datasets under one hood, 🤗 comes for GraphML! Today, Datasets started hosting graph datasets including OGB ones, ZINC, CSL, and others from Benchmarking GNNs, as well as MD17 for molecular dynamics. Let’s see what HF does next with GNNs.
​​Attending To Graph Transformers

by Luis Müller, Michael Galkin, Christopher Morris, and Ladislav Rampasek

arxiv

Our new survey on Graph Transformers (GTs) adjoined by some “mythbusting”.

We come up with categorization of GTs according to 4 main views:
🗺️ used Encodings,
🌐 expected Input Features (geometric or non-geometric),
Tokenization (nodes, nodes+edges, subgraphs), and
🧮 Propagation (fully-connected, sparse, hybrid).

We investigate 4 common expectations and claims about GTs. Although conclusions are more nuanced (see the paper), we label them with pretentious badges  Confirmed /  Busted / 🤔 Plausible

1️⃣ Are GTs theoretically more expressive than GNNs?

 Busted. There is no inherent property of GTs that makes them more expressive. Instead, their expressivity stems from their positional/structural encodings. (And making those maximally expressive is as hard as solving the graph isomorphism problem.)

2️⃣ Can graph structure be effectively incorporated into GTs?

 Confirmed. GTs can identify graph edges (easy task), count triangles (medium), and distinguish regular graphs (hard task). But there is still room for improvement.

3️⃣ Does global attention reduce over-smoothing?

🤔 Plausible. In heterophilic graphs, GTs clearly outperform vanilla GNNs but still lag behind specialized SOTA models. Maybe we need a different structural bias?

4️⃣ Do GTs alleviate over-squashing better than GNN models?

🤔 Plausible. The Transformer perfectly solves NeighborsMatch where GNNs struggle. However, this is a synthetic “retrieval” task that doesn’t test (sub)graph representation.

🎁 Bonus: Attention matrices contain meaningful patterns and explain GT performance.

 Busted. We couldn’t find any strong interpretability of attention scores for downstream tasks. We suggest following Bertology in NLP that moved from dissecting attention to designing benchmarks.
Temporal Graph Learning RG, Clifford Networks, Forward-Forward GNNs

Temporal Graph Learning reading group - a new reading group by McGill and NEC Labs researchers happening every Thursday 11-12 Eastern Time!

Jure Leskovec (Stanford) gave a talk “Towards Universal Cell Embeddings” at the Broad Institute (slides are available) covering the most recent research on single cell analysis with GNNs including MARS for novel cell types, SATURN for joint cell-protein representations, STELLAR for cancer tissue annotation, and GEARS for predicting effects of multi-gene perturbations

New papers you might want to have a look at:
Geometric Clifford Algebra Networks
David Ruhe, Jayesh K. Gupta, Steven de Keninck, Max Welling, and Johannes Brandstetter

MSR recently dropped a hefty 50-pager on Clifford Algebras for PDEs, here is the adaptation of Clifford layers for GNNs with applications in object dynamics and fluid mechanics! Check the Twitter thread by David Ruhe for cool visual examples.

Graph Neural Networks Go Forward-Forward
Daniele Paliotta, Mathieu Alain, Bálint Máté, François Fleuret

At the recent NeurIPS’22, Geoff Hinton presented an idea of forward-forward networks without backprop. Instead of building a computation graph for the backward pass, you’d encode the label together with an input data point and ask the trainable layer to distinguish positive label from a negative sample. Here, the authors expand the idea to GNNs and probe forward-forward on graph classification tasks. Interestingly, the results are not that bad - in some cases, FF-GNNs even outperform their backprop counterparts.

DiffDock, a diffusion model for protein-ligand docking, has been updated to include new results on blind docking with ESMFold - the model now drastically outperforms industrial tools on RMSD docking accuracy within 2 Angstrom, check the full thread by Gabriele Corso
Saturday News

Yanze Wang and Yuanqi Du published An Introduction to Molecular Dynamics Simulations - a great resource to get acquainted with the basics of molecular dynamics. This is a part of the bigger AI 4 Science 101 community effort to bring more ML folks into the world of complex fundamental scientific problems - great initiative 👏

No week without new protein models! Uni-Fold Musse by DP Technology is a no-MSA protein structure prediction system. Based on ESM-2 3B, Uni-Fold further pre-trains the base with more multimere tasks and outperforms other no-MSA proteins LMs (still lagging behind MSA-based Alphafold Multimere though).

Meanwhile, DGL officially released DGL 1.0 featuring a new DGL Sparse package with optimized kernels for sparse matrix multiplications.

Some upcoming events for different parts of the globe:

- SIAM symposiums: Geometric Structures in Neuroscience, Methods in Geometry and Shape Analysis featuring Erik Bekkers, Sophia Sanborn, and many other researchers - Feb 27th, Amsterdam
- The Machine Learning on Graphs workshop at WSDM’23 - March 3rd, Singapore

New papers you might be interested in:

On the Expressivity of Persistent Homology in Graph Learning by Bastian Rieck - the paper establishes the bridge between topology and expressiveness with formal proofs how topological features correspond to higher-order WL tests. Besides, the paper is very friendly to newcomers so give it a read if you want to learn more about persistent homology. Check the Twitter thread by Bastian for more info.

A handful of new of diffusion papers:
- Geometry-Complete Diffusion for 3D Molecule Generation - a strong competitor to the Equivariant Diffusion Model (EDM, ICML’22) used pretty much everywhere in molecule/protein models
- SE(3) diffusion model with application to protein backbone generation - a diffusion model over rigid bodies, FrameDiff
- MiDi: Mixed Graph and 3D Denoising Diffusion for Molecule Generation - the next version of the discrete diffusion model DiGress (to be presented at ICLR’23) now enriched with 3D info
- Aligned Diffusion Schrödinger Bridges - a new diffusion model approach based on Schrödinger bridges for rigid protein docking
GraphML News

A new blog post Graph Neural Networks for Molecular Dynamics Simulations by Sina Stocker and Johannes Gasteiger covering the basics of molecular dynamics with GemNet and code examples.

Teaching old labels new tricks in heterogeneous graphs - a new post by Google Research introducing Knowledge Transfer Networks (NeurIPS’22) - a method for zero-shot transfer on heterogeneous graphs with extreme label scarcity.

TigerGraph incorporated NodePiece, a compositional tokenization approach for scalable and inductive graph learning, into a new release - as the author of NodePiece, I am very excited to see academic efforts adopted in industrial DBs! Btw, NodePiece-based approaches have been taking the whole top-10 in OGB WikiKG 2 link prediction benchmark for almost two years now.

All talk recordings of the IPAM UCLA workshop on Deep Learning and Combinatorial Optimization are now available! Featuring researchers such as Stefanie Jegelka, Petar Veličković, Xavier Bresson, Kyle Cranmer, and many more.

Weekend reading:

Complexity of Many-Body Interactions in Transition Metals via Machine-Learned Force Fields from the TM23 Data Set - a new TM23 dataset for molecular dynamics modeling transition metals.

Reducing SO(3) Convolutions to SO(2) for Efficient Equivariant GNNs - a neat math trick to reduce computational complexity of equivariant GNNs
A Critical Look at the Evaluation of GNNs under Heterophily: Are we really making progress?

ICLR 2023, guest post by Oleg Platonov

Stop evaluating on squirrel and chameleon

It is often believed that standard GNNs work well for node classification only on homophilous graphs. Thus, many specialized models have been recently proposed for learning on heterophilous graphs. However, these models are typically evaluated on the same set of six heterophilous graphs called Squirrel, Chameleon, Actor, Texas, Cornell, and Wisconsin. In our recent paper, we show that these datasets have serious problems, which make results obtained using them unreliable. These problems include low diversity, small graph size, and strong class imbalance. But the most significant is the presence of a large number of duplicated nodes in Squirrel and Chameleon, which leads to train-test data leakage. We show that removing the duplicates strongly affects the performance of GNNs on these datasets.

We have proposed an alternative benchmark of five diverse heterophilous graphs that come from different domains and exhibit a variety of structural properties. Our benchmark includes a word dependency graph Roman-empire, a product co-purchasing network Amazon-ratings, a synthetic graph emulating the minesweeper game Minesweeper, a crowdsourcing platform worker network Tolokers, and a question-answering website interaction network Questions.

We have evaluated a large number of models, both standard GNNs and heterophily-specific methods, and, surprisingly, found that standard GNNs augmented with skip connections and layer normalization almost always outperform specialized models. We hope that the proposed benchmark and the insights obtained using it will facilitate further research in learning under heterophily.

The datasets are available on GitHub and in PyG Datasets. For more details, see our paper.
💐 GraphML News 🌷

Everything you wanted to know about Clifford Layers and its applications in PDE modeling and molecular dynamics is now collected on a single website, sprinkle with the recent LoGaG presentation (video) and add a little bit of Geometric Algebra intro from bivector for the best experience.

Some freshly arxived papers you might want to grab for the weekend reading:

Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models? by Knyazev et al - introduces Graph HyperNetwork v3 for predicting the weights of neural network architectures. The previous version GHN-2 got a massive recognition at NeurIPS’21 including an interview with Yannic Kilcher. Instead of training neural nets, you could use GHN to estimate model params in one forward pass and it demonstrated a non-trivial performance on ImageNet. In the new version, the authors apply a Graphormer on the model’s computation graph DAG to frame the task as node regression where node parameters correspond to weight matrices in the target neural nets. You can also use GHN for better initialization of model weights instead of random init.

SUREL+: Moving from Walks to Sets for Scalable Subgraph-based Graph Representation Learning by Yin et al - the next iteration of SUREL for link prediction where subgraphs are replaced with random walks for better scalability
GraphML News

GPT-4 made the graph community scratching their heads as well (maybe not as much as academic NLP researchers) - look at the molecule search example at the very end of the technical report. Andrew White was among the few researchers working on this example, he compiled a thread how GPT-4 empowered with external tools can do a very impressive job proposing new molecules.

Minkai Xu delivered a lecture “Geometric Graph Learning From Representation to Generation” as a part of the cs224w ML with Graphs course at Stanford (perhaps the most famous class about Graph ML). The lecture covers the basics of invariant and equivariant GNNs and introduces GeoDiff, a diffusion model for generating 3D molecules. Slides of the whole Winter’23 course are now available.

Weekend reading:

The Denoscriptive Complexity of Graph Neural Networks - a massive 88-pager from Martin Grohe proving that GNNs fall into the TC0 complexity class. This is a potential breakthrough since many database query languages fall into AC0 and TC0.

Zero-One Laws of Graph Neural Networks by Adam-Day et al. - shows an interesting result that GCN-like MPNNs with random features map final graph representations to zeros or ones with the growing size of graphs. GATs and GINs are not (yet) prone to this behavior.

Allegro-Legato: Scalable, Fast, and Robust Neural-Network Quantum Molecular Dynamics via Sharpness-Aware Minimization by Ibayashi et al - an improved version of Allegro, current SOTA in molecular dynamics simulations, with faster convergence and better stability.
GraphML News, March 25th edition

Some news you might have missed in the graph learning area after the week of massive AGI claims and GPT plugins announcement.

ICLR 2023 announced Outstanding Papers - great to see two GNN papers there! One Outstanding Award went to Rethinking the Expressive Power of GNNs via Graph Biconnectivity, an honorable mention went to Conditional Antibody Design as 3D Equivariant Graph Translation.

New releases of the main graph libraries:

- PyG announced 2.3.0 with the full PyTorch 2.0 support where scatter and sparse APIs are now parts of the main torch, so you might expect less hassle installing PyG dependencies now. Besides, new torch.compile() brings 2-3x speed improvements for many common GNN architectures.
- DGL presented a new version 1.0 at the recent LoGaG reading group, the video recording is already available. The new version introduces a new sparse API and further scalability improvements.

New papers for the weekend reading:

A Survey on Oversmoothing in Graph Neural Networks by T. Konstantin Rusch, Michael Bronstein, and Siddhartha Mishra - everything you wanted to know about known sources of oversmoothing and ways to alleviate it - including the recent Gradient Gating framework we reviewed a while ago.

Zero-shot prediction of therapeutic use with geometric deep learning and clinician centered design by Kexin Huang, Payal Chandak, et al - introduces TxGNN, a pre-trained GNN for identifying therapeutic opportunities for diseases with limited treatment options (and completely new diseases in the zero-shot manner).
Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases

Hongyu Ren, Mikhail Galkin, Michael Cochez, Zhaocheng Zhu, Jure Leskovec

Our new work (65-pager 👀) on rethinking graph databases in the era of GNNs and neural reasoners where we explore the concept of Neural Graph Databases (NGDBs).

1️⃣ Why do we need NGDBs and what do current graph DBs lack? The biggest motivation is incompleteness - symbolic SPARQL/Cypher-like engines can’t cope with incomplete graphs at scale. In fact, in some cases, SPARQL reasoners might run indefinitely. Neural graph reasoning, however, is already mature enough to work in large and noisy incomplete graphs.

2️⃣ What are NGDBs? While their architecture might look similar to traditional DBs, the essential difference is in ditching symbolic edge traversal and answering queries in the latent space (including logical operators). Broadly, NGDBs are equipped to answer both “what is there?” and “what is missing?” queries whereas standard graph DBs are limited to traversal-only scenarios assuming the graph is complete.

3️⃣ In the NGDB framework, we create a taxonomy and survey 40+ neural graph reasoning models that can potentially serve as Neural Query Engines under 3 main categories: Graphs (theory and expressiveness), Modeling (graph learning), and Queries (what can we answer).

4️⃣ Finally, we outline a handful of key challenges and open problems in the area of Graph ML + Databases and for NGDBs in particular. Lots of cool stuff to work on! (especially if you are in an existential crisis after GPT-4, eg, designing LLM interfaces for NGDBs and how to let NGDBs improve structure, compress and accelerate LLMs are also promising directions)

There is much more to tell about this work so we prepared more resources to learn about NGDBs:

📚 blog post with a gentle intro and images

📜 arxiv preprint

🛠️ github repo with the taxonomy and curated list of relevant papers
​​🐦 Special: Graph algorithms behind The Twitter Algorithm

Twitter has recently published some details on their tweet recommendation algorithm (denoted as The Algorithm). Let’s dive into it from the graph learning perspective - it does have some interesting features spanning clustering, KG embeddings, ANN, and PageRank.

Data-wise, the GraphJet framework operates on the Twitter interaction graph (in-memory) supporting dynamic edge updates and lookup queries. Several algorithms prepare features:

- Graph clustering based on sparse binary factorization (SBF) to mine communities, and then the SimClusters approximate nearest neighbor search library to query for the most similar clusters. There are approximately 145k communities on Twitter and they are updated every few weeks.

- Twitter Heterogeneous Information Network (TwHIN) embedding - this is largely based on the classic TransE for knowledge graph embedding. The KG is a multi-relational graph among Users, Tweets, Ads, and Advertisers. TwHIN learns shallow embeddings for all nodes. For inductive capabilities — building embeddings for newly arrived tweets or users — the model simply aggregates embeddings of neighboring nodes (my 2 cents - NodePiece would fit pretty well into this setup).

- RealGraph models the user interactions graph and outputs the likelihood of two users’ interaction. There is a relatively straightforward logistic regression model for edge scoring on top of the RealGraph.

- TweepCred - a PageRank score for users, this is your “influencer” score.

In your feed, 50% of tweets come from your network (RealGraph features), 50% from out-of-network (SimCluster, TwHIN, and Social Graph traversals). 1500 candidates are sent to the ranking models: a lightweight logreg and heavier 48M-param neural net based on MaskNet. Ranked candidates are subject to filtering and postprocessing.

Overall, the recommender pipeline runs about 5 billion times a day, so the latency requirements do play a major role in selecting shallow’ish graph models. Check the repos for more details.

We’ll leave other peculiarities like “the Elon feature” for other researchers 🙂
1
Graph ML News, April 1st edition

Apart from Neural Graph Databases and Twitter Algorithm (and SIGBOVIK), a few more things happened this week.

The Learning on Graphs Conference (LoG) 2023 has been announced! One of the most premiere graph learning venues is going to take place online on Nov 27-30th accompanied by local meetups, you can actually volunteer and organize it at your place!

Baker Lab open-sourced RF Diffusion, a SOTA protein generation model, as part of ColabFold. We covered RF Diffusion a few months ago and its capabilities are quite astounding. Since the time of announcement, the authors further improved the quality and managed to test hundreds of generated proteins in the wet lab to test their properties.

ICML 2023 announced accepted workshops - the graph learning audience might want to attend:

- Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators
- Topology, Algebra, and Geometry in Machine Learning (TAG-ML)
- Knowledge and Logical Reasoning in the Era of Data-driven Learning
- Sampling and Optimization in Discrete Space
- The Synergy of Scientific and Machine Learning Modelling (SynS & ML)
- Workshop on Computational Biology
- Structured Probabilistic Inference and Generative Modeling

Rishi Puri and Matthias Fey published a post on accelerating Heterogeneous Graph Transformers in pyg-lib resulting in about 3x speed boost. Meanwhile, AWS Labs released GraphStorm, a Graph ML framework for enterprise use-cases based on DGL.

For the weekend reading, check out Machine Learning for Partial Differential Equations by Steven L. Brunton and J. Nathan Kutz - perhaps the best intro into ML with PDEs. Yes, it is from the author of awesome YouTube lectures on dynamical systems, physics-inspired ML, and control theory.
Graph ML News, April 8th edition - MoML’23, GLB’23, and more

Molecular Machine Learning Conference (MoML) 2023 is going to take place at Mila in Montreal on May 29th. MoML is the premier venue for ML applications in drug discovery, quantum chemistry, molecular dynamics, and protein design. Confirmed speakers are Yoshua Bengio (Mila), Djork-Arné Clevert (Pfizer), Marinka Zitnik (Harvard), Gregory Bowman (UPenn), Mohammed AlQuraishi (Columbia), and Dominique Beaini (Mila, Valence Discovery). Posters submission deadline is April 24th, The ‘22 event was held at MIT and was a huge success!

In this context, University of Amsterdam (UvA) announced 4 open postdoc positions in the new program on AI 4 Molecules & Materials.

The Workshop on Graph Learning Benchmarks (GLB’23) will be held in conjunction with KDD 2023 in Long Beach (California) on Aug 7th. Submit your works on new graph datasets, benchmarks, and software until May 26th. The workshop is non-archival.

PyG expands the range of supported hardware to Graphcore IPUs with examples on training temporal GNNs, molecular property prediction GNNs, and inductive KG reasoning GNNs on IPUs. Following up on that, you might want to attend the GNN meetup organized by Graphcore and Kumo in London on April 13th next week.

For the weekend reading, check out EigenFold: Generative Protein Structure Prediction with Diffusion Models by Bowen Jing, Ezra Erives, Peter Pao-Huang, Gabriele Corso, Bonnie Berger, and Tommi Jaakkola. The take on protein tasks by the authors of DiffDock 😉
1
GraphML News, April 16th edition - Generalist Medical AI, more diffusion papers

No particularly outstanding Graph ML event or announcement (that we hadn’t covered before) happened this week, so here is a collection of fresh papers you might want to have a look at:

Foundation models for generalist medical artificial intelligence - perhaps a landmark paper on using foundation models and many its exciting applications like generative models (eg, text-to-molecule or text-to-protein) in real world medicine.

DiffDock-PP: Rigid Protein-Protein Docking with Diffusion Models - extension of the famous DiffDock that translates and rotates unbound protein structures into their bound conformations.

Graph Generation with Destination-Driven Diffusion Mixture - the next version of the score-matching GDSS generative model (ICML 2022). Here, the model learns to “keep in mind” the final destination of the diffusion process at each time step - this trick greatly improves the performance in 2D and 3D tasks.

DIFUSCO: Graph-based Diffusion Solvers for Combinatorial Optimization - turns out discrete diffusion on graphs is able to generate very strong priors for combinatorial optimization tasks like Traveling Salesman or Maximum Independent Set when paired with a postprocessing solver.

GraphGUIDE: interpretable and controllable conditional graph generation with discrete Bernoulli diffusion - another take on discrete diffusion on graphs where authors define Bernoulli noising process as adding/removing/flipping edges instead of marginal transition probabilities mined from data (like in DiGress). Strength of that approach is that any intermediate state with added noise is still a legit graph retaining its sparsity instead of adding direct noise to node features or adjacency matrix.
1
​​GraphML News (April 23rd) - Topological Deep Learning, Scalable Molecular Simulations, Network Games

Architectures of Topological Deep Learning: A Survey on Topological Neural Networks by Mathilde Papillon, Sophia Sanborn, Mustafa Hajij, and Nina Miolane - a wonderful survey on Topological Deep Learning explaining basic concepts from sets and graphs to simplicial and cellular complexes using message passing framework. The survey also covers prominent deep learning architectures employing topological features and tasks that benefit from them. Must read 👍

Scaling the leading accuracy of deep equivariant models to biomolecular simulations of realistic size by Albert Musaelian, Anders Johansson, Simon Batzner, Boris Kozinsky - the work introduces Allegro v2, an improved version of the SOTA equivariant model Allegro, probed on the humongous problem scale: nanoseconds of the full HIV capsid (44M atoms) and scaling up to 100M atom structures on 5120 A100 GPUs 👀.

New blogs:

Michael Bronstein and Emanuele Rossi wrote an article on Learning Network Games - an intersection of the game theory and Graph ML. The main task is to infer the network structure between the agents in a game based on the observations of actions and outcomes.

Not directly about graphs, but Shashank Prasanna wrote an intro to torch.compile() introduced in PyTorch 2.0 and what’s happening under the hood when you execute it on your model.
Graph ML News (April 29th) - Upcoming ICLR and Accepted ICML papers

ICLR in Kigali starts next week! There is going to be a flurry of materials and reviews prepared by small and big labs, for instance, A Guide to ICLR 2023 — 10 Topics and 50 papers you shouldn't miss - so we’ll try to keep you updated. Meanwhile, the Machine Learning for Drug Discovery (MLDD) and ML4Materials workshops announced accepted papers - those are nice venues to see where the community moves and what would be next major conference submissions.

More resources on topology: 🍩 Database of Original & Non-Theoretical Uses of Topology (DONUT) - a collection of TDA applications beyond machine learning. TopoEmbedX - a python library for working with topological data, pretty much networkx for higher-order structures. Following that, a fresh talk on the Curvature for Graph Learning by Bastian Rieck!

Finally, ICML acceptances have arrived - some particularly interesting preprints that made it to the conference include:

- Graph Neural Networks can Recover the Hidden Features Solely from the Graph Structure
- STRIDERNET: A Graph Reinforcement Learning Approach to Optimize Atomic Structures on Rough Energy Landscapes
- MoleculeSDE - A Group Symmetric Stochastic Differential Equation Model for Molecule Multi-modal Pretraining (a project website so far)
- GREAD: Graph Neural Reaction-Diffusion Networks
- On the Expressive Power of Geometric GNNs
- Improved Graph HyperNetwork (GHN-3)
Graph ML News (May 6th)

ICLR’23 has finished this week, to those who travelled to Kigali - have a safe trip back 🙂

Meanwhile, you might have missed the ICLR Blogposts Track - a collection of insightful articles for which it is often more handy to express the content as a blog post rather than a full paper. Particularly interesting are On Universality of Neural Networks on Sets vs Graphs (by Fabian B. Fuchs and Petar Veličković), on Neural PDE Solvers (by Yolanne Lee), and Thinking Like Transformer (by Alexander Rush, Gail Weiss). I would generally recommend submitting there (my post was accepted at ICLR’22 Blog Post Track) - it was a pleasant experience and you also do some community serving writing about your research.

A few upcoming events:

LoG Paris Meetup on June 14th in Paris at CentraleSupélec, Université Paris-Saclay with the keynote from Michael Bronstein.

Michael is going to be one of the keynote speakers at ECML PKDD 2023 in September in Torino - the list of accepted workshops should appear soon, so far we know about the Workshop on Learning and Mining with Blockchains. If you fancy Lisboa in September, you might want to submit to the Special Track on AI on Networks for Social Good, part of the ACM Conference on Information Technology for Social Good. Thanks to Manuel Dileo for the pointers 👏

For the weekend reading, have a look at:

Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes by Simran Arora and Christopher Ré’s lab

When Do Graph Neural Networks Help with Node Classification: Investigating the Homophily Principle on Node Distinguishability by Sitao Luan feat. Jure Leskovec and Doina Precup

An Exploration of Conditioning Methods in Graph Neural Networks by Yeskendir Koishekenov and Erik J. Bekkers