Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere
from the ICML2020.
Previously it was noted, that if one swaps contrastive loss with a tighter bound on MI, the downstream quality decreases. The authors propose, therefore, to move from InfoMax intuition to rather simple concepts: alignment and uniformity. The former enforces that positive pairs stay as close as possible and the latter enforces that all samples stay as evenly distributed as possible.
These components are empirically important for downstream performance. Furthermore, their direct optimization may outperform the classical contrastive loss training.
With images and a bit longer: here
Source: here
from the ICML2020.
Previously it was noted, that if one swaps contrastive loss with a tighter bound on MI, the downstream quality decreases. The authors propose, therefore, to move from InfoMax intuition to rather simple concepts: alignment and uniformity. The former enforces that positive pairs stay as close as possible and the latter enforces that all samples stay as evenly distributed as possible.
These components are empirically important for downstream performance. Furthermore, their direct optimization may outperform the classical contrastive loss training.
With images and a bit longer: here
Source: here
Telegraph
Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere
The work from the ICML2020 goes deeper in the understanding of contrastive learning. Authors diverged from a proposal that contrastive loss maximizes the mutual information between the positive views because it was shown that optimizing tighter bound on MI…
Well, there was more than three years since the last post here. In these three years a lot has changed. I'm done with my PhD in Heidelberg Uni, and moved on to JetBrains to lead a team on AI agents. With all this on my hands, I will have even less time for writing the reviews I'd like to read. But on the other hand, I'd still like to share the papers I read.
So, instead, I will post here links to the papers that I read. You can view this experiment as copycatting the @j_links but with a bias towards LLMs and probably agents.
So, instead, I will post here links to the papers that I read. You can view this experiment as copycatting the @j_links but with a bias towards LLMs and probably agents.
🔥10👍5