This media is not supported in your browser
VIEW IN TELEGRAM
basketball players recognition with RF-DETR, SAM2, SigLIP and ResNet
/r/computervision
https://redd.it/1nv4d8u
/r/computervision
https://redd.it/1nv4d8u
D Monthly Who's Hiring and Who wants to be Hired?
For Job Postings please use this template
>Hiring: [Location\], Salary:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] and [Brief overview, what you're looking for\]
For Those looking for jobs please use this template
>Want to be Hired: [Location\], Salary Expectation:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] Resume: [Link to resume\] and [Brief overview, what you're looking for\]
​
Please remember that this community is geared towards those with experience.
/r/MachineLearning
https://redd.it/1nuwj5t
For Job Postings please use this template
>Hiring: [Location\], Salary:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] and [Brief overview, what you're looking for\]
For Those looking for jobs please use this template
>Want to be Hired: [Location\], Salary Expectation:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] Resume: [Link to resume\] and [Brief overview, what you're looking for\]
​
Please remember that this community is geared towards those with experience.
/r/MachineLearning
https://redd.it/1nuwj5t
Reddit
From the MachineLearning community on Reddit
Explore this post and more from the MachineLearning community
D Self-Promotion Thread
Please post your personal projects, startups, product placements, collaboration needs, blogs etc.
Please mention the payment and pricing requirements for products and services.
Please do not post link shorteners, link aggregator websites , or auto-subscribe links.
\--
Any abuse of trust will lead to bans.
Encourage others who create new posts for questions to post here instead!
Thread will stay alive until next one so keep posting after the date in the noscript.
\--
Meta: This is an experiment. If the community doesnt like this, we will cancel it. This is to encourage those in the community to promote their work by not spamming the main threads.
/r/MachineLearning
https://redd.it/1nvrmw5
Please post your personal projects, startups, product placements, collaboration needs, blogs etc.
Please mention the payment and pricing requirements for products and services.
Please do not post link shorteners, link aggregator websites , or auto-subscribe links.
\--
Any abuse of trust will lead to bans.
Encourage others who create new posts for questions to post here instead!
Thread will stay alive until next one so keep posting after the date in the noscript.
\--
Meta: This is an experiment. If the community doesnt like this, we will cancel it. This is to encourage those in the community to promote their work by not spamming the main threads.
/r/MachineLearning
https://redd.it/1nvrmw5
Reddit
From the MachineLearning community on Reddit
Explore this post and more from the MachineLearning community
This media is not supported in your browser
VIEW IN TELEGRAM
How was this achieved? They are able to track movements and complete steps automatically
/r/computervision
https://redd.it/1oeosd3
/r/computervision
https://redd.it/1oeosd3
Google PhD Fellowship recipients 2025 D
Google have just announced the 2025 recipients.
What are the criteria to get this fellowship?
https://research.google/programs-and-events/phd-fellowship/recipients/
/r/MachineLearning
https://redd.it/1ogy6z9
Google have just announced the 2025 recipients.
What are the criteria to get this fellowship?
https://research.google/programs-and-events/phd-fellowship/recipients/
/r/MachineLearning
https://redd.it/1ogy6z9
research.google
Google PhD Fellowship recipients
The Google PhD Fellowship Program recognizes outstanding graduate students doing exceptional work in computer science, related disciplines, or promising research areas. Meet the recipients.
masters in computational linguistics uppsala or tübingen
hi all
i'm planning to apply for a masters in computational linguistics / language technology as an international (non EU/EEA) student. i've done research on programs and have narrowed down on these few:
1. uppsala's MA language technology masters
2. tübingen's MA computational linguistics
3. stockholm's MA AI and language
4. stuttgart's MSc Computational Linguistics
5. konstanz's MA speech and language processing
6. helsinki's MA linguistic diversity and digital humanities (language technology track)
7. potsdam's MSc cognitive systems
coming from a linguistic background (bachelor with honours), i'm looking at 2 year programs as i believe i'd be able to learn more programming theory + technical skills that would better equip me for an industry role in the tech sector. i'm thus not as keen on 1 year programs such as leiden's linguistics (comp ling track), VU's linguistics language and AI, or groningen's speech technology programs. i'm learning python online to gain some basic proficiency in programming before starting the masters.
uppsala and tübingen are my top 2 choices if i were to be accepted, particularly because they seem more accessible to prospective students from a linguistic background based on my research. i'm hoping to gain more information about these two cities and their programs based on people's personal experience so that i can make an informed choice. these are my questions:
1. ACCESSIBILITY: how accessible is the program for those with a linguistic background? accessible could mean being less CS-intensive, or that there are foundational classes in programming/ML/AI to help those with a humanities background ease into the program with less difficulty
2. TEACHING QUALITY: what's your experience with the quality of teaching, how well organised the course is, helpfulness of professors, whether studying resources are provided or you'd have to source for your own materials, etc
3. JOB OPPORTUNITIES: in which city would an international student find it easier to get a job after graduating?
4. HEALTHCARE: how easy is it to get a medical appointment for minor and major illnesses in the city, both as a student and after graduation?
5. SOCIAL LIFE: how open people are to making new (local) friends, especially if one is not fluent in Swedish (for uppsala) or German (for tübingen)?
6. ACTIVITIES: which city has more options for activities if i'm not a huge fan of partying, alcohol, pub crawls? (occasional outings for special occassions are fine, but it's not something i would do frequently or particularly enjoy) i'm open to hiking, bouldering, music events, board games, reading, or any other activity
7. TRANSPORT: how well-connected and accessible is public transport within these cities, and also from the city to other cities?
8. COST OF LIVING: it seems like living costs (on numbeo) are generally lower in uppsala than tübingen (which is counter to my initial impression that CoL is higher in nordic countries) and i'm wondering if this is really the case? i've also read comments that tübingen is an expensive city to live in - would this make the cost of living in tübingen 'comparable' to uppsala?
9. QUALTITY OF LIFE: how would you describe the overall quality of life in uppsala/tübingen, and if you have experience living in both, is the quality of life noticeably better in one of the cities? (my impression is that anywhere in the nordics would have a better quality of life but i'd like to hear your experience if you've lived there)
i'd be grateful if you could share your experience in uppsala and/or tübingen, or if you have experience with the other programs (and countries). thanks so much!
TLDR: international student (non EU/EEA) with BA (Honours) in Linguistics looking for advice on whether to choose uppsala or tübingen for masters in computational linguistics/language technology
/r/LanguageTechnology
https://redd.it/1omgy7r
hi all
i'm planning to apply for a masters in computational linguistics / language technology as an international (non EU/EEA) student. i've done research on programs and have narrowed down on these few:
1. uppsala's MA language technology masters
2. tübingen's MA computational linguistics
3. stockholm's MA AI and language
4. stuttgart's MSc Computational Linguistics
5. konstanz's MA speech and language processing
6. helsinki's MA linguistic diversity and digital humanities (language technology track)
7. potsdam's MSc cognitive systems
coming from a linguistic background (bachelor with honours), i'm looking at 2 year programs as i believe i'd be able to learn more programming theory + technical skills that would better equip me for an industry role in the tech sector. i'm thus not as keen on 1 year programs such as leiden's linguistics (comp ling track), VU's linguistics language and AI, or groningen's speech technology programs. i'm learning python online to gain some basic proficiency in programming before starting the masters.
uppsala and tübingen are my top 2 choices if i were to be accepted, particularly because they seem more accessible to prospective students from a linguistic background based on my research. i'm hoping to gain more information about these two cities and their programs based on people's personal experience so that i can make an informed choice. these are my questions:
1. ACCESSIBILITY: how accessible is the program for those with a linguistic background? accessible could mean being less CS-intensive, or that there are foundational classes in programming/ML/AI to help those with a humanities background ease into the program with less difficulty
2. TEACHING QUALITY: what's your experience with the quality of teaching, how well organised the course is, helpfulness of professors, whether studying resources are provided or you'd have to source for your own materials, etc
3. JOB OPPORTUNITIES: in which city would an international student find it easier to get a job after graduating?
4. HEALTHCARE: how easy is it to get a medical appointment for minor and major illnesses in the city, both as a student and after graduation?
5. SOCIAL LIFE: how open people are to making new (local) friends, especially if one is not fluent in Swedish (for uppsala) or German (for tübingen)?
6. ACTIVITIES: which city has more options for activities if i'm not a huge fan of partying, alcohol, pub crawls? (occasional outings for special occassions are fine, but it's not something i would do frequently or particularly enjoy) i'm open to hiking, bouldering, music events, board games, reading, or any other activity
7. TRANSPORT: how well-connected and accessible is public transport within these cities, and also from the city to other cities?
8. COST OF LIVING: it seems like living costs (on numbeo) are generally lower in uppsala than tübingen (which is counter to my initial impression that CoL is higher in nordic countries) and i'm wondering if this is really the case? i've also read comments that tübingen is an expensive city to live in - would this make the cost of living in tübingen 'comparable' to uppsala?
9. QUALTITY OF LIFE: how would you describe the overall quality of life in uppsala/tübingen, and if you have experience living in both, is the quality of life noticeably better in one of the cities? (my impression is that anywhere in the nordics would have a better quality of life but i'd like to hear your experience if you've lived there)
i'd be grateful if you could share your experience in uppsala and/or tübingen, or if you have experience with the other programs (and countries). thanks so much!
TLDR: international student (non EU/EEA) with BA (Honours) in Linguistics looking for advice on whether to choose uppsala or tübingen for masters in computational linguistics/language technology
/r/LanguageTechnology
https://redd.it/1omgy7r
Reddit
From the LanguageTechnology community on Reddit
Explore this post and more from the LanguageTechnology community
Wheres the Best Place to Rent a GPU for Model Training
Im planning some AI model training and want to rent a powerful GPU like an RTX 4090 instead of buying onejust curious.
Which platforms do you usually use Hows the pricing and availability in your area ?
/r/deeplearning
https://redd.it/1onzsyi
Im planning some AI model training and want to rent a powerful GPU like an RTX 4090 instead of buying onejust curious.
Which platforms do you usually use Hows the pricing and availability in your area ?
/r/deeplearning
https://redd.it/1onzsyi
Reddit
From the deeplearning community on Reddit: Wheres the Best Place to Rent a GPU for Model Training
Posted by faheemResource-9752 - 15 votes and 21 comments
How does Qwen3-Next Perform in Complex Code Generation & Software Architecture?
https://redd.it/1opwl9h
@datascientology
https://redd.it/1opwl9h
@datascientology
Reddit
From the deeplearning community on Reddit: How does Qwen3-Next Perform in Complex Code Generation & Software Architecture?
Explore this post and more from the deeplearning community
Linguistics Student looking for career advice
I'm currently in my third year of my Linguistics degree. Next year (2026-2027) will be my last and I will specialize in Computational Linguistics. I would like to get into the world of NLP Engineering, or NLP in any way. What can I do courses or certificates wise? I would like to start working asap, and I wouldn't mind doing a Master's degree while I work. Any recommendation or suggestion is welcome 😁
/r/LanguageTechnology
https://redd.it/1oqshsp
I'm currently in my third year of my Linguistics degree. Next year (2026-2027) will be my last and I will specialize in Computational Linguistics. I would like to get into the world of NLP Engineering, or NLP in any way. What can I do courses or certificates wise? I would like to start working asap, and I wouldn't mind doing a Master's degree while I work. Any recommendation or suggestion is welcome 😁
/r/LanguageTechnology
https://redd.it/1oqshsp
Reddit
From the LanguageTechnology community on Reddit
Explore this post and more from the LanguageTechnology community
D Monthly Who's Hiring and Who wants to be Hired?
For Job Postings please use this template
>Hiring: [Location\], Salary:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] and [Brief overview, what you're looking for\]
For Those looking for jobs please use this template
>Want to be Hired: [Location\], Salary Expectation:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] Resume: [Link to resume\] and [Brief overview, what you're looking for\]
​
Please remember that this community is geared towards those with experience.
/r/MachineLearning
https://redd.it/1okj2rw
For Job Postings please use this template
>Hiring: [Location\], Salary:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] and [Brief overview, what you're looking for\]
For Those looking for jobs please use this template
>Want to be Hired: [Location\], Salary Expectation:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] Resume: [Link to resume\] and [Brief overview, what you're looking for\]
​
Please remember that this community is geared towards those with experience.
/r/MachineLearning
https://redd.it/1okj2rw
Reddit
From the MachineLearning community on Reddit
Explore this post and more from the MachineLearning community
I think we found a third phase of grokking — has anyone else seen this?
/r/deeplearning
https://redd.it/1oyrmfy
/r/deeplearning
https://redd.it/1oyrmfy
Apple AIML Residency Program 2026 R
Haven't seen a 2026 post - wanted to use this to consolidate info from everyone on the process. Anyone have any idea when they start sending out info session updates?
/r/MachineLearning
https://redd.it/1p0lart
Haven't seen a 2026 post - wanted to use this to consolidate info from everyone on the process. Anyone have any idea when they start sending out info session updates?
/r/MachineLearning
https://redd.it/1p0lart
Reddit
From the MachineLearning community on Reddit
Explore this post and more from the MachineLearning community
Theory for Karpathy's "Zero to Hero"
I always enjoyed "understanding" how LLMs work but never actually implemented it. After a friend recommended "zero to hero", I have been hooked!!
I am just 1.5 videos in, but still feel there are gaps in what I am learning. I am also implementing the code myself along with watching.
I took an ML class in my college but its been 8 years and I don't remember much.
He mentions some topics like "cross entropy loss", "learning rate decay" or "maximum likelihood estimation", but don't necessarily go in depth. I want to structure my learnings more.
Can someone please suggest reading material to read along with these videos or some pre-requisites? I do not want to fall in tutorial trap.
/r/deeplearning
https://redd.it/1p2lm6z
I always enjoyed "understanding" how LLMs work but never actually implemented it. After a friend recommended "zero to hero", I have been hooked!!
I am just 1.5 videos in, but still feel there are gaps in what I am learning. I am also implementing the code myself along with watching.
I took an ML class in my college but its been 8 years and I don't remember much.
He mentions some topics like "cross entropy loss", "learning rate decay" or "maximum likelihood estimation", but don't necessarily go in depth. I want to structure my learnings more.
Can someone please suggest reading material to read along with these videos or some pre-requisites? I do not want to fall in tutorial trap.
/r/deeplearning
https://redd.it/1p2lm6z
Reddit
From the deeplearning community on Reddit
Explore this post and more from the deeplearning community
Kimi K2 Thinking and Gemini 3 may have just shown OpenAI to be the AI bubble epicenter.
In an interview recently. Sam Altman commented that while he didn't think there was an AI bubble, some players were poised to lose a whole lot of money. Before Moonshot AI launched Kimi K2 Thinking on November 6 and before Google launched Gemini 3 on November 18, coming out of nowhere to massively leapfrog over every other AI by an historic margin, we might have wondered who these big losers in the AI race would ultimately be. Now that the numbers are in, it seems Altman might have presciently been talking about OpenAI.
Here's why. Let's begin with OpenAI's revenue projections for the next 5 years, all calculated before the launch of Kimi K2 Thinking and Gemini 3. A few key points stand out. First, OpenAI made those earnings projections about products that don't yet exist. Second, no one has yet created the demand for these products. And third, perhaps most importantly, OpenAI apparently didn't factor in the competition.
So when a 2-year-old startup from China open sources a thinking model it trained on less than $5 million, (by comparison GPT-5 cost OpenAI between $1.5 billion and $2 billion to train) you have to appreciate how much the AI landscape has shifted in a matter of days. And K2 Thinking was not just another model. It outperformed GPT-5. Grok 4, Gemini 2.5, and Claude 4 on many of the most important benchmarks. Of course the threat that OpenAI faces isn't really about Moonshot or Kimi K2 Thinking. It's about the world now knowing with absolute certainty that a small lab spending a miniscule amount of money can overtake ALL of the AI giants, while costing consumers and enterprises from 2 to 10 times less to run.
But Kimi K2 Thinking really isn't what OpenAI should be worried about. Let the following sink in:
Gemini 3 set monstrous new highs with 37.5% on Humanity’s Last Exam and 45.1% on ARC-AGI-2 in Deep Think mode—nearly doubling GPT-5 on both measures. It also scored 1501 Elo on LMArena and 91.9% on GPQA Diamond, outperforming GPT-5 and Claude across strategic reasoning, scientific knowledge, and abstract problem-solving. And that's just the beginning. Gemini 3 dominated its competitors far beyond those key benchmarks. If you're brave enough to review a brutally detailed account of how completely Gemini 3 trounced OpenAI and pretty much everyone else on pretty much everything, check out the following stats:
https://www.vellum.ai/blog/google-gemini-3-benchmarks?utm=&utmsource=direct&utmmedium=none
These scores position Gemini 3 way ahead -- perhaps years ahead -- of OpenAI on the metrics that matter most to both consumer and enterprise AI. Essentially Google just ate OpenAI's lunch, dinner and breakfast the next day.
But that's just the competition part of all of this. While Kimi K2 Thinking clearly demonstrates that massive data centers are just not necessary to building the most powerful AIs, OpenAI has committed $1.4 trillion in investments to build massive data centers, most of which won't be operational for years. It could be that this miscalculation -- this massive misappropriation of investment commitments -- best comes to explain why OpenAI may have positioned itself to be THE big loser in the AI bubble that Altman warned everyone about.
The bottom line is that if OpenAI doesn't pull a rabbit out of the hat during 2026, it may become the first major casualty of the AI bubble that will hopefully be limited to colossally unwise investments like those of OpenAI. For their sake, let's hope that it's a really, really big rabbit.
/r/deeplearning
https://redd.it/1p558ag
In an interview recently. Sam Altman commented that while he didn't think there was an AI bubble, some players were poised to lose a whole lot of money. Before Moonshot AI launched Kimi K2 Thinking on November 6 and before Google launched Gemini 3 on November 18, coming out of nowhere to massively leapfrog over every other AI by an historic margin, we might have wondered who these big losers in the AI race would ultimately be. Now that the numbers are in, it seems Altman might have presciently been talking about OpenAI.
Here's why. Let's begin with OpenAI's revenue projections for the next 5 years, all calculated before the launch of Kimi K2 Thinking and Gemini 3. A few key points stand out. First, OpenAI made those earnings projections about products that don't yet exist. Second, no one has yet created the demand for these products. And third, perhaps most importantly, OpenAI apparently didn't factor in the competition.
So when a 2-year-old startup from China open sources a thinking model it trained on less than $5 million, (by comparison GPT-5 cost OpenAI between $1.5 billion and $2 billion to train) you have to appreciate how much the AI landscape has shifted in a matter of days. And K2 Thinking was not just another model. It outperformed GPT-5. Grok 4, Gemini 2.5, and Claude 4 on many of the most important benchmarks. Of course the threat that OpenAI faces isn't really about Moonshot or Kimi K2 Thinking. It's about the world now knowing with absolute certainty that a small lab spending a miniscule amount of money can overtake ALL of the AI giants, while costing consumers and enterprises from 2 to 10 times less to run.
But Kimi K2 Thinking really isn't what OpenAI should be worried about. Let the following sink in:
Gemini 3 set monstrous new highs with 37.5% on Humanity’s Last Exam and 45.1% on ARC-AGI-2 in Deep Think mode—nearly doubling GPT-5 on both measures. It also scored 1501 Elo on LMArena and 91.9% on GPQA Diamond, outperforming GPT-5 and Claude across strategic reasoning, scientific knowledge, and abstract problem-solving. And that's just the beginning. Gemini 3 dominated its competitors far beyond those key benchmarks. If you're brave enough to review a brutally detailed account of how completely Gemini 3 trounced OpenAI and pretty much everyone else on pretty much everything, check out the following stats:
https://www.vellum.ai/blog/google-gemini-3-benchmarks?utm=&utmsource=direct&utmmedium=none
These scores position Gemini 3 way ahead -- perhaps years ahead -- of OpenAI on the metrics that matter most to both consumer and enterprise AI. Essentially Google just ate OpenAI's lunch, dinner and breakfast the next day.
But that's just the competition part of all of this. While Kimi K2 Thinking clearly demonstrates that massive data centers are just not necessary to building the most powerful AIs, OpenAI has committed $1.4 trillion in investments to build massive data centers, most of which won't be operational for years. It could be that this miscalculation -- this massive misappropriation of investment commitments -- best comes to explain why OpenAI may have positioned itself to be THE big loser in the AI bubble that Altman warned everyone about.
The bottom line is that if OpenAI doesn't pull a rabbit out of the hat during 2026, it may become the first major casualty of the AI bubble that will hopefully be limited to colossally unwise investments like those of OpenAI. For their sake, let's hope that it's a really, really big rabbit.
/r/deeplearning
https://redd.it/1p558ag
www.vellum.ai
Google Gemini 3 Benchmarks
Explore this breakdown of Gemini 3 Pro’s benchmarks and performance across reasoning, math, multimodal, and agentic benchmark to learn what results actually mean for building more powerful AI agents.
AMA with Indiana University CL Faculty on November 24
Hi r/LanguageTechnology! Three of us faculty members here in [computational linguistics at Indiana University Bloomington](https://cl.indiana.edu/) will be doing an AMA on this coming Monday, **November 24**, from **2pm to 5pm ET** (19 GMT to 22 GMT).
The three of us who will be around are:
* [Luke Gessler](https://lgessler.com/) (low-resource NLP, corpora, computational language documentation)
* [Shuju Shi](https://scholar.google.com/citations?user=SGZk95cAAAAJ&hl=en) (speech recognition, phonetics, computer-aided language learning)
* [Sandra Kuebler](https://cl.indiana.edu/~skuebler/) (parsing, hate speech, machine learning for NLP)
We're happy to field your questions on:
* Higher education in CL
* MS and PhD programs
* Our research specialties
* Anything else on your mind
Please save the date, and look out for the AMA thread which we'll make earlier in the day on the 24th.
EDIT: we're going to reuse this thread for questions, so ask away!
/r/LanguageTechnology
https://redd.it/1p263p0
Hi r/LanguageTechnology! Three of us faculty members here in [computational linguistics at Indiana University Bloomington](https://cl.indiana.edu/) will be doing an AMA on this coming Monday, **November 24**, from **2pm to 5pm ET** (19 GMT to 22 GMT).
The three of us who will be around are:
* [Luke Gessler](https://lgessler.com/) (low-resource NLP, corpora, computational language documentation)
* [Shuju Shi](https://scholar.google.com/citations?user=SGZk95cAAAAJ&hl=en) (speech recognition, phonetics, computer-aided language learning)
* [Sandra Kuebler](https://cl.indiana.edu/~skuebler/) (parsing, hate speech, machine learning for NLP)
We're happy to field your questions on:
* Higher education in CL
* MS and PhD programs
* Our research specialties
* Anything else on your mind
Please save the date, and look out for the AMA thread which we'll make earlier in the day on the 24th.
EDIT: we're going to reuse this thread for questions, so ask away!
/r/LanguageTechnology
https://redd.it/1p263p0
cl.indiana.edu
Home - Computational Linguistics
IU Computational Linguistics is a group of faculty, student, and staff researchers based in Indiana University Bloomington's Department of Linguistics. We work with natural language using computational methods in order to investigate scientific questions…
Did self-supervised learning for visual features quietly peak already?
From around 2020–2024 it felt like self-supervised learning (SSL, self-supervised learning) for image features was on fire — BYOL (Bootstrap Your Own Latent), SimCLR (Simple Contrastive Learning of Representations), SwAV (Swapping Assignments between multiple Views), DINO, etc. Every few months there was some new objective, augmentation trick, or architectural tweak that actually moved the needle for feature extractors.
This year it feels a lot quieter on the “new SSL objective for vision backbones” front. We got DINOv3, but as far as I can tell it’s mostly smart but incremental tweaks plus a lot of scaling in terms of data and compute, rather than a totally new idea about how to learn general-purpose image features.
So I’m wondering:
Have I just missed some important recent SSL image models for feature extraction?
Or has the research focus mostly shifted to multimodal/foundation models and generative stuff, with “vanilla” visual SSL kind of considered a solved or mature problem now?
is the SSL scene for general vision features still evolving in interesting ways, or did we mostly hit diminishing returns after the original DINO/BYOL/SimCLR wave?
/r/computervision
https://redd.it/1pavb20
From around 2020–2024 it felt like self-supervised learning (SSL, self-supervised learning) for image features was on fire — BYOL (Bootstrap Your Own Latent), SimCLR (Simple Contrastive Learning of Representations), SwAV (Swapping Assignments between multiple Views), DINO, etc. Every few months there was some new objective, augmentation trick, or architectural tweak that actually moved the needle for feature extractors.
This year it feels a lot quieter on the “new SSL objective for vision backbones” front. We got DINOv3, but as far as I can tell it’s mostly smart but incremental tweaks plus a lot of scaling in terms of data and compute, rather than a totally new idea about how to learn general-purpose image features.
So I’m wondering:
Have I just missed some important recent SSL image models for feature extraction?
Or has the research focus mostly shifted to multimodal/foundation models and generative stuff, with “vanilla” visual SSL kind of considered a solved or mature problem now?
is the SSL scene for general vision features still evolving in interesting ways, or did we mostly hit diminishing returns after the original DINO/BYOL/SimCLR wave?
/r/computervision
https://redd.it/1pavb20
Reddit
From the computervision community on Reddit
Explore this post and more from the computervision community
[D] Attention before it was all we needed
*hey all,*
so I guess most of us have read/heard of *Attention Is All You Need*, which gave us the foundation of the transformer models we all use today. Yesterday I spent some time browsing some pre-cursor papers that were exploring attention right before the AIAYN paper. The ones I found most relevant were:
* End-To-End Memory Networks: [https://arxiv.org/pdf/1503.08895](https://arxiv.org/pdf/1503.08895)
* Key-Value Memory Networks for Directly Reading Documents: [https://arxiv.org/pdf/1606.03126](https://arxiv.org/pdf/1606.03126)
* Neural Machine Translation by Jointly Learning to Align and Translate: [https://arxiv.org/pdf/1409.0473](https://arxiv.org/pdf/1409.0473)
they all (directly or indirectly) use something like the `softmax(QK^T)V` (scaled dot-product attention, SDPA) operation in different ways, but with extra machinery on top, which makes them feel less general and more specialized to a particular setup.
it’s kind of fun in hindsight that this core calculation was almost a “trick” in these earlier works, embedded into more complex systems, and then AIAYN comes along and says: actually, let’s strip away most of the extra parts and just make attention the main building block — “attention is all you need”.
Hope some of you find this interesting. I’d love to hear any insights or anecdotes from people who were around / working with these models at the time. and if there are other important pre-transformer attention papers I should read, please let me know as well. ⚡
/r/deeplearning
https://redd.it/1pc37u0
*hey all,*
so I guess most of us have read/heard of *Attention Is All You Need*, which gave us the foundation of the transformer models we all use today. Yesterday I spent some time browsing some pre-cursor papers that were exploring attention right before the AIAYN paper. The ones I found most relevant were:
* End-To-End Memory Networks: [https://arxiv.org/pdf/1503.08895](https://arxiv.org/pdf/1503.08895)
* Key-Value Memory Networks for Directly Reading Documents: [https://arxiv.org/pdf/1606.03126](https://arxiv.org/pdf/1606.03126)
* Neural Machine Translation by Jointly Learning to Align and Translate: [https://arxiv.org/pdf/1409.0473](https://arxiv.org/pdf/1409.0473)
they all (directly or indirectly) use something like the `softmax(QK^T)V` (scaled dot-product attention, SDPA) operation in different ways, but with extra machinery on top, which makes them feel less general and more specialized to a particular setup.
it’s kind of fun in hindsight that this core calculation was almost a “trick” in these earlier works, embedded into more complex systems, and then AIAYN comes along and says: actually, let’s strip away most of the extra parts and just make attention the main building block — “attention is all you need”.
Hope some of you find this interesting. I’d love to hear any insights or anecdotes from people who were around / working with these models at the time. and if there are other important pre-transformer attention papers I should read, please let me know as well. ⚡
/r/deeplearning
https://redd.it/1pc37u0