NEW BOT Телеграм, страница

Data Scientology

[D] Attention before it was all we needed

*hey all,*

so I guess most of us have read/heard of *Attention Is All You Need*, which gave us the foundation of the transformer models we all use today. Yesterday I spent some time browsing some pre-cursor papers that were exploring attention right before the AIAYN paper. The ones I found most relevant were:

* End-To-End Memory Networks: [https://arxiv.org/pdf/1503.08895](https://arxiv.org/pdf/1503.08895)
* Key-Value Memory Networks for Directly Reading Documents: [https://arxiv.org/pdf/1606.03126](https://arxiv.org/pdf/1606.03126)
* Neural Machine Translation by Jointly Learning to Align and Translate: [https://arxiv.org/pdf/1409.0473](https://arxiv.org/pdf/1409.0473)

they all (directly or indirectly) use something like the `softmax(QK^T)V` (scaled dot-product attention, SDPA) operation in different ways, but with extra machinery on top, which makes them feel less general and more specialized to a particular setup.

it’s kind of fun in hindsight that this core calculation was almost a “trick” in these earlier works, embedded into more complex systems, and then AIAYN comes along and says: actually, let’s strip away most of the extra parts and just make attention the main building block — “attention is all you need”.

Hope some of you find this interesting. I’d love to hear any insights or anecdotes from people who were around / working with these models at the time. and if there are other important pre-transformer attention papers I should read, please let me know as well. ⚡

/r/deeplearning
https://redd.it/1pc37u0

152 views11:14

Data Scientology

D Monthly Who's Hiring and Who wants to be Hired?

For Job Postings please use this template

>Hiring: [Location\], Salary:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] and [Brief overview, what you're looking for\]

For Those looking for jobs please use this template

>Want to be Hired: [Location\], Salary Expectation:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] Resume: [Link to resume\] and [Brief overview, what you're looking for\]



Please remember that this community is geared towards those with experience.

/r/MachineLearning
https://redd.it/1pb25zo

From the MachineLearning community on Reddit

Explore this post and more from the MachineLearning community

168 views11:14

Data Scientology

D How did Gemini 3 Pro manage to get 38.3% on Humanity's Last Exam?

On ARC-AGI 2, Gemini improved its score from 5% (for 2.5 Pro) to 31% (for 3 Pro), both at $0.80 per task. This is amazing, but a lot of people here seem to believe that they just generated millions to synthetic ARC-like examples for pretraining. This is allowed by the rules of the competition, and the top Kaggle solution this year did just that. (Although investors and users might find such a tactic misleading.)

But how did Gemini go from 21.6% to 38.3% on Humanity's Last Exam? This kind of training data is very expensive to obtain en masse. The only practical way to "benchmax" here that I see is to actually cheat, i.e. use the test data for training.

What do you think is going on here? Is 3 as much of an improvement over 2.5 as its Humanity's Last Exam scores suggest?

/r/MachineLearning
https://redd.it/1pgqbjd

From the MachineLearning community on Reddit

Explore this post and more from the MachineLearning community

177 views11:14

Data Scientology

CVPR Submission id changed D

When I logged into my Openreview CVPR author console, I found that my submission id has been changed from 9k+ to 42k+ . Interestingly, the openreview has applied some black colored mask on multiple pages of the pdf, probably to hide original id mentioned at the header in every page. Did anyone else notice that??

/r/MachineLearning
https://redd.it/1phygsa

From the MachineLearning community on Reddit

Explore this post and more from the MachineLearning community

158 views11:14

Data Scientology

EACL 2026

Review Season is Here — Share Your Scores, Meta-Reviews & Thoughts!

With the ARR October 2025 → EACL 2026 cycle in full swing, I figured it’s a good time to open a discussion thread for everyone waiting on reviews, meta-reviews, and (eventually) decisions.

Looking forward to hearing your scores and experiences..!!!!

/r/LanguageTechnology
https://redd.it/1oykfv3

From the LanguageTechnology community on Reddit

Explore this post and more from the LanguageTechnology community

144 views11:14

Data Scientology

Comparing Different Object Detection Models (Metrics: Precision, Recall, F1-Score, COCO-mAP)

Hey there,

I am trying to train multiple object detection models (YOLO11, RT-DETRv4, DEIMv2) on a custom dataset while using the Ultralytics framework for YOLO and the repositories provided by the model authors from RT-DETRv4 and DEIMv2.

To objectivly compare the model performance I want to calculate the following metrics:

Precision (at fixed IoU-threshold like 0.5)
Recall (at fixed IoU-threshold like 0.5)
F1-Score (at fixed IoU-threshold like 0.5)
mAP at 0.5, 0.75 and 0.5:0.05:0.95 as well as for small, medium and large objects

However each framework appears to differ in the way they evaluate the model and the provided metrics. My idea was to run the models in prediction mode on the test-split of my custom dataset and then use the results to calculate the required metrics in a Python noscript by myself or with the help of a library like pycocotools. Different sources (Github etc.) claim this might provide wrong results compared to using the tools provided by the respective framework as the prediction settings usual differ from validation/test settings.

I am wondering what is the correct way to evaluate the models. Just use the tools provided by the authors and only use those metrics which are available for all models? In each paper on object detection models those metrics are provided to describe model performance but rarely, if at all, it's described how they were practically obtained (only theory, formula is stated).

I would appreciate if anyone can offer some insights on how to properly test the models with an academic setting in mind.

Thanks!

/r/computervision
https://redd.it/1pmmujx

From the computervision community on Reddit

Explore this post and more from the computervision community

136 views11:14

Data Scientology

Zoom pivots from web conferencing to Federated AI, and earns SOTA on HLE. High level talent is proving to be quite common.

Part of this story is about how Zoom brought together a team of the top models in a federated AI system that recently earned SOTA by scoring 48.1% on HLE, dethroning Gemini 3 with its 45.8%. it's too early to tell if this federated strategy will continue to unseat top models, and it's definitely something to watch. But I want to focus on a different part of Zoom's full entry into the AI space. It is becoming increasingly clear that top AI talent, like senior engineers, can be found just about anywhere.

Our first example is DeepSeek, who took the world by storm in January with the power and cost effectiveness of its open source AIs. The important point here is that DeepSeek started as a "side project" of a few people working at a hedge fund.

Then in September a Chinese food delivery company named Meituan stunned the world by open sourcing LongCat‑Flash‑Omni. It topped Gemini-2.5-Pro and Gemini-2.5-Flash on DailyOmni with 82.38, demonstrating its superior multimodal reasoning. Again, this was a food delivery company that turned itself into a top AI contender!

Then a few weeks ago six former engineers from Google and DeepMind scaffolded their meta-system onto Gemini 3 Pro, and earned SOTA on ARC-AGI-2 with a score of 54%, beating Gemini's Deep Think (preview) that scored 45.1%. Their company, Poetiq, has only been around for about 7 months.

Now contrast these developments with Zuckerberg's massive talent spending spree, where he paid some engineers hundreds of millions of dollars to join Meta. One would think that top talent is rare, and very expensive. But it's becoming increasingly clear that top AI engineers are everywhere, poised to stun the world again, and again, and again.

/r/deeplearning
https://redd.it/1pnj07o

From the deeplearning community on Reddit

Explore this post and more from the deeplearning community

169 views11:14

Data Scientology

For Text/Corpus Cluster Analysis - How do I handle huge, and very many small, outliers?

/r/LanguageTechnology
https://redd.it/1pnb3a9

177 views11:14

Data Scientology

R No causal inference workshops at ICLR 2026?

What gives? Anyone got any alternative venues in mind for causal topics? Otherwise we going straight to the main track I guess.

p.s. The full list is posted on twitter. Also some of these are already on openreview.

/r/MachineLearning
https://redd.it/1psp0a1

X (formerly Twitter)

Yanan Sui (@YananSui) on X

@iclr_conf ICLR_2026_Workshops_names_20251201

150 views11:14

Data Scientology

D Self-Promotion Thread

Please post your personal projects, startups, product placements, collaboration needs, blogs etc.

Please mention the payment and pricing requirements for products and services.

Please do not post link shorteners, link aggregator websites , or auto-subscribe links.

\--

Any abuse of trust will lead to bans.

Encourage others who create new posts for questions to post here instead!

Thread will stay alive until next one so keep posting after the date in the noscript.

\--

Meta: This is an experiment. If the community doesnt like this, we will cancel it. This is to encourage those in the community to promote their work by not spamming the main threads.

/r/MachineLearning
https://redd.it/1pbxkt2

From the MachineLearning community on Reddit

Explore this post and more from the MachineLearning community

154 views11:14

Data Scientology

D Best papers of 2025

Which papers do you think are the most important ones which were released in 2025?

Please, provide a link to the paper if you share one.

/r/MachineLearning
https://redd.it/1pvmrx9

From the MachineLearning community on Reddit

Explore this post and more from the MachineLearning community

198 views11:14

Data Scientology

How do you as an AI/ML researcher stay current with new papers and repos? D

For those doing AI/ML research or engineering:

1. How do you currently discover and track new research?
2. What's the most frustrating part of your research workflow?
3. How much time per week do you spend on research/staying current?

Genuinely curious how others handle this and how much time you’re spending. Thanks!

/r/MachineLearning
https://redd.it/1pxz7it

From the MachineLearning community on Reddit

Explore this post and more from the MachineLearning community

221 views11:14

Data Scientology

[P] The State Of LLMs 2025: Progress, Problems, and Predictions
https://magazine.sebastianraschka.com/p/state-of-llms-2025

/r/MachineLearning
https://redd.it/1pzrfbf

Sebastian Raschka, PhD

The State Of LLMs 2025: Progress, Problems, and Predictions

A 2025 review of large language models, from DeepSeek R1 and RLVR to inference-time scaling, benchmarks, architectures, and predictions for 2026.

222 views11:14

Data Scientology

Frustrated with the lack of ML engineers who understand hardware constraints

We're working on an edge computing project and it’s been a total uphill battle. I keep finding people who can build these massive models in a cloud environment with infinite resources, but then they have no idea how to prune or quantize them for a low-power device. It's like the concept of efficiency just doesn't exist for a lot of modern ML devs. I really need someone who has experience with TinyML or just general optimization for restricted environments. Every candidate we've seen so far just wants to throw more compute at the problem which we literally don't have. Does anyone have advice on where to find the efficiency nerds who actually know how to build for the real world instead of just running notebooks in the cloud?

/r/computervision
https://redd.it/1q1uchd

From the computervision community on Reddit

Explore this post and more from the computervision community

286 views11:14

Data Scientology