NEW BOT Телеграм, страница

gonzo-обзоры ML статей

DolphinGemma
Denise Herzing, Thad Starner
Блог: https://blog.google/technology/ai/dolphingemma/
Сайт проекта: https://www.wilddolphinproject.org/
Статья: нет
Модель: нет (обещали расшарить этим летом, пока вроде как всё ещё в разработке)
Код: нет

Давно хотелось разобрать DolphinGemma, совместный проект Гугла, Georgia Tech и проекта Wild Dolphin Project (WDP, https://www.wilddolphinproject.org/), про обученную на звуках дельфинов модель (LLM).

! Не путать с Dolphin Gemma/Llama/Qwen/Mistral проекта Dolphin (https://huggingface.co/dphn, https://dphn.ai/) и Cognitive Computations, эти -- семейство разговорных instruction-tuned ассистентов без цензуры (https://erichartford.com/uncensored-models), просто универсальные текстовые модели.

Это очень перекликается с проектом CETI (https://news.1rj.ru/str/gonzo_ML/2182), который изучает китов, но это не он. Есть также и другие интересные проекты про животных. Особенно хочу отметить могучий Earth Species Project (https://www.earthspecies.org/) -- с ним надо отдельно поразбираться -- у них уже есть своя биоакустическая модель NatureLM-Audio (https://arxiv.org/abs/2411.07186) и другие тулы.

WDP занимается изучением дельфинов с 1985 года, фокусируясь на атлантическом пятнистом дельфине (Stenella frontalis) в районе Багамских островов. Изучение в естественной среде, неинвазивное. За долгое время набрался датасет подводных видео и аудио, размеченный конкретными дельфиньими identities с их жизненными историями и наблюдаемыми поведениями.

Я так понимаю, что в датасете не просто записи звуков, но и сопутствующая информация про ситуацию и поведение конкретных дельфинов, например, воссоединение мамы и дельфинёнка, драки, преследование акул и т.д. Цель проекта -- понять структуру коммуникации дельфинов и, потенциально, её смысл. Чуть подробнее с примерами, которые можно послушать, есть на сайте проекта (https://www.wilddolphinproject.org/our-research/dolphin-communication/). Я слышал, у дельфинов есть и иные способы коммуникации (https://www.scientificamerican.com/article/dolphins-communicate-with-fountains-of-pee/), но не будем пока об этом -- таких LLM нам не надо!

У WDP есть также отдельный трек про двунаправленную коммуникацию, система CHAT (Cetacean Hearing Augmentation Telemetry, https://www.wilddolphinproject.org/our-research/chat-research/). CHAT может генерировать новые синтетические звуки, отличные от естественных, которые можно проассоциировать с новыми объектами, нравящимися дельфинам. Есть надежда, что любопытные дельфины выучат эти звуки, если захотят запросить такие объекты у исследователей (см. видео https://youtu.be/YhopeQKbpZA).

CHAT должна работать надёжно (чтобы в океанском шуме услышать нужное) и быстро (чтобы исследователь с девайсом-декодером мог быстро понять, что от него хотят и дать это дельфину, тем самым усилив связь). На уже старом Pixel 6 это работало в рилтайме, что удобно -- не надо особого и дорогого спец оборудования. Использование DolphinGemma с её предсказанием следующих токенов по идее может ускорить процесс понимания, чего хочет сказать дельфин, и ускорить процесс общения.

К сожалению, деталей про работу и практические результаты слишком мало. По моим представлениям это больше маркетинговый материал, нежели научная статья (её и нет). Project CETI и Earth Species Project в этом смысле намного более научные (и открытые). Информации про DolphinGemma почти нет -- в основном только посты в блогах и соцмедиа. Статей, самой модели или любого кода я не нашёл, что печально. Но попробуем разобрать что известно.

Цель модели -- получать на вход дельфиньи вокализации и генерировать новые последовательности звуков, hopefully dolphin-like.

❤11🔥10👎1

5.83K views21:44

gonzo-обзоры ML статей

Audio-in, audio-out. Но через токенизацию с токенизатором SoundStream (https://arxiv.org/abs/2107.03312, https://research.google/blog/soundstream-an-end-to-end-neural-audio-codec/) -- гугловая работа от 2021 года. SoundStream -- это по сути обучаемый end-to-end нейро-кодек, состоящий из энкодера, декодера и квантователя в бутылочном горлышке между ними. Во время обучения он использует два лосса: лосс восстановления и adversarial лосс, так чтобы дискриминатор не сумел отличить реконструированный звук от исходного. После обучения можно использовать энкодер с квантователем для генерации токенов, и декодер для восстановления их в звук. Я не уверен, был ли этот кодек опубликован Гуглом, сходу я этого не вижу. Но вижу в сети сколько-то реимплементаций. Знатоки аудио-кодеков, поправьте меня. А также скажите, есть ли что-то более современное и лучшее? Наверняка за четыре года что-то появилось.

Модель с 400M параметров, сделана для запуска локально на телефонах Pixel, которые используют в проекте WDP. Gemma такого размера не существует, то есть это не файнтюн Джеммы, а модель построенная на её идеях (видимо, декодер трансформера). В этом смысле коммуникация Гугла была misleading, когда они говорили (и до сих пор говорят), что проект использует модели Gemma.

Размер датасета непонятен. В статье “Imitation of Computer-Generated Sounds by Wild Atlantic Spotted Dolphins (Stenella frontalis)” (https://www.animalbehaviorandcognition.org/article.php?id=1370) про CHAT упоминаются 1319 минут аудио записей.

Практический выхлоп тоже неясен. Удалось нарыть отдельное интервью авторов в подкасте Scientific American (https://www.scientificamerican.com/podcast/episode/dolphingemma-could-enable-ai-communication-with-dolphins/). Там они утверждают, что модель выучила генерацию определённых вокализаций (VCM Type 3 или VCM3s), которые дельфины предпочитают использовать во время двусторонней коммуникации с человеками, и для авторов это было чем-то вроде a-ha момента. До этого, похоже, VCM3s генерить не особо получалось.

Вроде и всё. Видимо, всё ещё какой-то ранний рисёч. Хотя было ощущение, что немного иначе всё.

В общем конкретно с DolphinGemma ждём каких-то более внятных анонсов. И тем временем я бы более пристально посмотрел на более открытые проекты типа CETI и Earth Species Project. И вообще, давно бы уже обучили BarkLLM. Или в крайнем случае MeowLM. Может сорганизуемся?

Google

DolphinGemma: How Google AI is helping decode dolphin communication

Dolphin researchers are using Gemma and Google Pixel phones to try to decipher how dolphins talk to one another.

❤17

6.37K views21:44

gonzo-обзоры ML статей

❤11

6.95K views21:44

gonzo-обзоры ML статей

Pixel в носимом подводном девайсе

🤓14

7.5K views21:45

gonzo-обзоры ML статей

❤12👍5

7.64K views21:45

gonzo-обзоры ML статей

Прикольная работа про adaptive computation, Mixture-of-Recursions (MoR):

https://news.1rj.ru/str/gonzo_ML_podcasts/489

gonzo_ML_podcasts

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
Authors: Sangmin Bae, Yujin Kim, Reza Bayat, Sungnyun Kim, Jiyoun Ha, Tal Schuster, Adam Fisch, Hrayr Harutyunyan, Ziwei Ji, Aaron Courville, and Se-Young Yun
Paper:…

❤7👍5

7.3K views18:15

gonzo-обзоры ML статей

Очень классная тема — интеллект надо строить на базе движений, они должны стать объектами первого класса, а не как сейчас, когда поверх LLM пытаются что-то навесить. Я с этим очень согласен, постоянно вспоминаю, как много метафор в языке укоренено в нашем сенсорном и двигательном опыте (не устаю советовать книгу "Metaphors We Live By" от George Lakoff и Mark Johnson).

https://news.1rj.ru/str/gonzo_ML_podcasts/500

gonzo_ML_podcasts

Grounding Intelligence in Movement
Melanie Segado, Michael L. Platt, Felipe Parodi, Jordan K. Matelsky, Eva B. Dyer, Konrad P. Kording
Статья: https://arxiv.org/abs/2507.02771
Код: отсутствует
Модель: отсутствует

Англ пост: https://arxiviq.substack.com/p/grounding…

💯33👍7👎2

7.65K views14:33

gonzo-обзоры ML статей

2👏2👍1

7.35K views14:33

gonzo-обзоры ML статей

Agent Fleets? Не сейчас. Скейлинг до сотен агентов пока не работает.

https://news.1rj.ru/str/gonzo_ML_podcasts/506

gonzo_ML_podcasts

AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs
Florian Grötschla, Luis Müller, Mikhail Galkin, Jan Tönshoff, Bryan Perozzi
Статья: https://arxiv.org/abs/2507.08616
Код: https://github.com/floriangroetschla/AgentsNet
Датасет: http…

🤔7

6.72K viewsedited 13:52

gonzo-обзоры ML статей

Уровень золотого медалиста на 2025 International Mathematical Olympiad достигнут универсальной ризонинг моделью без использования тулов.

https://x.com/alexwei_/status/1946477742855532918?t=8Sz7-2-MwNV_hQ5SX8IlVA&s=19

I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

We evaluated our models on the 2025 IMO problems under the same rules as human contestants: two 4.5 hour exam sessions, no tools or internet, reading the official problem statements, and writing natural language proofs.

Why is this a big deal? First, IMO problems demand a new level of sustained creative thinking compared to past benchmarks. In reasoning time horizon, we’ve now progressed from GSM8K (~0.1 min for top humans) → MATH benchmark (~1 min) → AIME (~10 mins) → IMO (~100 mins).

Second, IMO submissions are hard-to-verify, multi-page proofs. Progress here calls for going beyond the RL paradigm of clear-cut, verifiable rewards. By doing so, we’ve obtained a model that can craft intricate, watertight arguments at the level of human mathematicians.

Besides the result itself, I am excited about our approach: We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling.

In our evaluation, the model solved 5 of the 6 problems on the 2025 IMO. For each problem, three former IMO medalists independently graded the model’s submitted proof, with scores finalized after unanimous consensus. The model earned 35/42 points in total, enough for gold! 🥇

HUGE congratulations to the team—@SherylHsu02, @polynoamial, and the many giants whose shoulders we stood on—for turning this crazy dream into reality! I am lucky I get to spend late nights and early mornings working alongside the very best.

Btw, we are releasing GPT-5 soon, and we’re excited for you to try it. But just to be clear: the IMO gold LLM is an experimental research model. We don’t plan to release anything with this level of math capability for several months.

Still—this underscores how fast AI has advanced in recent years. In 2021, my PhD advisor @JacobSteinhardt had me forecast AI math progress by July 2025. I predicted 30% on the MATH benchmark (and thought everyone else was too optimistic). Instead, we have IMO gold.

If you want to take a look, here are the model’s solutions to the 2025 IMO problems! The model solved P1 through P5; it did not produce a solution for P6. (Apologies in advance for its … distinct style—it is very much an experimental model 😅)

https://github.com/aw31/openai-imo-2025-proofs/

Lastly, we'd like to congratulate all the participants of the 2025 IMO on their achievement! We are proud to have many past IMO participants at @OpenAI and recognize that these are some of the brightest young minds of the future.

X (formerly Twitter)

Alexander Wei (@alexwei_) on X

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

🔥36🤯16❤11🤷‍♂2🤨2🤔1

7.79K views12:19

gonzo-обзоры ML статей

Комментарий от Теренса Тао про результаты AI систем и их оценку на IMO.

Если кратко, возможны миллионы вариаций, нужна стандартная прозрачная методология оценки, а не селф-репорт.

https://mathstodon.xyz/@tao/114881418225852441

It is tempting to view the capability of current AI technology as a singular quantity: either a given task X is within the ability of current tools, or it is not. However, there is in fact a very wide spread in capability (several orders of magnitude) depending on what resources and assistance gives the tool, and how one reports their results.

One can illustrate this with a human metaphor. I will use the recently concluded International Mathematical Olympiad (IMO) as an example. Here, the format is that each country fields a team of six human contestants (high school students), led by a team leader (often a professional mathematician). Over the course of two days, each contestant is given four and a half hours on each day to solve three difficult mathematical problems, given only pen and paper. No communication between contestants (or with the team leader) during this period is permitted, although the contestants can ask the invigilators for clarification on the wording of the problems. The team leader advocates for the students in front of the IMO jury during the grading process, but is not involved in the IMO examination directly.

The IMO is widely regarded as a highly selective measure of mathematical achievement for a high school student to be able to score well enough to receive a medal, particularly a gold medal or a perfect score; this year the threshold for the gold was 35/42, which corresponds to answering five of the six questions perfectly. Even answering one question perfectly merits an "honorable mention".

But consider what happens to the difficulty level of the Olympiad if we alter the format in various ways:

* One gives the students several days to complete each question, rather than four and half hours for three questions. (To stretch the metaphor somewhat, consider a sci-fi scenario in the student is still only given four and a half hours, but the team leader places the students in some sort of expensive and energy-intensive time acceleration machine in which months or even years of time pass for the students during this period.)
* Before the exam starts, the team leader rewrites the questions in a format that the students find easier to work with.
* The team leader gives the students unlimited access to calculators, computer algebra packages, formal proof assistants, textbooks, or the ability to search the internet.
* The team leader has the six student team work on the same problem simultaneously, communicating with each other on their partial progress and reported dead ends.
* The team leader gives the students prompts in the direction of favorable approaches, and intervenes if one of the students is spending too much time on a direction that they know to be unlikely to succeed.
* Each of the six students on the team submit solutions, but the team leader selects only the "best" solution to submit to the competition, discarding the rest.
* If none of the students on the team obtains a satisfactory solution, the team leader does not submit any solution at all, and silently withdraws from the competition without their participation ever being noted.

In each of these formats, the submitted solutions are still technically generated by the high school contestants, rather than the team leader. However, the reported success rate of the students on the competition can be dramatically affected by such changes of format; a student or team of students who might not even reach bronze medal performance if taking the competition under standard test conditions might instead reach gold medal performance under some of the modified formats indicated above.

Mathstodon

Terence Tao (@tao@mathstodon.xyz)

It is tempting to view the capability of current AI technology as a singular quantity: either a given task X is within the ability of current tools, or it is not. However, there is in fact a very wide spread in capability (several orders of magnitude) depending…

👍33🔥8🤨4❤1💯1

6.68K viewsedited 10:08

gonzo-обзоры ML статей

So, in the absence of a controlled test methodology that was not self-selected by the competing teams, one should be wary of making apples-to-apples comparisons between the performance of various AI models on competitions such as the IMO, or between such models and the human contestants.

Related to this, I will not be commenting on any self-reported AI competition performance results for which the methodology was not disclosed in advance of the competition.

👍26🔥8

6.59K views10:08

gonzo-обзоры ML статей

Нам было дано редкое, интерпретируемое для человека окно (CoT) в разум наших самых продвинутых творений, но нет гарантии, что это окно останется открытым.

https://news.1rj.ru/str/gonzo_ML_podcasts/524

gonzo_ML_podcasts

Chain of Thought Monitorability: A New and Fragile Opportunity for Al Safety
Authors: Tomek Korbak, Mikita Balesni, Elizabeth Barnes, Joe Benton, Mark Chen, Allan Dafoe, Scott Emmons, David Farhi, Dan Hendrycks, Evan Hubinger, Erik Jenner, Victoria Krakovna…

👍18❤5

6.99K viewsedited 22:45

gonzo-обзоры ML статей

Одна из статей, получивших Outstanding Paper Award на недавнем ICML 2025.

Адаптивный инференс для маскированных диффузионных моделей (MDM) сильно повышает качество решения задач планирования (например, судоку), обходя более тяжёлые авторегрессионные варианты:

https://news.1rj.ru/str/gonzo_ML_podcasts/528

Есть надежда, что мы увидим больше хороших текстовых диффузионок в ближайшее время!

gonzo_ML_podcasts

Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions
Jaeyeon Kim, Kulin Shah, Vasilis Kontonis, Sham Kakade, Sitan Chen
Статья: https://arxiv.org/abs/2502.06768

Англ пост: https://arxiviq.substack.com/p/icml-2025-outstanding…

🔥26👍2❤1

6.75K viewsedited 13:47

gonzo-обзоры ML статей

И ещё золотая медаль на IMO, теперь от Gemini и вроде как официально. Тоже 35 очков.

https://deepmind.google/discover/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/

Google DeepMind

Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad

Our advanced model officially achieved a gold-medal level performance on problems from the International Mathematical Olympiad (IMO), the world’s most prestigious competition for young...

😁27👍9🤡5

7K views19:21

gonzo-обзоры ML статей

Ещё одна статья с Outstanding Paper Award на ICML 2025. Критика next-token prediction, продвижение мульти-токенных методов и диффузии, а также неожиданно эффективный метод создания разнообразия на выходе модели, seed-conditioning, добавляющий рандомный бессмысленный текстовый шум на вход (seed-строка) и превосходящий температурный сэмплинг.

https://news.1rj.ru/str/gonzo_ML_podcasts/539

gonzo_ML_podcasts

Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
Authors: Vaishnavh Nagarajan, Chen Henry Wu, Charles Ding, Aditi Raghunathan
Paper: https://openreview.net/forum?id=Hi0SyHMmkd
Code: https://github.com/chenwu98/algorithmic…

1🔥14🤯2❤1🗿1

9.54K views13:19

gonzo-обзоры ML статей

https://news.1rj.ru/str/gonzo_ML_podcasts/550

gonzo_ML_podcasts

🗿2

6.72K views13:19

gonzo-обзоры ML статей

https://icml.cc/Conferences/2025/PublicationEthics

😁31🤡7👍6🔥3

7.72K views13:21

gonzo-обзоры ML статей

Продолжаем публикацию обзоров статей, взявших Outstanding Paper Award на ICML 2025.

Работа "The Value of Prediction in Identifying the Worst-Off" предлагает важный контр-аргумент подходу «точность превыше всего», который преобладает в прикладном машинном обучении. Она показывает, что во многих реальных сценариях с ограниченными ресурсами инвестиции в операционные возможности для реализации прогнозов приносят больше общественной пользы, чем незначительные улучшения в точности моделей. Коэффициент PAR даёт политикам принципиальный и основанный на данных инструмент, позволяющий выйти за рамки изолированных технических метрик и принимать целостные, учитывающие затраты решения о построении систем. Исследование знаменует собой взросление направления «ИИ для общественного блага», смещая фокус с вопроса «насколько точна модель?» на вопрос «каков самый эффективный способ повысить благосостояние и какое место в этом занимают прогнозы?».

https://news.1rj.ru/str/gonzo_ML_podcasts/551

gonzo_ML_podcasts

The Value of Prediction in Identifying the Worst-Off
Authors: Unai Fischer-Abaigar, Christoph Kern, Juan Carlos Perdomo
Paper: https://openreview.net/forum?id=26JsumCG0z
Code: В работе используется open-source библиотека CatBoost (https://catboost.ai).
Data:…

👍18😐6❤1👏1

6.38K views13:31

gonzo-обзоры ML статей

Теперь с Хассабисом поговорил

https://youtu.be/-HzgcbRXUK8

YouTube

Demis Hassabis: Future of AI, Simulating Reality, Physics and Video Games | Lex Fridman Podcast #475

Demis Hassabis is the CEO of Google DeepMind and Nobel Prize winner for his groundbreaking work in protein structure prediction using AI.
Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep475-sb
See below for timestamps,…

⚡14👍10❤1

6.63K views21:47

gonzo-обзоры ML статей

Ещё статья с Outstanding Paper Award на ICML 2025.

CollabLLM обучается на многоходовых роллаутах диалогов на базе симуляции пользователя и в итоге улучшает пользовательский опыт:

https://news.1rj.ru/str/gonzo_ML_podcasts/555

gonzo_ML_podcasts

CollabLLM: From Passive Responders to Active Collaborators
Authors: Shirley Wu, Michel Galley, Baolin Peng, Hao Cheng, Gavin Li, Yao Dou, Weixin Cai, James Zou, Jure Leskovec, Jianfeng Gao
Paper: https://arxiv.org/abs/2502.00640
Code: http://aka.ms/CollabLLM…

👍8❤1

6.51K views12:34

About

Blog

Apps

Platform