NEW BOT Телеграм, страница

gonzo-обзоры ML статей

Очень классная тема — интеллект надо строить на базе движений, они должны стать объектами первого класса, а не как сейчас, когда поверх LLM пытаются что-то навесить. Я с этим очень согласен, постоянно вспоминаю, как много метафор в языке укоренено в нашем сенсорном и двигательном опыте (не устаю советовать книгу "Metaphors We Live By" от George Lakoff и Mark Johnson).

https://news.1rj.ru/str/gonzo_ML_podcasts/500

gonzo_ML_podcasts

Grounding Intelligence in Movement
Melanie Segado, Michael L. Platt, Felipe Parodi, Jordan K. Matelsky, Eva B. Dyer, Konrad P. Kording
Статья: https://arxiv.org/abs/2507.02771
Код: отсутствует
Модель: отсутствует

Англ пост: https://arxiviq.substack.com/p/grounding…

💯33👍7👎2

7.65K views14:33

gonzo-обзоры ML статей

2👏2👍1

7.35K views14:33

gonzo-обзоры ML статей

Agent Fleets? Не сейчас. Скейлинг до сотен агентов пока не работает.

https://news.1rj.ru/str/gonzo_ML_podcasts/506

gonzo_ML_podcasts

AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs
Florian Grötschla, Luis Müller, Mikhail Galkin, Jan Tönshoff, Bryan Perozzi
Статья: https://arxiv.org/abs/2507.08616
Код: https://github.com/floriangroetschla/AgentsNet
Датасет: http…

🤔7

6.72K viewsedited 13:52

gonzo-обзоры ML статей

Уровень золотого медалиста на 2025 International Mathematical Olympiad достигнут универсальной ризонинг моделью без использования тулов.

https://x.com/alexwei_/status/1946477742855532918?t=8Sz7-2-MwNV_hQ5SX8IlVA&s=19

I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

We evaluated our models on the 2025 IMO problems under the same rules as human contestants: two 4.5 hour exam sessions, no tools or internet, reading the official problem statements, and writing natural language proofs.

Why is this a big deal? First, IMO problems demand a new level of sustained creative thinking compared to past benchmarks. In reasoning time horizon, we’ve now progressed from GSM8K (~0.1 min for top humans) → MATH benchmark (~1 min) → AIME (~10 mins) → IMO (~100 mins).

Second, IMO submissions are hard-to-verify, multi-page proofs. Progress here calls for going beyond the RL paradigm of clear-cut, verifiable rewards. By doing so, we’ve obtained a model that can craft intricate, watertight arguments at the level of human mathematicians.

Besides the result itself, I am excited about our approach: We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling.

In our evaluation, the model solved 5 of the 6 problems on the 2025 IMO. For each problem, three former IMO medalists independently graded the model’s submitted proof, with scores finalized after unanimous consensus. The model earned 35/42 points in total, enough for gold! 🥇

HUGE congratulations to the team—@SherylHsu02, @polynoamial, and the many giants whose shoulders we stood on—for turning this crazy dream into reality! I am lucky I get to spend late nights and early mornings working alongside the very best.

Btw, we are releasing GPT-5 soon, and we’re excited for you to try it. But just to be clear: the IMO gold LLM is an experimental research model. We don’t plan to release anything with this level of math capability for several months.

Still—this underscores how fast AI has advanced in recent years. In 2021, my PhD advisor @JacobSteinhardt had me forecast AI math progress by July 2025. I predicted 30% on the MATH benchmark (and thought everyone else was too optimistic). Instead, we have IMO gold.

If you want to take a look, here are the model’s solutions to the 2025 IMO problems! The model solved P1 through P5; it did not produce a solution for P6. (Apologies in advance for its … distinct style—it is very much an experimental model 😅)

https://github.com/aw31/openai-imo-2025-proofs/

Lastly, we'd like to congratulate all the participants of the 2025 IMO on their achievement! We are proud to have many past IMO participants at @OpenAI and recognize that these are some of the brightest young minds of the future.

X (formerly Twitter)

Alexander Wei (@alexwei_) on X

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

🔥36🤯16❤11🤷‍♂2🤨2🤔1

7.79K views12:19

gonzo-обзоры ML статей

Комментарий от Теренса Тао про результаты AI систем и их оценку на IMO.

Если кратко, возможны миллионы вариаций, нужна стандартная прозрачная методология оценки, а не селф-репорт.

https://mathstodon.xyz/@tao/114881418225852441

It is tempting to view the capability of current AI technology as a singular quantity: either a given task X is within the ability of current tools, or it is not. However, there is in fact a very wide spread in capability (several orders of magnitude) depending on what resources and assistance gives the tool, and how one reports their results.

One can illustrate this with a human metaphor. I will use the recently concluded International Mathematical Olympiad (IMO) as an example. Here, the format is that each country fields a team of six human contestants (high school students), led by a team leader (often a professional mathematician). Over the course of two days, each contestant is given four and a half hours on each day to solve three difficult mathematical problems, given only pen and paper. No communication between contestants (or with the team leader) during this period is permitted, although the contestants can ask the invigilators for clarification on the wording of the problems. The team leader advocates for the students in front of the IMO jury during the grading process, but is not involved in the IMO examination directly.

The IMO is widely regarded as a highly selective measure of mathematical achievement for a high school student to be able to score well enough to receive a medal, particularly a gold medal or a perfect score; this year the threshold for the gold was 35/42, which corresponds to answering five of the six questions perfectly. Even answering one question perfectly merits an "honorable mention".

But consider what happens to the difficulty level of the Olympiad if we alter the format in various ways:

* One gives the students several days to complete each question, rather than four and half hours for three questions. (To stretch the metaphor somewhat, consider a sci-fi scenario in the student is still only given four and a half hours, but the team leader places the students in some sort of expensive and energy-intensive time acceleration machine in which months or even years of time pass for the students during this period.)
* Before the exam starts, the team leader rewrites the questions in a format that the students find easier to work with.
* The team leader gives the students unlimited access to calculators, computer algebra packages, formal proof assistants, textbooks, or the ability to search the internet.
* The team leader has the six student team work on the same problem simultaneously, communicating with each other on their partial progress and reported dead ends.
* The team leader gives the students prompts in the direction of favorable approaches, and intervenes if one of the students is spending too much time on a direction that they know to be unlikely to succeed.
* Each of the six students on the team submit solutions, but the team leader selects only the "best" solution to submit to the competition, discarding the rest.
* If none of the students on the team obtains a satisfactory solution, the team leader does not submit any solution at all, and silently withdraws from the competition without their participation ever being noted.

In each of these formats, the submitted solutions are still technically generated by the high school contestants, rather than the team leader. However, the reported success rate of the students on the competition can be dramatically affected by such changes of format; a student or team of students who might not even reach bronze medal performance if taking the competition under standard test conditions might instead reach gold medal performance under some of the modified formats indicated above.

Mathstodon

Terence Tao (@tao@mathstodon.xyz)

It is tempting to view the capability of current AI technology as a singular quantity: either a given task X is within the ability of current tools, or it is not. However, there is in fact a very wide spread in capability (several orders of magnitude) depending…

👍33🔥8🤨4❤1💯1

6.68K viewsedited 10:08

gonzo-обзоры ML статей

So, in the absence of a controlled test methodology that was not self-selected by the competing teams, one should be wary of making apples-to-apples comparisons between the performance of various AI models on competitions such as the IMO, or between such models and the human contestants.

Related to this, I will not be commenting on any self-reported AI competition performance results for which the methodology was not disclosed in advance of the competition.

👍26🔥8

6.59K views10:08

gonzo-обзоры ML статей

Нам было дано редкое, интерпретируемое для человека окно (CoT) в разум наших самых продвинутых творений, но нет гарантии, что это окно останется открытым.

https://news.1rj.ru/str/gonzo_ML_podcasts/524

gonzo_ML_podcasts

Chain of Thought Monitorability: A New and Fragile Opportunity for Al Safety
Authors: Tomek Korbak, Mikita Balesni, Elizabeth Barnes, Joe Benton, Mark Chen, Allan Dafoe, Scott Emmons, David Farhi, Dan Hendrycks, Evan Hubinger, Erik Jenner, Victoria Krakovna…

👍18❤5

6.99K viewsedited 22:45

gonzo-обзоры ML статей

Одна из статей, получивших Outstanding Paper Award на недавнем ICML 2025.

Адаптивный инференс для маскированных диффузионных моделей (MDM) сильно повышает качество решения задач планирования (например, судоку), обходя более тяжёлые авторегрессионные варианты:

https://news.1rj.ru/str/gonzo_ML_podcasts/528

Есть надежда, что мы увидим больше хороших текстовых диффузионок в ближайшее время!

gonzo_ML_podcasts

Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions
Jaeyeon Kim, Kulin Shah, Vasilis Kontonis, Sham Kakade, Sitan Chen
Статья: https://arxiv.org/abs/2502.06768

Англ пост: https://arxiviq.substack.com/p/icml-2025-outstanding…

🔥26👍2❤1

6.75K viewsedited 13:47

gonzo-обзоры ML статей

И ещё золотая медаль на IMO, теперь от Gemini и вроде как официально. Тоже 35 очков.

https://deepmind.google/discover/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/

Google DeepMind

Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad

Our advanced model officially achieved a gold-medal level performance on problems from the International Mathematical Olympiad (IMO), the world’s most prestigious competition for young...

😁27👍9🤡5

7K views19:21

gonzo-обзоры ML статей

Ещё одна статья с Outstanding Paper Award на ICML 2025. Критика next-token prediction, продвижение мульти-токенных методов и диффузии, а также неожиданно эффективный метод создания разнообразия на выходе модели, seed-conditioning, добавляющий рандомный бессмысленный текстовый шум на вход (seed-строка) и превосходящий температурный сэмплинг.

https://news.1rj.ru/str/gonzo_ML_podcasts/539

gonzo_ML_podcasts

Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
Authors: Vaishnavh Nagarajan, Chen Henry Wu, Charles Ding, Aditi Raghunathan
Paper: https://openreview.net/forum?id=Hi0SyHMmkd
Code: https://github.com/chenwu98/algorithmic…

1🔥14🤯2❤1🗿1

9.54K views13:19

gonzo-обзоры ML статей

https://news.1rj.ru/str/gonzo_ML_podcasts/550

gonzo_ML_podcasts

🗿2

6.72K views13:19

gonzo-обзоры ML статей

https://icml.cc/Conferences/2025/PublicationEthics

😁31🤡7👍6🔥3

7.72K views13:21

gonzo-обзоры ML статей

Продолжаем публикацию обзоров статей, взявших Outstanding Paper Award на ICML 2025.

Работа "The Value of Prediction in Identifying the Worst-Off" предлагает важный контр-аргумент подходу «точность превыше всего», который преобладает в прикладном машинном обучении. Она показывает, что во многих реальных сценариях с ограниченными ресурсами инвестиции в операционные возможности для реализации прогнозов приносят больше общественной пользы, чем незначительные улучшения в точности моделей. Коэффициент PAR даёт политикам принципиальный и основанный на данных инструмент, позволяющий выйти за рамки изолированных технических метрик и принимать целостные, учитывающие затраты решения о построении систем. Исследование знаменует собой взросление направления «ИИ для общественного блага», смещая фокус с вопроса «насколько точна модель?» на вопрос «каков самый эффективный способ повысить благосостояние и какое место в этом занимают прогнозы?».

https://news.1rj.ru/str/gonzo_ML_podcasts/551

gonzo_ML_podcasts

The Value of Prediction in Identifying the Worst-Off
Authors: Unai Fischer-Abaigar, Christoph Kern, Juan Carlos Perdomo
Paper: https://openreview.net/forum?id=26JsumCG0z
Code: В работе используется open-source библиотека CatBoost (https://catboost.ai).
Data:…

👍18😐6❤1👏1

6.37K views13:31

gonzo-обзоры ML статей

Теперь с Хассабисом поговорил

https://youtu.be/-HzgcbRXUK8

YouTube

Demis Hassabis: Future of AI, Simulating Reality, Physics and Video Games | Lex Fridman Podcast #475

Demis Hassabis is the CEO of Google DeepMind and Nobel Prize winner for his groundbreaking work in protein structure prediction using AI.
Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep475-sb
See below for timestamps,…

⚡14👍10❤1

6.63K views21:47

gonzo-обзоры ML статей

Ещё статья с Outstanding Paper Award на ICML 2025.

CollabLLM обучается на многоходовых роллаутах диалогов на базе симуляции пользователя и в итоге улучшает пользовательский опыт:

https://news.1rj.ru/str/gonzo_ML_podcasts/555

gonzo_ML_podcasts

CollabLLM: From Passive Responders to Active Collaborators
Authors: Shirley Wu, Michel Galley, Baolin Peng, Hao Cheng, Gavin Li, Yao Dou, Weixin Cai, James Zou, Jure Leskovec, Jianfeng Gao
Paper: https://arxiv.org/abs/2502.00640
Code: http://aka.ms/CollabLLM…

👍8❤1

6.51K views12:34

gonzo-обзоры ML статей

Любителям Байесовских методов и количественной оценки неопределённости, очередная Outstanding Paper Award на ICML 2025:

https://news.1rj.ru/str/gonzo_ML_podcasts/568

gonzo_ML_podcasts

Conformal Prediction as Bayesian Quadrature
Jake C. Snell, Thomas L. Griffiths
Статья: https://arxiv.org/abs/2502.13228, https://openreview.net/forum?id=PNmkjIzHB7
Код: https://github.com/jakesnell/conformal-as-bayes-quad
Англ.обзор: https://arxiviq.substack.com/p/icml…

❤10🔥4👍3

6.04K views12:09

gonzo-обзоры ML статей

И последняя из ICML 2025 Outstanding Paper Award (там ещё есть Outstanding Position Paper и Test of time).

Здесь про адаптацию Score Matching на пропущенные данные (среди прочего показывают, что заполнение нулём вообще не торт)

https://news.1rj.ru/str/gonzo_ML_podcasts/577

В теме Score Matching я не разбираюсь, так что если есть эксперты, интересно послушать ваше мнение.

gonzo_ML_podcasts

Score Matching with Missing Data
Josh Givens, Song Liu, Henry W J Reeve
Статья: https://arxiv.org/abs/2506.00557, https://openreview.net/forum?id=mBstuGUaXo
Код: https://github.com/joshgivens/ScoreMatchingwithMissingData
Англ обзор: https://arxiviq.substack.com/p/score…

🔥6👍2

5.79K views13:05

gonzo-обзоры ML статей

Почитать на выходные (но вероятно paywall).

Тема очередного номера The Economist — The economics of superintelligence

1. https://www.economist.com/leaders/2025/07/24/the-economics-of-superintelligence [краткий бриф следующих двух статей]

2. https://www.economist.com/briefing/2025/07/24/ai-labs-all-or-nothing-race-leaves-no-time-to-fuss-about-safety [про AI safety]

3. https://www.economist.com/briefing/2025/07/24/what-if-ai-made-the-worlds-economic-growth-explode [про влияние на экономику]

4. https://www.economist.com/business/2025/07/23/the-dark-horse-of-ai-labs [про Anthropic]

The Economist

The economics of superintelligence

If Silicon Valley’s predictions are even close to being accurate, expect unprecedented upheaval

🦄10❤2🤡2🤨1

5.45K viewsedited 19:23

gonzo-обзоры ML статей

Вдогонку к экономике сверхинтеллекта из предыдущего поста (кстати, я его чуть дополнил), статья с Outstanding Position Paper Award ICML 2025.

Между прочим, один из авторов — Бодхисаттва!

gonzo-обзоры ML статей

🙏6❤2🔥2

4.69K viewsedited 12:38

gonzo-обзоры ML статей

Forwarded from gonzo_ML_podcasts

Position: AI Safety Should Prioritize the Future of Work
Sanchaita Hazra, Bodhisattwa Prasad Majumder, Tuhin Chakrabarty
Статья: https://arxiv.org/abs/2504.13959, https://openreview.net/forum?id=CA9NxmmUG5
Англ обзор: https://arxiviq.substack.com/p/icml-2025-position-ai-safety-should

# TL;DR

О чём работа?
Авторы утверждают, что текущая парадигма безопасности ИИ опасно узка: она фокусируется на технических и долгосрочных экзистенциальных рисках, упуская из виду немедленные системные проблемы, которые ИИ создаёт для будущего рынка труда. В этой статье-позиции они используют устоявшиеся экономические теории — такие как рентоориентированное поведение (когда фирмы стремятся к богатству через манипулирование политикой, а не созданием ценности), межвременное потребление и институциональная экономика — чтобы описать общественные риски неконтролируемого внедрения ИИ. Среди этих рисков: дестабилизация экономики из-за нестабильности на рынке труда, усугубление неравенства в пользу капитала, а не труда, создание «алгоритмической монокультуры», мешающей обучению, и обесценивание творческого труда из-за массового нарушения авторских прав.

Почему это важно?
Самый важный вклад работы — в переосмыслении самого определения экзистенциального риска. Авторы приводят веские доводы в пользу того, что нам следует беспокоиться о «накопительных x-рисках» — своего рода «смерти от тысячи порезов» в результате системной потери рабочих мест, упадка институтов и колониализма данных — не меньше, чем о единичном «решающем» событии, вроде появления неконтролируемого сверхинтеллекта. Это смещает фокус «безопасности» с гипотетического будущего на насущные проблемы настоящего. Предлагая систему управления, ориентированную на работников, статья строит важнейший мост между техническими исследованиями ИИ и осязаемой, ориентированной на человека политикой, необходимой для направления развития ИИ в сторону всеобщего процветания, а не системных потрясений.

# Мясо 🍖

Область безопасности ИИ в основном была сосредоточена на опасениях, связанных с решающими, долгосрочными экзистенциальными рисками — сценариями с участием неконтролируемого сверхинтеллекта, биотерроризма или крупномасштабных манипуляций. Хотя эти опасения обоснованы, недавняя статья-позиция утверждает, что такой узкий фокус заставляет нас не видеть леса за деревьями. Авторы приводят веские аргументы в пользу того, что самый непосредственный и серьёзный риск исходит от системного подрыва человеческой субъектности и экономического достоинства работников, и что безопасность ИИ как дисциплина должна сделать своим приоритетом будущее рынка труда.

💡 Новый фреймворк для рисков, вызванных ИИ

Вместо нового алгоритма, эта статья предлагает новую оптику для взгляда на вред от ИИ. Методология авторов заключается в применении устоявшихся экономических и социальных теорий к текущему ландшафту ИИ для выявления ряда системных рисков, которые часто рассматриваются как вторичные внешние эффекты, а не как ключевые проблемы безопасности.

❤12👍7🔥3🤮1🥱1

4.21K views12:38

gonzo-обзоры ML статей

Forwarded from gonzo_ML_podcasts

В статье изложены шесть центральных утверждений (P1-P6), которые рисуют тревожную картину текущей траектории развития ИИ:
1. Экономическая дестабилизация (P1): Конкурентная «гонка вооружений» среди разработчиков ИИ приводит к поспешным внедрениям, накоплению «технического долга» и созданию массовой нестабильности занятости, что нарушает традиционные модели экономической стабильности.
2. Ускорение разрыва в навыках (P2): Автоматизация на основе ИИ непропорционально выгодна высококвалифицированным работникам и владельцам капитала, вытесняя низкоквалифицированный труд и увеличивая экономический разрыв без адекватной адаптации рабочей силы.
3. Экстрактивная экономика (P3): Доминирующие ИИ-компании рассматриваются как «экстрактивные (извлекающие) институты» — системы, предназначенные для перераспределения ресурсов от большинства к влиятельному меньшинству. Они концентрируют богатство и власть, ослабляя переговорную силу работников и препятствуя всеобщему процветанию, которое лежит в основе стабильных обществ.
4. Неравномерная глобальная демократизация (P4): Преимущества и контроль над ИИ сконцентрированы в странах с высоким доходом, что способствует форме «колониализма данных», при которой страны с низким доходом становятся зависимыми потребителями, а не со-создателями технологий.
5. Ухудшение обучения и креативности (P5): Чрезмерная зависимость от генеративного ИИ в образовании и исследованиях рискует создать «алгоритмическую монокультуру», подрывая навыки критического мышления и гомогенизируя человеческое самовыражение.
6. Обесценивание творческого труда (P6): Практика обучения моделей на огромных массивах данных, защищённых авторским правом, без справедливой компенсации определяется как прямая угроза средствам к существованию художников, писателей и других творческих работников.

Этот фреймворк особенно силён тем, что он переводит дискуссию с абстрактных, гипотетических сценариев будущего на конкретные, уже существующие проблемы, основанные на хорошо изученных экономических принципах, таких как рентоориентированное поведение и проблемы коллективного действия.

👷 Путь вперёд: ориентация на работников

После диагностики проблем авторы предлагают комплексную, ориентированную на работников систему управления ИИ, основанную на шести ключевых рекомендациях (R1-R6):
* Поддержка работников и политика: Правительства должны создать надёжные системы социальной защиты и программы переподготовки для поддержки работников, вытесненных ИИ (R1).
* Содействие открытости и конкуренции: Доминированию бигтеха следует противодействовать, продвигая ИИ с открытым исходным кодом, включая открытые данные и открытые веса, чтобы способствовать созданию более конкурентной и справедливой экосистемы (R2).
* Ответственность через технические средства защиты: Обязательное использование водяных знаков для всего контента, созданного генеративным ИИ, и финансирование исследований надёжных инструментов для его обнаружения имеют решающее значение для обеспечения подотчётности и борьбы с дезинформацией (R3, R4).
* Справедливая компенсация за данные: Авторы решительно выступают за политику, требующую раскрытия данных для обучения и внедрения систем компенсации на основе роялти, чтобы создатели контента получали справедливую плату за свою работу (R5).
* Инклюзивное управление: Чтобы избежать «захвата регулятора» (ситуации, когда регулирующий орган начинает действовать в интересах отрасли, а не общества), в процесс выработки политики необходимо вовлекать широкий круг заинтересованных сторон, включая профсоюзы и правозащитные группы. Это нужно, чтобы корпоративное лоббирование не перевешивало общественные интересы и интересы работников (R6).

👍12🤔8❤‍🔥3❤2👎2🔥2

3.62K views12:38

About

Blog

Apps

Platform