When Russia invaded Ukraine for seeking to join the Western sphere of influence, I was most concerned that a weak response would embolden further violations of our interests.
However, despite our less-than-half-hearted support and Trump's insidious behavior, the non-Western “BRICS sphere” has proven to be significantly weaker. Iran, Syria, and now Venezuela. How did Russia and China react? With empty words. North Korea was the only non-Western power to show any real support for the pipe dream of a multipolar world order.
The only thing saving them from total humiliation is that most Western countries still believe in a rules-based world order. Otherwise, we would take back places such as Transnistria and pick off everyone from Cuba to Belarus, leaving them with nothing more than their rump states.
Against all odds and despite the cowardice of European politicians, on the 1,410th day of Russia's three-day special military operation, we are stronger and they are weaker. Three countries within their sphere of influence have either been weakened or undergone regime change. Two countries in our sphere of influence joined NATO. They lost hundreds of thousands of soldiers, while we lost none, and Europe has gone from condemning armed drones as immoral to having startups capable of producing hundreds of thousands of fully autonomous attack drones.
However, despite our less-than-half-hearted support and Trump's insidious behavior, the non-Western “BRICS sphere” has proven to be significantly weaker. Iran, Syria, and now Venezuela. How did Russia and China react? With empty words. North Korea was the only non-Western power to show any real support for the pipe dream of a multipolar world order.
The only thing saving them from total humiliation is that most Western countries still believe in a rules-based world order. Otherwise, we would take back places such as Transnistria and pick off everyone from Cuba to Belarus, leaving them with nothing more than their rump states.
Against all odds and despite the cowardice of European politicians, on the 1,410th day of Russia's three-day special military operation, we are stronger and they are weaker. Three countries within their sphere of influence have either been weakened or undergone regime change. Two countries in our sphere of influence joined NATO. They lost hundreds of thousands of soldiers, while we lost none, and Europe has gone from condemning armed drones as immoral to having startups capable of producing hundreds of thousands of fully autonomous attack drones.
👍15🤡10❤2🥱1
This media is not supported in your browser
VIEW IN TELEGRAM
According to Terence Tao, for the next 10-20 years, humans and AI will have complementary strengths.
I would be very surprised if humans were still able to meaningfully contribute to mathematical research in 10 years. But yes, Tao-level human intelligence will probably remain complementary for a bunch of years.
I would be very surprised if humans were still able to meaningfully contribute to mathematical research in 10 years. But yes, Tao-level human intelligence will probably remain complementary for a bunch of years.
🤡6🙏5
The founder of Midjourney: https://x.com/DavidSHolz/status/2007650184680092158
Many people have been making similar statements recently. Of course, we haven't entered the Singularity yet. But things are certainly accelerating. The more useful this technology becomes, the more money and human resources will flow into improving it, creating a positive feedback loop. Things will turn truly singularatish once these tools start to meaninfully improve themselves without human intervention. This might possibly happen as early as 2028.
Many people have been making similar statements recently. Of course, we haven't entered the Singularity yet. But things are certainly accelerating. The more useful this technology becomes, the more money and human resources will flow into improving it, creating a positive feedback loop. Things will turn truly singularatish once these tools start to meaninfully improve themselves without human intervention. This might possibly happen as early as 2028.
🥴7🤡5
The Chinese-supplied radar network also didn't help: https://udn.com/news/story/124698/9243412
The piece says Venezuelan air defense uses multiple Chinese radars (named as JYL-1 long-range 3D surveillance radar and JY-27 / JY-27A “counter-stealth” radar).
P.S. Russia has lost roughly 460 air defense systems since 2022 https://www.oryxspioenkop.com/2022/02/attack-on-europe-documenting-equipment.html
The piece says Venezuelan air defense uses multiple Chinese radars (named as JYL-1 long-range 3D surveillance radar and JY-27 / JY-27A “counter-stealth” radar).
P.S. Russia has lost roughly 460 air defense systems since 2022 https://www.oryxspioenkop.com/2022/02/attack-on-europe-documenting-equipment.html
😁12🤡3🏆1
In China, A.I. Is Finding Deadly Tumors That Doctors Might Miss
Read more: https://www.nytimes.com/2026/01/02/world/asia/china-ai-cancer-pancreatic.html [no paywall: https://archive.is/QvsBv]
Also: Here’s proof that Claude Code can write an entire empirical polisci paper. https://github.com/andybhall/vbm-replication-extension
All of those patients had come to the hospital with complaints like bloating or nausea and had not initially seen a pancreatic specialist, Dr. Zhu said. Several of their CT scans had raised no alarms until they were flagged by the A.I. tool.
“I think you can 100 percent say A.I. saved their lives,” he said.
Read more: https://www.nytimes.com/2026/01/02/world/asia/china-ai-cancer-pancreatic.html [no paywall: https://archive.is/QvsBv]
Also: Here’s proof that Claude Code can write an entire empirical polisci paper. https://github.com/andybhall/vbm-replication-extension
❤4🤡2👍1👏1
Since there are still people talking about "stochastic parrots" that merely copy from their training data, let me quickly say that we've long moved beyond LLMs.
Modern systems like Claude Code are closed-loop systems "grounded" in a hard reality: compilers, unit tests, linters, benchmarks, mathematical checkers, and other evaluators. Coding agents don’t just “guess” code. They write it, run it, observe failures, and iterate. In other words, they have something older ML systems lacked: tight feedback loops that let them reliably converge on working code rather than a plausible-sounding one.
And once you wrap an LLM in a search-and-evaluate harness (like evolutionary or population-based optimization), you get something even more important: systematic exploration. At that point, the model isn’t "recalling" a solution so much as proposing candidates that are filtered by reality. The LLM supplies strong priors (intuition) that prune the search space, while the harness supplies the reasoning pressure by checking what actually works.
This is why systems in the AlphaEvolve family are a big deal: they can generate candidate programs, measure them, keep improvements, and repeat. Google explicitly stated that AlphaEvolve was used to optimize the code for the very TPUs (AI chips) and kernels used to train the Gemini models themselves. The AI literally optimized its own "brain" and "nervous system." The novelty comes from the interaction between a generative prior and an objective evaluator, not from magical memorization. The intelligence emerges from the system, not just the large language model.
Furthermore, when a system like AlphaEvolve explores a problem and finds a novel solution, the traces are synthetic data that can be fed back into the training loop to make the base model inherently smarter. In agentic synthetic training, the model learns from a curated set of perfect trajectories. The system records every "thought" (Chain of Thought) and every tool call the agent made to get there. Claude Code generates millions of clean, verified coding traces that are almost certainly used to make or "Claude 5" better at coding out of the box.
P.S. Also, don't forget that compression is intelligence. If you compare the size of the training data to the size of the weights of the model, you get huge compression ratios (320:1 for Llama 3). To fit all of human knowledge into a 140-gigabyte "container," the model cannot simply copy-paste. Whatever LLMs are doing, it is overwhelmingly compressed abstraction, not simple storage of everything they’ve seen.
Modern systems like Claude Code are closed-loop systems "grounded" in a hard reality: compilers, unit tests, linters, benchmarks, mathematical checkers, and other evaluators. Coding agents don’t just “guess” code. They write it, run it, observe failures, and iterate. In other words, they have something older ML systems lacked: tight feedback loops that let them reliably converge on working code rather than a plausible-sounding one.
And once you wrap an LLM in a search-and-evaluate harness (like evolutionary or population-based optimization), you get something even more important: systematic exploration. At that point, the model isn’t "recalling" a solution so much as proposing candidates that are filtered by reality. The LLM supplies strong priors (intuition) that prune the search space, while the harness supplies the reasoning pressure by checking what actually works.
This is why systems in the AlphaEvolve family are a big deal: they can generate candidate programs, measure them, keep improvements, and repeat. Google explicitly stated that AlphaEvolve was used to optimize the code for the very TPUs (AI chips) and kernels used to train the Gemini models themselves. The AI literally optimized its own "brain" and "nervous system." The novelty comes from the interaction between a generative prior and an objective evaluator, not from magical memorization. The intelligence emerges from the system, not just the large language model.
Furthermore, when a system like AlphaEvolve explores a problem and finds a novel solution, the traces are synthetic data that can be fed back into the training loop to make the base model inherently smarter. In agentic synthetic training, the model learns from a curated set of perfect trajectories. The system records every "thought" (Chain of Thought) and every tool call the agent made to get there. Claude Code generates millions of clean, verified coding traces that are almost certainly used to make or "Claude 5" better at coding out of the box.
P.S. Also, don't forget that compression is intelligence. If you compare the size of the training data to the size of the weights of the model, you get huge compression ratios (320:1 for Llama 3). To fit all of human knowledge into a 140-gigabyte "container," the model cannot simply copy-paste. Whatever LLMs are doing, it is overwhelmingly compressed abstraction, not simple storage of everything they’ve seen.
👍11🔥8🤡4
Meta has an agentic kernel engineer called KernelEvolve that reduces kernel development time “from weeks to hours.” The kernels it produces are competitive with expert-written implementations.
KernelEvolve is operating continuously in Meta’s production infrastructure, generating optimized kernels for hundreds of models serving billions of users daily.
This translates to multi-million-dollar reductions in infrastructure operating costs while simultaneously enhancing user engagement metrics that correlate directly with advertising revenue.
KernelEvolve isn’t “an LLM that writes kernels.” It’s a closed-loop system that treats kernel optimization like a search problem. It is an agentic kernel-coding framework that takes a kernel specification and then generates, tests, profiles, debugs, and iteratively improves candidate implementations across multiple programming layers.
Read the paper: https://arxiv.org/abs/2512.23236
KernelEvolve is operating continuously in Meta’s production infrastructure, generating optimized kernels for hundreds of models serving billions of users daily.
This translates to multi-million-dollar reductions in infrastructure operating costs while simultaneously enhancing user engagement metrics that correlate directly with advertising revenue.
KernelEvolve isn’t “an LLM that writes kernels.” It’s a closed-loop system that treats kernel optimization like a search problem. It is an agentic kernel-coding framework that takes a kernel specification and then generates, tests, profiles, debugs, and iteratively improves candidate implementations across multiple programming layers.
Read the paper: https://arxiv.org/abs/2512.23236
👍6🤡3❤1🥴1
Sakana AI's optimization agent “ALE-Agent” took 1st place in AtCoder Heuristic Contest 058 (AHC058), beating 804 human participants. This is the first known case of an AI agent winning a major real-time optimization programming contest.
AHC problems are time-limited “write-a-solver” contests where the goal is to produce a high-scoring heuristic for a complex optimization problem (often reflecting real industrial planning/logistics-style constraints).
The agent extracted insights from its trial-and-error trajectory and reflects them in the next improvement cycle, allowing it to escape local optima that typically trap simpler algorithms.
The total compute and API cost for the 4-hour session was approximately $1,300.
Read more: https://sakana.ai/ahc058/
AHC problems are time-limited “write-a-solver” contests where the goal is to produce a high-scoring heuristic for a complex optimization problem (often reflecting real industrial planning/logistics-style constraints).
The agent extracted insights from its trial-and-error trajectory and reflects them in the next improvement cycle, allowing it to escape local optima that typically trap simpler algorithms.
The total compute and API cost for the 4-hour session was approximately $1,300.
Read more: https://sakana.ai/ahc058/
🤡2🔥1
Media is too big
VIEW IN TELEGRAM
CEO Jensen Huang announced Alpamayo which he calls the world’s first thinking and reasoning model built for autonomous vehicles.
By open sourcing the Alpamayo stack, Nvidia is pushing self-driving forward as a category after years of work by thousands of engineers.
Read more: A Complete, Open Ecosystem for Reasoning‑Based Autonomy https://nvidianews.nvidia.com/news/alpamayo-autonomous-vehicle-development
By open sourcing the Alpamayo stack, Nvidia is pushing self-driving forward as a category after years of work by thousands of engineers.
Read more: A Complete, Open Ecosystem for Reasoning‑Based Autonomy https://nvidianews.nvidia.com/news/alpamayo-autonomous-vehicle-development
🤡4🙏2👍1
Links for 2026-01-06
AI
1. NVIDIA Releases New Physical AI Models as Global Partners Unveil Next-Generation Robots https://nvidianews.nvidia.com/news/nvidia-releases-new-physical-ai-models-as-global-partners-unveil-next-generation-robots
2. Boston Dynamics & Google DeepMind Form New AI Partnership to Bring Foundational Intelligence to Humanoid Robots https://bostondynamics.com/blog/boston-dynamics-google-deepmind-form-new-ai-partnership/
3. AI-generated sensors open new paths for early cancer detection https://news.mit.edu/2026/ai-generated-sensors-open-new-paths-early-cancer-detection-0106
4. Code World Model: Building World Models for Computation https://www.youtube.com/watch?v=sYgE4ppDFOQ
5. MiniMax M2.1 is open source - SOTA for real-world dev workflows and agentic systems https://www.minimax.io/news/minimax-m21
6. Claude Code is about so much more than coding https://www.transformernews.ai/p/claude-code-is-about-so-much-more
7. “Claude Wrote Me a 400-Commit RSS Reader App” https://www.lesswrong.com/posts/vzaZwZgifypbnSiuf/claude-wrote-me-a-400-commit-rss-reader-app
8. What Drives Success in Physical Planning with Joint-Embedding Predictive World Models? https://arxiv.org/abs/2512.24497
9. Large language models and the entropy of English https://arxiv.org/abs/2512.24969
10. Wait, Wait, Wait... Why Do Reasoning Models Loop? https://arxiv.org/abs/2512.12895
11. Researchers at the University of Waterloo have discovered a method for pretraining LLMs that is both more accurate than current techniques and 50% more efficient. [PDF] https://openreview.net/pdf?id=6geRIdlFWJ
12. SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios https://arxiv.org/abs/2512.18470
13. PostTrainBench: Measuring how well AI agents can post-train language models https://posttrainbench.com/
14. MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes https://morebench.github.io/
15. Exploring a future where AI handles the execution, shifting our primary role from technical "doers" to high-level "directors" and critics. https://fidjisimo.substack.com/p/closing-the-capability-gap
16. Oversight Assistants: Turning Compute into Understanding https://www.lesswrong.com/posts/oZuJvSNuYk6busjqf/oversight-assistants-turning-compute-into-understanding
17.In Ukraine, an Arsenal of Killer A.I. Drones Is Being Born in War Against Russia https://www.nytimes.com/2025/12/31/magazine/ukraine-ai-drones-war-russia.html [no paywall: https://archive.is/EzScy]
Neuroscience
1. When will we be able to decode a non-trivial memory based on structural images from a preserved brain? https://neurobiology.substack.com/p/when-will-we-be-able-to-decode-a
2. Your Brain Doesn’t Command Your Body. It Predicts It. https://www.youtube.com/watch?v=RvYSsi6rd4g&t=2s
Miscellaneous
1. Information Without Rents: Mechanism Design Without Expected Utility — The economic idea that “private information equals profit” depends entirely on the assumption that people calculate risk using standard Expected Utility. Once you remove that assumption, a clever Designer can use dynamic steps to strip away all the privacy benefits of sophisticated agents, unless those agents are specifically “Concave” (super cautious) in their preferences. https://cowles.yale.edu/research/cfdp-2481-information-without-rents-mechanism-design-without-expected-utility
2. Enlightenment Ideals and Belief in Progress in the Run-up to the Industrial Revolution: A Textual Analysis https://academic.oup.com/qje/advance-article/doi/10.1093/qje/qjaf054/8361686
3. Is the European Union now the world’s most underrated libertarian project? https://cpsi.media/p/the-case-for-a-eu-progress-studies
AI
1. NVIDIA Releases New Physical AI Models as Global Partners Unveil Next-Generation Robots https://nvidianews.nvidia.com/news/nvidia-releases-new-physical-ai-models-as-global-partners-unveil-next-generation-robots
2. Boston Dynamics & Google DeepMind Form New AI Partnership to Bring Foundational Intelligence to Humanoid Robots https://bostondynamics.com/blog/boston-dynamics-google-deepmind-form-new-ai-partnership/
3. AI-generated sensors open new paths for early cancer detection https://news.mit.edu/2026/ai-generated-sensors-open-new-paths-early-cancer-detection-0106
4. Code World Model: Building World Models for Computation https://www.youtube.com/watch?v=sYgE4ppDFOQ
5. MiniMax M2.1 is open source - SOTA for real-world dev workflows and agentic systems https://www.minimax.io/news/minimax-m21
6. Claude Code is about so much more than coding https://www.transformernews.ai/p/claude-code-is-about-so-much-more
7. “Claude Wrote Me a 400-Commit RSS Reader App” https://www.lesswrong.com/posts/vzaZwZgifypbnSiuf/claude-wrote-me-a-400-commit-rss-reader-app
8. What Drives Success in Physical Planning with Joint-Embedding Predictive World Models? https://arxiv.org/abs/2512.24497
9. Large language models and the entropy of English https://arxiv.org/abs/2512.24969
10. Wait, Wait, Wait... Why Do Reasoning Models Loop? https://arxiv.org/abs/2512.12895
11. Researchers at the University of Waterloo have discovered a method for pretraining LLMs that is both more accurate than current techniques and 50% more efficient. [PDF] https://openreview.net/pdf?id=6geRIdlFWJ
12. SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios https://arxiv.org/abs/2512.18470
13. PostTrainBench: Measuring how well AI agents can post-train language models https://posttrainbench.com/
14. MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes https://morebench.github.io/
15. Exploring a future where AI handles the execution, shifting our primary role from technical "doers" to high-level "directors" and critics. https://fidjisimo.substack.com/p/closing-the-capability-gap
16. Oversight Assistants: Turning Compute into Understanding https://www.lesswrong.com/posts/oZuJvSNuYk6busjqf/oversight-assistants-turning-compute-into-understanding
17.In Ukraine, an Arsenal of Killer A.I. Drones Is Being Born in War Against Russia https://www.nytimes.com/2025/12/31/magazine/ukraine-ai-drones-war-russia.html [no paywall: https://archive.is/EzScy]
Neuroscience
1. When will we be able to decode a non-trivial memory based on structural images from a preserved brain? https://neurobiology.substack.com/p/when-will-we-be-able-to-decode-a
2. Your Brain Doesn’t Command Your Body. It Predicts It. https://www.youtube.com/watch?v=RvYSsi6rd4g&t=2s
Miscellaneous
1. Information Without Rents: Mechanism Design Without Expected Utility — The economic idea that “private information equals profit” depends entirely on the assumption that people calculate risk using standard Expected Utility. Once you remove that assumption, a clever Designer can use dynamic steps to strip away all the privacy benefits of sophisticated agents, unless those agents are specifically “Concave” (super cautious) in their preferences. https://cowles.yale.edu/research/cfdp-2481-information-without-rents-mechanism-design-without-expected-utility
2. Enlightenment Ideals and Belief in Progress in the Run-up to the Industrial Revolution: A Textual Analysis https://academic.oup.com/qje/advance-article/doi/10.1093/qje/qjaf054/8361686
3. Is the European Union now the world’s most underrated libertarian project? https://cpsi.media/p/the-case-for-a-eu-progress-studies
🙏4🤡4👍1🔥1
Utah has become the first state to allow AI to renew medical prenoscriptions with no doctor involved. The company, Doctronic, also secured a malpractice insurance policy for their AI. Their data also shows that their system matches doctors treatment plans 99.2% of the time.
Sources:
- https://www.medrxiv.org/content/10.1101/2025.07.14.25331406v1
- https://www.politico.com/news/2026/01/06/artificial-intelligence-prescribing-medications-utah-00709122
Sources:
- https://www.medrxiv.org/content/10.1101/2025.07.14.25331406v1
- https://www.politico.com/news/2026/01/06/artificial-intelligence-prescribing-medications-utah-00709122
💊11❤3🤡2🔥1👏1
Terence Tao has a page that "collects the various ways in which AI tools have contributed to the understanding of Erdős problems."
Today, the first autonomously AI generated / formalized Erdos problem was added. A combination of Aristotle and GPT 5.2 Pro was used to achieve this.
AI contributions to Erdős problems: https://github.com/teorth/erdosproblems/wiki/AI-contributions-to-Erd%C5%91s-problems
Remember that math is amenable to self-generated (“synthetic”) training data because you can build very strong verifiers that can mechanically check whether a purported proof is valid.
For one of many data points, see page 14 of Google's AlphaGeometry2 paper:
So an AI can generate its own training signal by trying proofs, checking them, and reinforcing what works (analogous to self-play in chess/Go, where AI is strongly superhuman now). This makes AI that is “superhuman at math” a plausible goal.
Today, the first autonomously AI generated / formalized Erdos problem was added. A combination of Aristotle and GPT 5.2 Pro was used to achieve this.
AI contributions to Erdős problems: https://github.com/teorth/erdosproblems/wiki/AI-contributions-to-Erd%C5%91s-problems
Remember that math is amenable to self-generated (“synthetic”) training data because you can build very strong verifiers that can mechanically check whether a purported proof is valid.
For one of many data points, see page 14 of Google's AlphaGeometry2 paper:
Our geometry experts and IMO medalists consider many AlphaGeometry solutions to exhibit superhuman creativity.
So an AI can generate its own training signal by trying proofs, checking them, and reinforcing what works (analogous to self-play in chess/Go, where AI is strongly superhuman now). This makes AI that is “superhuman at math” a plausible goal.
🤡4❤3👍1
A comprehensive thesis on an imminent explosion of the robotics industry. In other words, robotics is going to start working much, much faster than people expect.
The cost of robots is predicted to crash, eventually costing closer to an iPhone than a car, due to lower material requirements and "robots making robots".
The primary economic outcome will be massive deflation as the cost of labor drops to near zero.
Read it: https://finaloffshoring.com/
The cost of robots is predicted to crash, eventually costing closer to an iPhone than a car, due to lower material requirements and "robots making robots".
The primary economic outcome will be massive deflation as the cost of labor drops to near zero.
Read it: https://finaloffshoring.com/
👍5🤩2🤡2😁1
Propose, Solve, Verify (PSV): Self-play for code with proofs, not tests
Most AI coding systems learn from unit tests: they write code, run it on a few examples, and get rewarded if it passes. But tests are incomplete. Code can pass all tests and still be wrong on rare inputs. So the AI can learn “cheap tricks,” and those errors can spread during training.
PSV replaces this with a much stricter judge: formal verification. Instead of checking a handful of examples, a verifier tries to prove mathematically that the program meets the specification for all possible inputs.
The PSV loop
1. Propose: write the “what”
The Proposer invents a new task by writing a specification (a precise denoscription of what the program must do).
Here is the crucial idea: the proposer does not need to be a genius that can foresee what will stump a superhuman solver (like designing a benchmark for Terence Tao). PSV relies on a simpler asymmetry:
- It's often easy to state constraints.
- It's often harder to satisfy them (and prove you did).
A proposer can cheaply say: “Sort this list,” or “Sort it and keep it stable,” or “Sort it, return an index mapping, and prove nothing was lost or duplicated.” Stacking constraints is much easier to describe than to implement and prove them correct.
2. Solve: do the “how”
The Solver tries to write the program (and the proof-style annotations the verifier needs). It samples many attempts (like trying many “mutations”).
3. Verify: harsh selection
A formal verifier checks each attempt. Only solutions that are provably correct count as wins. This is the key difference from unit tests: passing isn't “it worked on a few examples,” it’s “it’s correct for everything.”
4. Learn: keep the survivors
The solver then trains on those verified wins, becoming more likely to produce correct solutions next time.
How problems get harder without a smarter proposer: PSV makes “hard” relative, not absolute. Instead of the proposer guessing difficulty in advance, the system measures it empirically:
- If the solver verifies solutions for a spec 90% of the time, it’s easy.
- If it verifies only 10% of the time, it’s hard.
- If it never verifies, it’s too hard (for now).
The proposer is shown examples labeled EASY / MEDIUM / HARD and asked to generate new specs at a target difficulty. If the solver starts succeeding too often, the system nudges the proposer toward harder specs (more conditions, tighter guarantees). If nothing succeeds, it nudges back.
So the proposer doesn’t need to “outthink” the solver. It just needs to generate many candidate specs, while the system uses feedback (pass rates) to keep the difficulty near the frontier (like a teacher who adjusts homework based on how the student actually performs).
Where the intelligence increase comes from. PSV is basically evolution with a very strict referee:
- Variation: many proposed problems, many attempted solutions.
- Selection: only solutions that pass the verifier survive
- Inheritance: the solver trains on the survivors.
- Moving frontier: as the solver improves, yesterday’s “hard” becomes today’s “medium,” so the system keeps pushing forward.
That’s why self-play works here: the verifier prevents the loop from “learning lies,” and the proposer and feedback mechanism keep generating fresh challenges just beyond the current capability.
A sign it scales: In the paper, generating more proposed tasks per round helped. Increasing from 1,000 to 32,000 proposed questions raised MBPP pass@1 from 22.3% to 44.3%. This is consistent with the idea that more self-generated practice plus strict verification produces real capability gains.
Paper: https://arxiv.org/abs/2512.18160
Most AI coding systems learn from unit tests: they write code, run it on a few examples, and get rewarded if it passes. But tests are incomplete. Code can pass all tests and still be wrong on rare inputs. So the AI can learn “cheap tricks,” and those errors can spread during training.
PSV replaces this with a much stricter judge: formal verification. Instead of checking a handful of examples, a verifier tries to prove mathematically that the program meets the specification for all possible inputs.
The PSV loop
1. Propose: write the “what”
The Proposer invents a new task by writing a specification (a precise denoscription of what the program must do).
Here is the crucial idea: the proposer does not need to be a genius that can foresee what will stump a superhuman solver (like designing a benchmark for Terence Tao). PSV relies on a simpler asymmetry:
- It's often easy to state constraints.
- It's often harder to satisfy them (and prove you did).
A proposer can cheaply say: “Sort this list,” or “Sort it and keep it stable,” or “Sort it, return an index mapping, and prove nothing was lost or duplicated.” Stacking constraints is much easier to describe than to implement and prove them correct.
2. Solve: do the “how”
The Solver tries to write the program (and the proof-style annotations the verifier needs). It samples many attempts (like trying many “mutations”).
3. Verify: harsh selection
A formal verifier checks each attempt. Only solutions that are provably correct count as wins. This is the key difference from unit tests: passing isn't “it worked on a few examples,” it’s “it’s correct for everything.”
4. Learn: keep the survivors
The solver then trains on those verified wins, becoming more likely to produce correct solutions next time.
How problems get harder without a smarter proposer: PSV makes “hard” relative, not absolute. Instead of the proposer guessing difficulty in advance, the system measures it empirically:
- If the solver verifies solutions for a spec 90% of the time, it’s easy.
- If it verifies only 10% of the time, it’s hard.
- If it never verifies, it’s too hard (for now).
The proposer is shown examples labeled EASY / MEDIUM / HARD and asked to generate new specs at a target difficulty. If the solver starts succeeding too often, the system nudges the proposer toward harder specs (more conditions, tighter guarantees). If nothing succeeds, it nudges back.
So the proposer doesn’t need to “outthink” the solver. It just needs to generate many candidate specs, while the system uses feedback (pass rates) to keep the difficulty near the frontier (like a teacher who adjusts homework based on how the student actually performs).
Where the intelligence increase comes from. PSV is basically evolution with a very strict referee:
- Variation: many proposed problems, many attempted solutions.
- Selection: only solutions that pass the verifier survive
- Inheritance: the solver trains on the survivors.
- Moving frontier: as the solver improves, yesterday’s “hard” becomes today’s “medium,” so the system keeps pushing forward.
That’s why self-play works here: the verifier prevents the loop from “learning lies,” and the proposer and feedback mechanism keep generating fresh challenges just beyond the current capability.
A sign it scales: In the paper, generating more proposed tasks per round helped. Increasing from 1,000 to 32,000 proposed questions raised MBPP pass@1 from 22.3% to 44.3%. This is consistent with the idea that more self-generated practice plus strict verification produces real capability gains.
Paper: https://arxiv.org/abs/2512.18160
❤1🤡1
Epiplexity
How can next-token prediction on human text lead to superhuman skills? How can synthetic data sometimes beat “real” data? And how did AlphaZero learn so much from nothing but the rules of chess? Classic information theory seems to say this shouldn’t happen. Yet it clearly does.
The problem is that traditional information theory assumes an observer with unlimited computing power. An unbounded observer can crack any code and reverse any function instantly. To them, a cryptographically encrypted message is "simple" because they can easily find the seed that generated it, distinguishing it easily from pure random noise. If you ignore time, ciphertext isn’t "random", it's the output of a short recipe plus a key. But if you can't afford the computation, it behaves like noise.
But AI systems don't have infinite compute. They’re bounded. And once time and compute matter, a new distinction appears:
- Time-Bounded Entropy (Randomness): Data that is computationally hard to predict. This includes true noise, but also things like encryption keys or complex hashes that look random to a neural network.
- Epiplexity (Structure): Patterns, abstractions, and rules that a model can actually learn and use to compress the data within a reasonable time.
They formalize it roughly like this:
1. Find the smallest model that can predict the data within a time limit.
2. The size of that model is epiplexity. Whatever remains unpredictable is time-bounded entropy.
This solves the paradox. Random noise has high entropy but low epiplexity because no amount of computing power helps you find a pattern, so the model learns nothing. Meanwhile, a strategy game or a textbook has high epiplexity. It forces the model to build complex internal circuits (shortcuts and concepts) to predict the data efficiently.
A neat example from the paper: training a model to predict chess moves is standard. But training it to predict the game in reverse (inferring moves from the final board) is computationally harder. This difficulty forces the model to learn deeper representations of the board state (higher epiplexity), which actually improves its performance on new, unseen chess puzzles. The computation "created" information by converting the implicit consequences of the rules into explicit, usable structures (epiplexity) that the model can now use to play well.
In summary:
The value of data isn’t just about how unpredictable it is. It’s about how much reusable structure it induces in a learner that has real-world limits.
Epiplexity is the amount of structure a model is worth learning because it reduces prediction error enough to justify the added complexity under a time limit.
Read the paper: https://arxiv.org/abs/2601.03220
How can next-token prediction on human text lead to superhuman skills? How can synthetic data sometimes beat “real” data? And how did AlphaZero learn so much from nothing but the rules of chess? Classic information theory seems to say this shouldn’t happen. Yet it clearly does.
The problem is that traditional information theory assumes an observer with unlimited computing power. An unbounded observer can crack any code and reverse any function instantly. To them, a cryptographically encrypted message is "simple" because they can easily find the seed that generated it, distinguishing it easily from pure random noise. If you ignore time, ciphertext isn’t "random", it's the output of a short recipe plus a key. But if you can't afford the computation, it behaves like noise.
But AI systems don't have infinite compute. They’re bounded. And once time and compute matter, a new distinction appears:
- Time-Bounded Entropy (Randomness): Data that is computationally hard to predict. This includes true noise, but also things like encryption keys or complex hashes that look random to a neural network.
- Epiplexity (Structure): Patterns, abstractions, and rules that a model can actually learn and use to compress the data within a reasonable time.
They formalize it roughly like this:
1. Find the smallest model that can predict the data within a time limit.
2. The size of that model is epiplexity. Whatever remains unpredictable is time-bounded entropy.
This solves the paradox. Random noise has high entropy but low epiplexity because no amount of computing power helps you find a pattern, so the model learns nothing. Meanwhile, a strategy game or a textbook has high epiplexity. It forces the model to build complex internal circuits (shortcuts and concepts) to predict the data efficiently.
A neat example from the paper: training a model to predict chess moves is standard. But training it to predict the game in reverse (inferring moves from the final board) is computationally harder. This difficulty forces the model to learn deeper representations of the board state (higher epiplexity), which actually improves its performance on new, unseen chess puzzles. The computation "created" information by converting the implicit consequences of the rules into explicit, usable structures (epiplexity) that the model can now use to play well.
In summary:
The value of data isn’t just about how unpredictable it is. It’s about how much reusable structure it induces in a learner that has real-world limits.
Epiplexity is the amount of structure a model is worth learning because it reduces prediction error enough to justify the added complexity under a time limit.
Read the paper: https://arxiv.org/abs/2601.03220
👍4🤡3⚡2
Spatiotemporal abstractions
Imagine trying to teach a robot to navigate a complex maze.
Traditional training uses trial and error. The robot tries random movements and gets a reward if it succeeds. The problem is that the AI model controlling the robot isn't deciding on meaningful steps like "walk to the door." Instead, it chooses tiny motor commands, similar to individual muscle twitches.
If the robot has to guess the correct sequence of millions of muscle twitches to solve a maze by random chance, it will fail every time. It flails around, never reaches the goal, and learns nothing.
To solve this, Google researchers first taught the robot simply by having it watch experts. The robot learned to predict the expert's next split-second movement.
Surprisingly, the researchers found that while the robot was learning these tiny movements, it was secretly building a map of the bigger picture. To predict the next twitch accurately, the robot internally needed to know "I am currently walking toward the red door."
The lead researcher explains this difference using coffee. Making a cup of coffee involves tiny, split-second hand movements. But it also involves massive, long-term goals (like driving to the store to buy beans). Traditional robots get stuck optimizing the hand movements. This new approach allows them to "plan the trip."
The researchers created a way to tap into these hidden internal plans. Instead of letting the AI decide every single muscle movement 100 times a second, they built a "steering wheel" that forces the AI to pick one of those high-level intentions (like "go to the red door") and stick with it for a while.
This works for the same reason humans don't plan their day by focusing on individual footsteps. Instead of searching through trillions of twitch combinations, the AI only has to choose between a few high-level plans. Because each choice lasts longer and does something useful, the robot stops flailing and actually reaches the goal, allowing it to finally learn from its success.
The researchers believe this architecture mimics human biology. The main AI model acts like the Cortex (constantly predicting what happens next based on what it sees), while the new "steering wheel" mechanism acts like the Basal Ganglia (nudging the cortex toward rewarding goals and habits).
In summary: Think of the “steering wheel” as a filter. Without it, the robot considers every possible muscle twitch at every millisecond (a search space so vast it is effectively infinite). By locking in a high-level intention, the steering wheel prunes the search space. It forces the robot to ignore the billions of random twitch combinations that don’t help reach the current sub-goal, making low-level actions goal-directed rather than random.
Paper: https://arxiv.org/abs/2512.20605
Talk: https://www.youtube.com/watch?v=cx_MIhvAOYM
Imagine trying to teach a robot to navigate a complex maze.
Traditional training uses trial and error. The robot tries random movements and gets a reward if it succeeds. The problem is that the AI model controlling the robot isn't deciding on meaningful steps like "walk to the door." Instead, it chooses tiny motor commands, similar to individual muscle twitches.
If the robot has to guess the correct sequence of millions of muscle twitches to solve a maze by random chance, it will fail every time. It flails around, never reaches the goal, and learns nothing.
To solve this, Google researchers first taught the robot simply by having it watch experts. The robot learned to predict the expert's next split-second movement.
Surprisingly, the researchers found that while the robot was learning these tiny movements, it was secretly building a map of the bigger picture. To predict the next twitch accurately, the robot internally needed to know "I am currently walking toward the red door."
The lead researcher explains this difference using coffee. Making a cup of coffee involves tiny, split-second hand movements. But it also involves massive, long-term goals (like driving to the store to buy beans). Traditional robots get stuck optimizing the hand movements. This new approach allows them to "plan the trip."
The researchers created a way to tap into these hidden internal plans. Instead of letting the AI decide every single muscle movement 100 times a second, they built a "steering wheel" that forces the AI to pick one of those high-level intentions (like "go to the red door") and stick with it for a while.
This works for the same reason humans don't plan their day by focusing on individual footsteps. Instead of searching through trillions of twitch combinations, the AI only has to choose between a few high-level plans. Because each choice lasts longer and does something useful, the robot stops flailing and actually reaches the goal, allowing it to finally learn from its success.
The researchers believe this architecture mimics human biology. The main AI model acts like the Cortex (constantly predicting what happens next based on what it sees), while the new "steering wheel" mechanism acts like the Basal Ganglia (nudging the cortex toward rewarding goals and habits).
In summary: Think of the “steering wheel” as a filter. Without it, the robot considers every possible muscle twitch at every millisecond (a search space so vast it is effectively infinite). By locking in a high-level intention, the steering wheel prunes the search space. It forces the robot to ignore the billions of random twitch combinations that don’t help reach the current sub-goal, making low-level actions goal-directed rather than random.
Paper: https://arxiv.org/abs/2512.20605
Talk: https://www.youtube.com/watch?v=cx_MIhvAOYM
❤5🤡1
Digital Red Queen: Adversarial Program Evolution in Core War with LLMs
What happens when you let an LLM write tiny computer programs that compete for control of a virtual computer's memory?
Setup:
- The environment: A chaotic “toy computer” where programs live inside shared memory. Because the program’s instructions are stored in the same place as its working data, programs can overwrite (sometimes even copy) themselves, and they can try to corrupt their opponents.
- The Algorithm: A self-play algorithm inspired by the Red Queen hypothesis in evolutionary biology, which suggests organisms must constantly adapt just to maintain their relative fitness against evolving competitors.
- The Process: Instead of training against a static objective, the algorithm evolves a lineage of warriors. In each round, an LLM generates a new warrior specifically designed to defeat all previous versions in the lineage.
Outcome: As the evolutionary arms race progresses, the warriors become increasingly robust and general-purpose, capable of defeating human-designed strategies they were never explicitly trained against.
Read the paper: https://sakana.ai/drq/
What happens when you let an LLM write tiny computer programs that compete for control of a virtual computer's memory?
Setup:
- The environment: A chaotic “toy computer” where programs live inside shared memory. Because the program’s instructions are stored in the same place as its working data, programs can overwrite (sometimes even copy) themselves, and they can try to corrupt their opponents.
- The Algorithm: A self-play algorithm inspired by the Red Queen hypothesis in evolutionary biology, which suggests organisms must constantly adapt just to maintain their relative fitness against evolving competitors.
- The Process: Instead of training against a static objective, the algorithm evolves a lineage of warriors. In each round, an LLM generates a new warrior specifically designed to defeat all previous versions in the lineage.
Outcome: As the evolutionary arms race progresses, the warriors become increasingly robust and general-purpose, capable of defeating human-designed strategies they were never explicitly trained against.
Read the paper: https://sakana.ai/drq/
🤡2👍1🔥1
Media is too big
VIEW IN TELEGRAM
Raptors gliding through a cloud of helium-filled soap bubbles reveals wingtip and tail vortices.
Paper: High aerodynamic lift from the tail reduces drag in gliding raptors https://journals.biologists.com/jeb/article/223/3/jeb214809/223686/High-aerodynamic-lift-from-the-tail-reduces-drag
Paper: High aerodynamic lift from the tail reduces drag in gliding raptors https://journals.biologists.com/jeb/article/223/3/jeb214809/223686/High-aerodynamic-lift-from-the-tail-reduces-drag
👍2🔥2
Links for 2026-01-09
AI
1. ChatGPT for Healthcare: “Over the past two years, we’ve partnered with a global network of more than 260 licensed physicians across 60 countries of practice to evaluate model performance using real clinical scenarios. To date, this group has reviewed more than 600,000 model outputs spanning 30 areas of focus. Their continuous feedback has directly informed model training, safety mitigations, and product iteration. ChatGPT for Healthcare went through multiple rounds of physician-led red teaming to tune model behavior, trustworthy information retrieval, and other evaluations.” https://openai.com/index/openai-for-healthcare/
2. AI now predicts 130 diseases from 1 night of sleep https://www.nature.com/articles/s41591-025-04133-4
3. Scaling Open-Ended Reasoning To Predict the Future https://openforecaster.github.io/
4. Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space https://arxiv.org/abs/2512.24617
5. LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings https://arxiv.org/abs/2510.08338v3
6. Why LLMs Aren’t Scientists Yet. https://www.lesswrong.com/posts/y7TpjDtKFcJSGzunm/why-llms-aren-t-scientists-yet
7. Anthropic’s new update makes coding agents self-healing https://venturebeat.com/orchestration/claude-code-2-1-0-arrives-with-smoother-workflows-and-smarter-agents
8. Claude Code and What Comes Next https://www.oneusefulthing.org/p/claude-code-and-what-comes-next
9. An AI revolution in drugmaking is under way https://www.economist.com/science-and-technology/2026/01/05/an-ai-revolution-in-drugmaking-is-under-way [no paywall: https://archive.is/Si71E]
10. JPMorgan is cutting all ties with proxy advisory firms and replacing them with AI to help cast shareholder votes https://www.wsj.com/finance/banking/jpmorgan-cuts-all-ties-with-proxy-advisers-in-industry-first-78c43d5f [no paywall: https://archive.is/Ttc8z]
11. How Judges Are Using AI to Help Decide Your Legal Dispute https://www.wsj.com/tech/ai/how-ai-could-help-decide-your-next-legal-dispute-9cb12517 [no paywall: https://archive.is/nmelA]
12. Stack Overflow’s forum is dead thanks to AI, but the company’s still kicking... thanks to AI https://sherwood.news/tech/stack-overflow-forum-dead-thanks-ai-but-companys-still-kicking-ai/
13. First theoretical physics paper to credit an AI assistant https://arxiv.org/abs/2601.02484
14. Real poetry by AI https://gwern.net/fiction/lab-animals
Neuro(tech)
1. These Hearing Aids Will Tune in to Your Brain https://spectrum.ieee.org/hearing-aids-biosignals
2. If an event is more likely to occur at a certain point in time, the brain tracks the time until it occurs more precisely https://www.mpg.de/25980090/brain-estimates-probabilities-of-events
3. A brain-inspired approach to scientific computing https://newsreleases.sandia.gov/nature-inspired-computers-are-shockingly-good-at-math/
4. The End of Privacy: Tracking Technology is Everywhere Now https://www.youtube.com/watch?v=UYWjgceclS4
Miscellaneous
1. Explaining Cloud-9: A Celestial Object Like No Other https://www.centauri-dreams.org/2026/01/07/explaining-cloud-9-a-celestial-object-like-no-other/
2. A tiny number of hyper-prolific individuals are responsible for a massive percentage of public complaints, effectively "monopolizing" government resources and taxpayer money. https://marginalrevolution.com/marginalrevolution/2026/01/the-tyranny-of-the-complainers.html
3. In “Being Nicer than Clippy,” Joe Carlsmith argues that our approach to AI alignment should be guided by “niceness” (a specific human virtue of respecting the preferences and boundaries of others) rather than just a competitive “battle of the utility functions.” https://joecarlsmith.com/2024/01/16/being-nicer-than-clippy
AI
1. ChatGPT for Healthcare: “Over the past two years, we’ve partnered with a global network of more than 260 licensed physicians across 60 countries of practice to evaluate model performance using real clinical scenarios. To date, this group has reviewed more than 600,000 model outputs spanning 30 areas of focus. Their continuous feedback has directly informed model training, safety mitigations, and product iteration. ChatGPT for Healthcare went through multiple rounds of physician-led red teaming to tune model behavior, trustworthy information retrieval, and other evaluations.” https://openai.com/index/openai-for-healthcare/
2. AI now predicts 130 diseases from 1 night of sleep https://www.nature.com/articles/s41591-025-04133-4
3. Scaling Open-Ended Reasoning To Predict the Future https://openforecaster.github.io/
4. Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space https://arxiv.org/abs/2512.24617
5. LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings https://arxiv.org/abs/2510.08338v3
6. Why LLMs Aren’t Scientists Yet. https://www.lesswrong.com/posts/y7TpjDtKFcJSGzunm/why-llms-aren-t-scientists-yet
7. Anthropic’s new update makes coding agents self-healing https://venturebeat.com/orchestration/claude-code-2-1-0-arrives-with-smoother-workflows-and-smarter-agents
8. Claude Code and What Comes Next https://www.oneusefulthing.org/p/claude-code-and-what-comes-next
9. An AI revolution in drugmaking is under way https://www.economist.com/science-and-technology/2026/01/05/an-ai-revolution-in-drugmaking-is-under-way [no paywall: https://archive.is/Si71E]
10. JPMorgan is cutting all ties with proxy advisory firms and replacing them with AI to help cast shareholder votes https://www.wsj.com/finance/banking/jpmorgan-cuts-all-ties-with-proxy-advisers-in-industry-first-78c43d5f [no paywall: https://archive.is/Ttc8z]
11. How Judges Are Using AI to Help Decide Your Legal Dispute https://www.wsj.com/tech/ai/how-ai-could-help-decide-your-next-legal-dispute-9cb12517 [no paywall: https://archive.is/nmelA]
12. Stack Overflow’s forum is dead thanks to AI, but the company’s still kicking... thanks to AI https://sherwood.news/tech/stack-overflow-forum-dead-thanks-ai-but-companys-still-kicking-ai/
13. First theoretical physics paper to credit an AI assistant https://arxiv.org/abs/2601.02484
14. Real poetry by AI https://gwern.net/fiction/lab-animals
Neuro(tech)
1. These Hearing Aids Will Tune in to Your Brain https://spectrum.ieee.org/hearing-aids-biosignals
2. If an event is more likely to occur at a certain point in time, the brain tracks the time until it occurs more precisely https://www.mpg.de/25980090/brain-estimates-probabilities-of-events
3. A brain-inspired approach to scientific computing https://newsreleases.sandia.gov/nature-inspired-computers-are-shockingly-good-at-math/
4. The End of Privacy: Tracking Technology is Everywhere Now https://www.youtube.com/watch?v=UYWjgceclS4
Miscellaneous
1. Explaining Cloud-9: A Celestial Object Like No Other https://www.centauri-dreams.org/2026/01/07/explaining-cloud-9-a-celestial-object-like-no-other/
2. A tiny number of hyper-prolific individuals are responsible for a massive percentage of public complaints, effectively "monopolizing" government resources and taxpayer money. https://marginalrevolution.com/marginalrevolution/2026/01/the-tyranny-of-the-complainers.html
3. In “Being Nicer than Clippy,” Joe Carlsmith argues that our approach to AI alignment should be guided by “niceness” (a specific human virtue of respecting the preferences and boundaries of others) rather than just a competitive “battle of the utility functions.” https://joecarlsmith.com/2024/01/16/being-nicer-than-clippy
🤡2❤1👍1🔥1
Lots of people are now sold on this idea, including Musk, Bezos, and Sundar Pichai.
https://x.com/paulg/status/2009686627506065779
https://x.com/paulg/status/2009686627506065779
🤡6❤1😁1