OpenAI developed a new way to train small AI models with internal mechanisms that are easier for humans to understand.
Language models like the ones behind ChatGPT have complex, sometimes surprising structures, and we don’t yet fully understand how they work.
In a new research, team train “sparse” models—with fewer, simpler connections between neurons—to see whether their computations become easier to understand.
Language models like the ones behind ChatGPT have complex, sometimes surprising structures, and we don’t yet fully understand how they work.
In a new research, team train “sparse” models—with fewer, simpler connections between neurons—to see whether their computations become easier to understand.
Openai
Understanding neural networks through sparse circuits
We trained models to think in simpler, more traceable steps—so we can better understand how they work.
🔥3🥰3👍2
Efficient Self-Improving Agent Systems. AgentEvolver lets AI agents improve themselves instead of requiring manual prompt tuning.
They use three core mechanisms: self-questioning, self-navigating, and self-attributing.
Agents evaluate their own work, spot failures, and write better instructions for themselves.
This leads to a self-improvement loop capable of running without human oversight.
Shows better performance across benchmarks with less manual work.
The framework works by having agents evaluate their own performance on tasks, identify where they failed or underperformed, and then generate improved behavioral instructions for the next iteration.
The results are impressive.
Agents using this approach show measurable performance gains across diverse benchmarks compared to static configurations, all while reducing the overhead of constant manual optimization.
They use three core mechanisms: self-questioning, self-navigating, and self-attributing.
Agents evaluate their own work, spot failures, and write better instructions for themselves.
This leads to a self-improvement loop capable of running without human oversight.
Shows better performance across benchmarks with less manual work.
The framework works by having agents evaluate their own performance on tasks, identify where they failed or underperformed, and then generate improved behavioral instructions for the next iteration.
The results are impressive.
Agents using this approach show measurable performance gains across diverse benchmarks compared to static configurations, all while reducing the overhead of constant manual optimization.
arXiv.org
AgentEvolver: Towards Efficient Self-Evolving Agent System
Autonomous agents powered by large language models (LLMs) have the potential to significantly enhance human productivity by reasoning, using tools, and executing complex tasks in diverse...
❤3🔥3👏2
Google is working on multi-agent systems to help you refine ideas with tournament-like evaluation.
Each run takes around 40 minutes and brings you 100 detailed ideas on a given research topic.
2 new multi-agents are being developed for Gemini Enterprise:
- Idea Generation - "Create a multi-agent innovation session"
- Co-Scientist - "Drive novel scientific discovery with Co-Scientist"
Co-Scientist 3-step workflow 👀
- Tell Co-Scientist what you plan to research, point it to relevant data, and set your evaluation criteria.
- A team of agents will generate ideas on your topic using their available data
- The agents will evaluate the ideas against your criteria and rank them, tournament-style
Google is not only automating research but also preparing a product that will enable others to do so.
Each run takes around 40 minutes and brings you 100 detailed ideas on a given research topic.
2 new multi-agents are being developed for Gemini Enterprise:
- Idea Generation - "Create a multi-agent innovation session"
- Co-Scientist - "Drive novel scientific discovery with Co-Scientist"
Co-Scientist 3-step workflow 👀
- Tell Co-Scientist what you plan to research, point it to relevant data, and set your evaluation criteria.
- A team of agents will generate ideas on your topic using their available data
- The agents will evaluate the ideas against your criteria and rank them, tournament-style
Google is not only automating research but also preparing a product that will enable others to do so.
TestingCatalog
Google to enable research automation on Gemini Enterprise
Gemini Enterprise will get multi-agent tournament systems that can work as your co-scientist or co-researcher and help refine ideas.
❤7🔥3👏2
Android creator Andy Rubin is launching a new humanoid robotics startup, "Genki Robotics," in Tokyo.
The company is operating in stealth mode, tapping Japan's engineering talent to enter an already crowded field.
During his tenure at Google, Rubin spearheaded an ambitious robotics division, leading the acquisition of numerous startups in 2013, including the high-profile Japanese humanoid firm Shaft, a spin-off from the University of Tokyo.
His interest in legged locomotion, a core challenge in humanoid development, is well-documented. At a 2018 tech conference, Rubin, then leading the incubator Playground Global, predicted a future of "legs everywhere." He argued that legged systems are essential for navigating human-centric environments, such as climbing stairs or using elevators for "last-mile delivery"—tasks impossible for wheeled machines.
The company is operating in stealth mode, tapping Japan's engineering talent to enter an already crowded field.
During his tenure at Google, Rubin spearheaded an ambitious robotics division, leading the acquisition of numerous startups in 2013, including the high-profile Japanese humanoid firm Shaft, a spin-off from the University of Tokyo.
His interest in legged locomotion, a core challenge in humanoid development, is well-documented. At a 2018 tech conference, Rubin, then leading the incubator Playground Global, predicted a future of "legs everywhere." He argued that legged systems are essential for navigating human-centric environments, such as climbing stairs or using elevators for "last-mile delivery"—tasks impossible for wheeled machines.
Humanoids Daily
Android Founder Andy Rubin Reportedly Launching Humanoid Robotics Startup in Tokyo
Andy Rubin, known for creating Android and leading Google''s past robotics efforts, is reportedly operating ''Genki Robotics'' in stealth mode, tapping into Japan''s engineering talent.
🔥4👏4🥰3
MIT and Oxford released their $2,500 agentic AI curriculum on GitHub at no cost.
15,000 people already paid for it.
It covers patterns, orchestration, memory, coordination, and deployment.
A strong roadmap to production ready systems.
15,000 people already paid for it.
It covers patterns, orchestration, memory, coordination, and deployment.
A strong roadmap to production ready systems.
GitHub
awesome-generative-ai-guide/free_courses/agentic_ai_crash_course at main · aishwaryanr/awesome-generative-ai-guide
A one stop repository for generative AI research updates, interview resources, notebooks and much more! - aishwaryanr/awesome-generative-ai-guide
❤5👍4🔥4👏2
Google DeepMind introduced WeatherNext 2 is most advanced system yet, able to generate more accurate and higher-resolution global forecasts.
The model’s improved performance is enabled by a new approach called a Functional Generative Network, which can generate the full range of possible forecasts in a single step.
Team added targeted randomness directly into the architecture, allowing it to explore a wide range of sensible weather scenarios.
The model’s improved performance is enabled by a new approach called a Functional Generative Network, which can generate the full range of possible forecasts in a single step.
Team added targeted randomness directly into the architecture, allowing it to explore a wide range of sensible weather scenarios.
Google
WeatherNext 2: Our most advanced weather forecasting model
The new AI model delivers more efficient, more accurate and higher-resolution global weather predictions.
🔥5👏4❤3
MIT Introduced JiT (Just image Transformers)
JiTs are simple large-patch Transformers that operate on raw pixels, no tokenizer, pre-training, or extra losses needed.
By predicting clean data on the natural-data manifold, JiT excels in high-dimensional spaces where traditional noise-predicting models can fail.
On ImageNet (256 & 512), JiT achieves competitive generative performance, showing that sometimes going back to basics is the key.
GitHub.
JiTs are simple large-patch Transformers that operate on raw pixels, no tokenizer, pre-training, or extra losses needed.
By predicting clean data on the natural-data manifold, JiT excels in high-dimensional spaces where traditional noise-predicting models can fail.
On ImageNet (256 & 512), JiT achieves competitive generative performance, showing that sometimes going back to basics is the key.
GitHub.
GitHub
GitHub - LTH14/JiT: PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720 - LTH14/JiT
👏4❤3🔥3
Physical intelligence introduced a new model π*0.6
π*0.6 can more than double throughput over a base model trained without RL, and can perform real-world tasks: making espresso drinks, folding diverse laundry, and assembling boxes.
Team trained a general-purpose value function on all of own data, which tells the π*0.6 VLA which actions are good or bad. By asking π*0.6 to produce only good actions, researchers get better performance. Team call this method Recap.
π*0.6 can then collect more autonomous data, which can be used to further train the value function and further improve π*0.6.
During autonomous data collection, a teleoperator can also intervene and provide corrections for significant mistakes, coaching π*0.6 further.
Quantitatively, training π*0.6 with RL can more than double throughput (number of successful task executions per hour) on the hardest tasks and cut the number of failures by as much as a factor of two.
π*0.6 can more than double throughput over a base model trained without RL, and can perform real-world tasks: making espresso drinks, folding diverse laundry, and assembling boxes.
Team trained a general-purpose value function on all of own data, which tells the π*0.6 VLA which actions are good or bad. By asking π*0.6 to produce only good actions, researchers get better performance. Team call this method Recap.
π*0.6 can then collect more autonomous data, which can be used to further train the value function and further improve π*0.6.
During autonomous data collection, a teleoperator can also intervene and provide corrections for significant mistakes, coaching π*0.6 further.
Quantitatively, training π*0.6 with RL can more than double throughput (number of successful task executions per hour) on the hardest tasks and cut the number of failures by as much as a factor of two.
www.pi.website
A VLA that Learns from Experience
A method for training our generalist policies with RL to improve success rate and throughput on real-world tasks.
🔥5🥰3👏3
Google DeepMind just released Gemini 3 that helps you learn, build and plan anything.
It comes with state-of-the-art reasoning capabilities, world-leading multimodal understanding, and enables new agentic coding experiences.
It comes with state-of-the-art reasoning capabilities, world-leading multimodal understanding, and enables new agentic coding experiences.
🔥7🥰2👏2
PyTorch creator Soumith Chintala has joined Thinking Machines Lab.
Official exit from Meta: Nov 17. New gig at TML: Nov 18.
He says the people there are "incredible" and he is already back to "building new things." The AI talent war continues.
Official exit from Meta: Nov 17. New gig at TML: Nov 18.
He says the people there are "incredible" and he is already back to "building new things." The AI talent war continues.
Can LLMs really behave like human investors? How do micro-level behaviors drive macro-level market dynamics?
TwinMarket offers an answer by placing thousands of LLM-driven investors in a realistic stock market environment that incorporates social networks, news, and behavioral biases.
This setup lets us watch bubbles, crashes, and herding emerge from individual decisions.
Calibrated on real market data and grounded in behavioral finance, TwinMarket scales to 1,000+ agents, reproduces key stylized market facts (volatility clustering, fat tails, etc.), and reveals how social interaction and cognitive biases jointly drive systemic risk.
The work is accepted to NeurIPS 2025 and received the Best Paper Award at the ICLR 2025 Financial AI Workshop.
GitHub.
TwinMarket offers an answer by placing thousands of LLM-driven investors in a realistic stock market environment that incorporates social networks, news, and behavioral biases.
This setup lets us watch bubbles, crashes, and herding emerge from individual decisions.
Calibrated on real market data and grounded in behavioral finance, TwinMarket scales to 1,000+ agents, reproduces key stylized market facts (volatility clustering, fat tails, etc.), and reveals how social interaction and cognitive biases jointly drive systemic risk.
The work is accepted to NeurIPS 2025 and received the Best Paper Award at the ICLR 2025 Financial AI Workshop.
GitHub.
arXiv.org
TwinMarket: A Scalable Behavioral and Social Simulation for...
The study of social emergence has long been a central focus in social science. Traditional modeling approaches, such as rule-based Agent-Based Models (ABMs), struggle to capture the diversity and...
❤7🔥2👏2
Meta introduced a new generation of Segment Anything Models:
1. SAM 3 enables detecting, segmenting and tracking of objects across images and videos, now with short text phrases and exemplar prompts.
2. SAM 3D brings the model collection into the 3rd dimension to enable precise reconstruction of 3D objects and people from a single 2D image.
1. SAM 3 enables detecting, segmenting and tracking of objects across images and videos, now with short text phrases and exemplar prompts.
2. SAM 3D brings the model collection into the 3rd dimension to enable precise reconstruction of 3D objects and people from a single 2D image.
Meta AI
Introducing Meta Segment Anything Model 3 and Segment Anything Playground
Explore Segment Anything Model 3 and the new Segment Anything Playground, a place to experience the full capabilities of our most advanced SAM releases to date.
🔥2🥰2👏2
Elon Musk’s xAI Introduced Grok 4.1 Fast and the xAI Agent Tools API.
With a 2M context window, it shines in real-world use cases like customer support and deep research.
With a 2M context window, it shines in real-world use cases like customer support and deep research.
x.ai
Grok 4.1 Fast and Agent Tools API | xAI
Bringing the next generation of tool-calling agents to the xAI API
🆒6
#DeepSeek just released LPLB
Linear-Programming-Based Load Balancer (LPLB) is a parallel load balancer that leverages linear programming to optimize expert parallel workload distribution for MoE (Mixture-of-Experts) models.
Linear-Programming-Based Load Balancer (LPLB) is a parallel load balancer that leverages linear programming to optimize expert parallel workload distribution for MoE (Mixture-of-Experts) models.
GitHub
GitHub - deepseek-ai/LPLB: An early research stage expert-parallel load balancer for MoE models based on linear programming.
An early research stage expert-parallel load balancer for MoE models based on linear programming. - deepseek-ai/LPLB
🆒4🔥2🥰2👏2
Kimi dropped Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning.
Ethereum co-founder Vitalik Buterin warned that if BlackRock and other large institutions keep expanding their ETH holdings, Ethereum faces two risks:
1) decentralization-minded builders could be crowded out, weakening the community;
2) base-layer choices optimized for institutions (e.g., ~150 ms block times) could make it infeasible for typical users to run nodes, driving geographic/network centralization.
1) decentralization-minded builders could be crowded out, weakening the community;
2) base-layer choices optimized for institutions (e.g., ~150 ms block times) could make it infeasible for typical users to run nodes, driving geographic/network centralization.
DL News
Vitalik Buterin warns of two threats to Ethereum if BlackRock gets any bigger
Vitalik Buterin warned that BlackRock's influence could push Ethereum in the wrong direction. Institutional pressure could lead to technical choices that exclude ordinary users. Ethereum needs to focus on being permissionless and censorship-resistant, he…
🔥2🥰2👏2
Nabla announced JAM-2 — the first AI model capable of generating drug-quality antibodies straight from the computer, with industry-leading success rates.
> Drug-like affinities: Picomolar to single-digit nanomolar antibody binders for half of 26 targets while testing <45 designs each.
> Unlocking hard targets: Up to 11% success rate for direct on-cell GPCR binders; top antibody hits in the single-digit nanomolar range.
> Unprecedented epitope breadth: JAM-2 routinely designed antibodies that hit 30–70% of user-defined epitopes, now enabling intentional design of biology — not chance discovery.
> Drug-like developability: Over 50% of antibody designs passed core industry developability criteria with zero optimization.
> Massive leverage: A four-person team prosecuted 16 targets in parallel in < 1 month.
JAM-2 is the first de novo antibody design capability ready for front-line use in drug discovery, matching or surpassing traditional discovery approaches.
> Drug-like affinities: Picomolar to single-digit nanomolar antibody binders for half of 26 targets while testing <45 designs each.
> Unlocking hard targets: Up to 11% success rate for direct on-cell GPCR binders; top antibody hits in the single-digit nanomolar range.
> Unprecedented epitope breadth: JAM-2 routinely designed antibodies that hit 30–70% of user-defined epitopes, now enabling intentional design of biology — not chance discovery.
> Drug-like developability: Over 50% of antibody designs passed core industry developability criteria with zero optimization.
> Massive leverage: A four-person team prosecuted 16 targets in parallel in < 1 month.
JAM-2 is the first de novo antibody design capability ready for front-line use in drug discovery, matching or surpassing traditional discovery approaches.
❤6🔥5👏2
Physical Intelligence has raised $600m at a $5.6 bil valuation, w/ CapitalG leading
Earlier this week, Pi released a new RL model that it says can help robots do tasks like fold laundry, pack boxes more quickly
Earlier this week, Pi released a new RL model that it says can help robots do tasks like fold laundry, pack boxes more quickly
Bloomberg.com
Robotics Startup Physical Intelligence Valued at $5.6 Billion in New Funding
Physical Intelligence, a startup developing artificial intelligence software to help robots learn a wide range of tasks, has raised $600 million in a new round of funding that values the company at $5.6 billion, according to people with knowledge of the matter.Alphabet…
🔥2🥰2👏2
Ai2 presented Olmo 3, a fully open LM suite built for reasoning, chat, tool use, and an open model flow—not just the final weights, but the entire training journey.
At the center is Olmo 3-Think (32B)—a fully open 32B-scale reasoning model.
Olmo 3 opens the model flow – pretraining, mid-training, & post-training – plus data recipes & code so you can see how capabilities are built + customize any stage.
Meet the Olmo 3 family:
1. Olmo 3-Base (7B, 32B)—foundations for post-training with strong code, math, & reading comprehension skills
2. Olmo 3-Instruct (7B)—multi-turn chat + tool use
3. Olmo 3-Think (7B, 32B)—“thinking” models that show their reasoning.
All designed to run on hardware from laptops to research clusters.
At the center is Olmo 3-Think (32B)—a fully open 32B-scale reasoning model.
Olmo 3 opens the model flow – pretraining, mid-training, & post-training – plus data recipes & code so you can see how capabilities are built + customize any stage.
Meet the Olmo 3 family:
1. Olmo 3-Base (7B, 32B)—foundations for post-training with strong code, math, & reading comprehension skills
2. Olmo 3-Instruct (7B)—multi-turn chat + tool use
3. Olmo 3-Think (7B, 32B)—“thinking” models that show their reasoning.
All designed to run on hardware from laptops to research clusters.
allenai.org
Olmo 3: Charting a path through the model flow to lead open-source AI | Ai2
Our new flagship Olmo 3 model family empowers the open source community with not only state-of-the-art open models, but the entire model flow and full traceability back to training data.
❤5🔥4👏2
Sakana AI co-founded by Llion Jones (one of the original 8 inventors of Transformers at Google), now argues it's time to move past them.
Their NeurIPS 2025 spotlight paper, “Continuous Thought Machines”, outlines a possible next step.
Their NeurIPS 2025 spotlight paper, “Continuous Thought Machines”, outlines a possible next step.
YouTube
He Co-Invented the Transformer. Now: Continuous Thought Machines [Llion Jones / Luke Darlow]
The Transformer architecture (which powers ChatGPT and nearly all modern AI) might be trapping the industry in a localized rut, preventing us from finding true intelligent reasoning, according to the person who co-invented it. Llion Jones and Luke Darlow…
⚡4🔥2👏2🥰1
Anthropic is working on a new Skill creation flow where Claude itself can create the Skill for you.
Other things to expect this week:
- Opus 4.5 is rumoured to be launched today (not confirmed).
- Claude code desktop app?
- New referral program
- Megabrain?
Other things to expect this week:
- Opus 4.5 is rumoured to be launched today (not confirmed).
- Claude code desktop app?
- New referral program
- Megabrain?