Если вдруг вы пропустили, пара полезных ресурсов про Agentic AI
1. "Agentic AI" course by Andrew Ng
https://www.deeplearning.ai/courses/agentic-ai/
2. "Agentic Design Patterns" book by Antonio Gulli
https://docs.google.com/document/d/1rsaK53T3Lg5KoGwvf8ukOUvbELRtH-V0LnOIFDxBryE/preview?tab=t.0
1. "Agentic AI" course by Andrew Ng
https://www.deeplearning.ai/courses/agentic-ai/
2. "Agentic Design Patterns" book by Antonio Gulli
https://docs.google.com/document/d/1rsaK53T3Lg5KoGwvf8ukOUvbELRtH-V0LnOIFDxBryE/preview?tab=t.0
DeepLearning.AI - Learning Platform
Agentic AI
In this course taught by Andrew Ng, you'll build agentic AI systems that take action through iterative, multi-step workflows.
1❤16👍8🤡1
Mamba 3 анонимно проникает на ICLR 2026. Планирую разбор после TRM.
https://openreview.net/forum?id=HwCvaJOiCj
Mamba3 just silently dropped on ICLR🤯
A faster, longer-context, and more scalable LLM architecture than Transformers
A few years ago, some researchers started rethinking sequence modeling from a different angle. Instead of stacking more attention layers, they went back to an older idea : state-space models, systems that keep an internal state evolving over time. That became the foundation for Mamba.
The early versions were promising.
Mamba-1 used continuous-time dynamics with selective memory updates, so it could remember efficiently without the heavy cost of attention.
Mamba-2 went further and showed that state-space updates and attention are actually two sides of the same math, which made it run much faster on GPUs while keeping similar performance.
Now Mamba-3 feels like the design finally matured. It refines how the internal state evolves, how it remembers, and how it uses hardware. The main update lies in switching from a simple Euler step to a trapezoidal integration, which takes into account both the start and end of each time interval. That small change makes its memory smoother and more stable over long sequences. It also lets the hidden state move in the complex plane, which adds a kind of rhythmic, oscillating memory. Instead of just decaying over time, the model can now represent repeating or periodic patterns, the kind of structure language and music often have. And with a new multi-input-multi-output design, Mamba-3 can process several streams in parallel, making much better use of modern GPUs.
In practice, Mamba-3 opens up a lot of possibilities. Its ability to handle long sequences efficiently makes it a strong fit for tasks like long-document understanding, scientific time-series, or genome modeling: areas where Transformers struggle with context limits. Because it runs in linear time and keeps latency stable, it’s also well-suited for real-time applications like chat assistants, translation, and speech interfaces, where responsiveness matters more than raw scale. And its hardware-friendly design makes Mamba-3 could eventually power on-device or edge AI systems, running large models locally without depending on the cloud.
It’s the kind of architecture that quietly expands from large-context reasoning on servers to lightweight intelligence on everyday devices
https://x.com/JundeMorsenWu/status/1977664753011916859?t=xoorer9sscloa78ZjuvcsQ&s=19
https://openreview.net/forum?id=HwCvaJOiCj
Mamba3 just silently dropped on ICLR🤯
A faster, longer-context, and more scalable LLM architecture than Transformers
A few years ago, some researchers started rethinking sequence modeling from a different angle. Instead of stacking more attention layers, they went back to an older idea : state-space models, systems that keep an internal state evolving over time. That became the foundation for Mamba.
The early versions were promising.
Mamba-1 used continuous-time dynamics with selective memory updates, so it could remember efficiently without the heavy cost of attention.
Mamba-2 went further and showed that state-space updates and attention are actually two sides of the same math, which made it run much faster on GPUs while keeping similar performance.
Now Mamba-3 feels like the design finally matured. It refines how the internal state evolves, how it remembers, and how it uses hardware. The main update lies in switching from a simple Euler step to a trapezoidal integration, which takes into account both the start and end of each time interval. That small change makes its memory smoother and more stable over long sequences. It also lets the hidden state move in the complex plane, which adds a kind of rhythmic, oscillating memory. Instead of just decaying over time, the model can now represent repeating or periodic patterns, the kind of structure language and music often have. And with a new multi-input-multi-output design, Mamba-3 can process several streams in parallel, making much better use of modern GPUs.
In practice, Mamba-3 opens up a lot of possibilities. Its ability to handle long sequences efficiently makes it a strong fit for tasks like long-document understanding, scientific time-series, or genome modeling: areas where Transformers struggle with context limits. Because it runs in linear time and keeps latency stable, it’s also well-suited for real-time applications like chat assistants, translation, and speech interfaces, where responsiveness matters more than raw scale. And its hardware-friendly design makes Mamba-3 could eventually power on-device or edge AI systems, running large models locally without depending on the cloud.
It’s the kind of architecture that quietly expands from large-context reasoning on servers to lightweight intelligence on everyday devices
https://x.com/JundeMorsenWu/status/1977664753011916859?t=xoorer9sscloa78ZjuvcsQ&s=19
openreview.net
Mamba-3: Improved Sequence Modeling using State Space Principles
The recent scaling of test-time compute for LLMs has restricted the practical deployment of models to those with strong capabilities that can generate high-quality outputs in an inference-efficient...
❤34🔥10👍5
Больше хороших референсных имплементаций!
https://github.com/karpathy/nanochat
This repo is a full-stack implementation of an LLM like ChatGPT in a single, clean, minimal, hackable, dependency-lite codebase. nanochat is designed to run on a single 8XH100 node via noscripts like speedrun.sh, that run the entire pipeline start to end. This includes tokenization, pretraining, finetuning, evaluation, inference, and web serving over a simple UI so that you can talk to your own LLM just like ChatGPT. nanochat will become the capstone project of the course LLM101n being developed by Eureka Labs.
https://github.com/karpathy/nanochat
This repo is a full-stack implementation of an LLM like ChatGPT in a single, clean, minimal, hackable, dependency-lite codebase. nanochat is designed to run on a single 8XH100 node via noscripts like speedrun.sh, that run the entire pipeline start to end. This includes tokenization, pretraining, finetuning, evaluation, inference, and web serving over a simple UI so that you can talk to your own LLM just like ChatGPT. nanochat will become the capstone project of the course LLM101n being developed by Eureka Labs.
GitHub
GitHub - karpathy/nanochat: The best ChatGPT that $100 can buy.
The best ChatGPT that $100 can buy. Contribute to karpathy/nanochat development by creating an account on GitHub.
🔥47❤15👍6👀1
Ещё про интересные проекты.
В Linux Foundation был передан проект Newton, опенсорс физический движок для симуляций при обучении роботов:
https://github.com/newton-physics
Изначально совместный проект Disney Research, Google DeepMind и NVIDIA. Замена задепрекейченного NVIDIA Warp.
В Linux Foundation был передан проект Newton, опенсорс физический движок для симуляций при обучении роботов:
https://github.com/newton-physics
Изначально совместный проект Disney Research, Google DeepMind и NVIDIA. Замена задепрекейченного NVIDIA Warp.
GitHub
Newton Physics
GitHub organization of the Linux Foundation Newton project. - Newton Physics
🔥23👍7
A Definition of AGI подвезли!
https://www.agidefinition.ai/
The lack of a concrete definition for Artificial General Intelligence (AGI) obscures the gap between today’s specialized AI and human-level cognition. This paper introduces a quantifiable framework to address this, defining AGI as matching the cognitive versatility and proficiency of a well-educated adult. To operationalize this, we ground our methodology in Cattell-Horn-Carroll theory, the most empirically validated model of human cognition.
The framework dissects general intelligence into ten core cognitive domains—including reasoning, memory, and perception—and adapts established human psychometric batteries to evaluate AI systems. Application of this framework reveals a highly “jagged” cognitive profile in contemporary models. While proficient in knowledge-intensive domains, current AI systems have critical deficits in foundational cognitive machinery, particularly long-term memory storage.
The resulting AGI scores (e.g., GPT-4 at 27%, GPT-5 at 58%) concretely quantify both rapid progress and the substantial gap remaining before AGI.
https://www.agidefinition.ai/
The lack of a concrete definition for Artificial General Intelligence (AGI) obscures the gap between today’s specialized AI and human-level cognition. This paper introduces a quantifiable framework to address this, defining AGI as matching the cognitive versatility and proficiency of a well-educated adult. To operationalize this, we ground our methodology in Cattell-Horn-Carroll theory, the most empirically validated model of human cognition.
The framework dissects general intelligence into ten core cognitive domains—including reasoning, memory, and perception—and adapts established human psychometric batteries to evaluate AI systems. Application of this framework reveals a highly “jagged” cognitive profile in contemporary models. While proficient in knowledge-intensive domains, current AI systems have critical deficits in foundational cognitive machinery, particularly long-term memory storage.
The resulting AGI scores (e.g., GPT-4 at 27%, GPT-5 at 58%) concretely quantify both rapid progress and the substantial gap remaining before AGI.
www.agidefinition.ai
A Definition of AGI
A definition and measurement of Artificial General Intelligence (AGI)
😁15👍6🥴3❤2🔥2🤔1