Anthropic released Bloom, an open-source tool for generating behavioral misalignment evals for frontier AI models.
Bloom lets researchers specify a behavior and then quantify its frequency and severity across automatically generated scenarios.
Bloom lets researchers specify a behavior and then quantify its frequency and severity across automatically generated scenarios.
Anthropic
Introducing Bloom: an open source tool for automated behavioral evaluations
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
❤2🔥2👏2
Google introduced A2UI: Agent-to-User Interface
- Protocol for agent-driven interfaces
- Enables agents to generate interactive user interfaces
- Open source
- Protocol for agent-driven interfaces
- Enables agents to generate interactive user interfaces
- Open source
GitHub
GitHub - google/A2UI
Contribute to google/A2UI development by creating an account on GitHub.
👏3🔥2🥰2
Researchers from U. Michigan, NYU, Princeton & U. Virginia presented Next-Embedding Prediction (NEPA).
Instead of reconstructing pixels, the model learns by predicting the next "embedding" in a visual sequence.
It outperforms complex methods, hitting 85.3% accuracy on ImageNet and excelling at segmentation, all with a simple, scalable approach.
GitHub.
Instead of reconstructing pixels, the model learns by predicting the next "embedding" in a visual sequence.
It outperforms complex methods, hitting 85.3% accuracy on ImageNet and excelling at segmentation, all with a simple, scalable approach.
GitHub.
GitHub
GitHub - SihanXU/nepa: PyTorch implementation of NEPA
PyTorch implementation of NEPA. Contribute to SihanXU/nepa development by creating an account on GitHub.
❤2🔥2👏2
Stripe Atlas 2025 Startups: Year in Review – Key Insights
Into Stripe's latest report on startups via their Atlas platform, and it's packed with exciting trends for 2025. Here's a breakdown of the essential highlights:
1. Explosive Growth in Registrations: Startup formations surged 36% YoY. Europe led the charge with a whopping 48% increase, likely due to easier US incorporation amid local red tape. Time for EU reforms?
2. Global Teams on the Rise: Multi-national founder teams are up 79% since 2017, thanks to remote work magic. Borders are blurring – talent knows no limits.
3. Faster Monetization Than Ever: New startups hit revenue milestones quicker: Median revenue in the first 6 months jumped 39% YoY. 20% snagged their first customer within 30 days, and top performers reached $100K revenue 11% faster (around 108 days). AI and stablecoins are supercharging this.
4. Polarization in Success: While averages are up, the top 1% grew revenue 67% faster. Shoutout to rockstars like Cursor AI and Lovable for insane traction. Winners are winning bigger.
This report shows 2025 as a year of acceleration in the startup world – more companies, quicker cash, and global vibes. If you're building something, Stripe Atlas is making it easier for founders worldwide.
Into Stripe's latest report on startups via their Atlas platform, and it's packed with exciting trends for 2025. Here's a breakdown of the essential highlights:
1. Explosive Growth in Registrations: Startup formations surged 36% YoY. Europe led the charge with a whopping 48% increase, likely due to easier US incorporation amid local red tape. Time for EU reforms?
2. Global Teams on the Rise: Multi-national founder teams are up 79% since 2017, thanks to remote work magic. Borders are blurring – talent knows no limits.
3. Faster Monetization Than Ever: New startups hit revenue milestones quicker: Median revenue in the first 6 months jumped 39% YoY. 20% snagged their first customer within 30 days, and top performers reached $100K revenue 11% faster (around 108 days). AI and stablecoins are supercharging this.
4. Polarization in Success: While averages are up, the top 1% grew revenue 67% faster. Shoutout to rockstars like Cursor AI and Lovable for insane traction. Winners are winning bigger.
This report shows 2025 as a year of acceleration in the startup world – more companies, quicker cash, and global vibes. If you're building something, Stripe Atlas is making it easier for founders worldwide.
Stripe
Stripe Atlas startups in 2025: Year in review
2025 was a breakout year for early-stage startups, as founders generated revenue faster than ever. Three shifts stand out: customer bases are becoming more global, time-to-revenue has compressed, and founders are turning their focus to AI agents.
🔥5❤2👏2
First large-scale empirical study of how developers actually use AI agent frameworks.
Over 100 open-source agent frameworks have emerged on GitHub, collectively accumulating 400,000+ stars and 70,000+ forks. But 80% of developers report difficulties identifying which frameworks best meet their needs.
Researchers analyzed 1,575 agent projects and 11,910 developer discussions across ten major frameworks, including LangChain, AutoGen, CrewAI, and MetaGPT.
Here are the findings:
96% of top-starred projects use multiple frameworks together. Single-framework solutions no longer meet the complex demands of real-world agent applications.
The dominant patterns: orchestration + data frameworks (LangChain + LlamaIndex) and multi-agent + orchestration combinations (AutoGen + LangChain).
Not surprisingly, GitHub stars don't predict real-world adoption.
Over 100 open-source agent frameworks have emerged on GitHub, collectively accumulating 400,000+ stars and 70,000+ forks. But 80% of developers report difficulties identifying which frameworks best meet their needs.
Researchers analyzed 1,575 agent projects and 11,910 developer discussions across ten major frameworks, including LangChain, AutoGen, CrewAI, and MetaGPT.
Here are the findings:
96% of top-starred projects use multiple frameworks together. Single-framework solutions no longer meet the complex demands of real-world agent applications.
The dominant patterns: orchestration + data frameworks (LangChain + LlamaIndex) and multi-agent + orchestration combinations (AutoGen + LangChain).
Not surprisingly, GitHub stars don't predict real-world adoption.
arXiv.org
An Empirical Study of Agent Developer Practices in AI Agent Frameworks
The rise of large language models (LLMs) has sparked a surge of interest in agents, leading to the rapid growth of agent frameworks. Agent frameworks are software toolkits and libraries that...
🔥5👏5🥰2
NVIDIA launched ALCHEMI Toolkit-Ops to accelerate chemistry and materials science simulations using machine learning interatomic potentials (MLIPs).
ALCHEMI combines the accuracy of quantum chemistry methods with the scalability of AI to enable large-scale atomistic simulations that were previously impractical, helping run faster, more accurate simulations for materials discovery and molecular modeling.
ALCHEMI combines the accuracy of quantum chemistry methods with the scalability of AI to enable large-scale atomistic simulations that were previously impractical, helping run faster, more accurate simulations for materials discovery and molecular modeling.
NVIDIA Technical Blog
Accelerating AI-Powered Chemistry and Materials Science Simulations with NVIDIA ALCHEMI Toolkit-Ops
Machine learning interatomic potentials (MLIPs) are transforming the landscape of computational chemistry and materials science. MLIPs enable atomistic simulations that combine the fidelity of…
💯4🔥2🥰2
Artemis published an empirical analysis of stablecoin payment usage on Ethereum
The core insight: volume belongs to business.
P2P transactions account for 67% of all payments by count — but only 24% by volume. The remaining 76% of dollar volume flows through business-involved payments: B2B, B2P, and internal corporate transfers.
This isn't surprising if you think about it. Retail users send $50-500 transfers. Institutions move millions. But it does reframe the narrative: stablecoins are currently infrastructure for large players, not yet a mass retail payment rail.
Top 1,000 wallets account for 84% of total transfer volume. Despite all the decentralization rhetoric, stablecoin activity is heavily concentrated among exchanges, market makers, and corporate treasuries.
Payment activity consistently drops on weekends — a pattern typical for business operations, not peer-to-peer retail transfers. This further supports the institutional dominance thesis.
There's a notable spike in transactions below $0.10 — which makes no economic sense given Ethereum gas fees. Researchers flag this as likely bot activity and wash trading. Any serious analysis needs to filter this noise.
The report acknowledges a major limitation: fiat-to-stablecoin-to-fiat flows through intermediaries aren't captured on-chain. Payment processors bundle transactions, so the true P2P payment volume may be higher than what blockchain data shows.
What this means?
Stablecoins have found product-market fit — but primarily as B2B settlement infrastructure. The retail payment use case is growing (transaction counts doubled year-over-year) but remains a fraction of total value moved.
The open question: will stablecoins evolve into a true P2P payment layer, or will they remain primarily institutional plumbing?
The data suggests we're still in the infrastructure phase.
The core insight: volume belongs to business.
P2P transactions account for 67% of all payments by count — but only 24% by volume. The remaining 76% of dollar volume flows through business-involved payments: B2B, B2P, and internal corporate transfers.
This isn't surprising if you think about it. Retail users send $50-500 transfers. Institutions move millions. But it does reframe the narrative: stablecoins are currently infrastructure for large players, not yet a mass retail payment rail.
Top 1,000 wallets account for 84% of total transfer volume. Despite all the decentralization rhetoric, stablecoin activity is heavily concentrated among exchanges, market makers, and corporate treasuries.
Payment activity consistently drops on weekends — a pattern typical for business operations, not peer-to-peer retail transfers. This further supports the institutional dominance thesis.
There's a notable spike in transactions below $0.10 — which makes no economic sense given Ethereum gas fees. Researchers flag this as likely bot activity and wash trading. Any serious analysis needs to filter this noise.
The report acknowledges a major limitation: fiat-to-stablecoin-to-fiat flows through intermediaries aren't captured on-chain. Payment processors bundle transactions, so the true P2P payment volume may be higher than what blockchain data shows.
What this means?
Stablecoins have found product-market fit — but primarily as B2B settlement infrastructure. The retail payment use case is growing (transaction counts doubled year-over-year) but remains a fraction of total value moved.
The open question: will stablecoins evolve into a true P2P payment layer, or will they remain primarily institutional plumbing?
The data suggests we're still in the infrastructure phase.
🔥4❤2👏2
Physical Intelligence demonstrated a series of humanoid robot tasks like making peanut butter sandwiches, cleaning windows, peeling oranges, and washing pans modeled after Benjie Holson’s “Robot Olympics.”
Using their fine-tuned model π0.6, the robots autonomously tackled high-dexterity challenges that highlight Moravec’s Paradox: tasks humans find trivial are still incredibly hard for machines.
Their results show that fine-tuning large embodied models is essential training from scratch failed on all tasks.
Using their fine-tuned model π0.6, the robots autonomously tackled high-dexterity challenges that highlight Moravec’s Paradox: tasks humans find trivial are still incredibly hard for machines.
Their results show that fine-tuning large embodied models is essential training from scratch failed on all tasks.
www.pi.website
Moravec's Paradox and the Robot Olympics
By fine-tuning our latest model, we were able to solve a series of very difficult manipulation challenge tasks.
🔥2👏2💯2
Google DeepMind just released DeepSearchQA
A 900-prompt benchmark that evaluates AI agents on complex, multi-step web research tasks across 17 fields.
A 900-prompt benchmark that evaluates AI agents on complex, multi-step web research tasks across 17 fields.
Nvidia will acquire assets and key talent from chipmaking startup Groq for $20B
Groq co-founder and CEO Jonathan Ross was lead designer and artchitect for the first generation of Google’s TPU chips. He’ll join Nvidia along with president Sunny Madra and other top executives.
This Nvidia’s largest deal ever, topping its $7B acquisition of data centre networking firm Mellanox in 2019
Groq co-founder and CEO Jonathan Ross was lead designer and artchitect for the first generation of Google’s TPU chips. He’ll join Nvidia along with president Sunny Madra and other top executives.
This Nvidia’s largest deal ever, topping its $7B acquisition of data centre networking firm Mellanox in 2019
Groq
Groq and Nvidia Enter Non-Exclusive Inference Technology Licensing Agreement to Accelerate AI Inference at Global Scale
The Groq LPU delivers inference with the speed and cost developers need.
❤4🔥4👏2
This paper is a big deal. A new research introduces Agent-R1, a framework for training LLM agents with end-to-end reinforcement learning across multi-turn interactions.
As agents move from predefined workflows to autonomous interaction, end-to-end RL becomes the natural training paradigm. Agent-R1 provides a modular foundation for scaling RL to complex, tool-using LLM agents.
Standard RL for LLMs assumes deterministic state transitions. You generate a token, append it to the sequence, done. But agents trigger external tools with uncertain outcomes. The environment responds unpredictably. State transitions become stochastic.
Therefore, the researchers extend the Markov Decision Process framework to capture this. State space expands to include full interaction history and environmental feedback. Actions can trigger external tools, not just generate text. Rewards become dense, with process rewards for intermediate steps alongside final outcome rewards.
Two core mechanisms make this work. An Action Mask distinguishes agent-generated tokens from environmental feedback, ensuring credit assignment targets only the agent's actual decisions. A ToolEnv module manages the interaction loop, handling state transitions and reward calculation when tools are invoked.
GitHub.
As agents move from predefined workflows to autonomous interaction, end-to-end RL becomes the natural training paradigm. Agent-R1 provides a modular foundation for scaling RL to complex, tool-using LLM agents.
Standard RL for LLMs assumes deterministic state transitions. You generate a token, append it to the sequence, done. But agents trigger external tools with uncertain outcomes. The environment responds unpredictably. State transitions become stochastic.
Therefore, the researchers extend the Markov Decision Process framework to capture this. State space expands to include full interaction history and environmental feedback. Actions can trigger external tools, not just generate text. Rewards become dense, with process rewards for intermediate steps alongside final outcome rewards.
Two core mechanisms make this work. An Action Mask distinguishes agent-generated tokens from environmental feedback, ensuring credit assignment targets only the agent's actual decisions. A ToolEnv module manages the interaction loop, handling state transitions and reward calculation when tools are invoked.
GitHub.
arXiv.org
Agent-R1: Training Powerful LLM Agents with End-to-End...
Large Language Models (LLMs) are increasingly being explored for building Agents capable of active environmental interaction (e.g., via tool use) to solve complex problems. Reinforcement Learning...
🔥5❤4💯3
LeCun's JEPA has evolved into a vision-language model, with 1.6B parameters rivaling the 72B Qwen-VL.
Instead of predicting words directly, the proposed VL-JEPA learns to predict the core "meaning" of a text in an abstract space, ignoring surface-level wording variations.
This method outperforms standard token-based training with 50% fewer parameters. It beats models like CLIP & SigLIP2 on video classification/retrieval tasks and matches larger VLMs on VQA, while using a decoder only when needed to cut decoding ops by nearly 3x.
Instead of predicting words directly, the proposed VL-JEPA learns to predict the core "meaning" of a text in an abstract space, ignoring surface-level wording variations.
This method outperforms standard token-based training with 50% fewer parameters. It beats models like CLIP & SigLIP2 on video classification/retrieval tasks and matches larger VLMs on VQA, while using a decoder only when needed to cut decoding ops by nearly 3x.
arXiv.org
VL-JEPA: Joint Embedding Predictive Architecture for Vision-language
We introduce VL-JEPA, a vision-language model built on a Joint Embedding Predictive Architecture (JEPA). Instead of autoregressively generating tokens as in classical VLMs, VL-JEPA predicts...
🔥5❤3👏2
Meta introduced software agents can self-improve via self-play RL
Self-play SWE-RL (SSR): training a single LLM agent to self-play between bug-injection and bug-repair, grounded in real-world repositories, no human-labeled issues or tests.
Bug-injection: the agent creates a standard suite of bug artifacts, further validated for consistency.
Key steps:
1) original tests must pass,
2) tests fail after applying the bug-injection patch,
3) weakened tests should pass
Self-play SWE-RL (SSR): training a single LLM agent to self-play between bug-injection and bug-repair, grounded in real-world repositories, no human-labeled issues or tests.
Bug-injection: the agent creates a standard suite of bug artifacts, further validated for consistency.
Key steps:
1) original tests must pass,
2) tests fail after applying the bug-injection patch,
3) weakened tests should pass
arXiv.org
Toward Training Superintelligent Software Agents through Self-Play SWE-RL
While current software agents powered by large language models (LLMs) and agentic reinforcement learning (RL) can boost programmer productivity, their training data (e.g., GitHub issues and pull...
🔥5❤3👏2
Memory in the Age of AI Agents
This 102-page survey introduces a unified framework for understanding agent memory through three lenses: Forms, Functions, and Dynamics.
This 102-page survey introduces a unified framework for understanding agent memory through three lenses: Forms, Functions, and Dynamics.
arXiv.org
Memory in the Age of AI Agents
Memory has emerged, and will continue to remain, a core capability of foundation model-based agents. As research on agent memory rapidly expands and attracts unprecedented attention, the field has...
🆒5❤2🔥2🥰2
2026_digital_assets__1767020549.pdf
1.2 MB
Grayscale shipped Digital Asset Outlook 2026.
Key Takeaways:
1. 2026 to accelerate structural shifts in digital asset investing, which have been underpinned by two major themes: macro demand for alternative stores of value and improved regulatory clarity. Together, these trends should bring in new capital, broaden adoption (especially among advised wealth and institutional investors), and bridge public blockchains more fully into mainstream financial infrastructure.
2. as a result as expected that valuations will rise in 2026 and the end of the so-called “four-year cycle,” or the theory that crypto market direction follows a recurring four-year pattern.
3. Bitcoin’s price will likely reach a new all-time high in the first half of the year.
4. Grayscale expects bipartisan crypto market structure legislation to become U.S. law in 2026. This will bring deeper integration between public blockchains and traditional finance, facilitate regulated trading of digital asset securities, and potentially allow for on-chain issuance by both startups and mature firms.
5. The outlook for fiat currencies is increasingly uncertain; in contrast, we can be highly confident that the 20 millionth Bitcoin will be mined in March 2026. Digital money systems like Bitcoin and Ethereum that offer transparent, programmatic, and ultimately scarce supply will be in rising demand, in our view, due to rising fiat currency risks.
6. More crypto assets to be available through exchange-traded products in 2026. These vehicles have had a successful start, but many platforms are still conducting due diligence and working to incorporate crypto into their asset-allocation process. As this process matures, look for more slow-moving institutional capital to arrive throughout 2026.
The report also outlines Top 10 Crypto Investing Themes for 2026, reflecting the breadth of use cases emerging across public blockchain technology.
- Dollar Debasement Risk Drives Demand for Monetary Alternatives
- Regulatory Clarity Supporting Adoption of Digital Assets
- Reach of Stablecoins to Grow in Wake of GENIUS Act
- Asset Tokenization at Inflection Point
- Privacy Solutions Needed as Blockchain Tech Goes Mainstream
- AI Centralization Calls for Blockchain Solutions
- DeFi Accelerates, Led by Lending
- Mainstream Adoption Will Demand Next-Generation Infrastructure
- A Focus on Sustainable Revenue
- Investors Seek Out Staking by Default
Finally, two topics that the report does not expect to influence crypto markets in 2026:
1. Quantum computing: We believe that research and preparedness will continue on post-quantum cryptography, but this issue is unlikely to affect valuations in the next year.
2. Digital asset treasuries: Despite their media attention, it is believed that DATs will not be a major swing factor for digital asset markets in 2026.
Key Takeaways:
1. 2026 to accelerate structural shifts in digital asset investing, which have been underpinned by two major themes: macro demand for alternative stores of value and improved regulatory clarity. Together, these trends should bring in new capital, broaden adoption (especially among advised wealth and institutional investors), and bridge public blockchains more fully into mainstream financial infrastructure.
2. as a result as expected that valuations will rise in 2026 and the end of the so-called “four-year cycle,” or the theory that crypto market direction follows a recurring four-year pattern.
3. Bitcoin’s price will likely reach a new all-time high in the first half of the year.
4. Grayscale expects bipartisan crypto market structure legislation to become U.S. law in 2026. This will bring deeper integration between public blockchains and traditional finance, facilitate regulated trading of digital asset securities, and potentially allow for on-chain issuance by both startups and mature firms.
5. The outlook for fiat currencies is increasingly uncertain; in contrast, we can be highly confident that the 20 millionth Bitcoin will be mined in March 2026. Digital money systems like Bitcoin and Ethereum that offer transparent, programmatic, and ultimately scarce supply will be in rising demand, in our view, due to rising fiat currency risks.
6. More crypto assets to be available through exchange-traded products in 2026. These vehicles have had a successful start, but many platforms are still conducting due diligence and working to incorporate crypto into their asset-allocation process. As this process matures, look for more slow-moving institutional capital to arrive throughout 2026.
The report also outlines Top 10 Crypto Investing Themes for 2026, reflecting the breadth of use cases emerging across public blockchain technology.
- Dollar Debasement Risk Drives Demand for Monetary Alternatives
- Regulatory Clarity Supporting Adoption of Digital Assets
- Reach of Stablecoins to Grow in Wake of GENIUS Act
- Asset Tokenization at Inflection Point
- Privacy Solutions Needed as Blockchain Tech Goes Mainstream
- AI Centralization Calls for Blockchain Solutions
- DeFi Accelerates, Led by Lending
- Mainstream Adoption Will Demand Next-Generation Infrastructure
- A Focus on Sustainable Revenue
- Investors Seek Out Staking by Default
Finally, two topics that the report does not expect to influence crypto markets in 2026:
1. Quantum computing: We believe that research and preparedness will continue on post-quantum cryptography, but this issue is unlikely to affect valuations in the next year.
2. Digital asset treasuries: Despite their media attention, it is believed that DATs will not be a major swing factor for digital asset markets in 2026.
🔥5💯4🥰2
Meta acquired Manus for $2–4B.
Manus hit $100M ARR just 8 months after launch. Fastest startup did this ever. It has no proprietary model. People call it an “AI wrapper.” Same critique was used on Cursor.
Manus runs on Claude with its custom tools built for orchestration and grounding. Their agentic environment enables the agents to browse, write code, manipulate files, and execute multi-step workflows without human in the loop.
They also beat OpenAI on GAIA. An interesting thing here is that they didn't build a foundation model. They built the most compatible environment for models to reason and act within.
Manus hit $100M ARR just 8 months after launch. Fastest startup did this ever. It has no proprietary model. People call it an “AI wrapper.” Same critique was used on Cursor.
Manus runs on Claude with its custom tools built for orchestration and grounding. Their agentic environment enables the agents to browse, write code, manipulate files, and execute multi-step workflows without human in the loop.
They also beat OpenAI on GAIA. An interesting thing here is that they didn't build a foundation model. They built the most compatible environment for models to reason and act within.
manus.im
Manus Joins Meta for Next Era of Innovation
Manus is joining Meta, and we’ll continue delivering our current services while accelerating product improvements to bring more powerful, reliable general AI agent capabilities to more users and businesses.
All about AI, Web 3.0, BCI
Tongyi Lab dropped half a dozen new papers, most focused on Deep Research agents. 1. Tongyi DeepResearch: Open-source DeepResearch Agent • First OSS web agent matching OpenAI’s DeepResearch • SOTA on HLE (32.9), BrowseComp (43.4/46.7), xbench-DeepSearch…
Tongyi released MAI-UI a family of foundation GUI agents.
It natively integrates MCP tool use, agent user interaction, device–cloud collaboration, and online RL, establishing SOTA results in general GUI grounding and mobile GUI navigation, surpassing Gemini-2.5-Pro, Seed1.8, and UI-Tars-2 on AndroidWorld.
MAI-UI includes a full-spectrum of sizes, including 2B, 8B, 32B and 235B-A22B variants.
MobileWorld benchmark.
It natively integrates MCP tool use, agent user interaction, device–cloud collaboration, and online RL, establishing SOTA results in general GUI grounding and mobile GUI navigation, surpassing Gemini-2.5-Pro, Seed1.8, and UI-Tars-2 on AndroidWorld.
MAI-UI includes a full-spectrum of sizes, including 2B, 8B, 32B and 235B-A22B variants.
MobileWorld benchmark.
GitHub
GitHub - Tongyi-MAI/MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments
Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments - Tongyi-MAI/MobileWorld
👍1
Happy new year folks! Wishing everyone a bright and inspiring new year 🎉
May 2026 be a year of bold ideas.
Let’s keep building, exploring, and pushing the boundaries together.
Happy New Year from @alwebbci 🚀
May 2026 be a year of bold ideas.
Let’s keep building, exploring, and pushing the boundaries together.
Happy New Year from @alwebbci 🚀
🆒4
So the first major paper of 2026, #DeepSeek mHC: Manifold-Constrained Hyper-Connections
This is actually an engineering paper, taking as a starting points ideas already exposed in the original Hyper-Connections (HC) paper from ByteDance, which is consequently a prerequisite for reading. So initial notes on this first.
DeepSeek paper starts almost in media res and first underlines a major success of HC original approach: increase in math/topological complexity did not result in computational overhead.
Overall the actual flex of the paper is not so much proving Hyper-Connections can work at scale.
It’s: we have the internal capacity to re-engineer the complete training environment at all dimensions (kernels, memory management, inter-node communication) around highly experimental research ideas.
This is actually an engineering paper, taking as a starting points ideas already exposed in the original Hyper-Connections (HC) paper from ByteDance, which is consequently a prerequisite for reading. So initial notes on this first.
DeepSeek paper starts almost in media res and first underlines a major success of HC original approach: increase in math/topological complexity did not result in computational overhead.
Overall the actual flex of the paper is not so much proving Hyper-Connections can work at scale.
It’s: we have the internal capacity to re-engineer the complete training environment at all dimensions (kernels, memory management, inter-node communication) around highly experimental research ideas.
arXiv.org
mHC: Manifold-Constrained Hyper-Connections
Recently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and...
❤6