❤5😁2
Databricks released Agent Bricks, a new product that helps enterprises develop SOTA domain-specific agents.
Agent Learning from Human Feedback (ALHF) is a new paradigm where agents learn directly from minimal natural language feedback, not just labels or numeric rewards.
Agent Learning from Human Feedback (ALHF) is a new paradigm where agents learn directly from minimal natural language feedback, not just labels or numeric rewards.
🔥4👍1🥰1
Google released a brilliant research, a new active learning method for curating high-quality data that reduces training data requirements for fine-tuning LLMs by orders of magnitude.
You get the same or better model quality with 250 to 450 expert labels instead of 100K, thanks to focusing expert effort only on confusing boundary cases.
The team uses an LLM as a scout to sweep a huge pool of ads, then asks experts to judge only the handful of examples that truly confuse the model.
Those expert decisions train the next model iteration, and the loop repeats until model–expert agreement stops improving. It is classic active learning, but adapted to large, noisy, imbalanced traffic like ads, where only about 1% of items are actually problematic.
You get the same or better model quality with 250 to 450 expert labels instead of 100K, thanks to focusing expert effort only on confusing boundary cases.
The team uses an LLM as a scout to sweep a huge pool of ads, then asks experts to judge only the handful of examples that truly confuse the model.
Those expert decisions train the next model iteration, and the loop repeats until model–expert agreement stops improving. It is classic active learning, but adapted to large, noisy, imbalanced traffic like ads, where only about 1% of items are actually problematic.
research.google
Achieving 10,000x training data reduction with high-fidelity labels
🔥6❤3🥰1🤡1
Big AI security issue: ChatGPT vulnerability in Connectors can leak your API keys + "memory" with 0 clicks
AgentFlayer, demoed at BlackHat, shows an injected prompt in a doc can force an image to render that exfiltrates data through a malicious URL.
AgentFlayer, demoed at BlackHat, shows an injected prompt in a doc can force an image to render that exfiltrates data through a malicious URL.
Zenity Labs
AgentFlayer: ChatGPT Connectors 0click Attack
👍3🆒3🔥2💯2
Tencent AI Lab introduced R-Zero. This framework enabling LLMs to self-evolve their reasoning capabilities
From zero human-curated data, through an autonomous Challenger-Solver loop.
R-Zero learns from scratch, with the Challenger proposing tasks at the edge of the Solver's ability.
This co-evolution boosts Qwen3-4B-Base by +6.49 on math and +7.54 on general reasoning.
GitHub.
From zero human-curated data, through an autonomous Challenger-Solver loop.
R-Zero learns from scratch, with the Challenger proposing tasks at the edge of the Solver's ability.
This co-evolution boosts Qwen3-4B-Base by +6.49 on math and +7.54 on general reasoning.
GitHub.
huggingface.co
Paper page - R-Zero: Self-Evolving Reasoning LLM from Zero Data
Join the discussion on this paper page
Renmin University of China and Huawei presented comprehensive survey of memory mechanisms in LLM-based agents:
• What memory is & why it matters
• How to design & evaluate it
• Key applications & use cases
• Limitations & future directions
A roadmap for building smarter, longer-lived AI agents.
GitHub.
• What memory is & why it matters
• How to design & evaluate it
• Key applications & use cases
• Limitations & future directions
A roadmap for building smarter, longer-lived AI agents.
GitHub.
arXiv.org
A Survey on the Memory Mechanism of Large Language Model based Agents
Large language model (LLM) based agents have recently attracted much attention from the research and industry communities. Compared with original LLMs, LLM-based agents are featured in their...
🤔1
The top AI agents by revenue @alwebbci
The AI agent market expected to 2x+ this year ($5B to $13B). 50% of top 20 were founded in the last 3 years.
Customer service AI agents command 127x revenue multiples vs. 52x average.
The AI agent market expected to 2x+ this year ($5B to $13B). 50% of top 20 were founded in the last 3 years.
Customer service AI agents command 127x revenue multiples vs. 52x average.
🔥6👍2
Alibaba introduced Memp, a new framework that gives LLM agents learnable, updatable procedural memory.
This leads to steadily higher success rates and greater efficiency on complex tasks.
Memp distills past agent trajectories into both fine-grained instructions and high-level abstractions, continuously improving with new experience. It's even transferable to weaker models.
This leads to steadily higher success rates and greater efficiency on complex tasks.
Memp distills past agent trajectories into both fine-grained instructions and high-level abstractions, continuously improving with new experience. It's even transferable to weaker models.
huggingface.co
Paper page - Memp: Exploring Agent Procedural Memory
Join the discussion on this paper page
Circle has announced the launch of Arc, an open Layer-1 blockchain designed to provide enterprise-grade infrastructure for stablecoin payments, foreign exchange, and capital markets applications.
The network is EVM-compatible and uses USDC as its native gas token. Arc is expected to launch its public testnet later this fall.
The network is EVM-compatible and uses USDC as its native gas token. Arc is expected to launch its public testnet later this fall.
The Block
Circle to launch Layer 1 blockchain Arc using USDC stablecoin as native gas token
Circle unveiled plans for its own stablecoin-focused Layer 1 blockchain, Arc, on Tuesday, expected to launch on public testnet this fall.
🔥6❤3🥰2
Microsoft_Administering_and_Governing_Agents__1755003891.pdf
569.1 KB
Microsoft has released a 30-page guide on #AIAgent governance to help secure and manage agents in #Microsoft365 environments
🆒5
Anthropic: Claude Sonnet 4 now supports 1 million tokens of context on the Anthropic API—a 5x increase.
Process over 75,000 lines of code or hundreds of documents in a single request.
Long context support is in public beta for API users with Tier 4 and custom rate limits.
Broader availability will be rolling out over the coming weeks. Available in Amazon Bedrock, and coming soon to Google Cloud's Vertex AI.
Process over 75,000 lines of code or hundreds of documents in a single request.
Long context support is in public beta for API users with Tier 4 and custom rate limits.
Broader availability will be rolling out over the coming weeks. Available in Amazon Bedrock, and coming soon to Google Cloud's Vertex AI.
Claude
Claude Sonnet 4 now supports 1M tokens of context | Claude
Claude Sonnet 4 now supports up to 1 million tokens of context—a 5x increase that lets you process entire codebases, synthesize extensive document sets, and build agents that maintain coherence across hundreds of tool calls.
🔥6🥰3👏2
Microsoft introduced Dion is a new AI model optimization method that boosts scalability and performance over existing leading methods by orthonormalizing only a top rank subset of singular vectors, enabling more efficient training of large models such as LLaMA-3 with reduced overhead.
Orthonormal updates appear to roughly double transformer training convergence with Dion paving tractability at the largest scale.
Code.
Paper.
Orthonormal updates appear to roughly double transformer training convergence with Dion paving tractability at the largest scale.
Code.
Paper.
Microsoft Research
Dion: Distributed orthonormal update revolution
Dion is a new AI model optimization method that boosts scalability and performance over existing leading methods by orthonormalizing only a top rank subset of singular vectors, enabling more efficient training of large models such as LLaMA-3 with reduced…
🔥3🆒3❤2👏2
Matrix-Game 2.0 — the first open-source, real-time, long-sequence interactive world model
Last week, DeepMind's Genie 3 shook the AI world with real-time interactive world models.
But... it wasn't open-sourced.
Matrix-Game 2.0 is Skywork's next-gen interactive world model:
- Real-time: 25FPS generation
- Long-sequence: Minutes of continuous video
- Interactive: Move, rotate, explore
- Multi-scene: City, wild, TempleRun, GTA.
It's the foundation for:
- Game engines
- Embodied AI
- Virtual humans
- Spatial intelligence.
The Tech Stack:
- Data: 1,350 hrs of interactive videos from Unreal Engine + GTA5
- Control: Frame-level keyboard & mouse input
- Model: 1.3B autoregressive diffusion with action control
- Speed: Single GPU → 25FPS
- 3D Causal VAE for space-time compression
- Diffusion Transformer with action conditioning
- KV-Cache for infinite video generation
- DMD training to avoid error accumulation
Last week, DeepMind's Genie 3 shook the AI world with real-time interactive world models.
But... it wasn't open-sourced.
Matrix-Game 2.0 is Skywork's next-gen interactive world model:
- Real-time: 25FPS generation
- Long-sequence: Minutes of continuous video
- Interactive: Move, rotate, explore
- Multi-scene: City, wild, TempleRun, GTA.
It's the foundation for:
- Game engines
- Embodied AI
- Virtual humans
- Spatial intelligence.
The Tech Stack:
- Data: 1,350 hrs of interactive videos from Unreal Engine + GTA5
- Control: Frame-level keyboard & mouse input
- Model: 1.3B autoregressive diffusion with action control
- Speed: Single GPU → 25FPS
- 3D Causal VAE for space-time compression
- Diffusion Transformer with action conditioning
- KV-Cache for infinite video generation
- DMD training to avoid error accumulation
huggingface.co
Skywork/Matrix-Game-2.0 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
🔥4🥰3👏2
Now you can run and benchmark evolutionary coding agents on 100+ algorithm optimization tasks from algotune.io
👍4🔥2🥰2
Google is rolling out their version of memory for Gemini today. It is called 'personal context.'
If you want to disable this, toggle off Personal Context in settings.
This works for 2.5 Pro only, not Flash.
It will be interesting to see what the effect of Gemini's monster context window will have on implementation.
If you want to disable this, toggle off Personal Context in settings.
This works for 2.5 Pro only, not Flash.
It will be interesting to see what the effect of Gemini's monster context window will have on implementation.
Google
Gemini adds Temporary Chats and new personalization features
Today, we are updating the Gemini app so that it learns about your preferences the more you use it.
🔥3❤2🥰2🍌1
Anthropic acquired the co-founders and most of the team of Humanloop, the startup behind Hooli, a platform for prompt management, LLM evaluation, and observability
The move marks a major push from Anthropic to strengthen its AI safety strategy.
The move marks a major push from Anthropic to strengthen its AI safety strategy.
TechCrunch
Anthropic nabs Humanloop team as competition for enterprise AI talent heats up | TechCrunch
While an Anthropic spokesperson confirmed that the AI firm did not acquire Humanloop or its IP, that’s a moot point in an industry where IP lives in the brain. And what Humanloop’s team is bringing to Anthropic is experience developing the tools that help…
The revenue from just the AI Labs (publicly reported figures from OpenAI and Anthropic), along with the public AI infrastructure companies, has already eclipsed all public SaaS revenue in 2024 (Nvidia's datacenter revenue drives most of the growth).
It will almost double public SaaS on a net new revenue basis this year. And these figures don’t include private AI companies, which would even further show the spread.
It’s clear that the current set of 100+ public SaaS companies is not yet seeing revenue growth in their AI offerings, and for the most part, AI demand is happening where they are not.
It will almost double public SaaS on a net new revenue basis this year. And these figures don’t include private AI companies, which would even further show the spread.
It’s clear that the current set of 100+ public SaaS companies is not yet seeing revenue growth in their AI offerings, and for the most part, AI demand is happening where they are not.
🔥4🥰2👏2
ByteDance & Tsinghua University unveiled ASearcher
Agentic search by enabling long-horizon reasoning with large-scale asynchronous RL.
Goes beyond typical turn limits for complex, knowledge-intensive tasks.
Achieves SOTA performance, with significant gains of up to +46.7% on xBench and GAIA after RL training.
Models & data.
Agentic search by enabling long-horizon reasoning with large-scale asynchronous RL.
Goes beyond typical turn limits for complex, knowledge-intensive tasks.
Achieves SOTA performance, with significant gains of up to +46.7% on xBench and GAIA after RL training.
Models & data.
huggingface.co
Paper page - Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale
Asynchronous RL
Asynchronous RL
Join the discussion on this paper page
🔥4❤3🥰2