Apple introduced EgoDex the largest and most diverse dataset of dexterous human manipulation to date — 829 hours of egocentric video + paired 3D hand poses across 194 tasks.
Unlike teleoperation, egocentric video is passively scalable - like text and images on the Internet.
Researchers use Apple Vision Pro to collect video + precise pose annotations (unlike Ego4D, which lacks native pose data). This unlocks 5x the scale of existing large datasets like DROID.
Also propose new benchmarks and train imitation learning policies for dexterous trajectory prediction. Below are 30 Hz wrist and fingertip trajectories on the test set, where blue = ground truth, red = model predictions, and points get lighter up to 2 seconds in the future.
The full dataset is now publicly available to the community, access details are in the paper. Sample code for data loading is coming soon.
Unlike teleoperation, egocentric video is passively scalable - like text and images on the Internet.
Researchers use Apple Vision Pro to collect video + precise pose annotations (unlike Ego4D, which lacks native pose data). This unlocks 5x the scale of existing large datasets like DROID.
Also propose new benchmarks and train imitation learning policies for dexterous trajectory prediction. Below are 30 Hz wrist and fingertip trajectories on the test set, where blue = ground truth, red = model predictions, and points get lighter up to 2 seconds in the future.
The full dataset is now publicly available to the community, access details are in the paper. Sample code for data loading is coming soon.
arXiv.org
EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video
Imitation learning for manipulation has a well-known data scarcity problem. Unlike natural language and 2D computer vision, there is no Internet-scale corpus of data for dexterous manipulation....
👏6
Anthropic is hosting an event tomorrow at 9:30 am PT.
Anthropic
Events \ Anthropic
Join Anthropic events: upcoming conferences, webinars, and livestreams. Access past recordings and explore the future of AI safety with Claude.
🔥6
Elon Musk's xAI announced Live Search in the API
The new beta (free for a limited time) feature allows apps leveraging Grok models to search real-time info from X and the internet, including news.
Here's how easy it is to try out Grok 3's new live search:
1/ Grab a key from xAI
2/ Remix our template
3/ Add your API key to Secrets
4/ Click Run and start chatting with Grok.
Since it's built with Agent, you can remix and keep editing with Agent.
Here's the template to get started.
The new beta (free for a limited time) feature allows apps leveraging Grok models to search real-time info from X and the internet, including news.
Here's how easy it is to try out Grok 3's new live search:
1/ Grab a key from xAI
2/ Remix our template
3/ Add your API key to Secrets
4/ Click Run and start chatting with Grok.
Since it's built with Agent, you can remix and keep editing with Agent.
Here's the template to get started.
replit
Loading...
Build and deploy software collaboratively with the power of AI without spending a second on setup.
❤3🆒3
Tencent presented Hunyuan-TurboS
- Hybrid Transformer-Mamba MoE (56B active params) trained on 16T tokens
- Dynamically switching between rapid responses and deep ”thinking” modes
- Overall top 7 on LMSYS Chatbot Arena.
- Hybrid Transformer-Mamba MoE (56B active params) trained on 16T tokens
- Dynamically switching between rapid responses and deep ”thinking” modes
- Overall top 7 on LMSYS Chatbot Arena.
❤3
VanEck will launch a private digital assets fund in June 2025 focused on the Avalanche ecosystem.
The fund will invest in projects with long-term token utility around the TGE stage across sectors such as gaming, financial services, payments, and AI, while allocating idle capital to Avalanche-native RWA products to maintain onchain liquidity.
The fund will invest in projects with long-term token utility around the TGE stage across sectors such as gaming, financial services, payments, and AI, while allocating idle capital to Avalanche-native RWA products to maintain onchain liquidity.
GlobeNewswire News Room
VanEck Prepares to Launch PurposeBuilt Fund to Invest in Real-World Applications on Avalanche
Managed by VanEck’s Digital Assets Alpha Fund investment team, the VanEck PurposeBuilt Fund will invest in Avalanche ecosystem founders building scalable...
👏3
G42 announced with OpenAI the Stargate in UAE
#Stargate UAE: a next generation 1GW AI compute cluster, will be built by G42 and operated by OpenAI and Oracle.
The collaboration will also include Cisco and SoftBank Group. NVIDIA will supply the latest Blackwell GB300 systems. This will be at the heart of 5GW AI campus announced last week.
#Stargate UAE: a next generation 1GW AI compute cluster, will be built by G42 and operated by OpenAI and Oracle.
The collaboration will also include Cisco and SoftBank Group. NVIDIA will supply the latest Blackwell GB300 systems. This will be at the heart of 5GW AI campus announced last week.
Invent a Better Everyday | Abu Dhabi, UAE | G42
Invent a Better Everyday | Abu Dhabi, UAE | G42 | Global Tech Alliance Launches Stargate UAE
👍3
Researchers introduced MedBrowseComp, a challenging deep research benchmark for LLM agents in medicine
MedBrowseComp is the first benchmark that tests the ability of agents to retrieve & synthesize multi-hop medical facts from oncology knowledge bases.
MedBrowseComp is the first benchmark that tests the ability of agents to retrieve & synthesize multi-hop medical facts from oncology knowledge bases.
moreirap12.github.io
MedBrowseComp
MedBrowseComp project page
🔥4
Claude 4 is here, and it’s Anthropic’s vision about future of Agents
👍6
All about AI, Web 3.0, BCI
Claude 4 is here, and it’s Anthropic’s vision about future of Agents
More details about Claude4:
—Both models are hybrid models
—Opus 4 is great at understanding codebases and “the right choice” for agentic workflows
—Sonnet 4 excels at everyday tasks, and is your “daily go to”.
Coding agents are a huge theme here at the event and clearly a major focus for what’s coming next.
-Claude 4 has significantly greater agentic capabilities
-A new Code execution tool
-Claude Code coming to VSCode and Jetbrains
-Can now run Claude Code in GitHub.
Some more details on Claude 4 Opus:
—Matches or beats the best models in the world
—SOTA for coding, agentic tool use, and writing
—Memory capabilities across sessions
—Extended thinking mode for complex problem-solving
—200K context window with 32K output tokens.
Claude Code:
—Now generally available
—Integrates with VSCode and Jetbrains IDEs
—You can now see changes live inline in your editor
—A new Claude Code SDK for more flexibility.
If you want to read more about Sonnet & Opus 4, including a bunch of alignment and reward hacking findings, check out the model card.
—Both models are hybrid models
—Opus 4 is great at understanding codebases and “the right choice” for agentic workflows
—Sonnet 4 excels at everyday tasks, and is your “daily go to”.
Coding agents are a huge theme here at the event and clearly a major focus for what’s coming next.
-Claude 4 has significantly greater agentic capabilities
-A new Code execution tool
-Claude Code coming to VSCode and Jetbrains
-Can now run Claude Code in GitHub.
Some more details on Claude 4 Opus:
—Matches or beats the best models in the world
—SOTA for coding, agentic tool use, and writing
—Memory capabilities across sessions
—Extended thinking mode for complex problem-solving
—200K context window with 32K output tokens.
Claude Code:
—Now generally available
—Integrates with VSCode and Jetbrains IDEs
—You can now see changes live inline in your editor
—A new Claude Code SDK for more flexibility.
If you want to read more about Sonnet & Opus 4, including a bunch of alignment and reward hacking findings, check out the model card.
Anthropic
Introducing Claude 4
Discover Claude 4's breakthrough AI capabilities. Experience more reliable, interpretable assistance for complex tasks across work and learning.
❤6👍3
ByteDance introduced MMaDA: Multimodal Large Diffusion Language Models
MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation.
Surpasses LLaMA-3-7B and Qwen2-7B, SDXL and Janus, Show-o and SEED-X.
3 key innovations:
1. a unified diffusion architecture with a shared probabilistic formulation and a modality-agnostic design, eliminating the need for modality-specific components.
2. mixed long chain-of-thought (CoT) fine-tuning strategy that curates a unified CoT format across modalities.
3. UniGRPO, a unified policy-gradient-based RL algorithm specifically tailored for diffusion foundation models.
GitHub.
MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation.
Surpasses LLaMA-3-7B and Qwen2-7B, SDXL and Janus, Show-o and SEED-X.
3 key innovations:
1. a unified diffusion architecture with a shared probabilistic formulation and a modality-agnostic design, eliminating the need for modality-specific components.
2. mixed long chain-of-thought (CoT) fine-tuning strategy that curates a unified CoT format across modalities.
3. UniGRPO, a unified policy-gradient-based RL algorithm specifically tailored for diffusion foundation models.
GitHub.
arXiv.org
MMaDA: Multimodal Large Diffusion Language Models
We introduce MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and...
👏4
Humans can now see near-infrared light! Very cool tech development in biophotonics: engineered contact lenses convert invisible NIR signals into visible colors- enabling wearable, power-free NIR vision.
This has potential to shift our perceptual boundaries, showing the brain can integrate novel spectral inputs when mapped onto familiar visual codes, reframing light-based information processing and sensory integration.
This has potential to shift our perceptual boundaries, showing the brain can integrate novel spectral inputs when mapped onto familiar visual codes, reframing light-based information processing and sensory integration.
❤4
AI models are finding zero-day vulnerabilities. A new era for cybersecurity.
Sean Heelan's Blog
How I used o3 to find CVE-2025-37899, a remote zeroday vulnerability in the Linux kernel’s SMB implementation
In this post I’ll show you how I found a zeroday vulnerability in the Linux kernel using OpenAI’s o3 model. I found the vulnerability with nothing more complicated than the o3 API ̵…
The World Economic Forum has released a report on Asset Tokenization in Financial Markets.
Highlights
1. Tokenization offers a new model of digital asset ownership that enhances transparency, efficiency and accessibility.
2. This report analyses asset class use cases in issuance, securities financing and asset management, identifying factors that enable successful tokenization implementation.
3. Key differentiators include a shared system of record, flexible custody, programmability, fractional ownership and composability across asset types. These features can democratize access to financial markets and modernize infrastructure.
4. While the benefits are demonstrated, adoption is slowed by challenges such as legacy infrastructure, regulatory fragmentation, limited interoperability and liquidity issues.
5. Effective deployment requires phased approaches and strategic coordination among financial institutions, regulators and technology providers. Factors affecting design decisions – such as ledger type, settlement mechanisms and market operating hours – must also be carefully considered.
6. Ultimately, tokenization holds promise for a more inclusive and efficient financial system, provided stakeholders align on standards, safeguards and scalable solutions.
7. Tokenization is expected to reshape financial markets by increasing transparency, efficiency, speed, and inclusivity—paving the way for more resilient and accessible financial systems.
Highlights
1. Tokenization offers a new model of digital asset ownership that enhances transparency, efficiency and accessibility.
2. This report analyses asset class use cases in issuance, securities financing and asset management, identifying factors that enable successful tokenization implementation.
3. Key differentiators include a shared system of record, flexible custody, programmability, fractional ownership and composability across asset types. These features can democratize access to financial markets and modernize infrastructure.
4. While the benefits are demonstrated, adoption is slowed by challenges such as legacy infrastructure, regulatory fragmentation, limited interoperability and liquidity issues.
5. Effective deployment requires phased approaches and strategic coordination among financial institutions, regulators and technology providers. Factors affecting design decisions – such as ledger type, settlement mechanisms and market operating hours – must also be carefully considered.
6. Ultimately, tokenization holds promise for a more inclusive and efficient financial system, provided stakeholders align on standards, safeguards and scalable solutions.
7. Tokenization is expected to reshape financial markets by increasing transparency, efficiency, speed, and inclusivity—paving the way for more resilient and accessible financial systems.
❤4
Singapore's Sharpa unveiled SharpaWave, a lifelike robotic hand
—Features 22 DOF to balance for dexterity and strength
—Each fingertip has 1,000+ tactile sensing pixels and 5 mN pressure sensitivity
—AI models adapt the hand's grip and modulate force
—Features 22 DOF to balance for dexterity and strength
—Each fingertip has 1,000+ tactile sensing pixels and 5 mN pressure sensitivity
—AI models adapt the hand's grip and modulate force
HouseBots
Sharpa Unveils SharpaWave: The World’s Most Tactile Dexterous Robot Hand — HouseBots
Singapore-based robotics startup Sharpa is redefining what robotic manipulation means with the debut of its latest innovation: SharpaWave , a 22-degree-of-freedom (DOF) dexterous hand that brings human-like precision and speed to the world of robotics.
🔥6
Researchers introduced SPORT, a multimodal agent that explores tool usage without human annotation.
It leverages step-wise DPO to further enhance tool-use capabilities following SFT.
SPORT achieves improvements on the GTA and GAIA benchmarks.
It leverages step-wise DPO to further enhance tool-use capabilities following SFT.
SPORT achieves improvements on the GTA and GAIA benchmarks.
Google introduced Lyria RealTime is a new experimental interactive music generation model that allows anyone to interactively create, control and perform music in real time.
Available via the Gemini API and you can try the demo app on Google AI Studio.
Available via the Gemini API and you can try the demo app on Google AI Studio.
Amazon added AI-generated audio discussions about certain products, based on customer reviews and web searches.
About Amazon
Amazon's new generative AI-powered audio feature synthesizes product summaries and reviews to make shopping easier
The new AI shopping experts help save time by compiling research and providing product highlights for customers from product pages, reviews, and insights.
Anthropic just now rolling out voice mode in beta on mobile.
Try starting a voice conversation and asking Claude to summarize your calendar or search your docs. Voice mode in beta is available in English and coming to all plans in the next few weeks.
Try starting a voice conversation and asking Claude to summarize your calendar or search your docs. Voice mode in beta is available in English and coming to all plans in the next few weeks.
Game-Changer for AI: Meet the Low-Latency-Llama Megakernel
Buckle up, because a new breakthrough in AI optimization just dropped, and it’s got even Andrej Karpathy buzzing)
The Low-Latency-Llama Megakernel a approach to running models like Llama-1B faster and smarter on GPUs.
What’s the Big Deal?
Instead of splitting a neural network’s forward pass into multiple CUDA kernels (with pesky synchronization delays), this megakernel runs everything in a single kernel. Think of it as swapping a clunky assembly line for a sleek, all-in-one super-machine!
Why It’s Awesome:
1. No Kernel Boundaries, No Delays. By eliminating kernel switches, the GPU works non-stop, slashing latency and boosting efficiency.
2. Memory Magic. Threads are split into “loaders” and “workers.” While loaders fetch future weights, workers crunch current data, using 16KiB memory pages to hide latency.
3. Fine-Grained Sync. Without kernel boundaries, custom synchronization was needed. This not only solves the issue but unlocks tricks like early attention head launches.
4. Open Source. The code is fully open, so you can stop “torturing” your models with slow kernel launches (as the devs humorously put it) and optimize your own pipelines!
Why It Matters ?
- Speed Boost. Faster inference means real-time AI applications (think chatbots or recommendation systems) with lower latency.
- Cost Savings. Optimized GPU usage reduces hardware demands, perfect for startups or budget-conscious teams.
- Flexibility. Open-source code lets developers tweak it for custom models or use cases.
Karpathy’s Take:
Andrej calls it “so so so cool,” praising the megakernel for enabling “optimal orchestration of compute and memory.” He argues that traditional sequential kernel approaches can’t match this efficiency.
Buckle up, because a new breakthrough in AI optimization just dropped, and it’s got even Andrej Karpathy buzzing)
The Low-Latency-Llama Megakernel a approach to running models like Llama-1B faster and smarter on GPUs.
What’s the Big Deal?
Instead of splitting a neural network’s forward pass into multiple CUDA kernels (with pesky synchronization delays), this megakernel runs everything in a single kernel. Think of it as swapping a clunky assembly line for a sleek, all-in-one super-machine!
Why It’s Awesome:
1. No Kernel Boundaries, No Delays. By eliminating kernel switches, the GPU works non-stop, slashing latency and boosting efficiency.
2. Memory Magic. Threads are split into “loaders” and “workers.” While loaders fetch future weights, workers crunch current data, using 16KiB memory pages to hide latency.
3. Fine-Grained Sync. Without kernel boundaries, custom synchronization was needed. This not only solves the issue but unlocks tricks like early attention head launches.
4. Open Source. The code is fully open, so you can stop “torturing” your models with slow kernel launches (as the devs humorously put it) and optimize your own pipelines!
Why It Matters ?
- Speed Boost. Faster inference means real-time AI applications (think chatbots or recommendation systems) with lower latency.
- Cost Savings. Optimized GPU usage reduces hardware demands, perfect for startups or budget-conscious teams.
- Flexibility. Open-source code lets developers tweak it for custom models or use cases.
Karpathy’s Take:
Andrej calls it “so so so cool,” praising the megakernel for enabling “optimal orchestration of compute and memory.” He argues that traditional sequential kernel approaches can’t match this efficiency.
hazyresearch.stanford.edu
Look Ma, No Bubbles! Designing a Low-Latency Megakernel for Llama-1B
🆒5
Telegram + Grok = this summer https://news.1rj.ru/str/durov/422
Telegram
Pavel Durov
🔥 This summer, Telegram users will gain access to the best AI technology on the market. Elon Musk and I have agreed to a 1-year partnership to bring xAI’s chatbot Grok to our billion+ users and integrate it across all Telegram apps 🤝
💪 This also strengthens…
💪 This also strengthens…
🆒5