All about AI, Web 3.0, BCI – Telegram
All about AI, Web 3.0, BCI
3.22K subscribers
724 photos
26 videos
161 files
3.08K links
This channel about AI, Web 3.0 and brain computer interface(BCI)

owner @Aniaslanyan
Download Telegram
All about AI, Web 3.0, BCI
OpenAI launches new tools to help businesses build AI agents A new tools: RAG/file search, web search, and operator/computer use. All packaged together in the Responses API. Plus,they’re upgraded Swarm to the (open source) Agents SDK to make it easy to build…
Summary of insights from OpenAI AMA on X following the launch of Agent Tools and APIs

Responses API and Tools

1. Operator functionality (CUA model) is available starting today through the Responses API

2. Responses API is stateful by default, supports retrieving past responses, chaining them, and will soon reintroduce threads

3. Code Interpreter tool is planned as the next built-in tool in the Responses API

4. Web search can be used together with structured outputs by defining a JSON schema explicitly

5. Assistants API won't be deprecated until migration to Responses API is possible without data loss

6. "Assistants" and "agents" terms are interchangeable, both describing systems independently accomplishing user tasks

7. Curl documentation is provided in API references, with more examples coming soon

Agents SDK and Compatibility

- Agents SDK supports external API calls through custom-defined function tools
- SDK compatible with external open-source models that expose a Chat Completions-compatible API endpoint (no Responses API compatibility required)
- JavaScript and TypeScript SDKs coming soon; Java SDK may be prioritized based on demand
- Tracing functionality covers external Chat Completions-compatible models as a "generation span"
- Agents SDK supports MCP connections through custom-defined function tools
- Asynchronous operations aren't natively supported yet; interim solution is immediately returning "success" followed by updating via user message later
- Agent tools can include preset variables either hardcoded or through a context object
- Privacy can be managed using guardrails and an input_filter to constrain context during agent handoffs
- Agents SDK workflows can combine external Chat Completions-compatible models and OpenAI models, including built-in tools like CUA
- Agentic "deep research" functionality can be built using Responses API or Agents SDK

File and Vector Store Features

- File search returns citation texts via the "annotations" parameter
- Vector stores already support custom chunking and hybrid search, with further improvements planned
- Images are not yet supported in vector stores, but entire PDFs can be directly uploaded into the Responses API for small-document use cases

Computer Use Model (CUA) and Environment

- Docker environments for computer use must be managed by developers; recommended third-party cloud services are Browserbase and Scrapybara, with provided sample apps
- Predefined Ubuntu environments and company-specific setups can be created using OpenAI's CUA starter app
- Integrated VMs or fully managed cloud environments for CUA are not planned yet; developers are encouraged to use sample apps with third-party hosting providers
- CUA model primarily trained on web tasks but shows promising performance in desktop applications; still early stage

Realtime Usage Tracking

- OpenAI currently doesn't provide a built-in solution for tracking realtime usage via WebRTC ephemeral tokens; using a relay/proxy is recommended as an interim solution

OpenAI Models and Roadmap

- o1-pro will be available soon in Responses API
- o3 model development continues with API release planned; details forthcoming

Strategy and Positioning

OpenAI identifies as both a product and model company, noting ChatGPT's 400M weekly users help improve model quality, and acknowledges they won't build all needed AI products themselves

Unexpected Use Cases

Early Responses API tests revealed use cases such as art generation, live event summarization, apartment finding, and belief simulations
👍3🔥21
Google introduced Gemini Robotics is the most advanced VLA in the world

Gemini Robotics featured in the post, builds on Gemini 2.0, introducing advanced vision-language-action capabilities to control robots physically.

The technology enables robots to understand and react to the physical world, performing tasks like desk cleanup through voice commands, as part of a broader push toward embodied AI.

Gemini Robotics-ER, a related model, enhances spatial understanding, allowing robots to adapt to dynamic environments and interact seamlessly with humans.

Tech report.
👍53🔥2
Hugging Face (LeRobot) & Yaak released the worlds largest open source self driving dataset

To search the data, Yaak is launching Nutron - A tool that is revolutionizing natural language search of robotics data. Check out the video to see how it works.

Natural language search of multi-modal data.
Open sourcing L2D dataset - 5,000 hours of multi-modal self-driving data.

Try Nutron.
4🔥4👍1
The first quantum supremacy for a useful application D-Wave’s quantum computer performed a complex simulation in minutes and with a level of accuracy that would take nearly a million years using the DOE supercomputer built with GPUs.

In addition, it would require more than the world’s annual electricity consumption to solve this problem using the classical supercomputer.
🔥5👍21
IOSCO_AI_1741862094.pdf
1.5 MB
IOSCO Report: AI in Capital Markets - Uses, Risks, and Regulatory Responses

This report delves into the current and potential applications of AI within financial markets, outlines the associated risks and challenges, and examines how regulators and market participants are adapting to these changes.

The report cites regulatory approaches from Hong Kong, the EU, Canada, the US, Singapore, the Netherlands, the UK, Greece, Japan, Brazil, and Australia.

Key AI Applications in Financial Markets:

Decision-Making Support:
- Robo-advising (automated investment advice)
- Algorithmic trading
- Investment research and market sentiment analysis

Specific AI Use Cases.
Nasdaq:
Developed the Dynamic M-ELO AI-driven trading order, optimizing order holding time for improved execution efficiency.

Broker-Dealers.
Customer interaction via chatbots
Algorithmic trading enhancements
Fraud and anomaly detection

Asset Managers.
Automated investment advice
Investment research
Portfolio construction and optimization
🔥4👍3👏1
Cohere introduced Command A: a new AI model that can match or outperform GPT-4o and DeepSeek-V3 on business tasks, with significantly greater efficiency.

Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding usecases.

Runs on only 2 GPUs (vs. typically 32), offers 256k context length, supports 23 languages and delivers up to 156 tokens/sec.

API.
🔥53🥰1👌1
Transformers, but without normalization layers. New paper by Meta.
🔥7👍1👏1
Baidu, the Google of China, dropped two models

1. ERNIE 4.5: beats GPT 4.5 for 1% of price

2. Reasoning model X1: beats DeepSeek R1 for 50% of price.

China continues to build intelligence too cheap to meter. The AI price war is on.
🔥5👍21
Microsoft has released this useful tool for performing R&D with LLM-based agents.
4🔥2👏1
Xiaomi's development of a SOTA audio reasoning model leverages DeepSeek's GRPO RL algorithm, achieving a 64.5% accuracy on the MMAU benchmark in just one week.

The breakthrough involves applying GRPO to the Qwen2-Audio-7B model, trained on 38,000 samples from Tsinghua University's AVQA dataset, marking a significant advancement in multimodal audio understanding.

The MMAU benchmark, introduced in 2024, tests models on complex audio tasks across speech, sound, and music, with even top models like Gemini Pro 1.5 achieving only 52.97% accuracy, highlighting the challenge Xiaomi's model addresses.
👍3🔥3👏2
The total value of real world tokenized assets has 📈 by ≈20% over the last 30 days. Data by RWA.xyz.

Total Real-World Asset (RWA) value continues to climb—over $18B is now tokenized onchain, excluding stablecoins.
👍53👏2
A breakthrough in brain signal analysis that combines PCA and ANFIS to hit 99.5% accuracy in cognitive pattern recognition.

It could be a game-changer for #neuroscience, #BCI tech and clinical applications.
3🤔2
Mistral announced Small 3.1: multimodal, multilingual, Apache 2.0

Lightweight: Runs on a single RTX 4090 or a Mac with 32GB RAM, perfect for on-device applications.

Fast-Response Conversations: Ideal for virtual assistants and other applications where quick, accurate responses are essential.

Low-Latency Function Calling: Capable of rapid function execution within automated or agentic workflows.

Specialized Fine-Tuning: Customizable for specific domains.

Advanced Reasoning Foundation: Inspires community innovation, with models like DeepHermes 24B by Nous Research built on Mistral Small 3.
🔥4👍2👏2
ByteDance Seed, Tsinghua, and UHK dropped open-sourced a new RL algorithm for building reasoning models.

DAPO-Zero-32B, a fully open-source RL reasoning model, surpasses DeepSeek-R1-Zero-Qwen-32B, and scores 50 on AIME 2024 with 50% fewer steps.

It is trained with zero-shot RL from the Qwen-32b pre-trained model.

Everything is fully open-sourced (algorithm, code, dataset, verifier, and model).
🔥71👏1
Cool research on open-source by Harvard

$4.15B invested in open-source generates $8.8T of value for companies (aka $1 invested in open-source = $2,000 of value created).

Companies would need to spend 3.5 times more on software than they currently do if OSS did not exist.
🔥32👏1
HuggingFace and IBM introduced SmolDocling an ultra-compact VLM for end-to-end multi-modal document conversion

SmolDocling is good for enterprise use cases:
- 256M parameters - cheap and easy to run locally
- Performs better than 20x larger models
- Fast inference using VLLM – Avg of 0.35 secs per page on A100 GPU.
- Apache 2.0 license

Demo.
6👍2👏1
Biggest deal in Google/Alphabet history: Google is buying Wiz for $32B to beef up in cloud security

Wiz. is an Israeli cloud security startup headquartered in New York City. The company was founded in January 2020.

This acquisition positions Google to better compete with AWS and Azure.
🔥4👍3👏3
Anthropic is working on voice capabilities for Claude.

The company’s chief product officer, Mike Krieger, told the Financial Timesthat Anthropic plans to launch experiences that allow users to talk to Anthropic’s AI models.
🆒5👍1🔥1👏1
Media is too big
VIEW IN TELEGRAM
NVIDIA, Google DeepMind and Disney Research are collaborating to build an R2D2 style home droid.

Jensen giving the little guy voice and gesture commands live on stage.

Robot’s name is Blue, he is so cute.
6🥰2👏1
Nvidia announced GR00T N1, the world’s first open foundation model for humanoid robots

The power of general robot brain, in the palm of your hand - with only 2B parameters, N1 learns from the most diverse physical action dataset ever compiled and punches above its weight:

- Real humanoid teleoperation data.
- Large-scale simulation data: we are open-sourcing 300K+ trajectories
- Neural trajectories: SOTA video generation models to “hallucinate” new synthetic data that features accurate physics in pixels. Using Jensen’s words, “systematically infinite data”
- Latent actions: novel algorithms to extract action tokens from in-the-wild human videos and neural generated videos.

GR00T N1 is a single end-to-end neural net, from photons to actions:

- Vision-Language Model (System 2) that interprets the physical world through vision and language instructions, enabling robots to reason about their environment and instructions, and plan the right actions.
- Diffusion Transformer (System 1) that “renders” smooth and precise motor actions at 120 Hz, executing the latent plan made by System 2.

Code.
Weights on HF.
Open Physical AI
dataset release.
Blog.
🔥5👍21