NEW BOT Телеграм, страница

All about AI, Web 3.0, BCI

A team from NYU, NVIDIA, UC Berkeley, and Stanford released H*Bench—a benchmark that moves visual AI out of curated household scenes into actual complexity: transportation hubs, retail spaces, urban streets across 12 countries.

The task is deceptively simple: given a 360° panorama and limited field of view, rotate your head, find an object or identify a navigable path. This mimics how humans actually search—not passively processing a full scene, but actively exploring with strategic head and eye movements.
The results expose where current models break down.

The researchers built HVS-3B by fine-tuning Qwen2.5-VL-3B with supervised learning and RL Performance jumped significantly:

Object search: 14.83% → 47.38% success rate
Path search: 6.44% → 24.94% success rate
The asymmetry is revealing. Object search—essentially visual grounding with rotation control—responds well to post-training. Path search—requiring physical commonsense, spatial reasoning, and social conventions—remains stubbornly difficult.

humanoid-vstar.github.io

Thinking in 360°

Humanoid Visual Search in the Wild

🔥2🥰2👏2

665 views08:50

All about AI, Web 3.0, BCI

NeurIPS 2025: LLMs can solve RL tasks without any external component.

Researchers introduce Prompted Policy Search (ProPS), an RL method based only LLMs and in-context learning.

The tutorials are all implemented in Colab and can be run online with a free Gemini account:

Tutorial 1.
Tutorial 2.

Google

In_Context_Learning_Demo_PromptedPolicySearch(ProPS).ipynb

Colab notebook

🔥2🥰2👏2

683 views12:22

All about AI, Web 3.0, BCI

#DeepSeek released DeepSeek-Math-V2: Towards Self-Verifiable Mathematical Reasoning

It shows LLMs can now self-verify proofs, not just output solutions. DeepSeekMath-V2 achieves gold-level IMO 2025, CMO 2024, and 118/120 Putnam 2024, pointing to a future of deep, trustworthy mathematical reasoning.

GitHub.

huggingface.co

deepseek-ai/DeepSeek-Math-V2 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

🆒4❤3👍2👏2

675 viewsedited 14:57

All about AI, Web 3.0, BCI

ByteDance: what if video generation could follow any semantic instruction without retraining or task-specific hacks?

Enter Video-As-Prompt (VAP).

By treating a reference video as an in-context semantic prompt and steering a frozen Video DiT with a plug-and-play MoT expert plus temporally biased position embeddings, it avoids artifacts, prevents forgetting, and delivers strong zero-shot control.

Trained on the new 100K-pair VAP-Data, it reaches a 38.7% user preference rate, rivaling specialized commercial models.

716 views11:55

All about AI, Web 3.0, BCI

NeurIPS 2025 Best Paper Awards

Here are the winners that are already making waves:

1. Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)

They built Infinity-Chat and showed that LLMs are converging into an “artificial hivemind”: same-y answers, collapsed diversity, and reward models that are completely miscalibrated for real human preferences. Scary and fascinating.

2. Gated Attention for Large Language Models

Simple idea: add a learned sigmoid gate after each attention head. Result? Better performance, no attention sink issues, rock-solid long-context extrapolation on 15B MoE and 1.7B dense models. Sometimes the simplest tricks win.

3. 1000 Layer Networks for Self-Supervised RL

Deep networks (literally 1024 layers!) + contrastive self-supervised RL = new goal-reaching abilities that shallow nets never discover. Depth is underrated in RL.

4. Why Diffusion Models Don’t Memorize

Tony Bonnaire, Giulio Biroli et al.
Elegant theory: two time scales in diffusion training → implicit dynamical regularization prevents memorization even in massively over-parameterized models. Explains why they generalize so well.

Runner-ups that are also fire:
- RL doesn’t actually teach LLMs new reasoning skills beyond the base model (sorry, RLHF believers)
- Optimal bounds for transductive online learning finally settled after 30 years
- Superposition is the reason scaling laws work so cleanly

Full list

❤‍🔥5🔥3🥰2❤1

696 views15:06

All about AI, Web 3.0, BCI

StepFun released GELab-Zero-4B-preview — a 4B multimodal GUI agent fine-tuned for Android.

It understands taps, swipes, typing & waits, and can perform complex, multi-app tasks.
Built on Qwen3-VL-4B-Instruct.

HuggingFace.
GitHub.

huggingface.co

stepfun-ai/GELab-Zero-4B-preview · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

👏4🔥3🥰2

559 views10:25

All about AI, Web 3.0, BCI

#DeepSeek just launched DeepSeek-V3.2 & DeepSeek-V3.2-Speciale — Reasoning-first models built for agents

1. DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web & API.

2. DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now.

Thinking in Tool-Use:

- Introduced a new massive agent training data synthesis method covering 1,800+ environments & 85k+ complex instructions.

- DeepSeek-V3.2 integrate thinking directly into tool-use, and also supports tool-use in both thinking and non-thinking modes.

API update:

- V3.2: Same usage pattern as V3.2-Exp.
- V3.2-Speciale: Served via a temporary endpoint: base_url="
Same pricing as V3.2, no tool calls, available until Dec 15th, 2025, 15:59 (UTC Time).

V3.2 now supports Thinking in Tool-Use — details

👏4🔥3🥰2

652 views11:27

All about AI, Web 3.0, BCI

Google introduced Budget Tracker for smarter AI agents

Current LLM agents waste tool-call budgets.

This work unveils Budget Tracker and BATS, enabling agents to dynamically adapt planning based on remaining resources.

🔥4🥰3👏2

575 views14:45

All about AI, Web 3.0, BCI

We have a new best text-to-video model that beats Google's Veo. Runway Gen-4.5, or Whisper Thunder, has +20 ELO on preference data over Veo 3, the difference between Veo 3 and Sora 2 Pro.

Does text-to-vid, image-to-vid, keyframes. 5-10s of output. No audio.

Runwayml

Runway Research | Introducing Runway Gen-4.5

A new frontier for video generation. State-of-the-art motion quality, prompt adherence and visual fidelity.

🔥4👍3👏3

574 views17:04

All about AI, Web 3.0, BCI

Sam Altman told staff today that he was declaring a “code red” as ChatGPT faces growing threats from Google and other AI makers.

He wrote that he’s marshaling more resources to improve model behavior and other features in the chatbot.

In an internal Slack memo, Sam said he's directing more employees to work on improving ChatGPT for over 800 million weekly users, with key code red priorities including personalizing the chatbot so each person can customize how it interacts with them, improving ImageGen, improving model behavior, boosting speed and reliability, and minimizing overrefusals

OpenAI is delaying ads (which the company is testing but hasn't publicly acknowledged, according to a person with knowledge of the plans), AI agents (which aim to automate tasks related to shopping and health), Pulse, and plans to release a new reasoning model next week that Sam said beats Google's Gemini 3 in OpenAI's internal tests

The Information

OpenAI CEO Declares ‘Code Red’ to Combat Threats to ChatGPT, Delays Ads Effort

OpenAI CEO Sam Altman on Monday told employees he was declaring a “code red” to marshalmore resources to improve ChatGPT as threats rise from Google and other artificial intelligence competitors, according to an internal memo. As a result,OpenAI plans to…

👏4🔥3🥰3

612 views08:41

All about AI, Web 3.0, BCI

The world's first Co-Scientist integrating AI and XR. Meet LabOS.

It uses multimodal perception, self-evolving agents, and XR tools to see what researchers see, grasp experimental context, and assist in real time.

From cancer immunotherapy target discovery to stem-cell engineering, it turns labs into collaborative spaces where human insight and machine smarts evolve together, proving modern science moves fastest when thought and action team up.

Paper

arXiv.org

LabOS: The AI-XR Co-Scientist That Sees and Works With Humans

Modern science advances fastest when thought meets action. LabOS represents the first AI co-scientist that unites computational reasoning with physical experimentation through multimodal...

❤6🆒6🔥5👏1

575 views14:13

All about AI, Web 3.0, BCI

Mistral released the Mistral 3 family of models

Small models Ministral 3 (14B, 8B, 3B), each released with base, instruct and reasoning versions.

And Mistral Large 3, a frontier class open source MoE. Apache 2.0.

mistral.ai

Introducing Mistral 3 | Mistral AI

A family of frontier open-source multimodal models

🔥6❤3👏2

573 views16:30

All about AI, Web 3.0, BCI

Shopify just shipped Tangle - the first open source experimentation platform with content-based caching and visual editor that's actually pleasant to use.

The CPU time savings alone are ridiculous (seeing 1+ year saved at Shopify).

Shopify

Tangle: An open-source ML experimentation platform built for scale (2025) - Shopify

Tangle saves months of compute time, makes every experiment automatically reproducible, and allows teammates to share computation without coordination.

🔥6👍3🥰2

570 views18:17

All about AI, Web 3.0, BCI

Diffusion Language Models are hyped lately, but hard to reproduce due to missing frameworks and high training costs.

Berkeley and UIUC show a surprisingly simple path: using their dLLM toolkit, they teach BERT to chat via discrete diffusion.

No generative pretraining, about 50 GPU hours, and ModernBERT large chat v0 reaches near Qwen1.5 0.5B quality with only lightweight SFT.

Even better, they open sourced the full training and inference pipeline plus a Hello World example, along with the extensible dllm framework. Efficient, cheap, and beginner friendly.

Models.

GitHub

GitHub - ZHZisZZ/dllm: dLLM: Simple Diffusion Language Modeling

dLLM: Simple Diffusion Language Modeling. Contribute to ZHZisZZ/dllm development by creating an account on GitHub.

❤3🔥3👏3

599 views07:55

All about AI, Web 3.0, BCI

UK passes law officially recognizing crypto as third kind of property

The Block

UK passes law officially recognizing crypto as third kind of property

Local industry body CryptoUK said this gives crypto a "clearer legal footing" in related crimes or litigation.

👏3🔥2🥰2

568 views09:38

All about AI, Web 3.0, BCI

A promising step toward practical, efficient compute in memory systems

A new memristor based ADC with adaptive quantization shows the possibility : analog AI hardware could unlock its full potential without bulky converters in the way.

It delivers strong CIFAR10 and ImageNet performance at just 5 bits, achieves up to 15.1x better energy efficiency and 12.9x smaller area, and cuts CIM system overhead by more than half.

Nature

Memristor-based adaptive analog-to-digital conversion for efficient and accurate compute-in-memory

Nature Communications - Hong et al. report an adaptive memristor-based analog-to-digital converter which leverages the programmable nature of memristors to implement optimal, data-aware...

🔥3🥰3👏3

599 views14:28

All about AI, Web 3.0, BCI

OpenAI published blog post stating: confessions can keep language models honest.

Poof-of-concept method that trains models to report when they break instructions or take unintended shortcuts.

Even when models learn to cheat, they’ll still admit it...

Openai

How confessions can keep language models honest

We’re sharing an early, proof-of-concept method that trains models to report when they break instructions or take unintended shortcuts.

🔥3🥰3👏2

612 views06:44

All about AI, Web 3.0, BCI

Google introduced the Massive Sound Embedding Benchmark (MSEB).

This new open-source framework evaluates universal sound understanding across 8 core tasks, from retrieval to reconstruction, in order to accelerate progress in multimodal AI.

research.google

From Waveforms to Wisdom: The New Benchmark for Auditory Intelligence

❤3👍3🔥2

533 views10:04

All about AI, Web 3.0, BCI

Best Paper(DB track) Award at #NeurIPS2025 for Artificial Hivemind

Researchers from University of Washington, CMU, and Allen Institute have identified a fundamental problem in modern language models - the "Artificial Hivemind effect". HuggingFace.

Different models independently generate identical responses to open-ended questions. GPT-4, Qwen, Llama, Mixtral - all write "time is a river" when asked for a metaphor about time.

Average semantic similarity across different model families: 71-82%. This isn't a bug in one model. It's a systemic property of current LLM training paradigms.

The study covers 70+ models using the INFINITY-CHAT dataset:
- 26K real-world open-ended queries from WildChat
- 17 categories (from creative writing to philosophical questions)
- 31,250 human annotations (25 independent annotators per example)

Two forms of collapse:

• Intra-model: a single model repeats itself with pairwise similarity >0.8 in 79% of cases (even at temperature=1.0)

• Inter-model: different models produce identical phrases and structures.

Critical finding: LLM judges and reward models systematically fail when evaluating alternative responses of similar quality. Correlation with humans drops from 0.4 to 0.05 on examples with diverse content.

For business:
This creates an "AI feedback loop" - models are trained based on evaluations from other models that are themselves poorly calibrated for diversity.
Implications: → Reduced innovation potential in AI assistants → Standardization of creative content → Loss of alternative perspectives in strategic analysis → Risk of homogenizing user thinking patterns.

The future of AI should not be echoes of one voice, but a chorus of many.

arXiv.org

Artificial Hivemind: The Open-Ended Homogeneity of Language Models...

Language models (LMs) often struggle to generate diverse, human-like creative content, raising concerns about the long-term homogenization of human thought through repeated exposure to similar...

🔥8❤5👏4

598 views13:16

All about AI, Web 3.0, BCI

The official announcement is pending, but Google is signing a multi-year partnership with Replit.

CNBC

Google partners with Replit, in vibe-coding push

Replit will expand usage of Google Cloud services, add more of Google's models onto its platform, and support AI coding use cases for enterprise customers.

🔥5❤4👏2

603 views17:40