NEW BOT Телеграм, страница

ML Research Hub

✨Mitigating Label Length Bias in Large Language Models

📝 Summary:
Large Language Models exhibit a label length bias with multi-token class labels. This paper introduces Normalized Contextual Calibration NCC to mitigate this issue by normalizing and calibrating predictions at the full-label level. NCC significantly improves performance and reliability across div...

🔹 Publication Date: Published on Nov 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14385
• PDF: https://arxiv.org/pdf/2511.14385

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#LLM #AI #NLP #BiasInAI #MachineLearning

214 views08:20

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Φeat: Physically-Grounded Feature Representation

📝 Summary:
Φeat is a new self-supervised visual backbone that captures material identity like reflectance and mesostructure. It learns robust features invariant to external physical factors such as shape and lighting, promoting physics-aware perception.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11270
• PDF: https://arxiv.org/pdf/2511.11270

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#ComputerVision #SelfSupervisedLearning #DeepLearning #FeatureLearning #PhysicsAwareAI

238 views08:21

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework

📝 Summary:
This paper improves Extreme Multi-label Classification XMC by using larger decoder-only models and introduces ViXML, a vision-enhanced framework. ViXML efficiently integrates visual information, significantly outperforming text-only models and achieving new state-of-the-art.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13189
• PDF: https://arxiv.org/pdf/2511.13189

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#LLM #XMC #MultiModalAI #MachineLearning #AIResearch

247 views09:21

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨A Brain Wave Encodes a Thousand Tokens: Modeling Inter-Cortical Neural Interactions for Effective EEG-based Emotion Recognition

📝 Summary:
RBTransformer, a Transformer-based model, improves EEG-based emotion recognition by modeling inter-cortical neural dynamics. It uses Band Differential Entropy tokens and multi-head attention. This approach significantly outperforms existing state-of-the-art methods on multiple datasets and dimens...

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13954
• PDF: https://arxiv.org/pdf/2511.13954
• Github: https://github.com/nnilayy/RBTransformer

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#EEG #EmotionRecognition #Transformers #Neuroscience #MachineLearning

261 views10:21

✨ Explore Data Science 📝 Write your paper

✨Proactive Hearing Assistants that Isolate Egocentric Conversations

📝 Summary:
A proactive hearing assistant system automatically identifies and isolates the wearers conversation partners from binaural audio. It uses a dual-model AI architecture that adapts to conversational dynamics in real-time, improving speech clarity without user prompts.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11473
• PDF: https://arxiv.org/pdf/2511.11473
• Project Page: https://proactivehearing.cs.washington.edu/
• Github: https://github.com/guilinhu/proactive_hearing_assistant

🔹 Models citing this paper:
• https://huggingface.co/guilinhu/proactive_hearing

✨ Datasets citing this paper:
• https://huggingface.co/datasets/guilinhu/libri_conversation

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#HearingTech #AI #SpeechEnhancement #AssistiveTechnology #AudioProcessing

255 views11:22

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards

📝 Summary:
NORA-1.5, an enhanced vision-language-action model with a flow-matching-based action expert and reward-driven post-training, improves performance and reliability in both simulated and real-world setti...

🔹 Publication Date: Published on Nov 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14659
• PDF: https://arxiv.org/pdf/2511.14659
• Project Page: https://declare-lab.github.io/nora-1.5
• Github: https://github.com/declare-lab/nora-1.5

🔹 Models citing this paper:
• https://huggingface.co/declare-lab/nora-1.5

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

❤1

336 views11:22

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨TopoPerception: A Shortcut-Free Evaluation of Global Visual Perception in Large Vision-Language Models

📝 Summary:
Large Vision-Language Models (LVLMs) typically align visual features from an encoder with a pre-trained Large Language Model (LLM). However, this makes the visual perception module a bottleneck, which...

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11831
• PDF: https://arxiv.org/pdf/2511.11831
• Github: https://github.com/Wenhao-Zhou/TopoPerception

✨ Datasets citing this paper:
• https://huggingface.co/datasets/Wenhao-Zhou/TopoPerception

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

417 views12:23

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨LLM-Powered Fully Automated Chaos Engineering: Towards Enabling Anyone to Build Resilient Software Systems at Low Cost

📝 Summary:
Manual planning and improvement hinder Chaos Engineering adoption. ChaosEater automates the entire Chaos Engineering cycle for Kubernetes using LLMs, handling tasks from requirements to debugging. This enables anyone to build resilient systems quickly and affordably.

🔹 Publication Date: Published on Nov 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07865
• PDF: https://arxiv.org/pdf/2511.07865
• Project Page: https://ntt-dkiku.github.io/chaos-eater/
• Github: https://github.com/ntt-dkiku/chaos-eater

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#ChaosEngineering #LLM #CloudNative #SoftwareResilience #DevOps

❤1

427 views15:24

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨VIDEOP2R: Video Understanding from Perception to Reasoning

📝 Summary:
VideoP2R is a novel reinforcement fine-tuning framework for video understanding. It separately models perception and reasoning processes, using a new CoT dataset and a process-aware RL algorithm. This approach achieves state-of-the-art results on video reasoning benchmarks.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11113v1
• PDF: https://arxiv.org/pdf/2511.11113

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VideoUnderstanding #ReinforcementLearning #AIResearch #ComputerVision #Reasoning

363 views22:24

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks

📝 Summary:
VR-Bench evaluates video models' spatial reasoning using maze-solving tasks. It demonstrates that video models excel in spatial perception and reasoning, outperforming VLMs, and benefit from diverse sampling during inference. These findings show the strong potential of reasoning via video for spa...

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15065
• PDF: https://arxiv.org/pdf/2511.15065
• Project Page: https://imyangc7.github.io/VRBench_Web/
• Github: https://github.com/ImYangC7/VR-Bench

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VideoModels #AIReasoning #SpatialAI #ComputerVision #MachineLearning

❤1

281 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨FreeAskWorld: An Interactive and Closed-Loop Simulator for Human-Centric Embodied AI

📝 Summary:
FreeAskWorld is an interactive simulator using LLMs for human-centric embodied AI with complex social behaviors. It offers a large dataset, improving agent semantic understanding and interaction competency, highlighting interaction as a key information modality.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13524
• PDF: https://arxiv.org/pdf/2511.13524
• Github: https://github.com/AIR-DISCOVER/FreeAskWorld

✨ Datasets citing this paper:
• https://huggingface.co/datasets/Astronaut-PENG/FreeAskWorld
• https://huggingface.co/datasets/Astronaut-PENG/FreeWorld

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#EmbodiedAI #LLMs #AISimulation #HumanAI #AIResearch

273 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MHR: Momentum Human Rig

📝 Summary:
MHR combines ATLASs decoupled skeleton and shape with a modern rig and Momentum-inspired pose correction. This parametric human body model provides expressive, anatomically plausible human animation with non-linear correctives for AR/VR and graphics applications.

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15586
• PDF: https://arxiv.org/pdf/2511.15586

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#ComputerGraphics #3DAnimation #ARVR #HumanModeling #AnimationTech

200 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

📝 Summary:
Kandinsky 5.0 is a family of state-of-the-art foundation models for high-resolution image and video generation. It includes Lite and Pro versions with varying parameters and uses advanced training techniques for superior quality and speed. This publicly available framework aims to advance generat...

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14993
• PDF: https://arxiv.org/pdf/2511.14993
• Project Page: https://kandinskylab.ai/
• Github: https://github.com/kandinskylab/kandinsky-5

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#FoundationModels #ImageGeneration #VideoGeneration #AI #DeepLearning

229 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset

📝 Summary:
Researchers introduce Instruction-Guided Lesion Segmentation ILS for CXRs, allowing diverse lesion segmentation using simple instructions. They developed MIMIC-ILS, a large-scale dataset, and ROSALIA, a vision-language model. ROSALIA accurately segments various lesions and provides textual explan...

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15186
• PDF: https://arxiv.org/pdf/2511.15186

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#MedicalAI #LesionSegmentation #ChestXray #VisionLanguageModel #DeepLearning

238 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨VisPlay: Self-Evolving Vision-Language Models from Images

📝 Summary:
VisPlay is a self-evolving RL framework that improves Vision-Language Models using unlabeled images. It employs interacting Questioner and Reasoner roles, trained with GRPO, to enhance reasoning, generalization, and reduce hallucination. This scalable method achieves consistent improvements.

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15661
• PDF: https://arxiv.org/pdf/2511.15661

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VisionLanguageModels #ReinforcementLearning #ArtificialIntelligence #MachineLearning #SelfEvolvingAI

286 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries

📝 Summary:
ARC-Chapter is a large-scale video chaptering model trained on millions of long video chapters, using a new bilingual and hierarchical dataset. It introduces a novel evaluation metric, GRACE, to better reflect real-world chaptering. The model achieves state-of-the-art performance and demonstrates...

🔹 Publication Date: Published on Nov 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14349
• PDF: https://arxiv.org/pdf/2511.14349
• Project Page: https://arcchapter.github.io/index_en.html
• Github: https://github.com/TencentARC/ARC-Chapter

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VideoChaptering #AI #MachineLearning #VideoSummarization #ComputerVision

285 views06:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Aligning Generative Music AI with Human Preferences: Methods and Challenges

📝 Summary:
This paper proposes applying preference alignment techniques to music AI to better match human preferences. It discusses methods like MusicRL and DiffRhythm+ to address unique challenges such as temporal coherence and harmonic consistency, aiming for improved interactive composition and personali...

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15038
• PDF: https://arxiv.org/pdf/2511.15038

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#GenerativeAI #MusicAI #PreferenceAlignment #AIResearch #ComputationalMusic

❤1

317 views07:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Medal S: Spatio-Textual Prompt Model for Medical Segmentation

📝 Summary:
Medal S is a medical segmentation foundation model using spatio-textual prompts for efficient, high-accuracy multi-class segmentation across diverse modalities. It uniquely aligns volumetric prompts with text embeddings and processes masks in parallel, significantly outperforming prior methods.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13001
• PDF: https://arxiv.org/pdf/2511.13001
• Github: https://github.com/yinghemedical/Medal-S

🔹 Models citing this paper:
• https://huggingface.co/spc819/Medal-S-V1.0

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#MedicalSegmentation #FoundationModels #AI #DeepLearning #ComputerVision

305 views09:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

292 viewsedited 09:03

ML Research Hub

✨OmniParser for Pure Vision Based GUI Agent

📝 Summary:
OmniParser enhances GPT-4V's ability to act as a GUI agent by improving screen parsing. It identifies interactable icons and understands element semantics using specialized models. This significantly boosts GPT-4V's performance on benchmarks like ScreenSpot, Mind2Web, and AITW.

🔹 Publication Date: Published on Aug 1, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2408.00203
• PDF: https://arxiv.org/pdf/2408.00203
• Github: https://github.com/microsoft/omniparser

🔹 Models citing this paper:
• https://huggingface.co/microsoft/OmniParser
• https://huggingface.co/microsoft/OmniParser-v2.0
• https://huggingface.co/banao-tech/OmniParser

✨ Datasets citing this paper:
• https://huggingface.co/datasets/mlfoundations/Click-100k

✨ Spaces citing this paper:
• https://huggingface.co/spaces/callmeumer/OmniParser-v2
• https://huggingface.co/spaces/nofl/OmniParser-v2
• https://huggingface.co/spaces/SheldonLe/OmniParser-v2

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#GUIagents #ComputerVision #GPT4V #AIagents #DeepLearning

arXiv.org

OmniParser for Pure Vision Based GUI Agent

The recent success of large vision language models shows great potential in driving the agent system operating on user interfaces. However, we argue that the power multimodal models like GPT-4V as...

389 views09:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Mixture of States: Routing Token-Level Dynamics for Multimodal Generation

📝 Summary:
MoS is a novel multimodal diffusion model that uses a learnable token-wise router for flexible state-based modality interactions. This achieves state-of-the-art text-to-image generation and editing with minimal parameters and computational overhead.

🔹 Publication Date: Published on Nov 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.12207
• PDF: https://arxiv.org/pdf/2511.12207

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#GenerativeAI #MultimodalAI #DiffusionModels #TextToImage #DeepLearning

455 views12:03

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform