✨Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark
📝 Summary:
Current video model benchmarks miss assessing Chain-of-Frames CoF reasoning, crucial for world simulators. Gen-ViRe is a new benchmark that decomposes CoF reasoning into cognitive subtasks, offering the first quantitative assessment. It reveals poor reasoning depth despite impressive visual quali...
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13853
• PDF: https://arxiv.org/pdf/2511.13853
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #WorldSimulators #VisualReasoning #GenerativeAI #Benchmarks
📝 Summary:
Current video model benchmarks miss assessing Chain-of-Frames CoF reasoning, crucial for world simulators. Gen-ViRe is a new benchmark that decomposes CoF reasoning into cognitive subtasks, offering the first quantitative assessment. It reveals poor reasoning depth despite impressive visual quali...
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13853
• PDF: https://arxiv.org/pdf/2511.13853
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #WorldSimulators #VisualReasoning #GenerativeAI #Benchmarks
✨Agent READMEs: An Empirical Study of Context Files for Agentic Coding
📝 Summary:
This study analyzed 2303 agent context files, finding them complex and evolving like config code. Developers prioritize functional details but rarely specify non-functional requirements like security or performance. This suggests a gap in guardrails for agent-written code quality.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.12884
• PDF: https://arxiv.org/pdf/2511.12884
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIAgents #SoftwareEngineering #CodeQuality #LLMs #AIResearch
📝 Summary:
This study analyzed 2303 agent context files, finding them complex and evolving like config code. Developers prioritize functional details but rarely specify non-functional requirements like security or performance. This suggests a gap in guardrails for agent-written code quality.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.12884
• PDF: https://arxiv.org/pdf/2511.12884
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIAgents #SoftwareEngineering #CodeQuality #LLMs #AIResearch
✨UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
📝 Summary:
UniMoE-Audio unifies speech and music generation using a novel Dynamic-Capacity Mixture-of-Experts framework. It addresses data imbalance and task conflicts through a hybrid expert design and a three-stage training, achieving state-of-the-art performance and synergistic cross-domain learning.
🔹 Publication Date: Published on Oct 15
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/unimoe-audio-unified-speech-and-music-generation-with-dynamic-capacity-moe
• PDF: https://arxiv.org/pdf/2510.13344
• Project Page: https://mukioxun.github.io/Uni-MoE-site/home.html
• Github: https://github.com/HITsz-TMG/Uni-MoE/blob/master/UniMoE-Audio
🔹 Models citing this paper:
• https://huggingface.co/HIT-TMG/UniMoE-Audio-Preview
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#SpeechGeneration #MusicGeneration #MixtureOfExperts #GenerativeAI #DeepLearning
📝 Summary:
UniMoE-Audio unifies speech and music generation using a novel Dynamic-Capacity Mixture-of-Experts framework. It addresses data imbalance and task conflicts through a hybrid expert design and a three-stage training, achieving state-of-the-art performance and synergistic cross-domain learning.
🔹 Publication Date: Published on Oct 15
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/unimoe-audio-unified-speech-and-music-generation-with-dynamic-capacity-moe
• PDF: https://arxiv.org/pdf/2510.13344
• Project Page: https://mukioxun.github.io/Uni-MoE-site/home.html
• Github: https://github.com/HITsz-TMG/Uni-MoE/blob/master/UniMoE-Audio
🔹 Models citing this paper:
• https://huggingface.co/HIT-TMG/UniMoE-Audio-Preview
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#SpeechGeneration #MusicGeneration #MixtureOfExperts #GenerativeAI #DeepLearning
✨OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
📝 Summary:
OmniZip is a training-free framework that addresses the computational bottleneck in omnimodal LLMs by dynamically compressing audio-visual tokens. It uses audio retention scores to guide video token pruning, achieving 3.42X inference speedup and 1.4X memory reduction without performance loss.
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14582
• PDF: https://arxiv.org/pdf/2511.14582
• Github: https://github.com/KD-TAO/OmniZip
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#OmnimodalLLM #TokenCompression #LLMs #AI #ModelEfficiency
📝 Summary:
OmniZip is a training-free framework that addresses the computational bottleneck in omnimodal LLMs by dynamically compressing audio-visual tokens. It uses audio retention scores to guide video token pruning, achieving 3.42X inference speedup and 1.4X memory reduction without performance loss.
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14582
• PDF: https://arxiv.org/pdf/2511.14582
• Github: https://github.com/KD-TAO/OmniZip
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#OmnimodalLLM #TokenCompression #LLMs #AI #ModelEfficiency
This media is not supported in your browser
VIEW IN TELEGRAM
✨Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models
📝 Summary:
Think-at-Hard TaH improves LLM reasoning by dynamically refining only hard tokens. It uses a neural decider to identify them and LoRA for focused refinement, boosting performance with minimal overhead.
🔹 Publication Date: Published on Nov 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.08577
• PDF: https://arxiv.org/pdf/2511.08577
• Github: https://github.com/thu-nics/TaH
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #AI #MachineLearning #NaturalLanguageProcessing #Reasoning
📝 Summary:
Think-at-Hard TaH improves LLM reasoning by dynamically refining only hard tokens. It uses a neural decider to identify them and LoRA for focused refinement, boosting performance with minimal overhead.
🔹 Publication Date: Published on Nov 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.08577
• PDF: https://arxiv.org/pdf/2511.08577
• Github: https://github.com/thu-nics/TaH
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #AI #MachineLearning #NaturalLanguageProcessing #Reasoning
✨Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
📝 Summary:
Uni-MoE introduces a sparse Multimodal Mixture of Experts LLM efficiently handling diverse data types. It uses modality-specific encoders and a progressive training strategy, reducing performance bias and improving collaboration across modalities.
🔹 Publication Date: Published on May 18, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2405.11273
• PDF: https://arxiv.org/pdf/2405.11273
• Github: https://github.com/hitsz-tmg/umoe-scaling-unified-multimodal-llms
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #LLMs #MixtureOfExperts #DeepLearning #AIResearch
📝 Summary:
Uni-MoE introduces a sparse Multimodal Mixture of Experts LLM efficiently handling diverse data types. It uses modality-specific encoders and a progressive training strategy, reducing performance bias and improving collaboration across modalities.
🔹 Publication Date: Published on May 18, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2405.11273
• PDF: https://arxiv.org/pdf/2405.11273
• Github: https://github.com/hitsz-tmg/umoe-scaling-unified-multimodal-llms
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #LLMs #MixtureOfExperts #DeepLearning #AIResearch
✨AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models
📝 Summary:
AraLingBench is a human-annotated benchmark evaluating Arabic LLM linguistic competence using expert-designed questions. It reveals models achieve surface proficiency but lack deep understanding, often relying on memorization rather than true comprehension.
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14295
• PDF: https://arxiv.org/pdf/2511.14295
✨ Datasets citing this paper:
• https://huggingface.co/datasets/hammh0a/AraLingBench
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ArabicNLP #LLMEvaluation #AIResearch #LanguageModels #NLPBenchmarking
📝 Summary:
AraLingBench is a human-annotated benchmark evaluating Arabic LLM linguistic competence using expert-designed questions. It reveals models achieve surface proficiency but lack deep understanding, often relying on memorization rather than true comprehension.
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14295
• PDF: https://arxiv.org/pdf/2511.14295
✨ Datasets citing this paper:
• https://huggingface.co/datasets/hammh0a/AraLingBench
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ArabicNLP #LLMEvaluation #AIResearch #LanguageModels #NLPBenchmarking
✨Mitigating Label Length Bias in Large Language Models
📝 Summary:
Large Language Models exhibit a label length bias with multi-token class labels. This paper introduces Normalized Contextual Calibration NCC to mitigate this issue by normalizing and calibrating predictions at the full-label level. NCC significantly improves performance and reliability across div...
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14385
• PDF: https://arxiv.org/pdf/2511.14385
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #AI #NLP #BiasInAI #MachineLearning
📝 Summary:
Large Language Models exhibit a label length bias with multi-token class labels. This paper introduces Normalized Contextual Calibration NCC to mitigate this issue by normalizing and calibrating predictions at the full-label level. NCC significantly improves performance and reliability across div...
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14385
• PDF: https://arxiv.org/pdf/2511.14385
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #AI #NLP #BiasInAI #MachineLearning
✨Φeat: Physically-Grounded Feature Representation
📝 Summary:
Φeat is a new self-supervised visual backbone that captures material identity like reflectance and mesostructure. It learns robust features invariant to external physical factors such as shape and lighting, promoting physics-aware perception.
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11270
• PDF: https://arxiv.org/pdf/2511.11270
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ComputerVision #SelfSupervisedLearning #DeepLearning #FeatureLearning #PhysicsAwareAI
📝 Summary:
Φeat is a new self-supervised visual backbone that captures material identity like reflectance and mesostructure. It learns robust features invariant to external physical factors such as shape and lighting, promoting physics-aware perception.
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11270
• PDF: https://arxiv.org/pdf/2511.11270
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ComputerVision #SelfSupervisedLearning #DeepLearning #FeatureLearning #PhysicsAwareAI
✨Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework
📝 Summary:
This paper improves Extreme Multi-label Classification XMC by using larger decoder-only models and introduces ViXML, a vision-enhanced framework. ViXML efficiently integrates visual information, significantly outperforming text-only models and achieving new state-of-the-art.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13189
• PDF: https://arxiv.org/pdf/2511.13189
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #XMC #MultiModalAI #MachineLearning #AIResearch
📝 Summary:
This paper improves Extreme Multi-label Classification XMC by using larger decoder-only models and introduces ViXML, a vision-enhanced framework. ViXML efficiently integrates visual information, significantly outperforming text-only models and achieving new state-of-the-art.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13189
• PDF: https://arxiv.org/pdf/2511.13189
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #XMC #MultiModalAI #MachineLearning #AIResearch
✨A Brain Wave Encodes a Thousand Tokens: Modeling Inter-Cortical Neural Interactions for Effective EEG-based Emotion Recognition
📝 Summary:
RBTransformer, a Transformer-based model, improves EEG-based emotion recognition by modeling inter-cortical neural dynamics. It uses Band Differential Entropy tokens and multi-head attention. This approach significantly outperforms existing state-of-the-art methods on multiple datasets and dimens...
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13954
• PDF: https://arxiv.org/pdf/2511.13954
• Github: https://github.com/nnilayy/RBTransformer
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#EEG #EmotionRecognition #Transformers #Neuroscience #MachineLearning
📝 Summary:
RBTransformer, a Transformer-based model, improves EEG-based emotion recognition by modeling inter-cortical neural dynamics. It uses Band Differential Entropy tokens and multi-head attention. This approach significantly outperforms existing state-of-the-art methods on multiple datasets and dimens...
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13954
• PDF: https://arxiv.org/pdf/2511.13954
• Github: https://github.com/nnilayy/RBTransformer
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#EEG #EmotionRecognition #Transformers #Neuroscience #MachineLearning
Media is too big
VIEW IN TELEGRAM
✨Proactive Hearing Assistants that Isolate Egocentric Conversations
📝 Summary:
A proactive hearing assistant system automatically identifies and isolates the wearers conversation partners from binaural audio. It uses a dual-model AI architecture that adapts to conversational dynamics in real-time, improving speech clarity without user prompts.
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11473
• PDF: https://arxiv.org/pdf/2511.11473
• Project Page: https://proactivehearing.cs.washington.edu/
• Github: https://github.com/guilinhu/proactive_hearing_assistant
🔹 Models citing this paper:
• https://huggingface.co/guilinhu/proactive_hearing
✨ Datasets citing this paper:
• https://huggingface.co/datasets/guilinhu/libri_conversation
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#HearingTech #AI #SpeechEnhancement #AssistiveTechnology #AudioProcessing
📝 Summary:
A proactive hearing assistant system automatically identifies and isolates the wearers conversation partners from binaural audio. It uses a dual-model AI architecture that adapts to conversational dynamics in real-time, improving speech clarity without user prompts.
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11473
• PDF: https://arxiv.org/pdf/2511.11473
• Project Page: https://proactivehearing.cs.washington.edu/
• Github: https://github.com/guilinhu/proactive_hearing_assistant
🔹 Models citing this paper:
• https://huggingface.co/guilinhu/proactive_hearing
✨ Datasets citing this paper:
• https://huggingface.co/datasets/guilinhu/libri_conversation
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#HearingTech #AI #SpeechEnhancement #AssistiveTechnology #AudioProcessing
✨NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards
📝 Summary:
NORA-1.5, an enhanced vision-language-action model with a flow-matching-based action expert and reward-driven post-training, improves performance and reliability in both simulated and real-world setti...
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14659
• PDF: https://arxiv.org/pdf/2511.14659
• Project Page: https://declare-lab.github.io/nora-1.5
• Github: https://github.com/declare-lab/nora-1.5
🔹 Models citing this paper:
• https://huggingface.co/declare-lab/nora-1.5
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
NORA-1.5, an enhanced vision-language-action model with a flow-matching-based action expert and reward-driven post-training, improves performance and reliability in both simulated and real-world setti...
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14659
• PDF: https://arxiv.org/pdf/2511.14659
• Project Page: https://declare-lab.github.io/nora-1.5
• Github: https://github.com/declare-lab/nora-1.5
🔹 Models citing this paper:
• https://huggingface.co/declare-lab/nora-1.5
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
✨TopoPerception: A Shortcut-Free Evaluation of Global Visual Perception in Large Vision-Language Models
📝 Summary:
Large Vision-Language Models (LVLMs) typically align visual features from an encoder with a pre-trained Large Language Model (LLM). However, this makes the visual perception module a bottleneck, which...
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11831
• PDF: https://arxiv.org/pdf/2511.11831
• Github: https://github.com/Wenhao-Zhou/TopoPerception
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Wenhao-Zhou/TopoPerception
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Large Vision-Language Models (LVLMs) typically align visual features from an encoder with a pre-trained Large Language Model (LLM). However, this makes the visual perception module a bottleneck, which...
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11831
• PDF: https://arxiv.org/pdf/2511.11831
• Github: https://github.com/Wenhao-Zhou/TopoPerception
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Wenhao-Zhou/TopoPerception
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨LLM-Powered Fully Automated Chaos Engineering: Towards Enabling Anyone to Build Resilient Software Systems at Low Cost
📝 Summary:
Manual planning and improvement hinder Chaos Engineering adoption. ChaosEater automates the entire Chaos Engineering cycle for Kubernetes using LLMs, handling tasks from requirements to debugging. This enables anyone to build resilient systems quickly and affordably.
🔹 Publication Date: Published on Nov 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07865
• PDF: https://arxiv.org/pdf/2511.07865
• Project Page: https://ntt-dkiku.github.io/chaos-eater/
• Github: https://github.com/ntt-dkiku/chaos-eater
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ChaosEngineering #LLM #CloudNative #SoftwareResilience #DevOps
📝 Summary:
Manual planning and improvement hinder Chaos Engineering adoption. ChaosEater automates the entire Chaos Engineering cycle for Kubernetes using LLMs, handling tasks from requirements to debugging. This enables anyone to build resilient systems quickly and affordably.
🔹 Publication Date: Published on Nov 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07865
• PDF: https://arxiv.org/pdf/2511.07865
• Project Page: https://ntt-dkiku.github.io/chaos-eater/
• Github: https://github.com/ntt-dkiku/chaos-eater
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ChaosEngineering #LLM #CloudNative #SoftwareResilience #DevOps
❤1
✨VIDEOP2R: Video Understanding from Perception to Reasoning
📝 Summary:
VideoP2R is a novel reinforcement fine-tuning framework for video understanding. It separately models perception and reasoning processes, using a new CoT dataset and a process-aware RL algorithm. This approach achieves state-of-the-art results on video reasoning benchmarks.
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11113v1
• PDF: https://arxiv.org/pdf/2511.11113
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoUnderstanding #ReinforcementLearning #AIResearch #ComputerVision #Reasoning
📝 Summary:
VideoP2R is a novel reinforcement fine-tuning framework for video understanding. It separately models perception and reasoning processes, using a new CoT dataset and a process-aware RL algorithm. This approach achieves state-of-the-art results on video reasoning benchmarks.
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11113v1
• PDF: https://arxiv.org/pdf/2511.11113
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoUnderstanding #ReinforcementLearning #AIResearch #ComputerVision #Reasoning
✨Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
📝 Summary:
VR-Bench evaluates video models' spatial reasoning using maze-solving tasks. It demonstrates that video models excel in spatial perception and reasoning, outperforming VLMs, and benefit from diverse sampling during inference. These findings show the strong potential of reasoning via video for spa...
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15065
• PDF: https://arxiv.org/pdf/2511.15065
• Project Page: https://imyangc7.github.io/VRBench_Web/
• Github: https://github.com/ImYangC7/VR-Bench
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoModels #AIReasoning #SpatialAI #ComputerVision #MachineLearning
📝 Summary:
VR-Bench evaluates video models' spatial reasoning using maze-solving tasks. It demonstrates that video models excel in spatial perception and reasoning, outperforming VLMs, and benefit from diverse sampling during inference. These findings show the strong potential of reasoning via video for spa...
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15065
• PDF: https://arxiv.org/pdf/2511.15065
• Project Page: https://imyangc7.github.io/VRBench_Web/
• Github: https://github.com/ImYangC7/VR-Bench
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoModels #AIReasoning #SpatialAI #ComputerVision #MachineLearning
❤1
✨FreeAskWorld: An Interactive and Closed-Loop Simulator for Human-Centric Embodied AI
📝 Summary:
FreeAskWorld is an interactive simulator using LLMs for human-centric embodied AI with complex social behaviors. It offers a large dataset, improving agent semantic understanding and interaction competency, highlighting interaction as a key information modality.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13524
• PDF: https://arxiv.org/pdf/2511.13524
• Github: https://github.com/AIR-DISCOVER/FreeAskWorld
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Astronaut-PENG/FreeAskWorld
• https://huggingface.co/datasets/Astronaut-PENG/FreeWorld
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#EmbodiedAI #LLMs #AISimulation #HumanAI #AIResearch
📝 Summary:
FreeAskWorld is an interactive simulator using LLMs for human-centric embodied AI with complex social behaviors. It offers a large dataset, improving agent semantic understanding and interaction competency, highlighting interaction as a key information modality.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13524
• PDF: https://arxiv.org/pdf/2511.13524
• Github: https://github.com/AIR-DISCOVER/FreeAskWorld
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Astronaut-PENG/FreeAskWorld
• https://huggingface.co/datasets/Astronaut-PENG/FreeWorld
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#EmbodiedAI #LLMs #AISimulation #HumanAI #AIResearch
✨MHR: Momentum Human Rig
📝 Summary:
MHR combines ATLASs decoupled skeleton and shape with a modern rig and Momentum-inspired pose correction. This parametric human body model provides expressive, anatomically plausible human animation with non-linear correctives for AR/VR and graphics applications.
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15586
• PDF: https://arxiv.org/pdf/2511.15586
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ComputerGraphics #3DAnimation #ARVR #HumanModeling #AnimationTech
📝 Summary:
MHR combines ATLASs decoupled skeleton and shape with a modern rig and Momentum-inspired pose correction. This parametric human body model provides expressive, anatomically plausible human animation with non-linear correctives for AR/VR and graphics applications.
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15586
• PDF: https://arxiv.org/pdf/2511.15586
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ComputerGraphics #3DAnimation #ARVR #HumanModeling #AnimationTech
✨Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
📝 Summary:
Kandinsky 5.0 is a family of state-of-the-art foundation models for high-resolution image and video generation. It includes Lite and Pro versions with varying parameters and uses advanced training techniques for superior quality and speed. This publicly available framework aims to advance generat...
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14993
• PDF: https://arxiv.org/pdf/2511.14993
• Project Page: https://kandinskylab.ai/
• Github: https://github.com/kandinskylab/kandinsky-5
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#FoundationModels #ImageGeneration #VideoGeneration #AI #DeepLearning
📝 Summary:
Kandinsky 5.0 is a family of state-of-the-art foundation models for high-resolution image and video generation. It includes Lite and Pro versions with varying parameters and uses advanced training techniques for superior quality and speed. This publicly available framework aims to advance generat...
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14993
• PDF: https://arxiv.org/pdf/2511.14993
• Project Page: https://kandinskylab.ai/
• Github: https://github.com/kandinskylab/kandinsky-5
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#FoundationModels #ImageGeneration #VideoGeneration #AI #DeepLearning
✨Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset
📝 Summary:
Researchers introduce Instruction-Guided Lesion Segmentation ILS for CXRs, allowing diverse lesion segmentation using simple instructions. They developed MIMIC-ILS, a large-scale dataset, and ROSALIA, a vision-language model. ROSALIA accurately segments various lesions and provides textual explan...
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15186
• PDF: https://arxiv.org/pdf/2511.15186
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MedicalAI #LesionSegmentation #ChestXray #VisionLanguageModel #DeepLearning
📝 Summary:
Researchers introduce Instruction-Guided Lesion Segmentation ILS for CXRs, allowing diverse lesion segmentation using simple instructions. They developed MIMIC-ILS, a large-scale dataset, and ROSALIA, a vision-language model. ROSALIA accurately segments various lesions and provides textual explan...
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15186
• PDF: https://arxiv.org/pdf/2511.15186
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MedicalAI #LesionSegmentation #ChestXray #VisionLanguageModel #DeepLearning