✨SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization
📝 Summary:
SafeGRPO introduces a self-rewarded, rule-governed framework for multimodal safety alignment in MLLMs. It integrates verifiable reward construction and step-guided safety thinking to improve robustness against compositional risks and enhance reasoning stability.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.12982
• PDF: https://arxiv.org/pdf/2511.12982
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MLLMs #AISafety #MultimodalAI #ReinforcementLearning #AIResearch
📝 Summary:
SafeGRPO introduces a self-rewarded, rule-governed framework for multimodal safety alignment in MLLMs. It integrates verifiable reward construction and step-guided safety thinking to improve robustness against compositional risks and enhance reasoning stability.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.12982
• PDF: https://arxiv.org/pdf/2511.12982
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MLLMs #AISafety #MultimodalAI #ReinforcementLearning #AIResearch
✨Error-Driven Scene Editing for 3D Grounding in Large Language Models
📝 Summary:
DEER-3D improves 3D LLM grounding by iteratively editing and retraining models. It diagnoses predicate-level errors, then generates targeted 3D scene edits as counterfactuals to enhance spatial understanding and accuracy.
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14086
• PDF: https://arxiv.org/pdf/2511.14086
• Github: https://github.com/zhangyuejoslin/Deer-3D
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMs #3DGrounding #ComputerVision #DeepLearning #AIResearch
📝 Summary:
DEER-3D improves 3D LLM grounding by iteratively editing and retraining models. It diagnoses predicate-level errors, then generates targeted 3D scene edits as counterfactuals to enhance spatial understanding and accuracy.
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14086
• PDF: https://arxiv.org/pdf/2511.14086
• Github: https://github.com/zhangyuejoslin/Deer-3D
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMs #3DGrounding #ComputerVision #DeepLearning #AIResearch
✨ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning
📝 Summary:
ATLAS is a new, high-difficulty, multidisciplinary benchmark for LLMs, featuring 800 original problems across seven scientific fields. It addresses current benchmark limitations with complex, open-ended answers and aims to differentiate advanced scientific reasoning, serving as a ruler for AGI pr...
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14366
• PDF: https://arxiv.org/pdf/2511.14366
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #AGI #AIResearch #ScientificReasoning #Benchmark
📝 Summary:
ATLAS is a new, high-difficulty, multidisciplinary benchmark for LLMs, featuring 800 original problems across seven scientific fields. It addresses current benchmark limitations with complex, open-ended answers and aims to differentiate advanced scientific reasoning, serving as a ruler for AGI pr...
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14366
• PDF: https://arxiv.org/pdf/2511.14366
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #AGI #AIResearch #ScientificReasoning #Benchmark
✨Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution
📝 Summary:
Orion is a visual agent framework that orchestrates specialized computer vision tools to execute complex visual workflows. It achieves competitive performance on benchmarks and enables autonomous, tool-driven visual reasoning.
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14210
• PDF: https://arxiv.org/pdf/2511.14210
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ComputerVision #AIagents #VisualReasoning #MultimodalAI #DeepLearning
📝 Summary:
Orion is a visual agent framework that orchestrates specialized computer vision tools to execute complex visual workflows. It achieves competitive performance on benchmarks and enables autonomous, tool-driven visual reasoning.
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14210
• PDF: https://arxiv.org/pdf/2511.14210
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ComputerVision #AIagents #VisualReasoning #MultimodalAI #DeepLearning
✨A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space
📝 Summary:
CoTyle introduces code-to-style image generation, creating consistent visual styles from numerical codes. It is the first open-source academic method for this task, using a discrete style codebook and a text-to-image diffusion model for diverse, reproducible styles.
🔹 Publication Date: Published on Nov 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.10555
• PDF: https://arxiv.org/pdf/2511.10555
• Project Page: https://Kwai-Kolors.github.io/CoTyle/
• Github: https://github.com/Kwai-Kolors/CoTyle
✨ Spaces citing this paper:
• https://huggingface.co/spaces/Kwai-Kolors/CoTyle
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageGeneration #DiffusionModels #NeuralStyle #ComputerVision #DeepLearning
📝 Summary:
CoTyle introduces code-to-style image generation, creating consistent visual styles from numerical codes. It is the first open-source academic method for this task, using a discrete style codebook and a text-to-image diffusion model for diverse, reproducible styles.
🔹 Publication Date: Published on Nov 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.10555
• PDF: https://arxiv.org/pdf/2511.10555
• Project Page: https://Kwai-Kolors.github.io/CoTyle/
• Github: https://github.com/Kwai-Kolors/CoTyle
✨ Spaces citing this paper:
• https://huggingface.co/spaces/Kwai-Kolors/CoTyle
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageGeneration #DiffusionModels #NeuralStyle #ComputerVision #DeepLearning
✨MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs
📝 Summary:
MVI-Bench introduces a new benchmark to evaluate Large Vision-Language Models robustness against misleading visual inputs. It utilizes a hierarchical taxonomy and a novel metric to uncover significant vulnerabilities in state-of-the-art LVLMs.
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14159
• PDF: https://arxiv.org/pdf/2511.14159
• Github: https://github.com/chenyil6/MVI-Bench
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LVLMs #ComputerVision #AIrobustness #MachineLearning #AI
📝 Summary:
MVI-Bench introduces a new benchmark to evaluate Large Vision-Language Models robustness against misleading visual inputs. It utilizes a hierarchical taxonomy and a novel metric to uncover significant vulnerabilities in state-of-the-art LVLMs.
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14159
• PDF: https://arxiv.org/pdf/2511.14159
• Github: https://github.com/chenyil6/MVI-Bench
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LVLMs #ComputerVision #AIrobustness #MachineLearning #AI
✨REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding
📝 Summary:
Text-only self-reflection is insufficient for long-form video understanding. REVISOR is a new framework enabling MLLMs to perform multimodal introspective reflection across text and visual modalities. This significantly enhances reasoning for long videos without extra fine-tuning, achieving stron...
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13026
• PDF: https://arxiv.org/pdf/2511.13026
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #VideoUnderstanding #MLLMs #AIResearch #ComputerVision
📝 Summary:
Text-only self-reflection is insufficient for long-form video understanding. REVISOR is a new framework enabling MLLMs to perform multimodal introspective reflection across text and visual modalities. This significantly enhances reasoning for long videos without extra fine-tuning, achieving stron...
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13026
• PDF: https://arxiv.org/pdf/2511.13026
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #VideoUnderstanding #MLLMs #AIResearch #ComputerVision
✨Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
📝 Summary:
This paper clarifies RL for LLM Agents by extending the MDP framework. It introduces Agent-R1, a modular and flexible training framework, demonstrating its effectiveness on Multihop QA tasks.
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14460
• PDF: https://arxiv.org/pdf/2511.14460
• Github: https://github.com/0russwest0/Agent-R1
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMAgents #ReinforcementLearning #AI #DeepLearning #NLP
📝 Summary:
This paper clarifies RL for LLM Agents by extending the MDP framework. It introduces Agent-R1, a modular and flexible training framework, demonstrating its effectiveness on Multihop QA tasks.
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14460
• PDF: https://arxiv.org/pdf/2511.14460
• Github: https://github.com/0russwest0/Agent-R1
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMAgents #ReinforcementLearning #AI #DeepLearning #NLP
✨Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark
📝 Summary:
Current video model benchmarks miss assessing Chain-of-Frames CoF reasoning, crucial for world simulators. Gen-ViRe is a new benchmark that decomposes CoF reasoning into cognitive subtasks, offering the first quantitative assessment. It reveals poor reasoning depth despite impressive visual quali...
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13853
• PDF: https://arxiv.org/pdf/2511.13853
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #WorldSimulators #VisualReasoning #GenerativeAI #Benchmarks
📝 Summary:
Current video model benchmarks miss assessing Chain-of-Frames CoF reasoning, crucial for world simulators. Gen-ViRe is a new benchmark that decomposes CoF reasoning into cognitive subtasks, offering the first quantitative assessment. It reveals poor reasoning depth despite impressive visual quali...
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13853
• PDF: https://arxiv.org/pdf/2511.13853
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #WorldSimulators #VisualReasoning #GenerativeAI #Benchmarks
✨Agent READMEs: An Empirical Study of Context Files for Agentic Coding
📝 Summary:
This study analyzed 2303 agent context files, finding them complex and evolving like config code. Developers prioritize functional details but rarely specify non-functional requirements like security or performance. This suggests a gap in guardrails for agent-written code quality.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.12884
• PDF: https://arxiv.org/pdf/2511.12884
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIAgents #SoftwareEngineering #CodeQuality #LLMs #AIResearch
📝 Summary:
This study analyzed 2303 agent context files, finding them complex and evolving like config code. Developers prioritize functional details but rarely specify non-functional requirements like security or performance. This suggests a gap in guardrails for agent-written code quality.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.12884
• PDF: https://arxiv.org/pdf/2511.12884
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIAgents #SoftwareEngineering #CodeQuality #LLMs #AIResearch
✨UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
📝 Summary:
UniMoE-Audio unifies speech and music generation using a novel Dynamic-Capacity Mixture-of-Experts framework. It addresses data imbalance and task conflicts through a hybrid expert design and a three-stage training, achieving state-of-the-art performance and synergistic cross-domain learning.
🔹 Publication Date: Published on Oct 15
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/unimoe-audio-unified-speech-and-music-generation-with-dynamic-capacity-moe
• PDF: https://arxiv.org/pdf/2510.13344
• Project Page: https://mukioxun.github.io/Uni-MoE-site/home.html
• Github: https://github.com/HITsz-TMG/Uni-MoE/blob/master/UniMoE-Audio
🔹 Models citing this paper:
• https://huggingface.co/HIT-TMG/UniMoE-Audio-Preview
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#SpeechGeneration #MusicGeneration #MixtureOfExperts #GenerativeAI #DeepLearning
📝 Summary:
UniMoE-Audio unifies speech and music generation using a novel Dynamic-Capacity Mixture-of-Experts framework. It addresses data imbalance and task conflicts through a hybrid expert design and a three-stage training, achieving state-of-the-art performance and synergistic cross-domain learning.
🔹 Publication Date: Published on Oct 15
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/unimoe-audio-unified-speech-and-music-generation-with-dynamic-capacity-moe
• PDF: https://arxiv.org/pdf/2510.13344
• Project Page: https://mukioxun.github.io/Uni-MoE-site/home.html
• Github: https://github.com/HITsz-TMG/Uni-MoE/blob/master/UniMoE-Audio
🔹 Models citing this paper:
• https://huggingface.co/HIT-TMG/UniMoE-Audio-Preview
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#SpeechGeneration #MusicGeneration #MixtureOfExperts #GenerativeAI #DeepLearning
✨OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
📝 Summary:
OmniZip is a training-free framework that addresses the computational bottleneck in omnimodal LLMs by dynamically compressing audio-visual tokens. It uses audio retention scores to guide video token pruning, achieving 3.42X inference speedup and 1.4X memory reduction without performance loss.
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14582
• PDF: https://arxiv.org/pdf/2511.14582
• Github: https://github.com/KD-TAO/OmniZip
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#OmnimodalLLM #TokenCompression #LLMs #AI #ModelEfficiency
📝 Summary:
OmniZip is a training-free framework that addresses the computational bottleneck in omnimodal LLMs by dynamically compressing audio-visual tokens. It uses audio retention scores to guide video token pruning, achieving 3.42X inference speedup and 1.4X memory reduction without performance loss.
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14582
• PDF: https://arxiv.org/pdf/2511.14582
• Github: https://github.com/KD-TAO/OmniZip
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#OmnimodalLLM #TokenCompression #LLMs #AI #ModelEfficiency
This media is not supported in your browser
VIEW IN TELEGRAM
✨Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models
📝 Summary:
Think-at-Hard TaH improves LLM reasoning by dynamically refining only hard tokens. It uses a neural decider to identify them and LoRA for focused refinement, boosting performance with minimal overhead.
🔹 Publication Date: Published on Nov 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.08577
• PDF: https://arxiv.org/pdf/2511.08577
• Github: https://github.com/thu-nics/TaH
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #AI #MachineLearning #NaturalLanguageProcessing #Reasoning
📝 Summary:
Think-at-Hard TaH improves LLM reasoning by dynamically refining only hard tokens. It uses a neural decider to identify them and LoRA for focused refinement, boosting performance with minimal overhead.
🔹 Publication Date: Published on Nov 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.08577
• PDF: https://arxiv.org/pdf/2511.08577
• Github: https://github.com/thu-nics/TaH
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #AI #MachineLearning #NaturalLanguageProcessing #Reasoning
✨Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
📝 Summary:
Uni-MoE introduces a sparse Multimodal Mixture of Experts LLM efficiently handling diverse data types. It uses modality-specific encoders and a progressive training strategy, reducing performance bias and improving collaboration across modalities.
🔹 Publication Date: Published on May 18, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2405.11273
• PDF: https://arxiv.org/pdf/2405.11273
• Github: https://github.com/hitsz-tmg/umoe-scaling-unified-multimodal-llms
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #LLMs #MixtureOfExperts #DeepLearning #AIResearch
📝 Summary:
Uni-MoE introduces a sparse Multimodal Mixture of Experts LLM efficiently handling diverse data types. It uses modality-specific encoders and a progressive training strategy, reducing performance bias and improving collaboration across modalities.
🔹 Publication Date: Published on May 18, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2405.11273
• PDF: https://arxiv.org/pdf/2405.11273
• Github: https://github.com/hitsz-tmg/umoe-scaling-unified-multimodal-llms
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #LLMs #MixtureOfExperts #DeepLearning #AIResearch
✨AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models
📝 Summary:
AraLingBench is a human-annotated benchmark evaluating Arabic LLM linguistic competence using expert-designed questions. It reveals models achieve surface proficiency but lack deep understanding, often relying on memorization rather than true comprehension.
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14295
• PDF: https://arxiv.org/pdf/2511.14295
✨ Datasets citing this paper:
• https://huggingface.co/datasets/hammh0a/AraLingBench
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ArabicNLP #LLMEvaluation #AIResearch #LanguageModels #NLPBenchmarking
📝 Summary:
AraLingBench is a human-annotated benchmark evaluating Arabic LLM linguistic competence using expert-designed questions. It reveals models achieve surface proficiency but lack deep understanding, often relying on memorization rather than true comprehension.
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14295
• PDF: https://arxiv.org/pdf/2511.14295
✨ Datasets citing this paper:
• https://huggingface.co/datasets/hammh0a/AraLingBench
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ArabicNLP #LLMEvaluation #AIResearch #LanguageModels #NLPBenchmarking
✨Mitigating Label Length Bias in Large Language Models
📝 Summary:
Large Language Models exhibit a label length bias with multi-token class labels. This paper introduces Normalized Contextual Calibration NCC to mitigate this issue by normalizing and calibrating predictions at the full-label level. NCC significantly improves performance and reliability across div...
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14385
• PDF: https://arxiv.org/pdf/2511.14385
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #AI #NLP #BiasInAI #MachineLearning
📝 Summary:
Large Language Models exhibit a label length bias with multi-token class labels. This paper introduces Normalized Contextual Calibration NCC to mitigate this issue by normalizing and calibrating predictions at the full-label level. NCC significantly improves performance and reliability across div...
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14385
• PDF: https://arxiv.org/pdf/2511.14385
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #AI #NLP #BiasInAI #MachineLearning
✨Φeat: Physically-Grounded Feature Representation
📝 Summary:
Φeat is a new self-supervised visual backbone that captures material identity like reflectance and mesostructure. It learns robust features invariant to external physical factors such as shape and lighting, promoting physics-aware perception.
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11270
• PDF: https://arxiv.org/pdf/2511.11270
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ComputerVision #SelfSupervisedLearning #DeepLearning #FeatureLearning #PhysicsAwareAI
📝 Summary:
Φeat is a new self-supervised visual backbone that captures material identity like reflectance and mesostructure. It learns robust features invariant to external physical factors such as shape and lighting, promoting physics-aware perception.
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11270
• PDF: https://arxiv.org/pdf/2511.11270
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ComputerVision #SelfSupervisedLearning #DeepLearning #FeatureLearning #PhysicsAwareAI
✨Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework
📝 Summary:
This paper improves Extreme Multi-label Classification XMC by using larger decoder-only models and introduces ViXML, a vision-enhanced framework. ViXML efficiently integrates visual information, significantly outperforming text-only models and achieving new state-of-the-art.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13189
• PDF: https://arxiv.org/pdf/2511.13189
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #XMC #MultiModalAI #MachineLearning #AIResearch
📝 Summary:
This paper improves Extreme Multi-label Classification XMC by using larger decoder-only models and introduces ViXML, a vision-enhanced framework. ViXML efficiently integrates visual information, significantly outperforming text-only models and achieving new state-of-the-art.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13189
• PDF: https://arxiv.org/pdf/2511.13189
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #XMC #MultiModalAI #MachineLearning #AIResearch
✨A Brain Wave Encodes a Thousand Tokens: Modeling Inter-Cortical Neural Interactions for Effective EEG-based Emotion Recognition
📝 Summary:
RBTransformer, a Transformer-based model, improves EEG-based emotion recognition by modeling inter-cortical neural dynamics. It uses Band Differential Entropy tokens and multi-head attention. This approach significantly outperforms existing state-of-the-art methods on multiple datasets and dimens...
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13954
• PDF: https://arxiv.org/pdf/2511.13954
• Github: https://github.com/nnilayy/RBTransformer
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#EEG #EmotionRecognition #Transformers #Neuroscience #MachineLearning
📝 Summary:
RBTransformer, a Transformer-based model, improves EEG-based emotion recognition by modeling inter-cortical neural dynamics. It uses Band Differential Entropy tokens and multi-head attention. This approach significantly outperforms existing state-of-the-art methods on multiple datasets and dimens...
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13954
• PDF: https://arxiv.org/pdf/2511.13954
• Github: https://github.com/nnilayy/RBTransformer
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#EEG #EmotionRecognition #Transformers #Neuroscience #MachineLearning
Media is too big
VIEW IN TELEGRAM
✨Proactive Hearing Assistants that Isolate Egocentric Conversations
📝 Summary:
A proactive hearing assistant system automatically identifies and isolates the wearers conversation partners from binaural audio. It uses a dual-model AI architecture that adapts to conversational dynamics in real-time, improving speech clarity without user prompts.
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11473
• PDF: https://arxiv.org/pdf/2511.11473
• Project Page: https://proactivehearing.cs.washington.edu/
• Github: https://github.com/guilinhu/proactive_hearing_assistant
🔹 Models citing this paper:
• https://huggingface.co/guilinhu/proactive_hearing
✨ Datasets citing this paper:
• https://huggingface.co/datasets/guilinhu/libri_conversation
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#HearingTech #AI #SpeechEnhancement #AssistiveTechnology #AudioProcessing
📝 Summary:
A proactive hearing assistant system automatically identifies and isolates the wearers conversation partners from binaural audio. It uses a dual-model AI architecture that adapts to conversational dynamics in real-time, improving speech clarity without user prompts.
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11473
• PDF: https://arxiv.org/pdf/2511.11473
• Project Page: https://proactivehearing.cs.washington.edu/
• Github: https://github.com/guilinhu/proactive_hearing_assistant
🔹 Models citing this paper:
• https://huggingface.co/guilinhu/proactive_hearing
✨ Datasets citing this paper:
• https://huggingface.co/datasets/guilinhu/libri_conversation
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#HearingTech #AI #SpeechEnhancement #AssistiveTechnology #AudioProcessing
✨NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards
📝 Summary:
NORA-1.5, an enhanced vision-language-action model with a flow-matching-based action expert and reward-driven post-training, improves performance and reliability in both simulated and real-world setti...
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14659
• PDF: https://arxiv.org/pdf/2511.14659
• Project Page: https://declare-lab.github.io/nora-1.5
• Github: https://github.com/declare-lab/nora-1.5
🔹 Models citing this paper:
• https://huggingface.co/declare-lab/nora-1.5
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
NORA-1.5, an enhanced vision-language-action model with a flow-matching-based action expert and reward-driven post-training, improves performance and reliability in both simulated and real-world setti...
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14659
• PDF: https://arxiv.org/pdf/2511.14659
• Project Page: https://declare-lab.github.io/nora-1.5
• Github: https://github.com/declare-lab/nora-1.5
🔹 Models citing this paper:
• https://huggingface.co/declare-lab/nora-1.5
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1