✨VGGT: Visual Geometry Grounded Transformer
📝 Summary:
VGGT is a novel feed-forward neural network that efficiently infers multiple key 3D scene attributes from single or multiple views. It outperforms existing specialized models without requiring post-processing, achieving state-of-the-art results across several 3D computer vision tasks. VGGT also s...
🔹 Publication Date: Published on Mar 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2503.11651
• PDF: https://arxiv.org/pdf/2503.11651
• Project Page: https://vgg-t.github.io/
• Github: https://github.com/facebookresearch/vggt
🔹 Models citing this paper:
• https://huggingface.co/facebook/VGGT-1B
• https://huggingface.co/facebook/VGGT-1B-Commercial
✨ Spaces citing this paper:
• https://huggingface.co/spaces/facebook/vggt
• https://huggingface.co/spaces/Pointcept/Concerto
• https://huggingface.co/spaces/HanzhouLiu/Stylos_Demo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#3DComputerVision #Transformers #DeepLearning #ComputerVision #AI
📝 Summary:
VGGT is a novel feed-forward neural network that efficiently infers multiple key 3D scene attributes from single or multiple views. It outperforms existing specialized models without requiring post-processing, achieving state-of-the-art results across several 3D computer vision tasks. VGGT also s...
🔹 Publication Date: Published on Mar 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2503.11651
• PDF: https://arxiv.org/pdf/2503.11651
• Project Page: https://vgg-t.github.io/
• Github: https://github.com/facebookresearch/vggt
🔹 Models citing this paper:
• https://huggingface.co/facebook/VGGT-1B
• https://huggingface.co/facebook/VGGT-1B-Commercial
✨ Spaces citing this paper:
• https://huggingface.co/spaces/facebook/vggt
• https://huggingface.co/spaces/Pointcept/Concerto
• https://huggingface.co/spaces/HanzhouLiu/Stylos_Demo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#3DComputerVision #Transformers #DeepLearning #ComputerVision #AI
arXiv.org
VGGT: Visual Geometry Grounded Transformer
We present VGGT, a feed-forward neural network that directly infers all key 3D attributes of a scene, including camera parameters, point maps, depth maps, and 3D point tracks, from one, a few, or...
✨Real-Time Reasoning Agents in Evolving Environments
📝 Summary:
AI agents struggle with real-time reasoning in dynamic environments, failing to balance logical judgments with timely responses. This paper introduces Real-Time Reasoning Gym and AgileThinker. AgileThinker combines reactive and planning approaches to effectively balance reasoning depth and respon...
🔹 Publication Date: Published on Nov 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04898
• PDF: https://arxiv.org/pdf/2511.04898
• Project Page: https://realtimegym.saltlab.stanford.edu
• Github: https://github.com/SALT-NLP/RealtimeGym
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #RealTimeAI #AutonomousAgents #DynamicEnvironments #MachineLearning
📝 Summary:
AI agents struggle with real-time reasoning in dynamic environments, failing to balance logical judgments with timely responses. This paper introduces Real-Time Reasoning Gym and AgileThinker. AgileThinker combines reactive and planning approaches to effectively balance reasoning depth and respon...
🔹 Publication Date: Published on Nov 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04898
• PDF: https://arxiv.org/pdf/2511.04898
• Project Page: https://realtimegym.saltlab.stanford.edu
• Github: https://github.com/SALT-NLP/RealtimeGym
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #RealTimeAI #AutonomousAgents #DynamicEnvironments #MachineLearning
✨HaluMem: Evaluating Hallucinations in Memory Systems of Agents
📝 Summary:
HaluMem is a new benchmark that evaluates memory hallucinations in AI systems by localizing them to specific stages: extraction, updating, and question answering. It uses large human-AI interaction datasets. Findings show current systems accumulate hallucinations during extraction and updating, w...
🔹 Publication Date: Published on Nov 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.03506
• PDF: https://arxiv.org/pdf/2511.03506
• Github: https://github.com/MemTensor/HaluMem
✨ Datasets citing this paper:
• https://huggingface.co/datasets/IAAR-Shanghai/HaluMem
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIHallucinations #AIAgents #MemorySystems #LLM #AIResearch
📝 Summary:
HaluMem is a new benchmark that evaluates memory hallucinations in AI systems by localizing them to specific stages: extraction, updating, and question answering. It uses large human-AI interaction datasets. Findings show current systems accumulate hallucinations during extraction and updating, w...
🔹 Publication Date: Published on Nov 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.03506
• PDF: https://arxiv.org/pdf/2511.03506
• Github: https://github.com/MemTensor/HaluMem
✨ Datasets citing this paper:
• https://huggingface.co/datasets/IAAR-Shanghai/HaluMem
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIHallucinations #AIAgents #MemorySystems #LLM #AIResearch
✨RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services
📝 Summary:
RedOne 2.0 is an SNS-oriented LLM trained with a progressive, RL-prioritized post-training paradigm for rapid and stable adaptation to social networking challenges. This 4B model significantly improves over a 7B baseline and achieves an 8.74 performance lift from base models with less data, demon...
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07070
• PDF: https://arxiv.org/pdf/2511.07070
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #SocialNetworking #ReinforcementLearning #NLP #DeepLearning
📝 Summary:
RedOne 2.0 is an SNS-oriented LLM trained with a progressive, RL-prioritized post-training paradigm for rapid and stable adaptation to social networking challenges. This 4B model significantly improves over a 7B baseline and achieves an 8.74 performance lift from base models with less data, demon...
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07070
• PDF: https://arxiv.org/pdf/2511.07070
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #SocialNetworking #ReinforcementLearning #NLP #DeepLearning
✨RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization
📝 Summary:
RLoop is a self-improving framework addressing Reinforcement Learning overfitting and generalization issues. It uses iterative policy initialization and Rejection-sampling Fine-Tuning to convert diverse policy variations into robust performance gains, boosting accuracy and mitigating catastrophic...
🔹 Publication Date: Published on Nov 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04285
• PDF: https://arxiv.org/pdf/2511.04285
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #MachineLearning #AI #DeepLearning #Generalization
📝 Summary:
RLoop is a self-improving framework addressing Reinforcement Learning overfitting and generalization issues. It uses iterative policy initialization and Rejection-sampling Fine-Tuning to convert diverse policy variations into robust performance gains, boosting accuracy and mitigating catastrophic...
🔹 Publication Date: Published on Nov 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04285
• PDF: https://arxiv.org/pdf/2511.04285
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #MachineLearning #AI #DeepLearning #Generalization
✨Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs
📝 Summary:
MoE LLMs have suboptimal routers that cause significant performance gaps. Routing Manifold Alignment RoMA aligns routing weights with task embeddings using a regularization term during lightweight finetuning of routers. This improves generalization by encouraging similar samples to share expert c...
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07419
• PDF: https://arxiv.org/pdf/2511.07419
• Github: https://github.com/tianyi-lab/RoMA
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMs #MixtureOfExperts #DeepLearning #AI #MachineLearning
📝 Summary:
MoE LLMs have suboptimal routers that cause significant performance gaps. Routing Manifold Alignment RoMA aligns routing weights with task embeddings using a regularization term during lightweight finetuning of routers. This improves generalization by encouraging similar samples to share expert c...
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07419
• PDF: https://arxiv.org/pdf/2511.07419
• Github: https://github.com/tianyi-lab/RoMA
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMs #MixtureOfExperts #DeepLearning #AI #MachineLearning
This media is not supported in your browser
VIEW IN TELEGRAM
✨DIMO: Diverse 3D Motion Generation for Arbitrary Objects
📝 Summary:
DIMO is a generative AI that creates diverse 3D motions for any object from one image. It extracts motion patterns from video models into a latent space, using neural key point trajectories to drive 3D object models. This enables sampling diverse motions and applications like interpolation.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07409
• PDF: https://arxiv.org/pdf/2511.07409
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DIMO #3DMotion #GenerativeAI #ComputerVision #DeepLearning
📝 Summary:
DIMO is a generative AI that creates diverse 3D motions for any object from one image. It extracts motion patterns from video models into a latent space, using neural key point trajectories to drive 3D object models. This enables sampling diverse motions and applications like interpolation.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07409
• PDF: https://arxiv.org/pdf/2511.07409
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DIMO #3DMotion #GenerativeAI #ComputerVision #DeepLearning
✨IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction
📝 Summary:
IterResearch improves long-horizon reasoning by reformulating it as a Markov Decision Process with strategic workspace reconstruction. This novel paradigm overcomes context suffocation, achieving substantial performance gains and unprecedented interaction scaling, and also serves as an effective ...
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07327
• PDF: https://arxiv.org/pdf/2511.07327
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #AI #MachineLearning #AIagents #MDP
📝 Summary:
IterResearch improves long-horizon reasoning by reformulating it as a Markov Decision Process with strategic workspace reconstruction. This novel paradigm overcomes context suffocation, achieving substantial performance gains and unprecedented interaction scaling, and also serves as an effective ...
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07327
• PDF: https://arxiv.org/pdf/2511.07327
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #AI #MachineLearning #AIagents #MDP
✨MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
📝 Summary:
MVU-Eval is a new comprehensive benchmark for evaluating Multi-Video Understanding in Multimodal Large Language Models. It addresses a critical gap in existing single-video benchmarks and reveals significant performance limitations in current MLLMs for multi-video scenarios.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07250
• PDF: https://arxiv.org/pdf/2511.07250
• Project Page: https://huggingface.co/datasets/MVU-Eval-Team/MVU-Eval-Data
• Github: https://github.com/NJU-LINK/MVU-Eval
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MLLMs #VideoUnderstanding #AI #Benchmarking #ComputerVision
📝 Summary:
MVU-Eval is a new comprehensive benchmark for evaluating Multi-Video Understanding in Multimodal Large Language Models. It addresses a critical gap in existing single-video benchmarks and reveals significant performance limitations in current MLLMs for multi-video scenarios.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07250
• PDF: https://arxiv.org/pdf/2511.07250
• Project Page: https://huggingface.co/datasets/MVU-Eval-Team/MVU-Eval-Data
• Github: https://github.com/NJU-LINK/MVU-Eval
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MLLMs #VideoUnderstanding #AI #Benchmarking #ComputerVision
✨The Station: An Open-World Environment for AI-Driven Discovery
📝 Summary:
The Station is an open-world multi-agent AI environment enabling autonomous scientific discovery. Agents engage in full scientific journeys, achieving state-of-the-art results across diverse benchmarks. This new paradigm fosters emergent behaviors and novel method development, moving beyond rigid...
🔹 Publication Date: Published on Nov 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06309
• PDF: https://arxiv.org/pdf/2511.06309
• Github: https://github.com/dualverse-ai/station
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #MultiAgentSystems #ScientificDiscovery #OpenWorldAI #AutonomousAI
📝 Summary:
The Station is an open-world multi-agent AI environment enabling autonomous scientific discovery. Agents engage in full scientific journeys, achieving state-of-the-art results across diverse benchmarks. This new paradigm fosters emergent behaviors and novel method development, moving beyond rigid...
🔹 Publication Date: Published on Nov 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06309
• PDF: https://arxiv.org/pdf/2511.06309
• Github: https://github.com/dualverse-ai/station
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #MultiAgentSystems #ScientificDiscovery #OpenWorldAI #AutonomousAI
❤1
This media is not supported in your browser
VIEW IN TELEGRAM
✨Robot Learning from a Physical World Model
📝 Summary:
PhysWorld enables robots to learn accurate manipulation from AI-generated videos by integrating video generation with physical world modeling. This approach grounds visual guidance into physically executable actions, eliminating the need for real robot data.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07416
• PDF: https://arxiv.org/pdf/2511.07416
• Project Page: https://pointscoder.github.io/PhysWorld_Web/
• Github: https://github.com/PointsCoder/OpenReal2Sim
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#RobotLearning #Robotics #AI #PhysicalModeling #MachineLearning
📝 Summary:
PhysWorld enables robots to learn accurate manipulation from AI-generated videos by integrating video generation with physical world modeling. This approach grounds visual guidance into physically executable actions, eliminating the need for real robot data.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07416
• PDF: https://arxiv.org/pdf/2511.07416
• Project Page: https://pointscoder.github.io/PhysWorld_Web/
• Github: https://github.com/PointsCoder/OpenReal2Sim
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#RobotLearning #Robotics #AI #PhysicalModeling #MachineLearning
✨DigiData: Training and Evaluating General-Purpose Mobile Control Agents
📝 Summary:
DigiData provides a diverse, high-quality dataset for training mobile control agents with complex goals from app feature exploration. DigiData-Bench offers dynamic AI-powered evaluation protocols, improving agent assessment beyond common metrics.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07413
• PDF: https://arxiv.org/pdf/2511.07413
• Github: https://facebookresearch.github.io/DigiData
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MobileAgents #ArtificialIntelligence #MachineLearning #Datasets #AgentTraining
📝 Summary:
DigiData provides a diverse, high-quality dataset for training mobile control agents with complex goals from app feature exploration. DigiData-Bench offers dynamic AI-powered evaluation protocols, improving agent assessment beyond common metrics.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07413
• PDF: https://arxiv.org/pdf/2511.07413
• Github: https://facebookresearch.github.io/DigiData
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MobileAgents #ArtificialIntelligence #MachineLearning #Datasets #AgentTraining
❤1
✨SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?
📝 Summary:
SWE-fficiency is a new benchmark evaluating how language models optimize real-world software repositories for performance on actual workloads. Agents must identify bottlenecks and generate correct code patches matching expert speedup. Current agents significantly underperform, struggling with loc...
🔹 Publication Date: Published on Nov 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06090
• PDF: https://arxiv.org/pdf/2511.06090
• Project Page: https://swefficiency.com/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #SoftwareOptimization #PerformanceTuning #AIagents #Benchmarking
📝 Summary:
SWE-fficiency is a new benchmark evaluating how language models optimize real-world software repositories for performance on actual workloads. Agents must identify bottlenecks and generate correct code patches matching expert speedup. Current agents significantly underperform, struggling with loc...
🔹 Publication Date: Published on Nov 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06090
• PDF: https://arxiv.org/pdf/2511.06090
• Project Page: https://swefficiency.com/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #SoftwareOptimization #PerformanceTuning #AIagents #Benchmarking
✨LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs
📝 Summary:
LUT-LLM is an FPGA accelerator for LLM inference that leverages on-chip memory to shift computation from arithmetic to memory-based operations via table lookups. This innovative approach achieves 1.66x lower latency than AMD MI210 and 1.72x higher energy efficiency than NVIDIA A100 for a 1.7B LLM.
🔹 Publication Date: Published on Nov 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06174
• PDF: https://arxiv.org/pdf/2511.06174
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #FPGA #AI #DeepLearning #AIHardware
📝 Summary:
LUT-LLM is an FPGA accelerator for LLM inference that leverages on-chip memory to shift computation from arithmetic to memory-based operations via table lookups. This innovative approach achieves 1.66x lower latency than AMD MI210 and 1.72x higher energy efficiency than NVIDIA A100 for a 1.7B LLM.
🔹 Publication Date: Published on Nov 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06174
• PDF: https://arxiv.org/pdf/2511.06174
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #FPGA #AI #DeepLearning #AIHardware
✨DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation
📝 Summary:
This study develops a two-stage reinforcement learning method for competitive code generation. It uses tailored data curation and a hard-focus curriculum, achieving state-of-the-art performance on competitive programming benchmarks.
🔹 Publication Date: Published on Nov 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06307
• PDF: https://arxiv.org/pdf/2511.06307
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #CodeGeneration #DataCuration #MachineLearning #AIResearch
📝 Summary:
This study develops a two-stage reinforcement learning method for competitive code generation. It uses tailored data curation and a hard-focus curriculum, achieving state-of-the-art performance on competitive programming benchmarks.
🔹 Publication Date: Published on Nov 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06307
• PDF: https://arxiv.org/pdf/2511.06307
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #CodeGeneration #DataCuration #MachineLearning #AIResearch
❤1
✨SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization
📝 Summary:
SofT-GRPO is a novel algorithm that enhances soft-thinking in LLMs by integrating Gumbel noise and Gumbel-Softmax. This method successfully reinforces soft-thinking policies, enabling LLMs to outperform discrete-token reinforcement learning approaches, especially on complex tasks.
🔹 Publication Date: Published on Nov 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06411
• PDF: https://arxiv.org/pdf/2511.06411
🔹 Models citing this paper:
• https://huggingface.co/zz1358m/SofT-GRPO-master
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #ReinforcementLearning #AI #MachineLearning #DeepLearning
📝 Summary:
SofT-GRPO is a novel algorithm that enhances soft-thinking in LLMs by integrating Gumbel noise and Gumbel-Softmax. This method successfully reinforces soft-thinking policies, enabling LLMs to outperform discrete-token reinforcement learning approaches, especially on complex tasks.
🔹 Publication Date: Published on Nov 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06411
• PDF: https://arxiv.org/pdf/2511.06411
🔹 Models citing this paper:
• https://huggingface.co/zz1358m/SofT-GRPO-master
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #ReinforcementLearning #AI #MachineLearning #DeepLearning
✨Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models
📝 Summary:
Diffusion-SDPO improves text-to-image quality by fixing a flaw in standard DPO where preferred output error can increase. It uses a safeguarded update to adaptively scale the loser gradient, ensuring the preferred output's error never increases. This leads to consistent quality gains across bench...
🔹 Publication Date: Published on Nov 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.03317
• PDF: https://arxiv.org/pdf/2511.03317
• Github: https://github.com/AIDC-AI/Diffusion-SDPO
🔹 Models citing this paper:
• https://huggingface.co/AIDC-AI/Diffusion-SDPO
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DiffusionModels #DPO #TextToImage #GenerativeAI #AI
📝 Summary:
Diffusion-SDPO improves text-to-image quality by fixing a flaw in standard DPO where preferred output error can increase. It uses a safeguarded update to adaptively scale the loser gradient, ensuring the preferred output's error never increases. This leads to consistent quality gains across bench...
🔹 Publication Date: Published on Nov 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.03317
• PDF: https://arxiv.org/pdf/2511.03317
• Github: https://github.com/AIDC-AI/Diffusion-SDPO
🔹 Models citing this paper:
• https://huggingface.co/AIDC-AI/Diffusion-SDPO
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DiffusionModels #DPO #TextToImage #GenerativeAI #AI
✨VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models
📝 Summary:
VADER is an LLM framework enhancing video anomaly understanding. It integrates keyframe object relations and visual cues to provide detailed, causally grounded denoscriptions and robust question answering, advancing explainable anomaly analysis.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07299
• PDF: https://arxiv.org/pdf/2511.07299
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #VideoAnalytics #AnomalyDetection #Causality #ExplainableAI
📝 Summary:
VADER is an LLM framework enhancing video anomaly understanding. It integrates keyframe object relations and visual cues to provide detailed, causally grounded denoscriptions and robust question answering, advancing explainable anomaly analysis.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07299
• PDF: https://arxiv.org/pdf/2511.07299
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #VideoAnalytics #AnomalyDetection #Causality #ExplainableAI
✨MPJudge: Towards Perceptual Assessment of Music-Induced Paintings
📝 Summary:
MPJudge is a new framework for assessing music-induced paintings. It integrates music features into a visual encoder using a modulation-based fusion mechanism, outperforming existing emotion models by directly modeling perceptual coherence. It also identifies music-relevant regions better.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07137
• PDF: https://arxiv.org/pdf/2511.07137
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MusicAndArt #ComputerVision #MachineLearning #DeepLearning #MultimodalAI
📝 Summary:
MPJudge is a new framework for assessing music-induced paintings. It integrates music features into a visual encoder using a modulation-based fusion mechanism, outperforming existing emotion models by directly modeling perceptual coherence. It also identifies music-relevant regions better.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07137
• PDF: https://arxiv.org/pdf/2511.07137
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MusicAndArt #ComputerVision #MachineLearning #DeepLearning #MultimodalAI
❤1
✨Do LLMs Feel? Teaching Emotion Recognition with Prompts, Retrieval, and Curriculum Learning
📝 Summary:
PRC-Emo is a new framework that significantly improves LLMs' emotion recognition in conversations. It combines prompt engineering, demonstration retrieval, and curriculum learning, achieving state-of-the-art results on benchmark datasets.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07061
• PDF: https://arxiv.org/pdf/2511.07061
• Github: https://github.com/LiXinran6/PRC-Emo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #EmotionRecognition #NLP #AIResearch #MachineLearning
📝 Summary:
PRC-Emo is a new framework that significantly improves LLMs' emotion recognition in conversations. It combines prompt engineering, demonstration retrieval, and curriculum learning, achieving state-of-the-art results on benchmark datasets.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07061
• PDF: https://arxiv.org/pdf/2511.07061
• Github: https://github.com/LiXinran6/PRC-Emo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #EmotionRecognition #NLP #AIResearch #MachineLearning