✨CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?
📝 Summary:
CritiCal, a novel training method using natural language critiques, significantly improves LLM confidence calibration. This method outperforms other approaches, including GPT-4o, enhancing reliability and generalization across tasks.
🔹 Publication Date: Published on Oct 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.24505
• PDF: https://arxiv.org/pdf/2510.24505
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #ConfidenceCalibration #MachineLearning #NLP #AIResearch
📝 Summary:
CritiCal, a novel training method using natural language critiques, significantly improves LLM confidence calibration. This method outperforms other approaches, including GPT-4o, enhancing reliability and generalization across tasks.
🔹 Publication Date: Published on Oct 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.24505
• PDF: https://arxiv.org/pdf/2510.24505
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #ConfidenceCalibration #MachineLearning #NLP #AIResearch
✨HAFixAgent: History-Aware Automated Program Repair Agent
📝 Summary:
HAFixAgent enhances automated program repair for complex multi-hunk bugs by incorporating repository history. It significantly improves bug-fixing effectiveness over existing agent-based systems while maintaining efficiency. This offers a practical approach for history-aware agentic APR.
🔹 Publication Date: Published on Nov 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01047
• PDF: https://arxiv.org/pdf/2511.01047
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AutomatedProgramRepair #SoftwareEngineering #AI #BugFixing #CodeRepair
📝 Summary:
HAFixAgent enhances automated program repair for complex multi-hunk bugs by incorporating repository history. It significantly improves bug-fixing effectiveness over existing agent-based systems while maintaining efficiency. This offers a practical approach for history-aware agentic APR.
🔹 Publication Date: Published on Nov 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01047
• PDF: https://arxiv.org/pdf/2511.01047
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AutomatedProgramRepair #SoftwareEngineering #AI #BugFixing #CodeRepair
✨VeriCoT: Neuro-symbolic Chain-of-Thought Validation via Logical Consistency Checks
📝 Summary:
VeriCoT is a neuro-symbolic method to validate LLM Chain-of-Thought reasoning. It formalizes CoT steps into first-order logic for automated verification of consistency. This improves LLM reliability by identifying flawed reasoning and enhancing overall accuracy.
🔹 Publication Date: Published on Nov 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2511.04662
• PDF: https://arxiv.org/pdf/2511.04662
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #ChainOfThought #NeuroSymbolic #AI #Logic
📝 Summary:
VeriCoT is a neuro-symbolic method to validate LLM Chain-of-Thought reasoning. It formalizes CoT steps into first-order logic for automated verification of consistency. This improves LLM reliability by identifying flawed reasoning and enhancing overall accuracy.
🔹 Publication Date: Published on Nov 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2511.04662
• PDF: https://arxiv.org/pdf/2511.04662
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #ChainOfThought #NeuroSymbolic #AI #Logic
✨VGGT: Visual Geometry Grounded Transformer
📝 Summary:
VGGT is a novel feed-forward neural network that efficiently infers multiple key 3D scene attributes from single or multiple views. It outperforms existing specialized models without requiring post-processing, achieving state-of-the-art results across several 3D computer vision tasks. VGGT also s...
🔹 Publication Date: Published on Mar 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2503.11651
• PDF: https://arxiv.org/pdf/2503.11651
• Project Page: https://vgg-t.github.io/
• Github: https://github.com/facebookresearch/vggt
🔹 Models citing this paper:
• https://huggingface.co/facebook/VGGT-1B
• https://huggingface.co/facebook/VGGT-1B-Commercial
✨ Spaces citing this paper:
• https://huggingface.co/spaces/facebook/vggt
• https://huggingface.co/spaces/Pointcept/Concerto
• https://huggingface.co/spaces/HanzhouLiu/Stylos_Demo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#3DComputerVision #Transformers #DeepLearning #ComputerVision #AI
📝 Summary:
VGGT is a novel feed-forward neural network that efficiently infers multiple key 3D scene attributes from single or multiple views. It outperforms existing specialized models without requiring post-processing, achieving state-of-the-art results across several 3D computer vision tasks. VGGT also s...
🔹 Publication Date: Published on Mar 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2503.11651
• PDF: https://arxiv.org/pdf/2503.11651
• Project Page: https://vgg-t.github.io/
• Github: https://github.com/facebookresearch/vggt
🔹 Models citing this paper:
• https://huggingface.co/facebook/VGGT-1B
• https://huggingface.co/facebook/VGGT-1B-Commercial
✨ Spaces citing this paper:
• https://huggingface.co/spaces/facebook/vggt
• https://huggingface.co/spaces/Pointcept/Concerto
• https://huggingface.co/spaces/HanzhouLiu/Stylos_Demo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#3DComputerVision #Transformers #DeepLearning #ComputerVision #AI
arXiv.org
VGGT: Visual Geometry Grounded Transformer
We present VGGT, a feed-forward neural network that directly infers all key 3D attributes of a scene, including camera parameters, point maps, depth maps, and 3D point tracks, from one, a few, or...
✨Real-Time Reasoning Agents in Evolving Environments
📝 Summary:
AI agents struggle with real-time reasoning in dynamic environments, failing to balance logical judgments with timely responses. This paper introduces Real-Time Reasoning Gym and AgileThinker. AgileThinker combines reactive and planning approaches to effectively balance reasoning depth and respon...
🔹 Publication Date: Published on Nov 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04898
• PDF: https://arxiv.org/pdf/2511.04898
• Project Page: https://realtimegym.saltlab.stanford.edu
• Github: https://github.com/SALT-NLP/RealtimeGym
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #RealTimeAI #AutonomousAgents #DynamicEnvironments #MachineLearning
📝 Summary:
AI agents struggle with real-time reasoning in dynamic environments, failing to balance logical judgments with timely responses. This paper introduces Real-Time Reasoning Gym and AgileThinker. AgileThinker combines reactive and planning approaches to effectively balance reasoning depth and respon...
🔹 Publication Date: Published on Nov 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04898
• PDF: https://arxiv.org/pdf/2511.04898
• Project Page: https://realtimegym.saltlab.stanford.edu
• Github: https://github.com/SALT-NLP/RealtimeGym
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #RealTimeAI #AutonomousAgents #DynamicEnvironments #MachineLearning
✨HaluMem: Evaluating Hallucinations in Memory Systems of Agents
📝 Summary:
HaluMem is a new benchmark that evaluates memory hallucinations in AI systems by localizing them to specific stages: extraction, updating, and question answering. It uses large human-AI interaction datasets. Findings show current systems accumulate hallucinations during extraction and updating, w...
🔹 Publication Date: Published on Nov 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.03506
• PDF: https://arxiv.org/pdf/2511.03506
• Github: https://github.com/MemTensor/HaluMem
✨ Datasets citing this paper:
• https://huggingface.co/datasets/IAAR-Shanghai/HaluMem
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIHallucinations #AIAgents #MemorySystems #LLM #AIResearch
📝 Summary:
HaluMem is a new benchmark that evaluates memory hallucinations in AI systems by localizing them to specific stages: extraction, updating, and question answering. It uses large human-AI interaction datasets. Findings show current systems accumulate hallucinations during extraction and updating, w...
🔹 Publication Date: Published on Nov 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.03506
• PDF: https://arxiv.org/pdf/2511.03506
• Github: https://github.com/MemTensor/HaluMem
✨ Datasets citing this paper:
• https://huggingface.co/datasets/IAAR-Shanghai/HaluMem
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIHallucinations #AIAgents #MemorySystems #LLM #AIResearch
✨RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services
📝 Summary:
RedOne 2.0 is an SNS-oriented LLM trained with a progressive, RL-prioritized post-training paradigm for rapid and stable adaptation to social networking challenges. This 4B model significantly improves over a 7B baseline and achieves an 8.74 performance lift from base models with less data, demon...
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07070
• PDF: https://arxiv.org/pdf/2511.07070
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #SocialNetworking #ReinforcementLearning #NLP #DeepLearning
📝 Summary:
RedOne 2.0 is an SNS-oriented LLM trained with a progressive, RL-prioritized post-training paradigm for rapid and stable adaptation to social networking challenges. This 4B model significantly improves over a 7B baseline and achieves an 8.74 performance lift from base models with less data, demon...
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07070
• PDF: https://arxiv.org/pdf/2511.07070
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #SocialNetworking #ReinforcementLearning #NLP #DeepLearning
✨RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization
📝 Summary:
RLoop is a self-improving framework addressing Reinforcement Learning overfitting and generalization issues. It uses iterative policy initialization and Rejection-sampling Fine-Tuning to convert diverse policy variations into robust performance gains, boosting accuracy and mitigating catastrophic...
🔹 Publication Date: Published on Nov 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04285
• PDF: https://arxiv.org/pdf/2511.04285
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #MachineLearning #AI #DeepLearning #Generalization
📝 Summary:
RLoop is a self-improving framework addressing Reinforcement Learning overfitting and generalization issues. It uses iterative policy initialization and Rejection-sampling Fine-Tuning to convert diverse policy variations into robust performance gains, boosting accuracy and mitigating catastrophic...
🔹 Publication Date: Published on Nov 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04285
• PDF: https://arxiv.org/pdf/2511.04285
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #MachineLearning #AI #DeepLearning #Generalization
✨Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs
📝 Summary:
MoE LLMs have suboptimal routers that cause significant performance gaps. Routing Manifold Alignment RoMA aligns routing weights with task embeddings using a regularization term during lightweight finetuning of routers. This improves generalization by encouraging similar samples to share expert c...
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07419
• PDF: https://arxiv.org/pdf/2511.07419
• Github: https://github.com/tianyi-lab/RoMA
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMs #MixtureOfExperts #DeepLearning #AI #MachineLearning
📝 Summary:
MoE LLMs have suboptimal routers that cause significant performance gaps. Routing Manifold Alignment RoMA aligns routing weights with task embeddings using a regularization term during lightweight finetuning of routers. This improves generalization by encouraging similar samples to share expert c...
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07419
• PDF: https://arxiv.org/pdf/2511.07419
• Github: https://github.com/tianyi-lab/RoMA
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMs #MixtureOfExperts #DeepLearning #AI #MachineLearning
This media is not supported in your browser
VIEW IN TELEGRAM
✨DIMO: Diverse 3D Motion Generation for Arbitrary Objects
📝 Summary:
DIMO is a generative AI that creates diverse 3D motions for any object from one image. It extracts motion patterns from video models into a latent space, using neural key point trajectories to drive 3D object models. This enables sampling diverse motions and applications like interpolation.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07409
• PDF: https://arxiv.org/pdf/2511.07409
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DIMO #3DMotion #GenerativeAI #ComputerVision #DeepLearning
📝 Summary:
DIMO is a generative AI that creates diverse 3D motions for any object from one image. It extracts motion patterns from video models into a latent space, using neural key point trajectories to drive 3D object models. This enables sampling diverse motions and applications like interpolation.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07409
• PDF: https://arxiv.org/pdf/2511.07409
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DIMO #3DMotion #GenerativeAI #ComputerVision #DeepLearning
✨IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction
📝 Summary:
IterResearch improves long-horizon reasoning by reformulating it as a Markov Decision Process with strategic workspace reconstruction. This novel paradigm overcomes context suffocation, achieving substantial performance gains and unprecedented interaction scaling, and also serves as an effective ...
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07327
• PDF: https://arxiv.org/pdf/2511.07327
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #AI #MachineLearning #AIagents #MDP
📝 Summary:
IterResearch improves long-horizon reasoning by reformulating it as a Markov Decision Process with strategic workspace reconstruction. This novel paradigm overcomes context suffocation, achieving substantial performance gains and unprecedented interaction scaling, and also serves as an effective ...
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07327
• PDF: https://arxiv.org/pdf/2511.07327
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #AI #MachineLearning #AIagents #MDP
✨MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
📝 Summary:
MVU-Eval is a new comprehensive benchmark for evaluating Multi-Video Understanding in Multimodal Large Language Models. It addresses a critical gap in existing single-video benchmarks and reveals significant performance limitations in current MLLMs for multi-video scenarios.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07250
• PDF: https://arxiv.org/pdf/2511.07250
• Project Page: https://huggingface.co/datasets/MVU-Eval-Team/MVU-Eval-Data
• Github: https://github.com/NJU-LINK/MVU-Eval
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MLLMs #VideoUnderstanding #AI #Benchmarking #ComputerVision
📝 Summary:
MVU-Eval is a new comprehensive benchmark for evaluating Multi-Video Understanding in Multimodal Large Language Models. It addresses a critical gap in existing single-video benchmarks and reveals significant performance limitations in current MLLMs for multi-video scenarios.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07250
• PDF: https://arxiv.org/pdf/2511.07250
• Project Page: https://huggingface.co/datasets/MVU-Eval-Team/MVU-Eval-Data
• Github: https://github.com/NJU-LINK/MVU-Eval
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MLLMs #VideoUnderstanding #AI #Benchmarking #ComputerVision
✨The Station: An Open-World Environment for AI-Driven Discovery
📝 Summary:
The Station is an open-world multi-agent AI environment enabling autonomous scientific discovery. Agents engage in full scientific journeys, achieving state-of-the-art results across diverse benchmarks. This new paradigm fosters emergent behaviors and novel method development, moving beyond rigid...
🔹 Publication Date: Published on Nov 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06309
• PDF: https://arxiv.org/pdf/2511.06309
• Github: https://github.com/dualverse-ai/station
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #MultiAgentSystems #ScientificDiscovery #OpenWorldAI #AutonomousAI
📝 Summary:
The Station is an open-world multi-agent AI environment enabling autonomous scientific discovery. Agents engage in full scientific journeys, achieving state-of-the-art results across diverse benchmarks. This new paradigm fosters emergent behaviors and novel method development, moving beyond rigid...
🔹 Publication Date: Published on Nov 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06309
• PDF: https://arxiv.org/pdf/2511.06309
• Github: https://github.com/dualverse-ai/station
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #MultiAgentSystems #ScientificDiscovery #OpenWorldAI #AutonomousAI
❤1
This media is not supported in your browser
VIEW IN TELEGRAM
✨Robot Learning from a Physical World Model
📝 Summary:
PhysWorld enables robots to learn accurate manipulation from AI-generated videos by integrating video generation with physical world modeling. This approach grounds visual guidance into physically executable actions, eliminating the need for real robot data.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07416
• PDF: https://arxiv.org/pdf/2511.07416
• Project Page: https://pointscoder.github.io/PhysWorld_Web/
• Github: https://github.com/PointsCoder/OpenReal2Sim
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#RobotLearning #Robotics #AI #PhysicalModeling #MachineLearning
📝 Summary:
PhysWorld enables robots to learn accurate manipulation from AI-generated videos by integrating video generation with physical world modeling. This approach grounds visual guidance into physically executable actions, eliminating the need for real robot data.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07416
• PDF: https://arxiv.org/pdf/2511.07416
• Project Page: https://pointscoder.github.io/PhysWorld_Web/
• Github: https://github.com/PointsCoder/OpenReal2Sim
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#RobotLearning #Robotics #AI #PhysicalModeling #MachineLearning
✨DigiData: Training and Evaluating General-Purpose Mobile Control Agents
📝 Summary:
DigiData provides a diverse, high-quality dataset for training mobile control agents with complex goals from app feature exploration. DigiData-Bench offers dynamic AI-powered evaluation protocols, improving agent assessment beyond common metrics.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07413
• PDF: https://arxiv.org/pdf/2511.07413
• Github: https://facebookresearch.github.io/DigiData
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MobileAgents #ArtificialIntelligence #MachineLearning #Datasets #AgentTraining
📝 Summary:
DigiData provides a diverse, high-quality dataset for training mobile control agents with complex goals from app feature exploration. DigiData-Bench offers dynamic AI-powered evaluation protocols, improving agent assessment beyond common metrics.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07413
• PDF: https://arxiv.org/pdf/2511.07413
• Github: https://facebookresearch.github.io/DigiData
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MobileAgents #ArtificialIntelligence #MachineLearning #Datasets #AgentTraining
❤1
✨SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?
📝 Summary:
SWE-fficiency is a new benchmark evaluating how language models optimize real-world software repositories for performance on actual workloads. Agents must identify bottlenecks and generate correct code patches matching expert speedup. Current agents significantly underperform, struggling with loc...
🔹 Publication Date: Published on Nov 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06090
• PDF: https://arxiv.org/pdf/2511.06090
• Project Page: https://swefficiency.com/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #SoftwareOptimization #PerformanceTuning #AIagents #Benchmarking
📝 Summary:
SWE-fficiency is a new benchmark evaluating how language models optimize real-world software repositories for performance on actual workloads. Agents must identify bottlenecks and generate correct code patches matching expert speedup. Current agents significantly underperform, struggling with loc...
🔹 Publication Date: Published on Nov 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06090
• PDF: https://arxiv.org/pdf/2511.06090
• Project Page: https://swefficiency.com/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #SoftwareOptimization #PerformanceTuning #AIagents #Benchmarking
✨LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs
📝 Summary:
LUT-LLM is an FPGA accelerator for LLM inference that leverages on-chip memory to shift computation from arithmetic to memory-based operations via table lookups. This innovative approach achieves 1.66x lower latency than AMD MI210 and 1.72x higher energy efficiency than NVIDIA A100 for a 1.7B LLM.
🔹 Publication Date: Published on Nov 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06174
• PDF: https://arxiv.org/pdf/2511.06174
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #FPGA #AI #DeepLearning #AIHardware
📝 Summary:
LUT-LLM is an FPGA accelerator for LLM inference that leverages on-chip memory to shift computation from arithmetic to memory-based operations via table lookups. This innovative approach achieves 1.66x lower latency than AMD MI210 and 1.72x higher energy efficiency than NVIDIA A100 for a 1.7B LLM.
🔹 Publication Date: Published on Nov 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06174
• PDF: https://arxiv.org/pdf/2511.06174
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #FPGA #AI #DeepLearning #AIHardware
✨DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation
📝 Summary:
This study develops a two-stage reinforcement learning method for competitive code generation. It uses tailored data curation and a hard-focus curriculum, achieving state-of-the-art performance on competitive programming benchmarks.
🔹 Publication Date: Published on Nov 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06307
• PDF: https://arxiv.org/pdf/2511.06307
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #CodeGeneration #DataCuration #MachineLearning #AIResearch
📝 Summary:
This study develops a two-stage reinforcement learning method for competitive code generation. It uses tailored data curation and a hard-focus curriculum, achieving state-of-the-art performance on competitive programming benchmarks.
🔹 Publication Date: Published on Nov 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06307
• PDF: https://arxiv.org/pdf/2511.06307
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #CodeGeneration #DataCuration #MachineLearning #AIResearch
❤1
✨SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization
📝 Summary:
SofT-GRPO is a novel algorithm that enhances soft-thinking in LLMs by integrating Gumbel noise and Gumbel-Softmax. This method successfully reinforces soft-thinking policies, enabling LLMs to outperform discrete-token reinforcement learning approaches, especially on complex tasks.
🔹 Publication Date: Published on Nov 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06411
• PDF: https://arxiv.org/pdf/2511.06411
🔹 Models citing this paper:
• https://huggingface.co/zz1358m/SofT-GRPO-master
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #ReinforcementLearning #AI #MachineLearning #DeepLearning
📝 Summary:
SofT-GRPO is a novel algorithm that enhances soft-thinking in LLMs by integrating Gumbel noise and Gumbel-Softmax. This method successfully reinforces soft-thinking policies, enabling LLMs to outperform discrete-token reinforcement learning approaches, especially on complex tasks.
🔹 Publication Date: Published on Nov 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06411
• PDF: https://arxiv.org/pdf/2511.06411
🔹 Models citing this paper:
• https://huggingface.co/zz1358m/SofT-GRPO-master
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #ReinforcementLearning #AI #MachineLearning #DeepLearning
✨Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models
📝 Summary:
Diffusion-SDPO improves text-to-image quality by fixing a flaw in standard DPO where preferred output error can increase. It uses a safeguarded update to adaptively scale the loser gradient, ensuring the preferred output's error never increases. This leads to consistent quality gains across bench...
🔹 Publication Date: Published on Nov 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.03317
• PDF: https://arxiv.org/pdf/2511.03317
• Github: https://github.com/AIDC-AI/Diffusion-SDPO
🔹 Models citing this paper:
• https://huggingface.co/AIDC-AI/Diffusion-SDPO
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DiffusionModels #DPO #TextToImage #GenerativeAI #AI
📝 Summary:
Diffusion-SDPO improves text-to-image quality by fixing a flaw in standard DPO where preferred output error can increase. It uses a safeguarded update to adaptively scale the loser gradient, ensuring the preferred output's error never increases. This leads to consistent quality gains across bench...
🔹 Publication Date: Published on Nov 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.03317
• PDF: https://arxiv.org/pdf/2511.03317
• Github: https://github.com/AIDC-AI/Diffusion-SDPO
🔹 Models citing this paper:
• https://huggingface.co/AIDC-AI/Diffusion-SDPO
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DiffusionModels #DPO #TextToImage #GenerativeAI #AI