✨Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models
📝 Summary:
This study optimizes small language models for real-device latency by identifying key architectural factors and efficient operators. It introduces Nemotron-Flash, a new family of hybrid SLMs that significantly improves accuracy, latency, and throughput compared to current models.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2511.18890
• PDF: https://arxiv.org/pdf/2511.18890
🔹 Models citing this paper:
• https://huggingface.co/nvidia/Nemotron-Flash-3B-Instruct
• https://huggingface.co/nvidia/Nemotron-Flash-1B
• https://huggingface.co/nvidia/Nemotron-Flash-3B
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#SmallLanguageModels #LatencyOptimization #AI #DeepLearning #NLP
📝 Summary:
This study optimizes small language models for real-device latency by identifying key architectural factors and efficient operators. It introduces Nemotron-Flash, a new family of hybrid SLMs that significantly improves accuracy, latency, and throughput compared to current models.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2511.18890
• PDF: https://arxiv.org/pdf/2511.18890
🔹 Models citing this paper:
• https://huggingface.co/nvidia/Nemotron-Flash-3B-Instruct
• https://huggingface.co/nvidia/Nemotron-Flash-1B
• https://huggingface.co/nvidia/Nemotron-Flash-3B
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#SmallLanguageModels #LatencyOptimization #AI #DeepLearning #NLP
❤1
✨Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM
📝 Summary:
Xmodel-2.5 is a 1.3B language model designed for efficient edge deployments. It uses maximal-update parameterization and a novel training curriculum that switches from AdamW to Muon, improving reasoning skills by 4.58% while maintaining efficiency.
🔹 Publication Date: Published on Nov 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19496
• PDF: https://arxiv.org/pdf/2511.19496
• Github: https://github.com/XiaoduoAILab/Xmodel-2.5
🔹 Models citing this paper:
• https://huggingface.co/XiaoduoAILab/Xmodel-2.5
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#SLM #EdgeAI #LanguageModels #DeepLearning #ReasoningAI
📝 Summary:
Xmodel-2.5 is a 1.3B language model designed for efficient edge deployments. It uses maximal-update parameterization and a novel training curriculum that switches from AdamW to Muon, improving reasoning skills by 4.58% while maintaining efficiency.
🔹 Publication Date: Published on Nov 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19496
• PDF: https://arxiv.org/pdf/2511.19496
• Github: https://github.com/XiaoduoAILab/Xmodel-2.5
🔹 Models citing this paper:
• https://huggingface.co/XiaoduoAILab/Xmodel-2.5
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#SLM #EdgeAI #LanguageModels #DeepLearning #ReasoningAI
❤1
This media is not supported in your browser
VIEW IN TELEGRAM
✨Geometrically-Constrained Agent for Spatial Reasoning
📝 Summary:
Geometrically Constrained Agent GCA resolves the semantic to geometric gap in VLMs for spatial reasoning. It uses a formal task constraint to guide the VLM from semantic analysis to constrained tool execution, achieving SOTA performance.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22659
• PDF: https://arxiv.org/pdf/2511.22659
• Project Page: https://gca-spatial-reasoning.github.io
• Github: https://github.com/gca-spatial-reasoning/gca
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#SpatialReasoning #VLMs #AI #Robotics #DeepLearning
📝 Summary:
Geometrically Constrained Agent GCA resolves the semantic to geometric gap in VLMs for spatial reasoning. It uses a formal task constraint to guide the VLM from semantic analysis to constrained tool execution, achieving SOTA performance.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22659
• PDF: https://arxiv.org/pdf/2511.22659
• Project Page: https://gca-spatial-reasoning.github.io
• Github: https://github.com/gca-spatial-reasoning/gca
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#SpatialReasoning #VLMs #AI #Robotics #DeepLearning
❤1
✨LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
📝 Summary:
LongVT is an agentic framework that improves long video reasoning. It uses LMMs as tools for global-to-local video cropping and frame resampling to ground answers. This novel approach consistently outperforms existing baselines.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20785
• PDF: https://arxiv.org/pdf/2511.20785
• Project Page: https://evolvinglmms-lab.github.io/LongVT/
• Github: https://github.com/EvolvingLMMs-Lab/LongVT
🔹 Models citing this paper:
• https://huggingface.co/longvideotool/LongVT-RFT
• https://huggingface.co/longvideotool/LongVT-SFT
• https://huggingface.co/longvideotool/LongVT-RL
✨ Datasets citing this paper:
• https://huggingface.co/datasets/longvideotool/LongVT-Source
• https://huggingface.co/datasets/longvideotool/LongVT-Parquet
✨ Spaces citing this paper:
• https://huggingface.co/spaces/longvideotool/LongVT-Demo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoAI #LMMs #AgenticAI #ComputerVision #AIResearch
📝 Summary:
LongVT is an agentic framework that improves long video reasoning. It uses LMMs as tools for global-to-local video cropping and frame resampling to ground answers. This novel approach consistently outperforms existing baselines.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20785
• PDF: https://arxiv.org/pdf/2511.20785
• Project Page: https://evolvinglmms-lab.github.io/LongVT/
• Github: https://github.com/EvolvingLMMs-Lab/LongVT
🔹 Models citing this paper:
• https://huggingface.co/longvideotool/LongVT-RFT
• https://huggingface.co/longvideotool/LongVT-SFT
• https://huggingface.co/longvideotool/LongVT-RL
✨ Datasets citing this paper:
• https://huggingface.co/datasets/longvideotool/LongVT-Source
• https://huggingface.co/datasets/longvideotool/LongVT-Parquet
✨ Spaces citing this paper:
• https://huggingface.co/spaces/longvideotool/LongVT-Demo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoAI #LMMs #AgenticAI #ComputerVision #AIResearch
arXiv.org
LongVT: Incentivizing "Thinking with Long Videos" via...
Large multimodal models (LMMs) have shown great potential for video reasoning with textual Chain-of-Thought. However, they remain vulnerable to hallucinations, especially when processing long-form...
❤1
Media is too big
VIEW IN TELEGRAM
✨GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation
📝 Summary:
GR-RL improves VLA policies for dexterous long-horizon manipulation. It filters and augments demonstrations, then refines them with RL. This enables unprecedented complex tasks, notably autonomously lacing a shoe.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01801
• PDF: https://arxiv.org/pdf/2512.01801
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Robotics #ReinforcementLearning #DexterousManipulation #RoboticManipulation #AI
📝 Summary:
GR-RL improves VLA policies for dexterous long-horizon manipulation. It filters and augments demonstrations, then refines them with RL. This enables unprecedented complex tasks, notably autonomously lacing a shoe.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01801
• PDF: https://arxiv.org/pdf/2512.01801
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Robotics #ReinforcementLearning #DexterousManipulation #RoboticManipulation #AI
✨What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards
📝 Summary:
NewtonRewards is a post-training framework that uses verifiable, physics-grounded rewards to improve physical realism and motion quality in AI-generated videos. It enforces Newtonian kinematics and mass conservation, significantly outperforming prior methods on various motion tasks. This offers a...
🔹 Publication Date: Published on Nov 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00425
• PDF: https://arxiv.org/pdf/2512.00425
• Project Page: https://cvlab-stonybrook.github.io/NewtonRewards/
• Github: https://cvlab-stonybrook.github.io/NewtonRewards
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIVideoGeneration #PhysicsInAI #MachineLearning #GenerativeAI #ComputerVision
📝 Summary:
NewtonRewards is a post-training framework that uses verifiable, physics-grounded rewards to improve physical realism and motion quality in AI-generated videos. It enforces Newtonian kinematics and mass conservation, significantly outperforming prior methods on various motion tasks. This offers a...
🔹 Publication Date: Published on Nov 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00425
• PDF: https://arxiv.org/pdf/2512.00425
• Project Page: https://cvlab-stonybrook.github.io/NewtonRewards/
• Github: https://cvlab-stonybrook.github.io/NewtonRewards
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIVideoGeneration #PhysicsInAI #MachineLearning #GenerativeAI #ComputerVision
✨SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs
📝 Summary:
SpeContext uses a distilled language model for efficient long-context LLM reasoning. This system co-design significantly reduces parameters and improves throughput by up to 24.89x in cloud and 10.06x in edge, with minimal accuracy loss.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00722
• PDF: https://arxiv.org/pdf/2512.00722
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #AIResearch #DeepLearning #AIOptimization #ContextSparsity
📝 Summary:
SpeContext uses a distilled language model for efficient long-context LLM reasoning. This system co-design significantly reduces parameters and improves throughput by up to 24.89x in cloud and 10.06x in edge, with minimal accuracy loss.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00722
• PDF: https://arxiv.org/pdf/2512.00722
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #AIResearch #DeepLearning #AIOptimization #ContextSparsity
✨How Far Are We from Genuinely Useful Deep Research Agents?
📝 Summary:
The paper introduces FINDER, a benchmark for Deep Research Agents DRAs with human-curated tasks and structured metrics. It also presents DEFT, a failure taxonomy showing DRAs struggle with evidence integration, verification, and resilient planning, not task comprehension.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01948
• PDF: https://arxiv.org/pdf/2512.01948
• Github: https://github.com/OPPO-PersonalAI/FINDER_DEFT
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DeepResearchAgents #AIResearch #AIBenchmarking #FailureTaxonomy #ArtificialIntelligence
📝 Summary:
The paper introduces FINDER, a benchmark for Deep Research Agents DRAs with human-curated tasks and structured metrics. It also presents DEFT, a failure taxonomy showing DRAs struggle with evidence integration, verification, and resilient planning, not task comprehension.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01948
• PDF: https://arxiv.org/pdf/2512.01948
• Github: https://github.com/OPPO-PersonalAI/FINDER_DEFT
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DeepResearchAgents #AIResearch #AIBenchmarking #FailureTaxonomy #ArtificialIntelligence
✨Rectifying LLM Thought from Lens of Optimization
📝 Summary:
RePro is a novel process-level reward mechanism that refines LLM reasoning by treating chain-of-thought as an optimization process. It uses dual scoring to generate a composite reward, integrated into RL pipelines to enhance performance and reduce suboptimal behaviors.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01925
• PDF: https://arxiv.org/pdf/2512.01925
• Github: https://github.com/open-compass/RePro
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #ReinforcementLearning #Optimization #ArtificialIntelligence #DeepLearning
📝 Summary:
RePro is a novel process-level reward mechanism that refines LLM reasoning by treating chain-of-thought as an optimization process. It uses dual scoring to generate a composite reward, integrated into RL pipelines to enhance performance and reduce suboptimal behaviors.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01925
• PDF: https://arxiv.org/pdf/2512.01925
• Github: https://github.com/open-compass/RePro
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #ReinforcementLearning #Optimization #ArtificialIntelligence #DeepLearning
Media is too big
VIEW IN TELEGRAM
✨VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference
📝 Summary:
VLASH is an asynchronous inference framework for VLAs. It achieves fast accurate and low-latency robotic control by estimating future robot states bridging prediction-execution gaps. This enables VLAs to perform high-precision tasks like ping-pong with significant speedup and reduced latency.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01031
• PDF: https://arxiv.org/pdf/2512.01031
• Github: https://github.com/mit-han-lab/vlash
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Robotics #VisionLanguageModels #RealTimeAI #AIResearch #MachineLearning
📝 Summary:
VLASH is an asynchronous inference framework for VLAs. It achieves fast accurate and low-latency robotic control by estimating future robot states bridging prediction-execution gaps. This enables VLAs to perform high-precision tasks like ping-pong with significant speedup and reduced latency.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01031
• PDF: https://arxiv.org/pdf/2512.01031
• Github: https://github.com/mit-han-lab/vlash
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Robotics #VisionLanguageModels #RealTimeAI #AIResearch #MachineLearning
✨TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models
📝 Summary:
TUNA is a unified multimodal model that builds a single continuous visual representation. This enables end-to-end understanding and generation, avoiding mismatches found in decoupled models and achieving state-of-the-art performance across multimodal tasks.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02014
• PDF: https://arxiv.org/pdf/2512.02014
• Project Page: https://tuna-ai.org/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #ComputerVision #DeepLearning #GenerativeAI #AIResearch
📝 Summary:
TUNA is a unified multimodal model that builds a single continuous visual representation. This enables end-to-end understanding and generation, avoiding mismatches found in decoupled models and achieving state-of-the-art performance across multimodal tasks.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02014
• PDF: https://arxiv.org/pdf/2512.02014
• Project Page: https://tuna-ai.org/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #ComputerVision #DeepLearning #GenerativeAI #AIResearch
✨Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model
📝 Summary:
Lotus-2 is a two-stage deterministic framework adapting powerful diffusion models for accurate geometric inference. It achieves top monocular depth and competitive surface normal prediction with very limited training data.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01030
• PDF: https://arxiv.org/pdf/2512.01030
• Project Page: https://lotus-2.github.io/
• Github: https://github.com/EnVision-Research/Lotus-2
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ComputerVision #DeepLearning #DiffusionModels #GeometricPrediction #MonocularDepth
📝 Summary:
Lotus-2 is a two-stage deterministic framework adapting powerful diffusion models for accurate geometric inference. It achieves top monocular depth and competitive surface normal prediction with very limited training data.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01030
• PDF: https://arxiv.org/pdf/2512.01030
• Project Page: https://lotus-2.github.io/
• Github: https://github.com/EnVision-Research/Lotus-2
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ComputerVision #DeepLearning #DiffusionModels #GeometricPrediction #MonocularDepth
✨Generalist Large Language Models Outperform Clinical Tools on Medical Benchmarks
📝 Summary:
Generalist LLMs like GPT-5 outperformed specialized clinical AI tools on a medical benchmark. This reveals that clinical decision support tools may lag behind frontier models and need urgent, independent evaluation before clinical deployment.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01191
• PDF: https://arxiv.org/pdf/2512.01191
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #HealthcareAI #AIinMedicine #ClinicalAI #MedicalResearch
📝 Summary:
Generalist LLMs like GPT-5 outperformed specialized clinical AI tools on a medical benchmark. This reveals that clinical decision support tools may lag behind frontier models and need urgent, independent evaluation before clinical deployment.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01191
• PDF: https://arxiv.org/pdf/2512.01191
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #HealthcareAI #AIinMedicine #ClinicalAI #MedicalResearch
✨Doppler-Enhanced Deep Learning: Improving Thyroid Nodule Segmentation with YOLOv5 Instance Segmentation
📝 Summary:
YOLOv5 algorithms accurately segment thyroid nodules in ultrasound images. Incorporating doppler images significantly improves segmentation performance across all models, offering a real-time solution for clinical diagnostics.
🔹 Publication Date: Published on Nov 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00639
• PDF: https://arxiv.org/pdf/2512.00639
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DeepLearning #MedicalImaging #ThyroidHealth #YOLOv5 #AIinHealthcare
📝 Summary:
YOLOv5 algorithms accurately segment thyroid nodules in ultrasound images. Incorporating doppler images significantly improves segmentation performance across all models, offering a real-time solution for clinical diagnostics.
🔹 Publication Date: Published on Nov 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00639
• PDF: https://arxiv.org/pdf/2512.00639
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DeepLearning #MedicalImaging #ThyroidHealth #YOLOv5 #AIinHealthcare
Media is too big
VIEW IN TELEGRAM
✨Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout
📝 Summary:
Infinity-RoPE is a new inference-time framework for autoregressive video diffusion, enabling continuous generation, fine-grained action control, and cinematic transitions without retraining. It addresses limitations like finite temporal horizons and slow prompt responsiveness, outperforming prior...
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20649
• PDF: https://arxiv.org/pdf/2511.20649
• Github: https://infinity-rope.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #AI #DeepLearning #ComputerVision #DiffusionModels
📝 Summary:
Infinity-RoPE is a new inference-time framework for autoregressive video diffusion, enabling continuous generation, fine-grained action control, and cinematic transitions without retraining. It addresses limitations like finite temporal horizons and slow prompt responsiveness, outperforming prior...
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20649
• PDF: https://arxiv.org/pdf/2511.20649
• Github: https://infinity-rope.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #AI #DeepLearning #ComputerVision #DiffusionModels
✨Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights
📝 Summary:
Envision is a new benchmark for chained text-to-multi-image generation assessing models dynamic causal process and world knowledge. Unified multimodal models outperform specialized ones in causal coherence but still struggle with spatiotemporal consistency, due to static training.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01816
• PDF: https://arxiv.org/pdf/2512.01816
• Project Page: https://opendatalab-raiser.github.io/Envision/
• Github: https://github.com/opendatalab-raiser/Envision
✨ Datasets citing this paper:
• https://huggingface.co/datasets/opendatalab-raiser/Envision
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #CausalReasoning #AIBenchmarking #GenerativeAI #ComputerVision
📝 Summary:
Envision is a new benchmark for chained text-to-multi-image generation assessing models dynamic causal process and world knowledge. Unified multimodal models outperform specialized ones in causal coherence but still struggle with spatiotemporal consistency, due to static training.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01816
• PDF: https://arxiv.org/pdf/2512.01816
• Project Page: https://opendatalab-raiser.github.io/Envision/
• Github: https://github.com/opendatalab-raiser/Envision
✨ Datasets citing this paper:
• https://huggingface.co/datasets/opendatalab-raiser/Envision
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #CausalReasoning #AIBenchmarking #GenerativeAI #ComputerVision
✨The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
📝 Summary:
ImageCritic corrects inconsistent fine-grained details in generated images using a reference-guided post-editing approach. It employs attention alignment loss and a detail encoder to precisely rectify inconsistencies and improve accuracy.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20614
• PDF: https://arxiv.org/pdf/2511.20614
• Project Page: https://ouyangziheng.github.io/ImageCritic-Page/
• Github: https://github.com/HVision-NKU/ImageCritic
🔹 Models citing this paper:
• https://huggingface.co/ziheng1234/ImageCritic
✨ Datasets citing this paper:
• https://huggingface.co/datasets/ziheng1234/Critic-10K
✨ Spaces citing this paper:
• https://huggingface.co/spaces/ziheng1234/ImageCritic
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageGeneration #ComputerVision #DeepLearning #AI #ImageEditing
📝 Summary:
ImageCritic corrects inconsistent fine-grained details in generated images using a reference-guided post-editing approach. It employs attention alignment loss and a detail encoder to precisely rectify inconsistencies and improve accuracy.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20614
• PDF: https://arxiv.org/pdf/2511.20614
• Project Page: https://ouyangziheng.github.io/ImageCritic-Page/
• Github: https://github.com/HVision-NKU/ImageCritic
🔹 Models citing this paper:
• https://huggingface.co/ziheng1234/ImageCritic
✨ Datasets citing this paper:
• https://huggingface.co/datasets/ziheng1234/Critic-10K
✨ Spaces citing this paper:
• https://huggingface.co/spaces/ziheng1234/ImageCritic
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageGeneration #ComputerVision #DeepLearning #AI #ImageEditing
arXiv.org
The Consistency Critic: Correcting Inconsistencies in Generated...
Previous works have explored various customized generation tasks given a reference image, but they still face limitations in generating consistent fine-grained details. In this paper, our aim is...
✨HiconAgent: History Context-aware Policy Optimization for GUI Agents
📝 Summary:
HiconAgent introduces History Context-aware Policy Optimization HCPO for GUI agents. HCPO efficiently leverages historical context using dynamic sampling and compression, achieving better performance than larger models with reduced computational cost and significant speedups.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01763
• PDF: https://arxiv.org/pdf/2512.01763
• Github: https://github.com/JiuTian-VL/HiconAgent
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#HiconAgent #GUIAgents #AIResearch #ReinforcementLearning #ContextAwareAI
📝 Summary:
HiconAgent introduces History Context-aware Policy Optimization HCPO for GUI agents. HCPO efficiently leverages historical context using dynamic sampling and compression, achieving better performance than larger models with reduced computational cost and significant speedups.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01763
• PDF: https://arxiv.org/pdf/2512.01763
• Github: https://github.com/JiuTian-VL/HiconAgent
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#HiconAgent #GUIAgents #AIResearch #ReinforcementLearning #ContextAwareAI
✨InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision
📝 Summary:
InternVideo-Next proposes a two-stage Encoder-Predictor-Decoder framework for general video representation learning without text supervision. It uses a conditional diffusion decoder to bridge pixel fidelity with semantics in Stage 1, then a latent world model in Stage 2 to learn world knowledge a...
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01342
• PDF: https://arxiv.org/pdf/2512.01342
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoFoundationModels #VideoAI #DeepLearning #UnsupervisedLearning #DiffusionModels
📝 Summary:
InternVideo-Next proposes a two-stage Encoder-Predictor-Decoder framework for general video representation learning without text supervision. It uses a conditional diffusion decoder to bridge pixel fidelity with semantics in Stage 1, then a latent world model in Stage 2 to learn world knowledge a...
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01342
• PDF: https://arxiv.org/pdf/2512.01342
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoFoundationModels #VideoAI #DeepLearning #UnsupervisedLearning #DiffusionModels