✨Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning
📝 Summary:
Colon-X introduces ColonR1, a novel reasoning-centric model for intelligent colonoscopy. It achieves 56.61% accuracy, outperforming traditional methods by 25.22% under data scarcity, by leveraging new comprehensive multimodal datasets.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03667
• PDF: https://arxiv.org/pdf/2512.03667
• Github: https://github.com/ai4colonoscopy/Colon-X
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Colon-X introduces ColonR1, a novel reasoning-centric model for intelligent colonoscopy. It achieves 56.61% accuracy, outperforming traditional methods by 25.22% under data scarcity, by leveraging new comprehensive multimodal datasets.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03667
• PDF: https://arxiv.org/pdf/2512.03667
• Github: https://github.com/ai4colonoscopy/Colon-X
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems
📝 Summary:
DoVer is an intervention-driven debugging approach for LLM multi-agent systems. It validates failure hypotheses and measures progress via targeted interventions, improving reliability. DoVer converts 18-49% of failed tasks into successes, offering an outcome-oriented debugging method.
🔹 Publication Date: Published on Dec 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06749
• PDF: https://arxiv.org/pdf/2512.06749
• Project Page: https://aka.ms/DoVer
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #MultiAgentSystems #Debugging #AI #Research
📝 Summary:
DoVer is an intervention-driven debugging approach for LLM multi-agent systems. It validates failure hypotheses and measures progress via targeted interventions, improving reliability. DoVer converts 18-49% of failed tasks into successes, offering an outcome-oriented debugging method.
🔹 Publication Date: Published on Dec 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06749
• PDF: https://arxiv.org/pdf/2512.06749
• Project Page: https://aka.ms/DoVer
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #MultiAgentSystems #Debugging #AI #Research
✨Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs
📝 Summary:
The paper proposes a method to enhance Rotary Position Embeddings by utilizing both the real and imaginary components of the complex-valued dot product, improving long-context modeling in Large Langua...
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07525
• PDF: https://arxiv.org/pdf/2512.07525
• Github: https://github.com/OpenMOSS/rope_pp
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
The paper proposes a method to enhance Rotary Position Embeddings by utilizing both the real and imaginary components of the complex-valued dot product, improving long-context modeling in Large Langua...
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07525
• PDF: https://arxiv.org/pdf/2512.07525
• Github: https://github.com/OpenMOSS/rope_pp
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨LongCat-Image Technical Report
📝 Summary:
LongCat-Image is a bilingual open-source foundation model for image generation that addresses multilingual text rendering, photorealism, and deployment efficiency through rigorous data curation, compa...
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07584
• PDF: https://arxiv.org/pdf/2512.07584
• Project Page: https://longcat.chat/
• Github: https://github.com/meituan-longcat/LongCat-Image
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
LongCat-Image is a bilingual open-source foundation model for image generation that addresses multilingual text rendering, photorealism, and deployment efficiency through rigorous data curation, compa...
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07584
• PDF: https://arxiv.org/pdf/2512.07584
• Project Page: https://longcat.chat/
• Github: https://github.com/meituan-longcat/LongCat-Image
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
✨EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing
📝 Summary:
EgoEdit is a real-time, instruction-following egocentric video editor that addresses challenges in handling egomotion and hand-object interactions, outperforming existing methods on egocentric editing...
🔹 Publication Date: Published on Dec 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06065
• PDF: https://arxiv.org/pdf/2512.06065
• Project Page: https://snap-research.github.io/EgoEdit/
• Github: https://github.com/snap-research/EgoEdit
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
EgoEdit is a real-time, instruction-following egocentric video editor that addresses challenges in handling egomotion and hand-object interactions, outperforming existing methods on egocentric editing...
🔹 Publication Date: Published on Dec 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06065
• PDF: https://arxiv.org/pdf/2512.06065
• Project Page: https://snap-research.github.io/EgoEdit/
• Github: https://github.com/snap-research/EgoEdit
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Scaling Zero-Shot Reference-to-Video Generation
📝 Summary:
Saber is a scalable zero-shot framework for reference-to-video generation that uses video-text pairs to learn identity-consistent representations and outperforms models trained with explicit reference...
🔹 Publication Date: Published on Dec 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06905
• PDF: https://arxiv.org/pdf/2512.06905
• Project Page: https://franciszzj.github.io/Saber/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Saber is a scalable zero-shot framework for reference-to-video generation that uses video-text pairs to learn identity-consistent representations and outperforms models trained with explicit reference...
🔹 Publication Date: Published on Dec 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06905
• PDF: https://arxiv.org/pdf/2512.06905
• Project Page: https://franciszzj.github.io/Saber/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Rethinking Training Dynamics in Scale-wise Autoregressive Generation
📝 Summary:
Self-Autoregressive Refinement (SAR) improves the quality of autoregressive generative models by addressing exposure bias through Stagger-Scale Rollout and Contrastive Student-Forcing Loss, leading to...
🔹 Publication Date: Published on Dec 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06421
• PDF: https://arxiv.org/pdf/2512.06421
• Project Page: https://gengzezhou.github.io/SAR/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Self-Autoregressive Refinement (SAR) improves the quality of autoregressive generative models by addressing exposure bias through Stagger-Scale Rollout and Contrastive Student-Forcing Loss, leading to...
🔹 Publication Date: Published on Dec 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06421
• PDF: https://arxiv.org/pdf/2512.06421
• Project Page: https://gengzezhou.github.io/SAR/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Embodied Referring Expression Comprehension in Human-Robot Interaction
📝 Summary:
A large-scale dataset and multimodal model improve embodied interaction comprehension in robots by addressing perspective bias and enhancing multimodal signal integration. AI-generated summary As robo...
🔹 Publication Date: Published on Dec 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06558
• PDF: https://arxiv.org/pdf/2512.06558
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A large-scale dataset and multimodal model improve embodied interaction comprehension in robots by addressing perspective bias and enhancing multimodal signal integration. AI-generated summary As robo...
🔹 Publication Date: Published on Dec 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06558
• PDF: https://arxiv.org/pdf/2512.06558
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
Media is too big
VIEW IN TELEGRAM
✨Unified Video Editing with Temporal Reasoner
📝 Summary:
VideoCoF, a Chain-of-Frames approach, improves video editing precision and instruction-to-region mapping by using reasoning tokens without requiring user-provided masks. AI-generated summary Existing ...
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07469
• PDF: https://arxiv.org/pdf/2512.07469
• Project Page: https://videocof.github.io/
• Github: https://github.com/knightyxp/VideoCoF
🔹 Models citing this paper:
• https://huggingface.co/XiangpengYang/VideoCoF
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
VideoCoF, a Chain-of-Frames approach, improves video editing precision and instruction-to-region mapping by using reasoning tokens without requiring user-provided masks. AI-generated summary Existing ...
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07469
• PDF: https://arxiv.org/pdf/2512.07469
• Project Page: https://videocof.github.io/
• Github: https://github.com/knightyxp/VideoCoF
🔹 Models citing this paper:
• https://huggingface.co/XiangpengYang/VideoCoF
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
Media is too big
VIEW IN TELEGRAM
✨Relational Visual Similarity
📝 Summary:
Vision-Language models fine-tuned on anonymized image captions can capture relational similarity between images, a capability lacking in current visual similarity metrics. AI-generated summary Humans ...
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07833
• PDF: https://arxiv.org/pdf/2512.07833
• Project Page: https://thaoshibe.github.io/relsim/
• Github: https://github.com/thaoshibe/relsim
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Vision-Language models fine-tuned on anonymized image captions can capture relational similarity between images, a capability lacking in current visual similarity metrics. AI-generated summary Humans ...
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07833
• PDF: https://arxiv.org/pdf/2512.07833
• Project Page: https://thaoshibe.github.io/relsim/
• Github: https://github.com/thaoshibe/relsim
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Group Representational Position Encoding
📝 Summary:
GRAPE is a unified positional encoding framework that combines multiplicative rotations and additive logit biases, extending existing methods like RoPE and ALiBi. AI-generated summary We present GRAPE...
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07805
• PDF: https://model-architectures.github.io/GRAPE/GRAPE.pdf
• Github: https://model-architectures.github.io/GRAPE/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
GRAPE is a unified positional encoding framework that combines multiplicative rotations and additive logit biases, extending existing methods like RoPE and ALiBi. AI-generated summary We present GRAPE...
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07805
• PDF: https://model-architectures.github.io/GRAPE/GRAPE.pdf
• Github: https://model-architectures.github.io/GRAPE/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
✨Voxify3D: Pixel Art Meets Volumetric Rendering
📝 Summary:
Voxify3D is a two-stage framework that combines 3D mesh optimization with 2D pixel art supervision to generate high-quality voxel art with semantic preservation, pixel-art aesthetics, and discrete col...
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07834
• PDF: https://arxiv.org/pdf/2512.07834
• Project Page: https://yichuanh.github.io/Voxify-3D/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Voxify3D is a two-stage framework that combines 3D mesh optimization with 2D pixel art supervision to generate high-quality voxel art with semantic preservation, pixel-art aesthetics, and discrete col...
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07834
• PDF: https://arxiv.org/pdf/2512.07834
• Project Page: https://yichuanh.github.io/Voxify-3D/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
📝 Summary:
A controlled experimental framework isolates and evaluates the contributions of pre-training, mid-training, and reinforcement learning in improving language model reasoning, demonstrating the necessit...
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07783
• PDF: https://arxiv.org/pdf/2512.07783
• Github: https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A controlled experimental framework isolates and evaluates the contributions of pre-training, mid-training, and reinforcement learning in improving language model reasoning, demonstrating the necessit...
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07783
• PDF: https://arxiv.org/pdf/2512.07783
• Github: https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Vector Quantization using Gaussian Variational Autoencoder
📝 Summary:
Gaussian Quant (GQ) converts Gaussian VAE to VQ-VAE without training, outperforming previous VQ-VAEs and Gaussian VAE discretization methods across different architectures. AI-generated summary Vector...
🔹 Publication Date: Published on Dec 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06609
• PDF: https://arxiv.org/pdf/2512.06609
• Github: https://github.com/Stability-AI/generative-models
🔹 Models citing this paper:
• https://huggingface.co/xutongda/GQModel
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Gaussian Quant (GQ) converts Gaussian VAE to VQ-VAE without training, outperforming previous VQ-VAEs and Gaussian VAE discretization methods across different architectures. AI-generated summary Vector...
🔹 Publication Date: Published on Dec 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06609
• PDF: https://arxiv.org/pdf/2512.06609
• Github: https://github.com/Stability-AI/generative-models
🔹 Models citing this paper:
• https://huggingface.co/xutongda/GQModel
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
✨VideoVLA: Video Generators Can Be Generalizable Robot Manipulators
📝 Summary:
VideoVLA uses a multi-modal Diffusion Transformer to predict actions and visual outcomes from language and image inputs, enabling strong generalization in robotic manipulation tasks. AI-generated summ...
🔹 Publication Date: Published on Dec 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06963
• PDF: https://arxiv.org/pdf/2512.06963
• Project Page: https://videovla-nips2025.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
VideoVLA uses a multi-modal Diffusion Transformer to predict actions and visual outcomes from language and image inputs, enabling strong generalization in robotic manipulation tasks. AI-generated summ...
🔹 Publication Date: Published on Dec 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06963
• PDF: https://arxiv.org/pdf/2512.06963
• Project Page: https://videovla-nips2025.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
✨Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning
📝 Summary:
NPR is a teacher-free framework enabling LLMs to perform genuine parallel reasoning. It uses self-distilled training and a new optimization algorithm. This achieves significant performance gains and speedups on reasoning benchmarks.
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07461
• PDF: https://arxiv.org/pdf/2512.07461
• Github: https://bigai-nlco.github.io/Native-Parallel-Reasoner
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
NPR is a teacher-free framework enabling LLMs to perform genuine parallel reasoning. It uses self-distilled training and a new optimization algorithm. This achieves significant performance gains and speedups on reasoning benchmarks.
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07461
• PDF: https://arxiv.org/pdf/2512.07461
• Github: https://bigai-nlco.github.io/Native-Parallel-Reasoner
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Distribution Matching Variational AutoEncoder
📝 Summary:
DMVAE explicitly aligns VAE latent distributions with arbitrary reference distributions, generalizing beyond fixed priors. This improves modeling efficiency and image synthesis fidelity, with SSL-derived distributions showing excellent balance.
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07778
• PDF: https://arxiv.org/pdf/2512.07778
• Github: https://github.com/sen-ye/dmvae%7D
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VAE #DeepLearning #GenerativeAI #ImageSynthesis #ArtificialIntelligence
📝 Summary:
DMVAE explicitly aligns VAE latent distributions with arbitrary reference distributions, generalizing beyond fixed priors. This improves modeling efficiency and image synthesis fidelity, with SSL-derived distributions showing excellent balance.
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07778
• PDF: https://arxiv.org/pdf/2512.07778
• Github: https://github.com/sen-ye/dmvae%7D
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VAE #DeepLearning #GenerativeAI #ImageSynthesis #ArtificialIntelligence
✨Multi-view Pyramid Transformer: Look Coarser to See Broader
📝 Summary:
MVP is a scalable multi-view transformer that reconstructs large 3D scenes from many images. It uses a dual hierarchy of local-to-global inter-view and fine-to-coarse intra-view processing. This achieves efficient, state-of-the-art 3D scene reconstruction quality.
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07806
• PDF: https://arxiv.org/pdf/2512.07806
• Project Page: https://gynjn.github.io/MVP/
• Github: https://github.com/Gynjn/MVP
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#3DReconstruction #ComputerVision #Transformers #DeepLearning #AI
📝 Summary:
MVP is a scalable multi-view transformer that reconstructs large 3D scenes from many images. It uses a dual hierarchy of local-to-global inter-view and fine-to-coarse intra-view processing. This achieves efficient, state-of-the-art 3D scene reconstruction quality.
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07806
• PDF: https://arxiv.org/pdf/2512.07806
• Project Page: https://gynjn.github.io/MVP/
• Github: https://github.com/Gynjn/MVP
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#3DReconstruction #ComputerVision #Transformers #DeepLearning #AI
✨UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation
📝 Summary:
UnityVideo is a unified framework enhancing video generation by integrating multiple modalities and training paradigms. It uses dynamic noising and a modality switcher for comprehensive world understanding. This improves video quality, consistency, and zero-shot generalization to new data.
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07831
• PDF: https://arxiv.org/pdf/2512.07831
• Project Page: https://jackailab.github.io/Projects/UnityVideo/
• Github: https://github.com/dvlab-research/UnityVideo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #MultimodalAI #GenerativeAI #DeepLearning #AIResearch
📝 Summary:
UnityVideo is a unified framework enhancing video generation by integrating multiple modalities and training paradigms. It uses dynamic noising and a modality switcher for comprehensive world understanding. This improves video quality, consistency, and zero-shot generalization to new data.
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07831
• PDF: https://arxiv.org/pdf/2512.07831
• Project Page: https://jackailab.github.io/Projects/UnityVideo/
• Github: https://github.com/dvlab-research/UnityVideo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #MultimodalAI #GenerativeAI #DeepLearning #AIResearch
✨ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation
📝 Summary:
ReCamDriving generates camera-controlled novel-trajectory videos using dense 3DGS renderings and a two-stage training approach, achieving state-of-the-art results in controllability and consistency. A...
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03621
• PDF: https://arxiv.org/pdf/2512.03621
• Project Page: https://recamdriving.github.io/
• Github: https://recamdriving.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
ReCamDriving generates camera-controlled novel-trajectory videos using dense 3DGS renderings and a two-stage training approach, achieving state-of-the-art results in controllability and consistency. A...
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03621
• PDF: https://arxiv.org/pdf/2512.03621
• Project Page: https://recamdriving.github.io/
• Github: https://recamdriving.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨VG-Refiner: Towards Tool-Refined Referring Grounded Reasoning via Agentic Reinforcement Learning
📝 Summary:
VG-Refiner improves visual reasoning by addressing unreliable tool outputs. It uses a two-stage think-rethink mechanism and refinement reward to correct poor tool results. This significantly improves accuracy and correction ability in referring and grounding tasks.
🔹 Publication Date: Published on Dec 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06373
• PDF: https://arxiv.org/pdf/2512.06373
• Github: https://github.com/VoyageWang/VG-Refiner
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisualReasoning #ReinforcementLearning #ComputerVision #AIResearch #MachineLearning
📝 Summary:
VG-Refiner improves visual reasoning by addressing unreliable tool outputs. It uses a two-stage think-rethink mechanism and refinement reward to correct poor tool results. This significantly improves accuracy and correction ability in referring and grounding tasks.
🔹 Publication Date: Published on Dec 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06373
• PDF: https://arxiv.org/pdf/2512.06373
• Github: https://github.com/VoyageWang/VG-Refiner
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisualReasoning #ReinforcementLearning #ComputerVision #AIResearch #MachineLearning