✨LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
📝 Summary:
LongVT is an agentic framework that improves long video reasoning. It uses LMMs as tools for global-to-local video cropping and frame resampling to ground answers. This novel approach consistently outperforms existing baselines.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20785
• PDF: https://arxiv.org/pdf/2511.20785
• Project Page: https://evolvinglmms-lab.github.io/LongVT/
• Github: https://github.com/EvolvingLMMs-Lab/LongVT
🔹 Models citing this paper:
• https://huggingface.co/longvideotool/LongVT-RFT
• https://huggingface.co/longvideotool/LongVT-SFT
• https://huggingface.co/longvideotool/LongVT-RL
✨ Datasets citing this paper:
• https://huggingface.co/datasets/longvideotool/LongVT-Source
• https://huggingface.co/datasets/longvideotool/LongVT-Parquet
✨ Spaces citing this paper:
• https://huggingface.co/spaces/longvideotool/LongVT-Demo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoAI #LMMs #AgenticAI #ComputerVision #AIResearch
📝 Summary:
LongVT is an agentic framework that improves long video reasoning. It uses LMMs as tools for global-to-local video cropping and frame resampling to ground answers. This novel approach consistently outperforms existing baselines.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20785
• PDF: https://arxiv.org/pdf/2511.20785
• Project Page: https://evolvinglmms-lab.github.io/LongVT/
• Github: https://github.com/EvolvingLMMs-Lab/LongVT
🔹 Models citing this paper:
• https://huggingface.co/longvideotool/LongVT-RFT
• https://huggingface.co/longvideotool/LongVT-SFT
• https://huggingface.co/longvideotool/LongVT-RL
✨ Datasets citing this paper:
• https://huggingface.co/datasets/longvideotool/LongVT-Source
• https://huggingface.co/datasets/longvideotool/LongVT-Parquet
✨ Spaces citing this paper:
• https://huggingface.co/spaces/longvideotool/LongVT-Demo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoAI #LMMs #AgenticAI #ComputerVision #AIResearch
arXiv.org
LongVT: Incentivizing "Thinking with Long Videos" via...
Large multimodal models (LMMs) have shown great potential for video reasoning with textual Chain-of-Thought. However, they remain vulnerable to hallucinations, especially when processing long-form...
❤1
Media is too big
VIEW IN TELEGRAM
✨GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation
📝 Summary:
GR-RL improves VLA policies for dexterous long-horizon manipulation. It filters and augments demonstrations, then refines them with RL. This enables unprecedented complex tasks, notably autonomously lacing a shoe.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01801
• PDF: https://arxiv.org/pdf/2512.01801
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Robotics #ReinforcementLearning #DexterousManipulation #RoboticManipulation #AI
📝 Summary:
GR-RL improves VLA policies for dexterous long-horizon manipulation. It filters and augments demonstrations, then refines them with RL. This enables unprecedented complex tasks, notably autonomously lacing a shoe.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01801
• PDF: https://arxiv.org/pdf/2512.01801
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Robotics #ReinforcementLearning #DexterousManipulation #RoboticManipulation #AI
✨What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards
📝 Summary:
NewtonRewards is a post-training framework that uses verifiable, physics-grounded rewards to improve physical realism and motion quality in AI-generated videos. It enforces Newtonian kinematics and mass conservation, significantly outperforming prior methods on various motion tasks. This offers a...
🔹 Publication Date: Published on Nov 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00425
• PDF: https://arxiv.org/pdf/2512.00425
• Project Page: https://cvlab-stonybrook.github.io/NewtonRewards/
• Github: https://cvlab-stonybrook.github.io/NewtonRewards
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIVideoGeneration #PhysicsInAI #MachineLearning #GenerativeAI #ComputerVision
📝 Summary:
NewtonRewards is a post-training framework that uses verifiable, physics-grounded rewards to improve physical realism and motion quality in AI-generated videos. It enforces Newtonian kinematics and mass conservation, significantly outperforming prior methods on various motion tasks. This offers a...
🔹 Publication Date: Published on Nov 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00425
• PDF: https://arxiv.org/pdf/2512.00425
• Project Page: https://cvlab-stonybrook.github.io/NewtonRewards/
• Github: https://cvlab-stonybrook.github.io/NewtonRewards
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIVideoGeneration #PhysicsInAI #MachineLearning #GenerativeAI #ComputerVision
✨SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs
📝 Summary:
SpeContext uses a distilled language model for efficient long-context LLM reasoning. This system co-design significantly reduces parameters and improves throughput by up to 24.89x in cloud and 10.06x in edge, with minimal accuracy loss.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00722
• PDF: https://arxiv.org/pdf/2512.00722
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #AIResearch #DeepLearning #AIOptimization #ContextSparsity
📝 Summary:
SpeContext uses a distilled language model for efficient long-context LLM reasoning. This system co-design significantly reduces parameters and improves throughput by up to 24.89x in cloud and 10.06x in edge, with minimal accuracy loss.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00722
• PDF: https://arxiv.org/pdf/2512.00722
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #AIResearch #DeepLearning #AIOptimization #ContextSparsity
✨How Far Are We from Genuinely Useful Deep Research Agents?
📝 Summary:
The paper introduces FINDER, a benchmark for Deep Research Agents DRAs with human-curated tasks and structured metrics. It also presents DEFT, a failure taxonomy showing DRAs struggle with evidence integration, verification, and resilient planning, not task comprehension.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01948
• PDF: https://arxiv.org/pdf/2512.01948
• Github: https://github.com/OPPO-PersonalAI/FINDER_DEFT
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DeepResearchAgents #AIResearch #AIBenchmarking #FailureTaxonomy #ArtificialIntelligence
📝 Summary:
The paper introduces FINDER, a benchmark for Deep Research Agents DRAs with human-curated tasks and structured metrics. It also presents DEFT, a failure taxonomy showing DRAs struggle with evidence integration, verification, and resilient planning, not task comprehension.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01948
• PDF: https://arxiv.org/pdf/2512.01948
• Github: https://github.com/OPPO-PersonalAI/FINDER_DEFT
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DeepResearchAgents #AIResearch #AIBenchmarking #FailureTaxonomy #ArtificialIntelligence
✨Rectifying LLM Thought from Lens of Optimization
📝 Summary:
RePro is a novel process-level reward mechanism that refines LLM reasoning by treating chain-of-thought as an optimization process. It uses dual scoring to generate a composite reward, integrated into RL pipelines to enhance performance and reduce suboptimal behaviors.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01925
• PDF: https://arxiv.org/pdf/2512.01925
• Github: https://github.com/open-compass/RePro
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #ReinforcementLearning #Optimization #ArtificialIntelligence #DeepLearning
📝 Summary:
RePro is a novel process-level reward mechanism that refines LLM reasoning by treating chain-of-thought as an optimization process. It uses dual scoring to generate a composite reward, integrated into RL pipelines to enhance performance and reduce suboptimal behaviors.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01925
• PDF: https://arxiv.org/pdf/2512.01925
• Github: https://github.com/open-compass/RePro
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #ReinforcementLearning #Optimization #ArtificialIntelligence #DeepLearning
Media is too big
VIEW IN TELEGRAM
✨VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference
📝 Summary:
VLASH is an asynchronous inference framework for VLAs. It achieves fast accurate and low-latency robotic control by estimating future robot states bridging prediction-execution gaps. This enables VLAs to perform high-precision tasks like ping-pong with significant speedup and reduced latency.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01031
• PDF: https://arxiv.org/pdf/2512.01031
• Github: https://github.com/mit-han-lab/vlash
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Robotics #VisionLanguageModels #RealTimeAI #AIResearch #MachineLearning
📝 Summary:
VLASH is an asynchronous inference framework for VLAs. It achieves fast accurate and low-latency robotic control by estimating future robot states bridging prediction-execution gaps. This enables VLAs to perform high-precision tasks like ping-pong with significant speedup and reduced latency.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01031
• PDF: https://arxiv.org/pdf/2512.01031
• Github: https://github.com/mit-han-lab/vlash
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Robotics #VisionLanguageModels #RealTimeAI #AIResearch #MachineLearning
✨TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models
📝 Summary:
TUNA is a unified multimodal model that builds a single continuous visual representation. This enables end-to-end understanding and generation, avoiding mismatches found in decoupled models and achieving state-of-the-art performance across multimodal tasks.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02014
• PDF: https://arxiv.org/pdf/2512.02014
• Project Page: https://tuna-ai.org/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #ComputerVision #DeepLearning #GenerativeAI #AIResearch
📝 Summary:
TUNA is a unified multimodal model that builds a single continuous visual representation. This enables end-to-end understanding and generation, avoiding mismatches found in decoupled models and achieving state-of-the-art performance across multimodal tasks.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02014
• PDF: https://arxiv.org/pdf/2512.02014
• Project Page: https://tuna-ai.org/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #ComputerVision #DeepLearning #GenerativeAI #AIResearch
✨Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model
📝 Summary:
Lotus-2 is a two-stage deterministic framework adapting powerful diffusion models for accurate geometric inference. It achieves top monocular depth and competitive surface normal prediction with very limited training data.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01030
• PDF: https://arxiv.org/pdf/2512.01030
• Project Page: https://lotus-2.github.io/
• Github: https://github.com/EnVision-Research/Lotus-2
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ComputerVision #DeepLearning #DiffusionModels #GeometricPrediction #MonocularDepth
📝 Summary:
Lotus-2 is a two-stage deterministic framework adapting powerful diffusion models for accurate geometric inference. It achieves top monocular depth and competitive surface normal prediction with very limited training data.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01030
• PDF: https://arxiv.org/pdf/2512.01030
• Project Page: https://lotus-2.github.io/
• Github: https://github.com/EnVision-Research/Lotus-2
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ComputerVision #DeepLearning #DiffusionModels #GeometricPrediction #MonocularDepth
✨Generalist Large Language Models Outperform Clinical Tools on Medical Benchmarks
📝 Summary:
Generalist LLMs like GPT-5 outperformed specialized clinical AI tools on a medical benchmark. This reveals that clinical decision support tools may lag behind frontier models and need urgent, independent evaluation before clinical deployment.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01191
• PDF: https://arxiv.org/pdf/2512.01191
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #HealthcareAI #AIinMedicine #ClinicalAI #MedicalResearch
📝 Summary:
Generalist LLMs like GPT-5 outperformed specialized clinical AI tools on a medical benchmark. This reveals that clinical decision support tools may lag behind frontier models and need urgent, independent evaluation before clinical deployment.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01191
• PDF: https://arxiv.org/pdf/2512.01191
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #HealthcareAI #AIinMedicine #ClinicalAI #MedicalResearch
✨Doppler-Enhanced Deep Learning: Improving Thyroid Nodule Segmentation with YOLOv5 Instance Segmentation
📝 Summary:
YOLOv5 algorithms accurately segment thyroid nodules in ultrasound images. Incorporating doppler images significantly improves segmentation performance across all models, offering a real-time solution for clinical diagnostics.
🔹 Publication Date: Published on Nov 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00639
• PDF: https://arxiv.org/pdf/2512.00639
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DeepLearning #MedicalImaging #ThyroidHealth #YOLOv5 #AIinHealthcare
📝 Summary:
YOLOv5 algorithms accurately segment thyroid nodules in ultrasound images. Incorporating doppler images significantly improves segmentation performance across all models, offering a real-time solution for clinical diagnostics.
🔹 Publication Date: Published on Nov 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00639
• PDF: https://arxiv.org/pdf/2512.00639
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DeepLearning #MedicalImaging #ThyroidHealth #YOLOv5 #AIinHealthcare
Media is too big
VIEW IN TELEGRAM
✨Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout
📝 Summary:
Infinity-RoPE is a new inference-time framework for autoregressive video diffusion, enabling continuous generation, fine-grained action control, and cinematic transitions without retraining. It addresses limitations like finite temporal horizons and slow prompt responsiveness, outperforming prior...
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20649
• PDF: https://arxiv.org/pdf/2511.20649
• Github: https://infinity-rope.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #AI #DeepLearning #ComputerVision #DiffusionModels
📝 Summary:
Infinity-RoPE is a new inference-time framework for autoregressive video diffusion, enabling continuous generation, fine-grained action control, and cinematic transitions without retraining. It addresses limitations like finite temporal horizons and slow prompt responsiveness, outperforming prior...
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20649
• PDF: https://arxiv.org/pdf/2511.20649
• Github: https://infinity-rope.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #AI #DeepLearning #ComputerVision #DiffusionModels
✨Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights
📝 Summary:
Envision is a new benchmark for chained text-to-multi-image generation assessing models dynamic causal process and world knowledge. Unified multimodal models outperform specialized ones in causal coherence but still struggle with spatiotemporal consistency, due to static training.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01816
• PDF: https://arxiv.org/pdf/2512.01816
• Project Page: https://opendatalab-raiser.github.io/Envision/
• Github: https://github.com/opendatalab-raiser/Envision
✨ Datasets citing this paper:
• https://huggingface.co/datasets/opendatalab-raiser/Envision
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #CausalReasoning #AIBenchmarking #GenerativeAI #ComputerVision
📝 Summary:
Envision is a new benchmark for chained text-to-multi-image generation assessing models dynamic causal process and world knowledge. Unified multimodal models outperform specialized ones in causal coherence but still struggle with spatiotemporal consistency, due to static training.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01816
• PDF: https://arxiv.org/pdf/2512.01816
• Project Page: https://opendatalab-raiser.github.io/Envision/
• Github: https://github.com/opendatalab-raiser/Envision
✨ Datasets citing this paper:
• https://huggingface.co/datasets/opendatalab-raiser/Envision
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #CausalReasoning #AIBenchmarking #GenerativeAI #ComputerVision
✨The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
📝 Summary:
ImageCritic corrects inconsistent fine-grained details in generated images using a reference-guided post-editing approach. It employs attention alignment loss and a detail encoder to precisely rectify inconsistencies and improve accuracy.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20614
• PDF: https://arxiv.org/pdf/2511.20614
• Project Page: https://ouyangziheng.github.io/ImageCritic-Page/
• Github: https://github.com/HVision-NKU/ImageCritic
🔹 Models citing this paper:
• https://huggingface.co/ziheng1234/ImageCritic
✨ Datasets citing this paper:
• https://huggingface.co/datasets/ziheng1234/Critic-10K
✨ Spaces citing this paper:
• https://huggingface.co/spaces/ziheng1234/ImageCritic
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageGeneration #ComputerVision #DeepLearning #AI #ImageEditing
📝 Summary:
ImageCritic corrects inconsistent fine-grained details in generated images using a reference-guided post-editing approach. It employs attention alignment loss and a detail encoder to precisely rectify inconsistencies and improve accuracy.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20614
• PDF: https://arxiv.org/pdf/2511.20614
• Project Page: https://ouyangziheng.github.io/ImageCritic-Page/
• Github: https://github.com/HVision-NKU/ImageCritic
🔹 Models citing this paper:
• https://huggingface.co/ziheng1234/ImageCritic
✨ Datasets citing this paper:
• https://huggingface.co/datasets/ziheng1234/Critic-10K
✨ Spaces citing this paper:
• https://huggingface.co/spaces/ziheng1234/ImageCritic
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageGeneration #ComputerVision #DeepLearning #AI #ImageEditing
arXiv.org
The Consistency Critic: Correcting Inconsistencies in Generated...
Previous works have explored various customized generation tasks given a reference image, but they still face limitations in generating consistent fine-grained details. In this paper, our aim is...
✨HiconAgent: History Context-aware Policy Optimization for GUI Agents
📝 Summary:
HiconAgent introduces History Context-aware Policy Optimization HCPO for GUI agents. HCPO efficiently leverages historical context using dynamic sampling and compression, achieving better performance than larger models with reduced computational cost and significant speedups.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01763
• PDF: https://arxiv.org/pdf/2512.01763
• Github: https://github.com/JiuTian-VL/HiconAgent
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#HiconAgent #GUIAgents #AIResearch #ReinforcementLearning #ContextAwareAI
📝 Summary:
HiconAgent introduces History Context-aware Policy Optimization HCPO for GUI agents. HCPO efficiently leverages historical context using dynamic sampling and compression, achieving better performance than larger models with reduced computational cost and significant speedups.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01763
• PDF: https://arxiv.org/pdf/2512.01763
• Github: https://github.com/JiuTian-VL/HiconAgent
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#HiconAgent #GUIAgents #AIResearch #ReinforcementLearning #ContextAwareAI
✨InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision
📝 Summary:
InternVideo-Next proposes a two-stage Encoder-Predictor-Decoder framework for general video representation learning without text supervision. It uses a conditional diffusion decoder to bridge pixel fidelity with semantics in Stage 1, then a latent world model in Stage 2 to learn world knowledge a...
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01342
• PDF: https://arxiv.org/pdf/2512.01342
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoFoundationModels #VideoAI #DeepLearning #UnsupervisedLearning #DiffusionModels
📝 Summary:
InternVideo-Next proposes a two-stage Encoder-Predictor-Decoder framework for general video representation learning without text supervision. It uses a conditional diffusion decoder to bridge pixel fidelity with semantics in Stage 1, then a latent world model in Stage 2 to learn world knowledge a...
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01342
• PDF: https://arxiv.org/pdf/2512.01342
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoFoundationModels #VideoAI #DeepLearning #UnsupervisedLearning #DiffusionModels
This media is not supported in your browser
VIEW IN TELEGRAM
✨Seeing the Wind from a Falling Leaf
📝 Summary:
This paper presents an end-to-end differentiable inverse graphics framework that recovers invisible force representations from video observations. This innovation enables estimating physical forces, like wind from a falling leaf, leading to physics-based video generation and editing.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00762
• PDF: https://arxiv.org/pdf/2512.00762
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#InverseGraphics #PhysicsAI #ComputerVision #VideoGeneration #DeepLearning
📝 Summary:
This paper presents an end-to-end differentiable inverse graphics framework that recovers invisible force representations from video observations. This innovation enables estimating physical forces, like wind from a falling leaf, leading to physics-based video generation and editing.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00762
• PDF: https://arxiv.org/pdf/2512.00762
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#InverseGraphics #PhysicsAI #ComputerVision #VideoGeneration #DeepLearning
✨ChronosObserver: Taming 4D World with Hyperspace Diffusion Sampling
📝 Summary:
ChronosObserver generates high-fidelity, 3D-consistent, and time-synchronized multi-view videos. It is a training-free method leveraging World State Hyperspace and Hyperspace Guided Sampling to synchronize views. This approach overcomes challenges in 4D world generation without model training.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01481
• PDF: https://arxiv.org/pdf/2512.01481
• Project Page: https://icvteam.github.io/ChronosObserver.html
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#4DGeneration #DiffusionModels #ComputerVision #MultiViewVideo #AIResearch
📝 Summary:
ChronosObserver generates high-fidelity, 3D-consistent, and time-synchronized multi-view videos. It is a training-free method leveraging World State Hyperspace and Hyperspace Guided Sampling to synchronize views. This approach overcomes challenges in 4D world generation without model training.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01481
• PDF: https://arxiv.org/pdf/2512.01481
• Project Page: https://icvteam.github.io/ChronosObserver.html
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#4DGeneration #DiffusionModels #ComputerVision #MultiViewVideo #AIResearch
✨WiseEdit: Benchmarking Cognition- and Creativity-Informed Image Editing
📝 Summary:
WiseEdit is a new benchmark for evaluating image editing models, focusing on cognition and creativity. It decomposes editing into Awareness, Interpretation, and Imagination tasks, assessing declarative, procedural, and metacognitive knowledge. This reveals limitations in current models.
🔹 Publication Date: Published on Nov 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00387
• PDF: https://arxiv.org/pdf/2512.00387
• Project Page: https://qnancy.github.io/wiseedit_project_page/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageEditing #ComputerVision #AIResearch #CognitiveAI #CreativeAI
📝 Summary:
WiseEdit is a new benchmark for evaluating image editing models, focusing on cognition and creativity. It decomposes editing into Awareness, Interpretation, and Imagination tasks, assessing declarative, procedural, and metacognitive knowledge. This reveals limitations in current models.
🔹 Publication Date: Published on Nov 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00387
• PDF: https://arxiv.org/pdf/2512.00387
• Project Page: https://qnancy.github.io/wiseedit_project_page/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageEditing #ComputerVision #AIResearch #CognitiveAI #CreativeAI
❤1