✨InstructMix2Mix: Consistent Sparse-View Editing Through Multi-View Model Personalization
📝 Summary:
InstructMix2Mix I-Mix2Mix improves multi-view image editing from sparse inputs, which often lack consistency. It distills a 2D diffusion model into a multi-view diffusion model, leveraging its 3D prior for cross-view coherence. This framework significantly enhances multi-view consistency and per-...
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14899
• PDF: https://arxiv.org/pdf/2511.14899
• Project Page: https://danielgilo.github.io/instruct-mix2mix/
• Github: https://danielgilo.github.io/instruct-mix2mix/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultiViewEditing #DiffusionModels #ComputerVision #3DVision #ImageSynthesis
📝 Summary:
InstructMix2Mix I-Mix2Mix improves multi-view image editing from sparse inputs, which often lack consistency. It distills a 2D diffusion model into a multi-view diffusion model, leveraging its 3D prior for cross-view coherence. This framework significantly enhances multi-view consistency and per-...
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14899
• PDF: https://arxiv.org/pdf/2511.14899
• Project Page: https://danielgilo.github.io/instruct-mix2mix/
• Github: https://danielgilo.github.io/instruct-mix2mix/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultiViewEditing #DiffusionModels #ComputerVision #3DVision #ImageSynthesis
❤1
This media is not supported in your browser
VIEW IN TELEGRAM
✨Computer-Use Agents as Judges for Generative User Interface
📝 Summary:
This paper introduces a framework where Computer-Use Agents CUA act as judges for coding language models Coder to automatically design GUIs. The goal is to optimize interfaces for CUA efficiency and task solvability, rather than human aesthetics, using a new benchmark called AUI-Gym.
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15567
• PDF: https://arxiv.org/pdf/2511.15567
• Project Page: https://showlab.github.io/AUI/
• Github: https://github.com/showlab/AUI/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIAgents #GUIDesign #GenerativeAI #AIevaluation #LanguageModels
📝 Summary:
This paper introduces a framework where Computer-Use Agents CUA act as judges for coding language models Coder to automatically design GUIs. The goal is to optimize interfaces for CUA efficiency and task solvability, rather than human aesthetics, using a new benchmark called AUI-Gym.
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15567
• PDF: https://arxiv.org/pdf/2511.15567
• Project Page: https://showlab.github.io/AUI/
• Github: https://github.com/showlab/AUI/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIAgents #GUIDesign #GenerativeAI #AIevaluation #LanguageModels
✨M3-Bench: Multi-Modal, Multi-Hop, Multi-Threaded Tool-Using MLLM Agent Benchmark
📝 Summary:
M3-Bench is a new benchmark evaluating multimodal LLM agent tool use in complex, multi-hop workflows requiring visual grounding and tool dependencies. It introduces a similarity-driven alignment method and interpretable metrics. Evaluations show significant gaps in current MLLMs, especially in ar...
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17729
• PDF: https://arxiv.org/pdf/2511.17729
• Github: https://github.com/EtaYang10th/Open-M3-Bench
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MLLM #LLMAgents #AI #Benchmarking #ToolUse
📝 Summary:
M3-Bench is a new benchmark evaluating multimodal LLM agent tool use in complex, multi-hop workflows requiring visual grounding and tool dependencies. It introduces a similarity-driven alignment method and interpretable metrics. Evaluations show significant gaps in current MLLMs, especially in ar...
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17729
• PDF: https://arxiv.org/pdf/2511.17729
• Github: https://github.com/EtaYang10th/Open-M3-Bench
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MLLM #LLMAgents #AI #Benchmarking #ToolUse
✨General Agentic Memory Via Deep Research
📝 Summary:
GAM is a novel framework for AI memory addressing information loss in static systems. It uses JIT principles with a memorizer and researcher to create optimized contexts at runtime. This improves memory efficiency and task completion, leveraging LLMs and reinforcement learning.
🔹 Publication Date: Published on Nov 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18423
• PDF: https://arxiv.org/pdf/2511.18423
• Github: https://github.com/VectorSpaceLab/general-agentic-memory
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #LLMs #ReinforcementLearning #AIMemory #DeepLearning
📝 Summary:
GAM is a novel framework for AI memory addressing information loss in static systems. It uses JIT principles with a memorizer and researcher to create optimized contexts at runtime. This improves memory efficiency and task completion, leveraging LLMs and reinforcement learning.
🔹 Publication Date: Published on Nov 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18423
• PDF: https://arxiv.org/pdf/2511.18423
• Github: https://github.com/VectorSpaceLab/general-agentic-memory
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #LLMs #ReinforcementLearning #AIMemory #DeepLearning
✨In-Video Instructions: Visual Signals as Generative Control
📝 Summary:
This paper introduces In-Video Instruction for controllable image-to-video generation. It embeds visual signals like text or arrows directly into frames as instructions, offering precise, spatial-aware control over object actions. Experiments show video models reliably execute these visual cues.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19401
• PDF: https://arxiv.org/pdf/2511.19401
• Project Page: https://fangggf.github.io/In-Video/
• Github: https://fangggf.github.io/In-Video/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #GenerativeAI #ComputerVision #AIResearch #DeepLearning
📝 Summary:
This paper introduces In-Video Instruction for controllable image-to-video generation. It embeds visual signals like text or arrows directly into frames as instructions, offering precise, spatial-aware control over object actions. Experiments show video models reliably execute these visual cues.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19401
• PDF: https://arxiv.org/pdf/2511.19401
• Project Page: https://fangggf.github.io/In-Video/
• Github: https://fangggf.github.io/In-Video/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #GenerativeAI #ComputerVision #AIResearch #DeepLearning
✨AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning
📝 Summary:
AutoEnv and AutoEnv-36 provide a standardized framework and dataset for measuring cross-environment agent learning. Their evaluations show that fixed learning methods do not scale across diverse environments, highlighting current limitations in agent generalization.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19304
• PDF: https://arxiv.org/pdf/2511.19304
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #MachineLearning #AgentLearning #Generalization #ReinforcementLearning
📝 Summary:
AutoEnv and AutoEnv-36 provide a standardized framework and dataset for measuring cross-environment agent learning. Their evaluations show that fixed learning methods do not scale across diverse environments, highlighting current limitations in agent generalization.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19304
• PDF: https://arxiv.org/pdf/2511.19304
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #MachineLearning #AgentLearning #Generalization #ReinforcementLearning
✨DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
📝 Summary:
DeCo is a frequency-decoupled pixel diffusion framework that improves image generation by separating high-frequency details and low-frequency semantics. It uses a lightweight pixel decoder for details and a DiT for semantics, achieving superior efficiency and quality over existing pixel diffusion...
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19365
• PDF: https://arxiv.org/pdf/2511.19365
• Project Page: https://zehong-ma.github.io/DeCo/
• Github: https://github.com/Zehong-Ma/DeCo
🔹 Models citing this paper:
• https://huggingface.co/zehongma/DeCo
✨ Spaces citing this paper:
• https://huggingface.co/spaces/zehongma/DeCo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageGeneration #DiffusionModels #ComputerVision #DeepLearning #DeCo
📝 Summary:
DeCo is a frequency-decoupled pixel diffusion framework that improves image generation by separating high-frequency details and low-frequency semantics. It uses a lightweight pixel decoder for details and a DiT for semantics, achieving superior efficiency and quality over existing pixel diffusion...
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19365
• PDF: https://arxiv.org/pdf/2511.19365
• Project Page: https://zehong-ma.github.io/DeCo/
• Github: https://github.com/Zehong-Ma/DeCo
🔹 Models citing this paper:
• https://huggingface.co/zehongma/DeCo
✨ Spaces citing this paper:
• https://huggingface.co/spaces/zehongma/DeCo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageGeneration #DiffusionModels #ComputerVision #DeepLearning #DeCo
✨Budget-Aware Tool-Use Enables Effective Agent Scaling
📝 Summary:
Tool-augmented agents struggle to scale with more tool calls due to a lack of budget awareness. This paper introduces Budget Tracker for continuous budget awareness and BATS for adaptive planning, dynamically adjusting strategy based on remaining resources. These methods significantly improve cos...
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17006
• PDF: https://arxiv.org/pdf/2511.17006
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIAgents #ToolUse #ResourceManagement #AgentScaling #AIResearch
📝 Summary:
Tool-augmented agents struggle to scale with more tool calls due to a lack of budget awareness. This paper introduces Budget Tracker for continuous budget awareness and BATS for adaptive planning, dynamically adjusting strategy based on remaining resources. These methods significantly improve cos...
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17006
• PDF: https://arxiv.org/pdf/2511.17006
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIAgents #ToolUse #ResourceManagement #AgentScaling #AIResearch
✨UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios
📝 Summary:
UltraFlux overcomes diffusion transformer failures at 4K resolution and diverse aspect ratios through data-model co-design. It uses enhanced positional encoding, VAE improvements, gradient rebalancing, and aesthetic curriculum learning to achieve superior 4K text-to-image generation, outperformin...
🔹 Publication Date: Published on Nov 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18050
• PDF: https://arxiv.org/pdf/2511.18050
• Project Page: https://github.com/W2GenAI-Lab/UltraFlux
• Github: https://github.com/W2GenAI-Lab/UltraFlux
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#TextToImage #GenerativeAI #4KGeneration #DiffusionModels #AIResearch
📝 Summary:
UltraFlux overcomes diffusion transformer failures at 4K resolution and diverse aspect ratios through data-model co-design. It uses enhanced positional encoding, VAE improvements, gradient rebalancing, and aesthetic curriculum learning to achieve superior 4K text-to-image generation, outperformin...
🔹 Publication Date: Published on Nov 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18050
• PDF: https://arxiv.org/pdf/2511.18050
• Project Page: https://github.com/W2GenAI-Lab/UltraFlux
• Github: https://github.com/W2GenAI-Lab/UltraFlux
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#TextToImage #GenerativeAI #4KGeneration #DiffusionModels #AIResearch
✨Controllable Layer Decomposition for Reversible Multi-Layer Image Generation
📝 Summary:
Controllable Layer Decomposition CLD enables fine-grained, controllable separation of raster images into editable RGBA layers, overcoming traditional compositing limitations. Using LD-DiT and MLCA, CLD surpasses existing methods in quality and control. It produces layers directly usable in design...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16249
• PDF: https://arxiv.org/pdf/2511.16249
• Github: https://github.com/monkek123King/CLD
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageGeneration #DeepLearning #ComputerVision #ImageEditing #LayerDecomposition
📝 Summary:
Controllable Layer Decomposition CLD enables fine-grained, controllable separation of raster images into editable RGBA layers, overcoming traditional compositing limitations. Using LD-DiT and MLCA, CLD surpasses existing methods in quality and control. It produces layers directly usable in design...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16249
• PDF: https://arxiv.org/pdf/2511.16249
• Github: https://github.com/monkek123King/CLD
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageGeneration #DeepLearning #ComputerVision #ImageEditing #LayerDecomposition
✨PRInTS: Reward Modeling for Long-Horizon Information Seeking
📝 Summary:
PRInTS is a generative process reward model that improves AI agents information-seeking. It provides dense scoring on step quality and summarizes long trajectories to manage context. PRInTS enhances agent performance, matching or surpassing frontier models with a smaller backbone.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19314
• PDF: https://arxiv.org/pdf/2511.19314
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#RewardModeling #InformationSeeking #AIagents #GenerativeAI #MachineLearning
📝 Summary:
PRInTS is a generative process reward model that improves AI agents information-seeking. It provides dense scoring on step quality and summarizes long trajectories to manage context. PRInTS enhances agent performance, matching or surpassing frontier models with a smaller backbone.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19314
• PDF: https://arxiv.org/pdf/2511.19314
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#RewardModeling #InformationSeeking #AIagents #GenerativeAI #MachineLearning
✨Plan-X: Instruct Video Generation via Semantic Planning
📝 Summary:
Plan-X improves instruction-aligned video generation by integrating a Semantic Planner with diffusion models. The planner generates semantic tokens that guide video synthesis, reducing visual hallucinations. This framework combines language models for reasoning with diffusion models for photoreal...
🔹 Publication Date: Published on Nov 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17986
• PDF: https://arxiv.org/pdf/2511.17986
• Project Page: https://byteaigc.github.io/Plan-X/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #DiffusionModels #AI #ComputerVision #DeepLearning
📝 Summary:
Plan-X improves instruction-aligned video generation by integrating a Semantic Planner with diffusion models. The planner generates semantic tokens that guide video synthesis, reducing visual hallucinations. This framework combines language models for reasoning with diffusion models for photoreal...
🔹 Publication Date: Published on Nov 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17986
• PDF: https://arxiv.org/pdf/2511.17986
• Project Page: https://byteaigc.github.io/Plan-X/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #DiffusionModels #AI #ComputerVision #DeepLearning
This media is not supported in your browser
VIEW IN TELEGRAM
✨Target-Bench: Can World Models Achieve Mapless Path Planning with Semantic Targets?
📝 Summary:
Target-Bench evaluates world models for mapless robot path planning to semantic targets in real-world environments. It reveals off-the-shelf models perform poorly, but fine-tuning significantly improves their planning capability.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17792
• PDF: https://arxiv.org/pdf/2511.17792
• Project Page: https://target-bench.github.io/
• Github: https://github.com/TUM-AVS/target-bench
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Robotics #PathPlanning #WorldModels #ArtificialIntelligence #MachineLearning
📝 Summary:
Target-Bench evaluates world models for mapless robot path planning to semantic targets in real-world environments. It reveals off-the-shelf models perform poorly, but fine-tuning significantly improves their planning capability.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17792
• PDF: https://arxiv.org/pdf/2511.17792
• Project Page: https://target-bench.github.io/
• Github: https://github.com/TUM-AVS/target-bench
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Robotics #PathPlanning #WorldModels #ArtificialIntelligence #MachineLearning
✨SyncMV4D: Synchronized Multi-view Joint Diffusion of Appearance and Motion for Hand-Object Interaction Synthesis
📝 Summary:
SyncMV4D generates realistic and consistent multi-view 3D Hand-Object Interaction videos and 4D motions. It unifies visual priors, motion dynamics, and multi-view geometry, using a joint diffusion model and a point aligner for robust generation.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19319
• PDF: https://arxiv.org/pdf/2511.19319
• Project Page: https://droliven.github.io/SyncMV4D/
• Github: https://droliven.github.io/SyncMV4D/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#HandObjectInteraction #DiffusionModels #3DGeneration #ComputerVision #GenerativeAI
📝 Summary:
SyncMV4D generates realistic and consistent multi-view 3D Hand-Object Interaction videos and 4D motions. It unifies visual priors, motion dynamics, and multi-view geometry, using a joint diffusion model and a point aligner for robust generation.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19319
• PDF: https://arxiv.org/pdf/2511.19319
• Project Page: https://droliven.github.io/SyncMV4D/
• Github: https://droliven.github.io/SyncMV4D/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#HandObjectInteraction #DiffusionModels #3DGeneration #ComputerVision #GenerativeAI
✨Continuous Thought Machines
📝 Summary:
The Continuous Thought Machine CTM reintroduces neural timing and synchronization to deep learning for complex sequential reasoning and biologically plausible AI. It uses neuron-level temporal processing and synchronization as a latent representation, performing well on diverse tasks with adaptiv...
🔹 Publication Date: Published on May 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.05522
• PDF: https://arxiv.org/pdf/2505.05522
• Github: https://github.com/SakanaAI/continuous-thought-machines
🔹 Models citing this paper:
• https://huggingface.co/SakanaAI/ctm-imagenet
• https://huggingface.co/SakanaAI/ctm-maze-large
✨ Spaces citing this paper:
• https://huggingface.co/spaces/Uday/ctm-energy-based-halting
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DeepLearning #NeuralNetworks #BiologicallyInspiredAI #TemporalAI
📝 Summary:
The Continuous Thought Machine CTM reintroduces neural timing and synchronization to deep learning for complex sequential reasoning and biologically plausible AI. It uses neuron-level temporal processing and synchronization as a latent representation, performing well on diverse tasks with adaptiv...
🔹 Publication Date: Published on May 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.05522
• PDF: https://arxiv.org/pdf/2505.05522
• Github: https://github.com/SakanaAI/continuous-thought-machines
🔹 Models citing this paper:
• https://huggingface.co/SakanaAI/ctm-imagenet
• https://huggingface.co/SakanaAI/ctm-maze-large
✨ Spaces citing this paper:
• https://huggingface.co/spaces/Uday/ctm-energy-based-halting
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DeepLearning #NeuralNetworks #BiologicallyInspiredAI #TemporalAI
✨LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer
📝 Summary:
LucidFlux is a caption-free universal image restoration framework using a large diffusion transformer. It employs a dual-branch conditioner and adaptive modulation for robust restoration, avoiding text prompts by using SigLIP features. This approach outperforms existing methods by intelligently c...
🔹 Publication Date: Published on Sep 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.22414
• PDF: https://arxiv.org/pdf/2509.22414
• Project Page: https://w2genai-lab.github.io/LucidFlux/
• Github: https://github.com/W2GenAI-Lab/LucidFlux
🔹 Models citing this paper:
• https://huggingface.co/W2GenAI/LucidFlux
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageRestoration #DiffusionModels #ComputerVision #DeepLearning #GenerativeAI
📝 Summary:
LucidFlux is a caption-free universal image restoration framework using a large diffusion transformer. It employs a dual-branch conditioner and adaptive modulation for robust restoration, avoiding text prompts by using SigLIP features. This approach outperforms existing methods by intelligently c...
🔹 Publication Date: Published on Sep 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.22414
• PDF: https://arxiv.org/pdf/2509.22414
• Project Page: https://w2genai-lab.github.io/LucidFlux/
• Github: https://github.com/W2GenAI-Lab/LucidFlux
🔹 Models citing this paper:
• https://huggingface.co/W2GenAI/LucidFlux
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageRestoration #DiffusionModels #ComputerVision #DeepLearning #GenerativeAI
✨Seeing the Forest and the Trees: Query-Aware Tokenizer for Long-Video Multimodal Language Models
📝 Summary:
QTSplus is a query-aware token selector for long-video multimodal language models. It dynamically selects the most important visual tokens based on a text query, significantly compressing vision data and reducing latency. This method maintains overall accuracy and enhances temporal understanding ...
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://huggingface.co/collections/AlpachinoNLP/qtsplus
• PDF: https://arxiv.org/pdf/2511.11910
• Project Page: https://qtsplus.github.io/
• Github: https://github.com/Siyou-Li/QTSplus
🔹 Models citing this paper:
• https://huggingface.co/AlpachinoNLP/QTSplus-3B
• https://huggingface.co/AlpachinoNLP/QTSplus-3B-FT
✨ Spaces citing this paper:
• https://huggingface.co/spaces/AlpachinoNLP/QTSplus-3B
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #VideoAI #LLM #Tokenization #ComputerVision
📝 Summary:
QTSplus is a query-aware token selector for long-video multimodal language models. It dynamically selects the most important visual tokens based on a text query, significantly compressing vision data and reducing latency. This method maintains overall accuracy and enhances temporal understanding ...
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://huggingface.co/collections/AlpachinoNLP/qtsplus
• PDF: https://arxiv.org/pdf/2511.11910
• Project Page: https://qtsplus.github.io/
• Github: https://github.com/Siyou-Li/QTSplus
🔹 Models citing this paper:
• https://huggingface.co/AlpachinoNLP/QTSplus-3B
• https://huggingface.co/AlpachinoNLP/QTSplus-3B-FT
✨ Spaces citing this paper:
• https://huggingface.co/spaces/AlpachinoNLP/QTSplus-3B
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #VideoAI #LLM #Tokenization #ComputerVision
huggingface.co
QTSplus - a AlpachinoNLP Collection
Official models and datasets for paper(https://arxiv.org/abs/2511.11910)
✨DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
📝 Summary:
RLER is introduced to train deep research models for long-form tasks by using rubrics that co-evolve with the policy model. Enabling DR Tulu-8B to outperform open models and match proprietary systems while being more cost-effective.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19399
• PDF: https://arxiv.org/pdf/2511.19399
• Project Page: https://github.com/rlresearch/dr-tulu
• Github: https://github.com/rlresearch/dr-tulu
🔹 Models citing this paper:
• https://huggingface.co/rl-research/DR-Tulu-8B
• https://huggingface.co/rl-research/DR-Tulu-SFT-8B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/rl-research/dr-tulu-sft-data
• https://huggingface.co/datasets/rl-research/dr-tulu-rl-data
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #LLMs #DeepLearning #AIResearch #MachineLearning
📝 Summary:
RLER is introduced to train deep research models for long-form tasks by using rubrics that co-evolve with the policy model. Enabling DR Tulu-8B to outperform open models and match proprietary systems while being more cost-effective.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19399
• PDF: https://arxiv.org/pdf/2511.19399
• Project Page: https://github.com/rlresearch/dr-tulu
• Github: https://github.com/rlresearch/dr-tulu
🔹 Models citing this paper:
• https://huggingface.co/rl-research/DR-Tulu-8B
• https://huggingface.co/rl-research/DR-Tulu-SFT-8B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/rl-research/dr-tulu-sft-data
• https://huggingface.co/datasets/rl-research/dr-tulu-rl-data
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #LLMs #DeepLearning #AIResearch #MachineLearning
arXiv.org
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
Deep research models perform multi-step research to produce long-form, well-attributed answers. However, most open deep research models are trained on easily verifiable short-form QA tasks via...
✨HunyuanVideo 1.5 Technical Report
📝 Summary:
HunyuanVideo 1.5 is a lightweight, open-source video generation model achieving state-of-the-art visual quality and motion coherence. It employs an advanced DiT architecture with SSTA and an efficient video super-resolution network, enabling high-quality video creation on consumer GPUs.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18870
• PDF: https://arxiv.org/pdf/2511.18870
• Github: https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #AI #DeepLearning #OpenSource #DiffusionModels
📝 Summary:
HunyuanVideo 1.5 is a lightweight, open-source video generation model achieving state-of-the-art visual quality and motion coherence. It employs an advanced DiT architecture with SSTA and an efficient video super-resolution network, enabling high-quality video creation on consumer GPUs.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18870
• PDF: https://arxiv.org/pdf/2511.18870
• Github: https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #AI #DeepLearning #OpenSource #DiffusionModels