✨Over++: Generative Video Compositing for Layer Interaction Effects
📝 Summary:
Over++ introduces augmented compositing, a framework that generates realistic, text-prompted environmental effects for videos. It synthesizes effects like shadows onto video layers while preserving the original scene, outperforming prior methods without dense annotations.
🔹 Publication Date: Published on Dec 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19661
• PDF: https://arxiv.org/pdf/2512.19661
• Project Page: https://overplusplus.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#GenerativeAI #VideoCompositing #VFX #ComputerGraphics #AIResearch
📝 Summary:
Over++ introduces augmented compositing, a framework that generates realistic, text-prompted environmental effects for videos. It synthesizes effects like shadows onto video layers while preserving the original scene, outperforming prior methods without dense annotations.
🔹 Publication Date: Published on Dec 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19661
• PDF: https://arxiv.org/pdf/2512.19661
• Project Page: https://overplusplus.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#GenerativeAI #VideoCompositing #VFX #ComputerGraphics #AIResearch
👍1
✨SecureCode v2.0: A Production-Grade Dataset for Training Security-Aware Code Generation Models
📝 Summary:
SecureCode v2.0 is a production-grade dataset of 1215 security-focused coding examples. It trains AI models to generate secure code by providing real-incident examples with vulnerable and secure implementations, attacks, defense, and operational security context across 11 languages, using a conve...
🔹 Publication Date: Published on Dec 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18542
• PDF: https://arxiv.org/pdf/2512.18542
• Project Page: https://perfecxion.ai/
• Github: https://github.com/scthornton/securecode-v2
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Cybersecurity #CodeSecurity #AI #CodeGeneration #Dataset
📝 Summary:
SecureCode v2.0 is a production-grade dataset of 1215 security-focused coding examples. It trains AI models to generate secure code by providing real-incident examples with vulnerable and secure implementations, attacks, defense, and operational security context across 11 languages, using a conve...
🔹 Publication Date: Published on Dec 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18542
• PDF: https://arxiv.org/pdf/2512.18542
• Project Page: https://perfecxion.ai/
• Github: https://github.com/scthornton/securecode-v2
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Cybersecurity #CodeSecurity #AI #CodeGeneration #Dataset
✨Step-DeepResearch Technical Report
📝 Summary:
Step-DeepResearch is an end-to-end agent for deep research, using a data synthesis strategy and progressive training. It achieves expert-level capabilities, outperforming existing models and rivaling SOTA closed-source models with cost-efficiency. It also introduces ADR-Bench for realistic Chines...
🔹 Publication Date: Published on Dec 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20491
• PDF: https://arxiv.org/pdf/2512.20491
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #MachineLearning #DeepResearch #AIagent #SOTA
📝 Summary:
Step-DeepResearch is an end-to-end agent for deep research, using a data synthesis strategy and progressive training. It achieves expert-level capabilities, outperforming existing models and rivaling SOTA closed-source models with cost-efficiency. It also introduces ADR-Bench for realistic Chines...
🔹 Publication Date: Published on Dec 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20491
• PDF: https://arxiv.org/pdf/2512.20491
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #MachineLearning #DeepResearch #AIagent #SOTA
✨Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies
📝 Summary:
This paper decomposes LLM policies into internal layer and modular policies, revealing distinct reasoning patterns across layers. It finds early layers explore and top layers refine. Motivated by this, Bottom-up Policy Optimization BuPO is proposed to optimize internal layer policies for superior...
🔹 Publication Date: Published on Dec 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19673
• PDF: https://arxiv.org/pdf/2512.19673
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #PolicyOptimization #DeepLearning #AIResearch #NLP
📝 Summary:
This paper decomposes LLM policies into internal layer and modular policies, revealing distinct reasoning patterns across layers. It finds early layers explore and top layers refine. Motivated by this, Bottom-up Policy Optimization BuPO is proposed to optimize internal layer policies for superior...
🔹 Publication Date: Published on Dec 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19673
• PDF: https://arxiv.org/pdf/2512.19673
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #PolicyOptimization #DeepLearning #AIResearch #NLP
✨SAM Audio: Segment Anything in Audio
📝 Summary:
SAM Audio is a foundation model for general audio separation. It unifies text visual and temporal span prompts achieving state-of-the-art performance across diverse audio types. It also introduces a new real-world separation benchmark.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18099
• PDF: https://arxiv.org/pdf/2512.18099
• Project Page: https://ai.meta.com/samaudio/
• Github: https://github.com/facebookresearch/sam-audio
🔹 Models citing this paper:
• https://huggingface.co/facebook/sam-audio-large
• https://huggingface.co/facebook/sam-audio-small
• https://huggingface.co/facebook/sam-audio-base
✨ Spaces citing this paper:
• https://huggingface.co/spaces/lpeterl/sam-audio-webui
• https://huggingface.co/spaces/Arrcttacsrks/SAM-Audio-Demo
• https://huggingface.co/spaces/chippie1/SAM-Audio-Demo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AudioSeparation #FoundationModels #AI #DeepLearning #SAMAudio
📝 Summary:
SAM Audio is a foundation model for general audio separation. It unifies text visual and temporal span prompts achieving state-of-the-art performance across diverse audio types. It also introduces a new real-world separation benchmark.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18099
• PDF: https://arxiv.org/pdf/2512.18099
• Project Page: https://ai.meta.com/samaudio/
• Github: https://github.com/facebookresearch/sam-audio
🔹 Models citing this paper:
• https://huggingface.co/facebook/sam-audio-large
• https://huggingface.co/facebook/sam-audio-small
• https://huggingface.co/facebook/sam-audio-base
✨ Spaces citing this paper:
• https://huggingface.co/spaces/lpeterl/sam-audio-webui
• https://huggingface.co/spaces/Arrcttacsrks/SAM-Audio-Demo
• https://huggingface.co/spaces/chippie1/SAM-Audio-Demo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AudioSeparation #FoundationModels #AI #DeepLearning #SAMAudio
arXiv.org
SAM Audio: Segment Anything in Audio
General audio source separation is a key capability for multimodal AI systems that can perceive and reason about sound. Despite substantial progress in recent years, existing separation models are...
✨QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models
📝 Summary:
QuantiPhy is a benchmark that quantitatively assesses state-of-the-art vision perception models' ability to reason about physical properties such as size, velocity, and acceleration from video observa...
🔹 Publication Date: Published on Dec 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19526
• PDF: https://arxiv.org/pdf/2512.19526
✨ Datasets citing this paper:
• https://huggingface.co/datasets/PaulineLi/QuantiPhy-validation
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
QuantiPhy is a benchmark that quantitatively assesses state-of-the-art vision perception models' ability to reason about physical properties such as size, velocity, and acceleration from video observa...
🔹 Publication Date: Published on Dec 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19526
• PDF: https://arxiv.org/pdf/2512.19526
✨ Datasets citing this paper:
• https://huggingface.co/datasets/PaulineLi/QuantiPhy-validation
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
📝 Summary:
GLM-4.5, a Mixture-of-Experts large language model with 355B parameters, achieves strong performance across agentic, reasoning, and coding tasks using multi-stage training and reinforcement learning. ...
🔹 Publication Date: Published on Aug 8
🔹 Paper Links:
• arXiv Page: https://arxivlens.com/PaperView/Details/glm-4-5-agentic-reasoning-and-coding-arc-foundation-models-126-7b914dd8
• PDF: https://arxiv.org/pdf/2508.06471
• Github: https://github.com/zai-org/GLM-4.5
🔹 Models citing this paper:
• https://huggingface.co/zai-org/GLM-4.5
• https://huggingface.co/zai-org/GLM-4.6
• https://huggingface.co/zai-org/GLM-4.5-Air
✨ Spaces citing this paper:
• https://huggingface.co/spaces/enzostvs/deepsite
• https://huggingface.co/spaces/akhaliq/anycoder
• https://huggingface.co/spaces/hadadxyz/ai
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
GLM-4.5, a Mixture-of-Experts large language model with 355B parameters, achieves strong performance across agentic, reasoning, and coding tasks using multi-stage training and reinforcement learning. ...
🔹 Publication Date: Published on Aug 8
🔹 Paper Links:
• arXiv Page: https://arxivlens.com/PaperView/Details/glm-4-5-agentic-reasoning-and-coding-arc-foundation-models-126-7b914dd8
• PDF: https://arxiv.org/pdf/2508.06471
• Github: https://github.com/zai-org/GLM-4.5
🔹 Models citing this paper:
• https://huggingface.co/zai-org/GLM-4.5
• https://huggingface.co/zai-org/GLM-4.6
• https://huggingface.co/zai-org/GLM-4.5-Air
✨ Spaces citing this paper:
• https://huggingface.co/spaces/enzostvs/deepsite
• https://huggingface.co/spaces/akhaliq/anycoder
• https://huggingface.co/spaces/hadadxyz/ai
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
Arxivlens
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models - AI Research Paper Analysis | ArxivLens
AI-powered analysis of 'GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models'. We present GLM-4.5, an open-source Mixture-of-Experts (MoE) large language
model with 355B total parameters and 32B activated parameters, featuring a
... Explore with…
model with 355B total parameters and 32B activated parameters, featuring a
... Explore with…
This media is not supported in your browser
VIEW IN TELEGRAM
✨SpatialTree: How Spatial Abilities Branch Out in MLLMs
📝 Summary:
SpatialTree introduces a 4-level cognitive hierarchy and benchmark for evaluating MLLM spatial abilities. It reveals distinct skill dependencies and strong cross-level transfer from low to high-level abilities. A novel auto-think strategy consistently enhances performance across all spatial levels.
🔹 Publication Date: Published on Dec 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20617
• PDF: https://arxiv.org/pdf/2512.20617
• Project Page: https://spatialtree.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
SpatialTree introduces a 4-level cognitive hierarchy and benchmark for evaluating MLLM spatial abilities. It reveals distinct skill dependencies and strong cross-level transfer from low to high-level abilities. A novel auto-think strategy consistently enhances performance across all spatial levels.
🔹 Publication Date: Published on Dec 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20617
• PDF: https://arxiv.org/pdf/2512.20617
• Project Page: https://spatialtree.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨SemanticGen: Video Generation in Semantic Space
📝 Summary:
SemanticGen addresses slow convergence and computational costs in video generation by using a two-stage diffusion model approach that first generates semantic features and then VAE latents, leading to...
🔹 Publication Date: Published on Dec 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20619
• PDF: https://arxiv.org/pdf/2512.20619
• Project Page: https://jianhongbai.github.io/SemanticGen/
• Github: https://jianhongbai.github.io/SemanticGen/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
SemanticGen addresses slow convergence and computational costs in video generation by using a two-stage diffusion model approach that first generates semantic features and then VAE latents, leading to...
🔹 Publication Date: Published on Dec 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20619
• PDF: https://arxiv.org/pdf/2512.20619
• Project Page: https://jianhongbai.github.io/SemanticGen/
• Github: https://jianhongbai.github.io/SemanticGen/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Reinforcement Learning for Self-Improving Agent with Skill Library
📝 Summary:
A novel RL framework, SAGE, enhances LLM-based agents' self-improvement capabilities by systematically incorporating skills from a skill library, leading to better performance and efficiency in new en...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17102
• PDF: https://arxiv.org/pdf/2512.17102
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A novel RL framework, SAGE, enhances LLM-based agents' self-improvement capabilities by systematically incorporating skills from a skill library, leading to better performance and efficiency in new en...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17102
• PDF: https://arxiv.org/pdf/2512.17102
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Active Intelligence in Video Avatars via Closed-loop World Modeling
📝 Summary:
Video avatars currently lack agency for autonomous goal pursuit. ORCA introduces a framework for active intelligence, using a closed-loop Observe-Think-Act-Reflect cycle and a dual-system architecture for strategic reasoning and action. It enables robust, goal-directed task completion, transformi...
🔹 Publication Date: Published on Dec 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20615
• PDF: https://arxiv.org/pdf/2512.20615
• Project Page: https://xuanhuahe.github.io/ORCA/
• Github: https://xuanhuahe.github.io/ORCA/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Video avatars currently lack agency for autonomous goal pursuit. ORCA introduces a framework for active intelligence, using a closed-loop Observe-Think-Act-Reflect cycle and a dual-system architecture for strategic reasoning and action. It enables robust, goal-directed task completion, transformi...
🔹 Publication Date: Published on Dec 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20615
• PDF: https://arxiv.org/pdf/2512.20615
• Project Page: https://xuanhuahe.github.io/ORCA/
• Github: https://xuanhuahe.github.io/ORCA/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨FaithLens: Detecting and Explaining Faithfulness Hallucination
📝 Summary:
FaithLens is a cost-efficient model for detecting and explaining faithfulness hallucinations in LLM outputs. It uses synthesized training data and rule-based reinforcement learning. FaithLens outperforms advanced models like GPT-4.1 on 12 tasks while providing high-quality explanations.
🔹 Publication Date: Published on Dec 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20182
• PDF: https://arxiv.org/pdf/2512.20182
• Github: https://github.com/S1s-Z/FaithLens
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
FaithLens is a cost-efficient model for detecting and explaining faithfulness hallucinations in LLM outputs. It uses synthesized training data and rule-based reinforcement learning. FaithLens outperforms advanced models like GPT-4.1 on 12 tasks while providing high-quality explanations.
🔹 Publication Date: Published on Dec 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20182
• PDF: https://arxiv.org/pdf/2512.20182
• Github: https://github.com/S1s-Z/FaithLens
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Multi-LLM Thematic Analysis with Dual Reliability Metrics: Combining Cohen's Kappa and Semantic Similarity for Qualitative Research Validation
📝 Summary:
A multi-perspective validation framework using LLMs for thematic analysis combines ensemble validation with Cohen's Kappa and cosine similarity to enhance reliability and extract consensus themes from...
🔹 Publication Date: Published on Dec 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20352
• PDF: https://arxiv.org/pdf/2512.20352
• Project Page: https://azalab-llm-tool.vercel.app/
• Github: https://github.com/NileshArnaiya/LLM-Thematic-Analysis-Tool
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A multi-perspective validation framework using LLMs for thematic analysis combines ensemble validation with Cohen's Kappa and cosine similarity to enhance reliability and extract consensus themes from...
🔹 Publication Date: Published on Dec 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20352
• PDF: https://arxiv.org/pdf/2512.20352
• Project Page: https://azalab-llm-tool.vercel.app/
• Github: https://github.com/NileshArnaiya/LLM-Thematic-Analysis-Tool
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨INTELLECT-3: Technical Report
📝 Summary:
INTELLECT-3, a large Mixture-of-Experts model trained with reinforcement learning, achieves top performance across various benchmarks and is supported by an open-source RL infrastructure framework. AI...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16144
• PDF: https://arxiv.org/pdf/2512.16144
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
INTELLECT-3, a large Mixture-of-Experts model trained with reinforcement learning, achieves top performance across various benchmarks and is supported by an open-source RL infrastructure framework. AI...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16144
• PDF: https://arxiv.org/pdf/2512.16144
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨MemEvolve: Meta-Evolution of Agent Memory Systems
📝 Summary:
MemEvolve, a meta-evolutionary framework, enhances self-evolving memory systems by jointly evolving agents' experiential knowledge and memory architecture, leading to improved performance and generali...
🔹 Publication Date: Published on Dec 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18746
• PDF: https://arxiv.org/pdf/2512.18746
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
MemEvolve, a meta-evolutionary framework, enhances self-evolving memory systems by jointly evolving agents' experiential knowledge and memory architecture, leading to improved performance and generali...
🔹 Publication Date: Published on Dec 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18746
• PDF: https://arxiv.org/pdf/2512.18746
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤2
✨LongVideoAgent: Multi-Agent Reasoning with Long Videos
📝 Summary:
A multi-agent framework with a master LLM, grounding agent, and vision agent enhances long-video QA by improving temporal grounding and extracting visual details. This RL-trained system outperforms non-agent baselines on new datasets.
🔹 Publication Date: Published on Dec 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20618
• PDF: https://arxiv.org/pdf/2512.20618
• Github: https://longvideoagent.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultiAgentSystems #LLM #VideoUnderstanding #ComputerVision #AI
📝 Summary:
A multi-agent framework with a master LLM, grounding agent, and vision agent enhances long-video QA by improving temporal grounding and extracting visual details. This RL-trained system outperforms non-agent baselines on new datasets.
🔹 Publication Date: Published on Dec 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20618
• PDF: https://arxiv.org/pdf/2512.20618
• Github: https://longvideoagent.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultiAgentSystems #LLM #VideoUnderstanding #ComputerVision #AI
❤1
✨Toxicity Ahead: Forecasting Conversational Derailment on GitHub
📝 Summary:
A novel LLM framework uses a two-step prompting pipeline to predict conversational derailment on GitHub. It generates Summaries of Conversation Dynamics to forecast toxicity, achieving high F1-scores and outperforming baselines for proactive moderation.
🔹 Publication Date: Published on Dec 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.15031
• PDF: https://arxiv.org/pdf/2512.15031
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #ToxicityDetection #ContentModeration #GitHub #MachineLearning
📝 Summary:
A novel LLM framework uses a two-step prompting pipeline to predict conversational derailment on GitHub. It generates Summaries of Conversation Dynamics to forecast toxicity, achieving high F1-scores and outperforming baselines for proactive moderation.
🔹 Publication Date: Published on Dec 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.15031
• PDF: https://arxiv.org/pdf/2512.15031
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #ToxicityDetection #ContentModeration #GitHub #MachineLearning
❤1
✨Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems
📝 Summary:
Simulstream is an open-source toolkit for evaluating and demonstrating streaming speech-to-text translation. It supports long-form audio, incremental decoding, and re-translation, plus offers an interactive demo interface.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17648
• PDF: https://arxiv.org/pdf/2512.17648
• Project Page: https://pypi.org/project/simulstream/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#SpeechToText #MachineTranslation #NLP #OpenSource #StreamingAI
📝 Summary:
Simulstream is an open-source toolkit for evaluating and demonstrating streaming speech-to-text translation. It supports long-form audio, incremental decoding, and re-translation, plus offers an interactive demo interface.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17648
• PDF: https://arxiv.org/pdf/2512.17648
• Project Page: https://pypi.org/project/simulstream/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#SpeechToText #MachineTranslation #NLP #OpenSource #StreamingAI
❤1
✨Scaling Laws for Code: Every Programming Language Matters
📝 Summary:
This paper explores scaling laws for multilingual code pre-training, finding interpreted languages benefit more from scaling. It proposes an optimal token allocation strategy for programming languages based on utility and synergy, outperforming uniform distribution.
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13472
• PDF: https://arxiv.org/pdf/2512.13472
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#CodeAI #MachineLearning #ProgrammingLanguages #ScalingLaws #LLMs
📝 Summary:
This paper explores scaling laws for multilingual code pre-training, finding interpreted languages benefit more from scaling. It proposes an optimal token allocation strategy for programming languages based on utility and synergy, outperforming uniform distribution.
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13472
• PDF: https://arxiv.org/pdf/2512.13472
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#CodeAI #MachineLearning #ProgrammingLanguages #ScalingLaws #LLMs