ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
4.06K photos
234 videos
23 files
4.37K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Title: ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use

📝 Summary:
ToolScope is an agentic framework for MLLMs that unifies global planning with local multimodal perception, using a specialized Perceive tool to manage visual context in long-horizon VQA tasks. It improves performance on VQA benchmarks by an average of 6.69%.

🔹 Publication Date: Published on Oct 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27363
• PDF: https://arxiv.org/pdf/2510.27363
• Github: https://github.com/dengmengjie/ToolScope

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
This media is not supported in your browser
VIEW IN TELEGRAM
Title: PHUMA: Physically-Grounded Humanoid Locomotion Dataset

📝 Summary:
PHUMA is a new dataset for humanoid locomotion, leveraging large-scale human video while eliminating physical artifacts. Through careful curation and physics-constrained retargeting, PHUMA provides reliable motions. Policies trained with PHUMA significantly outperform existing datasets in imitati...

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26236
• PDF: https://arxiv.org/pdf/2510.26236
• Project Page: https://davian-robotics.github.io/PHUMA/
• Github: https://github.com/davian-robotics/PHUMA

Datasets citing this paper:
https://huggingface.co/datasets/DAVIAN-Robotics/PHUMA

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: MotionStream: Real-Time Video Generation with Interactive Motion Controls

📝 Summary:
MotionStream enables real-time video generation with interactive motion controls, achieving sub-second latency and 29 FPS streaming. It distills a motion-controlled text-to-video teacher into a causal student, using novel attention mechanisms for infinite-length, high-quality video.

🔹 Publication Date: Published on Nov 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01266
• PDF: https://arxiv.org/pdf/2511.01266

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: OpenSIR: Open-Ended Self-Improving Reasoner

📝 Summary:
OpenSIR is a self-play framework where LLMs improve reasoning by alternating teacher and student roles. It generates novel math problems without external supervision, optimizing for difficulty and diversity. This enables open-ended learning and significant performance gains on benchmarks.

🔹 Publication Date: Published on Nov 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.00602
• PDF: https://arxiv.org/pdf/2511.00602

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: NaviTrace: Evaluating Embodied Navigation of Vision-Language Models

📝 Summary:
NaviTrace is a new benchmark to evaluate vision-language models for robotic navigation using 2D trace prediction. It uses a semantic-aware score across diverse scenarios and embodiment types. VLMs consistently show poor spatial grounding and goal localization, falling short of human performance.

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26909
• PDF: https://arxiv.org/pdf/2510.26909
• Project Page: https://leggedrobotics.github.io/navitrace_webpage/
• Github: https://github.com/leggedrobotics/navitrace_evaluation

Datasets citing this paper:
https://huggingface.co/datasets/leggedrobotics/navitrace

Spaces citing this paper:
https://huggingface.co/spaces/leggedrobotics/navitrace_leaderboard

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: Vote-in-Context: Turning VLMs into Zero-Shot Rank Fusers

📝 Summary:
Vote-in-Context ViC turns VLMs into zero-shot rank fusers and rerankers. It serializes content and retriever data into prompts, enabling adaptive reasoning. ViC achieves state-of-the-art zero-shot video retrieval, greatly surpassing baselines.

🔹 Publication Date: Published on Nov 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01617
• PDF: https://arxiv.org/pdf/2511.01617
• Github: https://github.com/mohammad2012191/ViC

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: Multi-Step Knowledge Interaction Analysis via Rank-2 Subspace Disentanglement

📝 Summary:
A novel rank-2 subspace disentangles Parametric and Context Knowledge in LLM multi-step explanations. It enables the first detailed analysis of how these knowledge types interact, showing distinct patterns in faithful versus hallucinated outputs.

🔹 Publication Date: Published on Nov 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01706
• PDF: https://arxiv.org/pdf/2511.01706
• Github: https://github.com/copenlu/pk-ck-knowledge-disentanglement

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: MR-Align: Meta-Reasoning Informed Factuality Alignment for Large Reasoning Models

📝 Summary:
MR-ALIGN enhances factuality in large reasoning models by aligning their internal reasoning process. It addresses the 'reasoning-answer hit gap' by reinforcing beneficial thinking patterns and suppressing defective ones. This meta-reasoning approach improves accuracy and truthfulness by focusing ...

🔹 Publication Date: Published on Oct 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.24794
• PDF: https://arxiv.org/pdf/2510.24794
• Github: https://github.com/gudehhh666/MR-ALIGN

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: EBT-Policy: Energy Unlocks Emergent Physical Reasoning Capabilities

📝 Summary:
EBT-Policy, an energy-based architecture, outperforms diffusion-based policies in robotics by offering improved robustness, vastly reduced computational cost, and emergent zero-shot recovery. It converges much faster and leverages scalar energy for uncertainty-aware inference.

🔹 Publication Date: Published on Oct 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27545
• PDF: https://arxiv.org/pdf/2510.27545
• Project Page: https://energy-based-transformers.github.io/ebt-policy/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: left|,circlearrowright,text{BUS},right|: A Large and Diverse Multimodal Benchmark for evaluating the ability of Vision-Language Models to understand Rebus Puzzles

📝 Summary:
A new benchmark, BUS, offers 1,333 diverse Rebus Puzzles challenging Vision-Language Models. The paper introduces RebusDescProgICE, a framework using structured reasoning and improved example selection, boosting VLM performance significantly.

🔹 Publication Date: Published on Nov 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01340
• PDF: https://arxiv.org/pdf/2511.01340

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
A Guide to Writing a Scientific Paper

I. Title

Be Specific and Informative: The noscript should accurately reflect the content of the paper. Avoid overly broad or vague noscripts.
Include Keywords: Incorporate the most important keywords related to your research to improve searchability.
Be Concise: A good noscript is clear and to the point. Aim for a length that is denoscriptive but not overly long (typically under 15 words).

II. Abstract

A Standalone Summary: The abstract is a brief (usually 150-250 words), comprehensive summary of the entire paper. It is often the only part read, so it must be clear and complete.
Structure: It should contain four key elements:
Background/Objective: A brief sentence on the context and purpose of the study.
Methods: A concise denoscription of the main experimental procedures.
Key Results: A summary of the most important findings.
Conclusion/Implications: A concluding statement on the significance of the results.

III. Keywords

For Indexing: Provide 3-5 keywords that are not already in the noscript. These help search engines and databases categorize your paper correctly.

IV. Introduction

The Funnel Structure: The introduction should move from a broad context to the specific focus of your paper.
Establish Context: Start by providing the necessary background information for a reader to understand the topic.
Identify the Gap: Clearly state the problem or gap in the existing research that your study aims to address. This is the "why" of your research.
State Your Objective: Clearly articulate your research question, hypothesis, or the specific goals of the paper.
Outline the Paper: Briefly mention the approach you took and outline the structure of the rest of the paper.

V. Materials and Methods

Enable Replicability: This section must be detailed enough for another researcher to replicate your study exactly.
Describe Materials/Participants: Specify all materials, equipment (including manufacturer), and the characteristics of your study participants or samples.
Detail the Procedure: Provide a step-by-step account of how you conducted the research. Use chronological order.
Explain Data Analysis: Describe the statistical methods or analytical techniques used to process the results. Justify your choice of methods.

VI. Results

Present, Don't Interpret: State your findings objectively and factually, without discussing their meaning or implications. This is the "what you found" section.
Use Visuals: Employ tables, figures, and graphs to present data clearly. Ensure all visuals are numbered and have denoscriptive captions.
Logical Flow: Organize the results in a logical sequence that tells a clear story, often corresponding to the order of the methods.
Reference Visuals in Text: Refer to every table and figure in the text, guiding the reader through the key data points (e.g., "As shown in Figure 1, there was a significant increase...").

VII. Discussion

Interpret Your Findings: This is the "so what?" section. Explain what your results mean.
Summarize Key Results: Begin with a brief summary of your most important findings.
Compare with Existing Literature: How do your results compare to previous studies mentioned in your introduction? Do they support, contradict, or add new dimensions to existing knowledge?
Acknowledge Limitations: Be honest about the limitations of your study. This shows critical awareness and strengthens your paper's credibility.
State Implications: Discuss the broader implications and significance of your work. What is its practical or theoretical importance?
Suggest Future Research: Propose specific directions for future studies that could build upon your findings.
VIII. Conclusion

A Strong Take-Home Message: Provide a short, powerful summary of the study's main conclusions.
Reiterate Significance: Briefly restate the importance of your work without simply repeating the abstract.
No New Information: Do not introduce any new data, ideas, or citations in the conclusion.

IX. Acknowledgments

Give Credit: Thank any individuals who provided technical help, intellectual contributions (but not enough for authorship), or feedback.
Disclose Funding: Acknowledge all funding sources, grants, and scholarships that supported the research.

X. References

Cite Everything: Ensure every source cited in the text is listed in the references section, and vice-versa.
Maintain Consistency: Use a single, consistent citation style throughout the paper (e.g., APA, MLA, IEEE, Vancouver) as required by the target journal or institution.
Use a Reference Manager: Tools like Zotero, Mendeley, or EndNote can help manage citations and prevent errors.

XI. General Writing Tips

Clarity and Conciseness: Use clear, direct language. Avoid jargon where possible and define it if necessary. Write short, focused sentences.
Voice and Tense:
Methods & Results: Use the past tense ("We collected...", "The temperature increased...").
Introduction & Discussion: Use a mix of present tense for established knowledge ("Diabetes is a chronic disease...") and past tense for referring to specific studies ("Smith (2020) found...").
* Active Voice: Prefer active voice ("We conducted the experiment") over passive voice ("The experiment was conducted") for clarity, especially in the Introduction and Discussion.
Proofread Meticulously: Check for spelling, grammar, and punctuation errors. Read your paper aloud to catch awkward phrasing. Ask a colleague to review it before submission.
Avoid Plagiarism: Always give proper credit to the work and ideas of others through correct citation.

#AcademicWriting #ScientificPaper #Research #Publishing #GradSchool

━━━━━━━━━━━━━━━
By: @DataScienceT
1
Title: GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding

📝 Summary:
GUI-AIMA is a coordinate-free framework that improves GUI grounding by aligning MLLM attention with patch-wise signals. It's highly data-efficient, achieving state-of-the-art results among 3B models with minimal training. This approach effectively triggers MLLMs' native grounding capability.

🔹 Publication Date: Published on Nov 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.00810
• PDF: https://arxiv.org/pdf/2511.00810
• Project Page: https://github.com/sjz5202/GUI-AIMA
• Github: https://github.com/sjz5202/GUI-AIMA

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🤖🧠 LongCat-Video: Meituan’s Groundbreaking Step Toward Efficient Long Video Generation with AI

🗓️ 04 Nov 2025
📚 AI News & Trends

In the rapidly advancing field of generative AI, the ability to create realistic, coherent, and high-quality videos from text or images has become one of the most sought-after goals. Meituan, one of the leading technology innovators in China, has made a remarkable stride in this domain with its latest open-source model — LongCat-Video. Designed as ...

#LongCatVideo #Meituan #GenerativeAI #VideoGeneration #AIInnovation #OpenSource
Title: AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence

📝 Summary:
AthenaBench enhances CTI LLM evaluation, revealing current models, even top proprietary ones, have limited reasoning for tasks like threat attribution and risk mitigation. This highlights fundamental LLM weaknesses and the need for CTI-specific AI.

🔹 Publication Date: Published on Nov 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01144
• PDF: https://arxiv.org/pdf/2511.01144
• Github: https://github.com/Athena-Software-Group/athenabench

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

📝 Summary:
VCode introduces a benchmark to generate SVG code from images, preserving symbolic meaning. VCoder, an agentic framework, boosts VLM performance via revision and visual tools. This shows VLM limitations and the promise of symbolic visual representation.

🔹 Publication Date: Published on Nov 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02778
• PDF: https://arxiv.org/pdf/2511.02778
• Project Page: https://csu-jpg.github.io/VCode/
• Github: https://github.com/CSU-JPG/VCode

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: D2D: Detector-to-Differentiable Critic for Improved Numeracy in Text-to-Image Generation

📝 Summary:
D2D transforms non-differentiable detectors into differentiable critics for text-to-image generation. This leverages their superior counting ability to greatly improve object numeracy, boosting accuracy with minimal impact on image quality.

🔹 Publication Date: Published on Oct 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.19278
• PDF: https://arxiv.org/pdf/2510.19278

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: LiveSecBench: A Dynamic and Culturally-Relevant AI Safety Benchmark for LLMs in Chinese Context

📝 Summary:
LiveSecBench is a dynamic safety benchmark for Chinese LLMs, continuously updated to reflect new threats. It evaluates models across six critical dimensions tailored to Chinese legal and social frameworks. This benchmark offers a current landscape of AI safety in China, with a public leaderboard.

🔹 Publication Date: Published on Nov 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02366
• PDF: https://arxiv.org/pdf/2511.02366
• Project Page: https://livesecbench.intokentech.cn/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System

📝 Summary:
TWIST2 is a portable, mocap-free system for efficient humanoid data collection using VR and egocentric vision. It enables whole-body human-to-humanoid control and a hierarchical visuomotor policy for autonomous complex skills.

🔹 Publication Date: Published on Nov 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02832
• PDF: https://arxiv.org/pdf/2511.02832
• Project Page: https://yanjieze.com/TWIST2/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: iFlyBot-VLA Technical Report

📝 Summary:
iFlyBot-VLA is a large VLA model that uses a latent action model and dual-level action representation. This enhances 3D perception and reasoning, achieving superior performance in diverse manipulation tasks.

🔹 Publication Date: Published on Nov 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01914
• PDF: https://arxiv.org/pdf/2511.01914
• Project Page: https://xuwenjie401.github.io/iFlyBot-VLA.github.io/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
Title: VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models

📝 Summary:
This paper introduces VidEmo, a new video emotion foundation model that uses an affective cues-guided reasoning framework. It is trained on the Emo-CFG dataset and achieves competitive performance in emotion understanding and face perception tasks.

🔹 Publication Date: Published on Nov 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02712
• PDF: https://arxiv.org/pdf/2511.02712
• Project Page: https://zzcheng.top/VidEmo

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT