ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
4.06K photos
234 videos
23 files
4.37K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Title: NaviTrace: Evaluating Embodied Navigation of Vision-Language Models

📝 Summary:
NaviTrace is a new benchmark to evaluate vision-language models for robotic navigation using 2D trace prediction. It uses a semantic-aware score across diverse scenarios and embodiment types. VLMs consistently show poor spatial grounding and goal localization, falling short of human performance.

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26909
• PDF: https://arxiv.org/pdf/2510.26909
• Project Page: https://leggedrobotics.github.io/navitrace_webpage/
• Github: https://github.com/leggedrobotics/navitrace_evaluation

Datasets citing this paper:
https://huggingface.co/datasets/leggedrobotics/navitrace

Spaces citing this paper:
https://huggingface.co/spaces/leggedrobotics/navitrace_leaderboard

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: Vote-in-Context: Turning VLMs into Zero-Shot Rank Fusers

📝 Summary:
Vote-in-Context ViC turns VLMs into zero-shot rank fusers and rerankers. It serializes content and retriever data into prompts, enabling adaptive reasoning. ViC achieves state-of-the-art zero-shot video retrieval, greatly surpassing baselines.

🔹 Publication Date: Published on Nov 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01617
• PDF: https://arxiv.org/pdf/2511.01617
• Github: https://github.com/mohammad2012191/ViC

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: Multi-Step Knowledge Interaction Analysis via Rank-2 Subspace Disentanglement

📝 Summary:
A novel rank-2 subspace disentangles Parametric and Context Knowledge in LLM multi-step explanations. It enables the first detailed analysis of how these knowledge types interact, showing distinct patterns in faithful versus hallucinated outputs.

🔹 Publication Date: Published on Nov 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01706
• PDF: https://arxiv.org/pdf/2511.01706
• Github: https://github.com/copenlu/pk-ck-knowledge-disentanglement

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: MR-Align: Meta-Reasoning Informed Factuality Alignment for Large Reasoning Models

📝 Summary:
MR-ALIGN enhances factuality in large reasoning models by aligning their internal reasoning process. It addresses the 'reasoning-answer hit gap' by reinforcing beneficial thinking patterns and suppressing defective ones. This meta-reasoning approach improves accuracy and truthfulness by focusing ...

🔹 Publication Date: Published on Oct 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.24794
• PDF: https://arxiv.org/pdf/2510.24794
• Github: https://github.com/gudehhh666/MR-ALIGN

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: EBT-Policy: Energy Unlocks Emergent Physical Reasoning Capabilities

📝 Summary:
EBT-Policy, an energy-based architecture, outperforms diffusion-based policies in robotics by offering improved robustness, vastly reduced computational cost, and emergent zero-shot recovery. It converges much faster and leverages scalar energy for uncertainty-aware inference.

🔹 Publication Date: Published on Oct 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27545
• PDF: https://arxiv.org/pdf/2510.27545
• Project Page: https://energy-based-transformers.github.io/ebt-policy/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: left|,circlearrowright,text{BUS},right|: A Large and Diverse Multimodal Benchmark for evaluating the ability of Vision-Language Models to understand Rebus Puzzles

📝 Summary:
A new benchmark, BUS, offers 1,333 diverse Rebus Puzzles challenging Vision-Language Models. The paper introduces RebusDescProgICE, a framework using structured reasoning and improved example selection, boosting VLM performance significantly.

🔹 Publication Date: Published on Nov 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01340
• PDF: https://arxiv.org/pdf/2511.01340

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
A Guide to Writing a Scientific Paper

I. Title

Be Specific and Informative: The noscript should accurately reflect the content of the paper. Avoid overly broad or vague noscripts.
Include Keywords: Incorporate the most important keywords related to your research to improve searchability.
Be Concise: A good noscript is clear and to the point. Aim for a length that is denoscriptive but not overly long (typically under 15 words).

II. Abstract

A Standalone Summary: The abstract is a brief (usually 150-250 words), comprehensive summary of the entire paper. It is often the only part read, so it must be clear and complete.
Structure: It should contain four key elements:
Background/Objective: A brief sentence on the context and purpose of the study.
Methods: A concise denoscription of the main experimental procedures.
Key Results: A summary of the most important findings.
Conclusion/Implications: A concluding statement on the significance of the results.

III. Keywords

For Indexing: Provide 3-5 keywords that are not already in the noscript. These help search engines and databases categorize your paper correctly.

IV. Introduction

The Funnel Structure: The introduction should move from a broad context to the specific focus of your paper.
Establish Context: Start by providing the necessary background information for a reader to understand the topic.
Identify the Gap: Clearly state the problem or gap in the existing research that your study aims to address. This is the "why" of your research.
State Your Objective: Clearly articulate your research question, hypothesis, or the specific goals of the paper.
Outline the Paper: Briefly mention the approach you took and outline the structure of the rest of the paper.

V. Materials and Methods

Enable Replicability: This section must be detailed enough for another researcher to replicate your study exactly.
Describe Materials/Participants: Specify all materials, equipment (including manufacturer), and the characteristics of your study participants or samples.
Detail the Procedure: Provide a step-by-step account of how you conducted the research. Use chronological order.
Explain Data Analysis: Describe the statistical methods or analytical techniques used to process the results. Justify your choice of methods.

VI. Results

Present, Don't Interpret: State your findings objectively and factually, without discussing their meaning or implications. This is the "what you found" section.
Use Visuals: Employ tables, figures, and graphs to present data clearly. Ensure all visuals are numbered and have denoscriptive captions.
Logical Flow: Organize the results in a logical sequence that tells a clear story, often corresponding to the order of the methods.
Reference Visuals in Text: Refer to every table and figure in the text, guiding the reader through the key data points (e.g., "As shown in Figure 1, there was a significant increase...").

VII. Discussion

Interpret Your Findings: This is the "so what?" section. Explain what your results mean.
Summarize Key Results: Begin with a brief summary of your most important findings.
Compare with Existing Literature: How do your results compare to previous studies mentioned in your introduction? Do they support, contradict, or add new dimensions to existing knowledge?
Acknowledge Limitations: Be honest about the limitations of your study. This shows critical awareness and strengthens your paper's credibility.
State Implications: Discuss the broader implications and significance of your work. What is its practical or theoretical importance?
Suggest Future Research: Propose specific directions for future studies that could build upon your findings.
VIII. Conclusion

A Strong Take-Home Message: Provide a short, powerful summary of the study's main conclusions.
Reiterate Significance: Briefly restate the importance of your work without simply repeating the abstract.
No New Information: Do not introduce any new data, ideas, or citations in the conclusion.

IX. Acknowledgments

Give Credit: Thank any individuals who provided technical help, intellectual contributions (but not enough for authorship), or feedback.
Disclose Funding: Acknowledge all funding sources, grants, and scholarships that supported the research.

X. References

Cite Everything: Ensure every source cited in the text is listed in the references section, and vice-versa.
Maintain Consistency: Use a single, consistent citation style throughout the paper (e.g., APA, MLA, IEEE, Vancouver) as required by the target journal or institution.
Use a Reference Manager: Tools like Zotero, Mendeley, or EndNote can help manage citations and prevent errors.

XI. General Writing Tips

Clarity and Conciseness: Use clear, direct language. Avoid jargon where possible and define it if necessary. Write short, focused sentences.
Voice and Tense:
Methods & Results: Use the past tense ("We collected...", "The temperature increased...").
Introduction & Discussion: Use a mix of present tense for established knowledge ("Diabetes is a chronic disease...") and past tense for referring to specific studies ("Smith (2020) found...").
* Active Voice: Prefer active voice ("We conducted the experiment") over passive voice ("The experiment was conducted") for clarity, especially in the Introduction and Discussion.
Proofread Meticulously: Check for spelling, grammar, and punctuation errors. Read your paper aloud to catch awkward phrasing. Ask a colleague to review it before submission.
Avoid Plagiarism: Always give proper credit to the work and ideas of others through correct citation.

#AcademicWriting #ScientificPaper #Research #Publishing #GradSchool

━━━━━━━━━━━━━━━
By: @DataScienceT
1
Title: GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding

📝 Summary:
GUI-AIMA is a coordinate-free framework that improves GUI grounding by aligning MLLM attention with patch-wise signals. It's highly data-efficient, achieving state-of-the-art results among 3B models with minimal training. This approach effectively triggers MLLMs' native grounding capability.

🔹 Publication Date: Published on Nov 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.00810
• PDF: https://arxiv.org/pdf/2511.00810
• Project Page: https://github.com/sjz5202/GUI-AIMA
• Github: https://github.com/sjz5202/GUI-AIMA

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🤖🧠 LongCat-Video: Meituan’s Groundbreaking Step Toward Efficient Long Video Generation with AI

🗓️ 04 Nov 2025
📚 AI News & Trends

In the rapidly advancing field of generative AI, the ability to create realistic, coherent, and high-quality videos from text or images has become one of the most sought-after goals. Meituan, one of the leading technology innovators in China, has made a remarkable stride in this domain with its latest open-source model — LongCat-Video. Designed as ...

#LongCatVideo #Meituan #GenerativeAI #VideoGeneration #AIInnovation #OpenSource
Title: AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence

📝 Summary:
AthenaBench enhances CTI LLM evaluation, revealing current models, even top proprietary ones, have limited reasoning for tasks like threat attribution and risk mitigation. This highlights fundamental LLM weaknesses and the need for CTI-specific AI.

🔹 Publication Date: Published on Nov 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01144
• PDF: https://arxiv.org/pdf/2511.01144
• Github: https://github.com/Athena-Software-Group/athenabench

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

📝 Summary:
VCode introduces a benchmark to generate SVG code from images, preserving symbolic meaning. VCoder, an agentic framework, boosts VLM performance via revision and visual tools. This shows VLM limitations and the promise of symbolic visual representation.

🔹 Publication Date: Published on Nov 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02778
• PDF: https://arxiv.org/pdf/2511.02778
• Project Page: https://csu-jpg.github.io/VCode/
• Github: https://github.com/CSU-JPG/VCode

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: D2D: Detector-to-Differentiable Critic for Improved Numeracy in Text-to-Image Generation

📝 Summary:
D2D transforms non-differentiable detectors into differentiable critics for text-to-image generation. This leverages their superior counting ability to greatly improve object numeracy, boosting accuracy with minimal impact on image quality.

🔹 Publication Date: Published on Oct 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.19278
• PDF: https://arxiv.org/pdf/2510.19278

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: LiveSecBench: A Dynamic and Culturally-Relevant AI Safety Benchmark for LLMs in Chinese Context

📝 Summary:
LiveSecBench is a dynamic safety benchmark for Chinese LLMs, continuously updated to reflect new threats. It evaluates models across six critical dimensions tailored to Chinese legal and social frameworks. This benchmark offers a current landscape of AI safety in China, with a public leaderboard.

🔹 Publication Date: Published on Nov 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02366
• PDF: https://arxiv.org/pdf/2511.02366
• Project Page: https://livesecbench.intokentech.cn/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System

📝 Summary:
TWIST2 is a portable, mocap-free system for efficient humanoid data collection using VR and egocentric vision. It enables whole-body human-to-humanoid control and a hierarchical visuomotor policy for autonomous complex skills.

🔹 Publication Date: Published on Nov 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02832
• PDF: https://arxiv.org/pdf/2511.02832
• Project Page: https://yanjieze.com/TWIST2/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: iFlyBot-VLA Technical Report

📝 Summary:
iFlyBot-VLA is a large VLA model that uses a latent action model and dual-level action representation. This enhances 3D perception and reasoning, achieving superior performance in diverse manipulation tasks.

🔹 Publication Date: Published on Nov 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01914
• PDF: https://arxiv.org/pdf/2511.01914
• Project Page: https://xuwenjie401.github.io/iFlyBot-VLA.github.io/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
Title: VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models

📝 Summary:
This paper introduces VidEmo, a new video emotion foundation model that uses an affective cues-guided reasoning framework. It is trained on the Emo-CFG dataset and achieves competitive performance in emotion understanding and face perception tasks.

🔹 Publication Date: Published on Nov 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02712
• PDF: https://arxiv.org/pdf/2511.02712
• Project Page: https://zzcheng.top/VidEmo

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: ChartM^3: A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension

📝 Summary:
A new automated code-driven pipeline, ChartM^3, generates diverse datasets for complex chart understanding via RAG and CoT. This improves MLLM reasoning and generalization, enabling smaller models to match larger ones in complex chart comprehension.

🔹 Publication Date: Published on Nov 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02415
• PDF: https://arxiv.org/pdf/2511.02415

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: Discriminately Treating Motion Components Evolves Joint Depth and Ego-Motion Learning

📝 Summary:
DiMoDE introduces a discriminative treatment of motion components for robust joint depth and ego-motion learning. By leveraging geometric constraints and reforming the learning process, it improves accuracy and achieves state-of-the-art performance.

🔹 Publication Date: Published on Nov 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01502
• PDF: https://arxiv.org/pdf/2511.01502

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

📝 Summary:
VCode introduces a benchmark for generating SVG code from images, preserving symbolic meaning for visual reasoning. Frontier VLMs struggle with this visual-centric task. VCoder, an agentic framework, improves performance using iterative revision and visual tools.

🔹 Publication Date: Published on Nov 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02778
• PDF: https://arxiv.org/pdf/2511.02778
• Project Page: https://csu-jpg.github.io/VCode/
• Github: https://github.com/CSU-JPG/VCode

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VCode #MultimodalAI #SVG #VisualReasoning #VLMs
Title: When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

📝 Summary:
MIRA is a new benchmark for evaluating models that use intermediate visual images to enhance reasoning. It includes 546 multimodal problems requiring models to generate and utilize visual cues. Experiments show models achieve a 33.7% performance gain with visual cues compared to text-only prompts...

🔹 Publication Date: Published on Nov 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02779
• PDF: https://arxiv.org/pdf/2511.02779

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VisualReasoning #ChainOfThought #MultimodalAI #AIBenchmark #ComputerVision