ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
4.06K photos
234 videos
23 files
4.38K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Title: ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation

📝 Summary:
ROVER is a new benchmark evaluating reciprocal cross-modal reasoning in unified multimodal models. It tests how models use one modality to guide or verify outputs in another, in both verbal and visual generation tasks. Experiments show cross-modal reasoning is vital for visual generation, but mod...

🔹 Publication Date: Published on Nov 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01163
• PDF: https://arxiv.org/pdf/2511.01163
• Github: https://roverbench.github.io/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: Trove: A Flexible Toolkit for Dense Retrieval

📝 Summary:
Trove is an open-source toolkit for dense retrieval that simplifies research. It offers efficient on-the-fly data management, reducing memory use and allowing flexible dataset experiments. Trove is highly customizable and provides a unified, scalable pipeline for evaluation.

🔹 Publication Date: Published on Nov 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01857
• PDF: https://arxiv.org/pdf/2511.01857
• Project Page: https://ir-trove.dev/
• Github: https://github.com/BatsResearch/trove

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: Data-Efficient RLVR via Off-Policy Influence Guidance

📝 Summary:
This paper proposes CROPI a new method for efficient data selection in Reinforcement Learning with Verifiable Rewards RLVR. It uses off-policy influence estimation and sparse random projection to identify the most valuable data points. CROPI significantly accelerates training achieving 2.66x spee...

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26491
• PDF: https://arxiv.org/pdf/2510.26491

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: How Far Are Surgeons from Surgical World Models? A Pilot Study on Zero-shot Surgical Video Generation with Expert Assessment

📝 Summary:
This study introduces SurgVeo and the Surgical Plausibility Pyramid to evaluate video generation models in surgery. Experts found Veo-3 visually convincing but lacking in actual surgical understanding. This highlights a critical gap between visual mimicry and causal knowledge in surgical AI.

🔹 Publication Date: Published on Nov 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01775
• PDF: https://arxiv.org/pdf/2511.01775

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings

📝 Summary:
UME-R1 introduces generative multimodal embeddings, unifying embedding tasks within a generative paradigm. Its two-stage MLLM training creates reasoning-driven embeddings that significantly outperform conventional discriminative methods, offering a foundation for new interpretability.

🔹 Publication Date: Published on Nov 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.00405
• PDF: https://arxiv.org/pdf/2511.00405
• Github: https://github.com/DeepLearnXMU/UME-R1

🔹 Models citing this paper:
https://huggingface.co/zhibinlan/UME-R1-2B
https://huggingface.co/zhibinlan/UME-R1-7B

Datasets citing this paper:
https://huggingface.co/datasets/zhibinlan/UME-sft-train
https://huggingface.co/datasets/zhibinlan/UME-rl-train

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: World Simulation with Video Foundation Models for Physical AI

📝 Summary:
Cosmos-Predict2.5 is a new world foundation model for physical AI, unifying Text, Image, and Video2World generation with enhanced quality and control for robotics. It works with Cosmos-Transfer2.5 for Sim2Real translation. Both are open-source to accelerate embodied intelligence research.

🔹 Publication Date: Published on Oct 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.00062
• PDF: https://arxiv.org/pdf/2511.00062
• Github: https://github.com/nvidia-cosmos/cosmos-transfer2.5

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
Title: Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench

📝 Summary:
Current VLMs struggle with visual measurement reading, especially indicator localization. We introduce MeasureBench, a new benchmark with real-world and synthetic images, and a data synthesis pipeline. VLMs show poor fine-grained spatial grounding, leading to significant numeric errors despite pl...

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26865
• PDF: https://arxiv.org/pdf/2510.26865
• Project Page: https://flageval-baai.github.io/MeasureBenchPage/
• Github: https://github.com/flageval-baai/MeasureBench

Datasets citing this paper:
https://huggingface.co/datasets/FlagEval/MeasureBench

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback

📝 Summary:
UniLumos is a fast, unified image and video relighting framework. It uses RGB-space geometry feedback to ensure physically plausible results, unlike prior diffusion models. It achieves state-of-the-art quality with a 20x speedup.

🔹 Publication Date: Published on Nov 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01678
• PDF: https://arxiv.org/pdf/2511.01678
• Github: https://github.com/alibaba-damo-academy/Lumos-Custom

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models

📝 Summary:
Multimodal LLMs struggle with detailed 3D spatial reasoning and cross-view consistency. This paper introduces Viewpoint Learning with the Viewpoint-100K dataset and a two-stage fine-tuning strategy. Their method significantly activates MLLM spatial reasoning, improving performance on various tasks.

🔹 Publication Date: Published on Nov 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01618
• PDF: https://arxiv.org/pdf/2511.01618

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use

📝 Summary:
ToolScope is an agentic framework for MLLMs that unifies global planning with local multimodal perception, using a specialized Perceive tool to manage visual context in long-horizon VQA tasks. It improves performance on VQA benchmarks by an average of 6.69%.

🔹 Publication Date: Published on Oct 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27363
• PDF: https://arxiv.org/pdf/2510.27363
• Github: https://github.com/dengmengjie/ToolScope

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
This media is not supported in your browser
VIEW IN TELEGRAM
Title: PHUMA: Physically-Grounded Humanoid Locomotion Dataset

📝 Summary:
PHUMA is a new dataset for humanoid locomotion, leveraging large-scale human video while eliminating physical artifacts. Through careful curation and physics-constrained retargeting, PHUMA provides reliable motions. Policies trained with PHUMA significantly outperform existing datasets in imitati...

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26236
• PDF: https://arxiv.org/pdf/2510.26236
• Project Page: https://davian-robotics.github.io/PHUMA/
• Github: https://github.com/davian-robotics/PHUMA

Datasets citing this paper:
https://huggingface.co/datasets/DAVIAN-Robotics/PHUMA

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: MotionStream: Real-Time Video Generation with Interactive Motion Controls

📝 Summary:
MotionStream enables real-time video generation with interactive motion controls, achieving sub-second latency and 29 FPS streaming. It distills a motion-controlled text-to-video teacher into a causal student, using novel attention mechanisms for infinite-length, high-quality video.

🔹 Publication Date: Published on Nov 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01266
• PDF: https://arxiv.org/pdf/2511.01266

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: OpenSIR: Open-Ended Self-Improving Reasoner

📝 Summary:
OpenSIR is a self-play framework where LLMs improve reasoning by alternating teacher and student roles. It generates novel math problems without external supervision, optimizing for difficulty and diversity. This enables open-ended learning and significant performance gains on benchmarks.

🔹 Publication Date: Published on Nov 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.00602
• PDF: https://arxiv.org/pdf/2511.00602

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: NaviTrace: Evaluating Embodied Navigation of Vision-Language Models

📝 Summary:
NaviTrace is a new benchmark to evaluate vision-language models for robotic navigation using 2D trace prediction. It uses a semantic-aware score across diverse scenarios and embodiment types. VLMs consistently show poor spatial grounding and goal localization, falling short of human performance.

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26909
• PDF: https://arxiv.org/pdf/2510.26909
• Project Page: https://leggedrobotics.github.io/navitrace_webpage/
• Github: https://github.com/leggedrobotics/navitrace_evaluation

Datasets citing this paper:
https://huggingface.co/datasets/leggedrobotics/navitrace

Spaces citing this paper:
https://huggingface.co/spaces/leggedrobotics/navitrace_leaderboard

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: Vote-in-Context: Turning VLMs into Zero-Shot Rank Fusers

📝 Summary:
Vote-in-Context ViC turns VLMs into zero-shot rank fusers and rerankers. It serializes content and retriever data into prompts, enabling adaptive reasoning. ViC achieves state-of-the-art zero-shot video retrieval, greatly surpassing baselines.

🔹 Publication Date: Published on Nov 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01617
• PDF: https://arxiv.org/pdf/2511.01617
• Github: https://github.com/mohammad2012191/ViC

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: Multi-Step Knowledge Interaction Analysis via Rank-2 Subspace Disentanglement

📝 Summary:
A novel rank-2 subspace disentangles Parametric and Context Knowledge in LLM multi-step explanations. It enables the first detailed analysis of how these knowledge types interact, showing distinct patterns in faithful versus hallucinated outputs.

🔹 Publication Date: Published on Nov 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01706
• PDF: https://arxiv.org/pdf/2511.01706
• Github: https://github.com/copenlu/pk-ck-knowledge-disentanglement

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: MR-Align: Meta-Reasoning Informed Factuality Alignment for Large Reasoning Models

📝 Summary:
MR-ALIGN enhances factuality in large reasoning models by aligning their internal reasoning process. It addresses the 'reasoning-answer hit gap' by reinforcing beneficial thinking patterns and suppressing defective ones. This meta-reasoning approach improves accuracy and truthfulness by focusing ...

🔹 Publication Date: Published on Oct 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.24794
• PDF: https://arxiv.org/pdf/2510.24794
• Github: https://github.com/gudehhh666/MR-ALIGN

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: EBT-Policy: Energy Unlocks Emergent Physical Reasoning Capabilities

📝 Summary:
EBT-Policy, an energy-based architecture, outperforms diffusion-based policies in robotics by offering improved robustness, vastly reduced computational cost, and emergent zero-shot recovery. It converges much faster and leverages scalar energy for uncertainty-aware inference.

🔹 Publication Date: Published on Oct 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27545
• PDF: https://arxiv.org/pdf/2510.27545
• Project Page: https://energy-based-transformers.github.io/ebt-policy/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Title: left|,circlearrowright,text{BUS},right|: A Large and Diverse Multimodal Benchmark for evaluating the ability of Vision-Language Models to understand Rebus Puzzles

📝 Summary:
A new benchmark, BUS, offers 1,333 diverse Rebus Puzzles challenging Vision-Language Models. The paper introduces RebusDescProgICE, a framework using structured reasoning and improved example selection, boosting VLM performance significantly.

🔹 Publication Date: Published on Nov 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01340
• PDF: https://arxiv.org/pdf/2511.01340

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
A Guide to Writing a Scientific Paper

I. Title

Be Specific and Informative: The noscript should accurately reflect the content of the paper. Avoid overly broad or vague noscripts.
Include Keywords: Incorporate the most important keywords related to your research to improve searchability.
Be Concise: A good noscript is clear and to the point. Aim for a length that is denoscriptive but not overly long (typically under 15 words).

II. Abstract

A Standalone Summary: The abstract is a brief (usually 150-250 words), comprehensive summary of the entire paper. It is often the only part read, so it must be clear and complete.
Structure: It should contain four key elements:
Background/Objective: A brief sentence on the context and purpose of the study.
Methods: A concise denoscription of the main experimental procedures.
Key Results: A summary of the most important findings.
Conclusion/Implications: A concluding statement on the significance of the results.

III. Keywords

For Indexing: Provide 3-5 keywords that are not already in the noscript. These help search engines and databases categorize your paper correctly.

IV. Introduction

The Funnel Structure: The introduction should move from a broad context to the specific focus of your paper.
Establish Context: Start by providing the necessary background information for a reader to understand the topic.
Identify the Gap: Clearly state the problem or gap in the existing research that your study aims to address. This is the "why" of your research.
State Your Objective: Clearly articulate your research question, hypothesis, or the specific goals of the paper.
Outline the Paper: Briefly mention the approach you took and outline the structure of the rest of the paper.

V. Materials and Methods

Enable Replicability: This section must be detailed enough for another researcher to replicate your study exactly.
Describe Materials/Participants: Specify all materials, equipment (including manufacturer), and the characteristics of your study participants or samples.
Detail the Procedure: Provide a step-by-step account of how you conducted the research. Use chronological order.
Explain Data Analysis: Describe the statistical methods or analytical techniques used to process the results. Justify your choice of methods.

VI. Results

Present, Don't Interpret: State your findings objectively and factually, without discussing their meaning or implications. This is the "what you found" section.
Use Visuals: Employ tables, figures, and graphs to present data clearly. Ensure all visuals are numbered and have denoscriptive captions.
Logical Flow: Organize the results in a logical sequence that tells a clear story, often corresponding to the order of the methods.
Reference Visuals in Text: Refer to every table and figure in the text, guiding the reader through the key data points (e.g., "As shown in Figure 1, there was a significant increase...").

VII. Discussion

Interpret Your Findings: This is the "so what?" section. Explain what your results mean.
Summarize Key Results: Begin with a brief summary of your most important findings.
Compare with Existing Literature: How do your results compare to previous studies mentioned in your introduction? Do they support, contradict, or add new dimensions to existing knowledge?
Acknowledge Limitations: Be honest about the limitations of your study. This shows critical awareness and strengthens your paper's credibility.
State Implications: Discuss the broader implications and significance of your work. What is its practical or theoretical importance?
Suggest Future Research: Propose specific directions for future studies that could build upon your findings.
VIII. Conclusion

A Strong Take-Home Message: Provide a short, powerful summary of the study's main conclusions.
Reiterate Significance: Briefly restate the importance of your work without simply repeating the abstract.
No New Information: Do not introduce any new data, ideas, or citations in the conclusion.

IX. Acknowledgments

Give Credit: Thank any individuals who provided technical help, intellectual contributions (but not enough for authorship), or feedback.
Disclose Funding: Acknowledge all funding sources, grants, and scholarships that supported the research.

X. References

Cite Everything: Ensure every source cited in the text is listed in the references section, and vice-versa.
Maintain Consistency: Use a single, consistent citation style throughout the paper (e.g., APA, MLA, IEEE, Vancouver) as required by the target journal or institution.
Use a Reference Manager: Tools like Zotero, Mendeley, or EndNote can help manage citations and prevent errors.

XI. General Writing Tips

Clarity and Conciseness: Use clear, direct language. Avoid jargon where possible and define it if necessary. Write short, focused sentences.
Voice and Tense:
Methods & Results: Use the past tense ("We collected...", "The temperature increased...").
Introduction & Discussion: Use a mix of present tense for established knowledge ("Diabetes is a chronic disease...") and past tense for referring to specific studies ("Smith (2020) found...").
* Active Voice: Prefer active voice ("We conducted the experiment") over passive voice ("The experiment was conducted") for clarity, especially in the Introduction and Discussion.
Proofread Meticulously: Check for spelling, grammar, and punctuation errors. Read your paper aloud to catch awkward phrasing. Ask a colleague to review it before submission.
Avoid Plagiarism: Always give proper credit to the work and ideas of others through correct citation.

#AcademicWriting #ScientificPaper #Research #Publishing #GradSchool

━━━━━━━━━━━━━━━
By: @DataScienceT
1