Forwarded from Machine Learning with Python
Some people asked me about a resource for learning about Transformers.
Here's a good one I am sharing again -- it covers just about everything you need to know.
brandonrohrer.com/transformers
Amazing stuff. It's totally worth your weekend.
https://news.1rj.ru/str/CodeProgrammer
Here's a good one I am sharing again -- it covers just about everything you need to know.
brandonrohrer.com/transformers
Amazing stuff. It's totally worth your weekend.
#Transformers #DeepLearning #NLP #AI #MachineLearning #SelfAttention #DataScience #Technology #Python #LearningResource
https://news.1rj.ru/str/CodeProgrammer
👍5
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
22 Feb 2024 · Yijia Shao, Yucheng Jiang, Theodore A. Kanell, Peter Xu, Omar Khattab, Monica S. Lam ·
We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages. This underexplored problem poses new challenges at the pre-writing stage, including how to research the topic and prepare an outline prior to writing. We propose STORM, a writing system for the Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking. STORM models the pre-writing stage by (1) discovering diverse perspectives in researching the given topic, (2) simulating conversations where writers carrying different perspectives pose questions to a topic expert grounded on trusted Internet sources, (3) curating the collected information to create an outline. For evaluation, we curate FreshWiki, a dataset of recent high-quality Wikipedia articles, and formulate outline assessments to evaluate the pre-writing stage. We further gather feedback from experienced Wikipedia editors. Compared to articles generated by an outline-driven retrieval-augmented baseline, more of STORM's articles are deemed to be organized (by a 25% absolute increase) and broad in coverage (by 10%). The expert feedback also helps identify new challenges for generating grounded long articles, such as source bias transfer and over-association of unrelated facts.
Paper: https://arxiv.org/pdf/2402.14207v2.pdf
Codes:
https://github.com/assafelovic/gpt-researcher
https://github.com/stanford-oval/storm
22 Feb 2024 · Yijia Shao, Yucheng Jiang, Theodore A. Kanell, Peter Xu, Omar Khattab, Monica S. Lam ·
We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages. This underexplored problem poses new challenges at the pre-writing stage, including how to research the topic and prepare an outline prior to writing. We propose STORM, a writing system for the Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking. STORM models the pre-writing stage by (1) discovering diverse perspectives in researching the given topic, (2) simulating conversations where writers carrying different perspectives pose questions to a topic expert grounded on trusted Internet sources, (3) curating the collected information to create an outline. For evaluation, we curate FreshWiki, a dataset of recent high-quality Wikipedia articles, and formulate outline assessments to evaluate the pre-writing stage. We further gather feedback from experienced Wikipedia editors. Compared to articles generated by an outline-driven retrieval-augmented baseline, more of STORM's articles are deemed to be organized (by a 25% absolute increase) and broad in coverage (by 10%). The expert feedback also helps identify new challenges for generating grounded long articles, such as source bias transfer and over-association of unrelated facts.
Paper: https://arxiv.org/pdf/2402.14207v2.pdf
Codes:
https://github.com/assafelovic/gpt-researcher
https://github.com/stanford-oval/storm
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
https://news.1rj.ru/str/DataScienceT
👍2❤1
LLM4Decompile: Decompiling Binary Code with Large Language Models
8 Mar 2024 · Hanzhuo Tan, Qi Luo, Jing Li, Yuqun Zhang ·
Decompilation aims to convert binary code to high-level source code, but traditional tools like Ghidra often produce results that are difficult to read and execute. Motivated by the advancements in Large Language Models (LLMs), we propose LLM4Decompile, the first and largest open-source #LLM series (1.3B to 33B) trained to decompile binary code. We optimize the LLM training process and introduce the LLM4Decompile-End models to decompile binary directly. The resulting models significantly outperform GPT-4o and Ghidra on the HumanEval and ExeBench benchmarks by over 100% in terms of re-executability rate. Additionally, we improve the standard refinement approach to fine-tune the LLM4Decompile-Ref models, enabling them to effectively refine the decompiled code from Ghidra and achieve a further 16.2% improvement over the LLM4Decompile-End. LLM4Decompile demonstrates the potential of LLMs to revolutionize binary code decompilation, delivering remarkable improvements in readability and executability while complementing conventional tools for optimal results.
Paper: https://arxiv.org/pdf/2403.05286v3.pdf
Code: https://github.com/albertan017/LLM4Decompile
8 Mar 2024 · Hanzhuo Tan, Qi Luo, Jing Li, Yuqun Zhang ·
Decompilation aims to convert binary code to high-level source code, but traditional tools like Ghidra often produce results that are difficult to read and execute. Motivated by the advancements in Large Language Models (LLMs), we propose LLM4Decompile, the first and largest open-source #LLM series (1.3B to 33B) trained to decompile binary code. We optimize the LLM training process and introduce the LLM4Decompile-End models to decompile binary directly. The resulting models significantly outperform GPT-4o and Ghidra on the HumanEval and ExeBench benchmarks by over 100% in terms of re-executability rate. Additionally, we improve the standard refinement approach to fine-tune the LLM4Decompile-Ref models, enabling them to effectively refine the decompiled code from Ghidra and achieve a further 16.2% improvement over the LLM4Decompile-End. LLM4Decompile demonstrates the potential of LLMs to revolutionize binary code decompilation, delivering remarkable improvements in readability and executability while complementing conventional tools for optimal results.
Paper: https://arxiv.org/pdf/2403.05286v3.pdf
Code: https://github.com/albertan017/LLM4Decompile
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
https://news.1rj.ru/str/DataScienceT
❤1👍1
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
24 Jan 2025 · Kai-Tuo Xu, Feng-Long Xie, Xu Tang, Yao Hu ·
Paper: https://arxiv.org/pdf/2501.14350v1.pdf
Code: https://github.com/fireredteam/fireredasr
Datasets: LibriSpeech - AISHELL-1 - AISHELL-2 - WenetSpeech
24 Jan 2025 · Kai-Tuo Xu, Feng-Long Xie, Xu Tang, Yao Hu ·
We present FireRedASR, a family of large-scale automatic speech recognition (ASR) models for Mandarin, designed to meet diverse requirements in superior performance and optimal efficiency across various applications. FireRedASR comprises two variants: FireRedASR-LLM: Designed to achieve state-of-the-art (SOTA) performance and to enable seamless end-to-end speech interaction. It adopts an Encoder-Adapter-LLM framework leveraging large language model (LLM) capabilities. On public Mandarin benchmarks, FireRedASR-LLM (8.3B parameters) achieves an average Character Error Rate (CER) of 3.05%, surpassing the latest SOTA of 3.33% with an 8.4% relative CER reduction (CERR). It demonstrates superior generalization capability over industrial-grade baselines, achieving 24%-40% CERR in multi-source Mandarin ASR scenarios such as video, live, and intelligent assistant. FireRedASR-AED: Designed to balance high performance and computational efficiency and to serve as an effective speech representation module in LLM-based speech models. It utilizes an Attention-based Encoder-Decoder (AED) architecture. On public Mandarin benchmarks, FireRedASR-AED (1.1B parameters) achieves an average CER of 3.18%, slightly worse than FireRedASR-LLM but still outperforming the latest SOTA model with over 12B parameters. It offers a more compact size, making it suitable for resource-constrained applications. Moreover, both models exhibit competitive results on Chinese dialects and English speech benchmarks and excel in singing lyrics recognition.
Paper: https://arxiv.org/pdf/2501.14350v1.pdf
Code: https://github.com/fireredteam/fireredasr
Datasets: LibriSpeech - AISHELL-1 - AISHELL-2 - WenetSpeech
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
https://news.1rj.ru/str/DataScienceT
👍2
⭐️ Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
🖥 Github: https://github.com/bcmi/Light-A-Video
📕 Paper: https://arxiv.org/abs/2502.08590v1
🌟 Dataset: https://paperswithcode.com/task/image-relighting
🖥 Github: https://github.com/bcmi/Light-A-Video
📕 Paper: https://arxiv.org/abs/2502.08590v1
🌟 Dataset: https://paperswithcode.com/task/image-relighting
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
https://news.1rj.ru/str/DataScienceT
👍3
3b3d57df_374f_4bf3_9752_26a5010a718e.gif
56.6 MB
MedRAX: Medical Reasoning Agent for Chest X-ray
4 Feb 2025 · Adibvafa Fallahpour, Jun Ma, Alif Munim, Hongwei Lyu, Bo wang ·
paper: https://arxiv.org/pdf/2502.02673v1.pdf
Code: https://github.com/bowang-lab/medrax
4 Feb 2025 · Adibvafa Fallahpour, Jun Ma, Alif Munim, Hongwei Lyu, Bo wang ·
Chest X-rays (CXRs) play an integral role in driving critical decisions in disease management and patient care. While recent innovations have led to specialized models for various CXR interpretation tasks, these solutions often operate in isolation, limiting their practical utility in clinical practice. We present #MedRAX, the first versatile AI agent that seamlessly integrates state-of-the-art CXR analysis tools and multimodal large language models into a unified framework. MedRAX dynamically leverages these models to address complex medical queries without requiring additional training. To rigorously evaluate its capabilities, we introduce ChestAgentBench, a comprehensive benchmark containing 2,500 complex medical queries across 7 diverse categories. Our experiments demonstrate that MedRAX achieves state-of-the-art performance compared to both open-source and proprietary models, representing a significant step toward the practical deployment of automated CXR interpretation systems.
paper: https://arxiv.org/pdf/2502.02673v1.pdf
Code: https://github.com/bowang-lab/medrax
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
https://news.1rj.ru/str/DataScienceT
❤1
Bayesian Sample Inference
🖥 Github: https://github.com/martenlienen/bsi
📕 Paper: https://arxiv.org/abs/2502.07580
🌟 Dataset: https://paperswithcode.com/dataset/cifar-10
🌟 Dataset: https://paperswithcode.com/dataset/cifar-10
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
❤1👍1
On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices
5 Feb 2025 · Bosung Kim, Kyuhwan Lee, Isu Jeong, Jungmin Cheon, Yeojin Lee, Seulki Lee ·
Paper:https://arxiv.org/pdf/2502.04363v1.pdf
Code: https://github.com/eai-lab/on-device-sora
5 Feb 2025 · Bosung Kim, Kyuhwan Lee, Isu Jeong, Jungmin Cheon, Yeojin Lee, Seulki Lee ·
We present On-device #Sora, a first pioneering solution for diffusion-based on-device text-to-video generation that operates efficiently on smartphone-grade devices. Building on Open-Sora, On-device Sora applies three novel techniques to address the challenges of diffusion-based text-to-video generation on computation- and memory-limited mobile devices. First, Linear Proportional Leap (#LPL) reduces the excessive denoising steps required in video diffusion through an efficient leap-based approach. Second, Temporal Dimension Token Merging (#TDTM) minimizes intensive token-processing computation in attention layers by merging consecutive tokens along the temporal dimension. Third, Concurrent Inference with Dynamic Loading (CI-DL) dynamically partitions large models into smaller blocks and loads them into memory for concurrent model inference, effectively addressing the challenges of limited device memory. We implement On-device Sora on the iPhone 15 Pro, and the experimental evaluations demonstrate that it is capable of generating high-quality videos on the device, comparable to those produced by Open-Sora running on high-end GPUs. These results show that On-device Sora enables efficient and high-quality video generation on resource-constrained mobile devices, expanding accessibility, ensuring user privacy, reducing dependence on cloud infrastructure, and lowering associated costs. We envision the proposed On-device Sora as a significant first step toward democratizing state-of-the-art generative technologies, enabling video generation capabilities on commodity mobile and embedded devices.
Paper:https://arxiv.org/pdf/2502.04363v1.pdf
Code: https://github.com/eai-lab/on-device-sora
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
https://news.1rj.ru/str/DataScienceT
👍3❤1
Accelerating Data Processing and Benchmarking of AI Models for Pathology
10 Feb 2025 · Andrew Zhang, Guillaume Jaume, Anurag Vaidya, Tong Ding, Faisal Mahmood ·
Paper: https://arxiv.org/pdf/2502.06750v1.pdf
Codes:
https://github.com/mahmoodlab/trident
https://github.com/mahmoodlab/patho-bench
10 Feb 2025 · Andrew Zhang, Guillaume Jaume, Anurag Vaidya, Tong Ding, Faisal Mahmood ·
Advances in foundation modeling have reshaped computational pathology. However, the increasing number of available #models and lack of standardized benchmarks make it increasingly complex to assess their strengths, limitations, and potential for further development. To address these challenges, we introduce a new suite of software tools for whole-slide image processing, foundation model benchmarking, and curated publicly available tasks. We anticipate that these resources will promote transparency, reproducibility, and continued progress in the field.
Paper: https://arxiv.org/pdf/2502.06750v1.pdf
Codes:
https://github.com/mahmoodlab/trident
https://github.com/mahmoodlab/patho-bench
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
https://news.1rj.ru/str/DataScienceT
👍1
LIMO: Less is More for Reasoning
5 Feb 2025 · Yixin Ye, Zhen Huang, Yang Xiao, Ethan Chern, Shijie Xia, PengFei Liu ·
Paper: https://arxiv.org/pdf/2502.03387v1.pdf
Codes:
https://github.com/gair-nlp/limo
https://github.com/zhaoolee/garss
5 Feb 2025 · Yixin Ye, Zhen Huang, Yang Xiao, Ethan Chern, Shijie Xia, PengFei Liu ·
We present a fundamental discovery that challenges our understanding of how complex reasoning emerges in large language models. While conventional wisdom suggests that sophisticated reasoning tasks demand extensive training data (>100,000 examples), we demonstrate that complex mathematical reasoning abilities can be effectively elicited with surprisingly few examples. Through comprehensive experiments, our proposed model LIMO demonstrates unprecedented performance in mathematical reasoning. With merely 817 curated training samples, LIMO achieves 57.1% accuracy on AIME and 94.8% on #MATH, improving from previous SFT-based models' 6.5% and 59.2% respectively, while only using 1% of the training data required by previous approaches. LIMO demonstrates exceptional out-of-distribution generalization, achieving 40.5% absolute improvement across 10 diverse benchmarks, outperforming models trained on 100x more data, challenging the notion that SFT leads to memorization rather than generalization. Based on these results, we propose the Less-Is-More Reasoning Hypothesis (#LIMO Hypothesis): In foundation models where domain knowledge has been comprehensively encoded during pre-training, sophisticated reasoning capabilities can emerge through minimal but precisely orchestrated demonstrations of cognitive processes. This hypothesis posits that the elicitation threshold for complex reasoning is determined by two key factors: (1) the completeness of the model's encoded knowledge foundation during pre-training, and (2) the effectiveness of post-training examples as "cognitive templates" that show the model how to utilize its knowledge base to solve complex reasoning tasks. To facilitate reproducibility and future research in data-efficient reasoning
Paper: https://arxiv.org/pdf/2502.03387v1.pdf
Codes:
https://github.com/gair-nlp/limo
https://github.com/zhaoolee/garss
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
https://news.1rj.ru/str/DataScienceT
👍3
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation
7 Feb 2025 · Shilong Zhang, Wenbo Li, Shoufa Chen, Chongjian Ge, Peize Sun, Yida Zhang, Yi Jiang, Zehuan Yuan, Binyue Peng, Ping Luo ·
Paper: https://arxiv.org/pdf/2502.05179v1.pdf
Code: https://github.com/foundationvision/flashvideo
7 Feb 2025 · Shilong Zhang, Wenbo Li, Shoufa Chen, Chongjian Ge, Peize Sun, Yida Zhang, Yi Jiang, Zehuan Yuan, Binyue Peng, Ping Luo ·
DiT diffusion models have achieved great success in text-to-video generation, leveraging their scalability in model capacity and data scale. High content and motion fidelity aligned with text prompts, however, often require large model parameters and a substantial number of function evaluations (NFEs). Realistic and visually appealing details are typically reflected in high resolution outputs, further amplifying computational demands especially for single stage DiT models. To address these challenges, we propose a novel two stage framework, #FlashVideo, which strategically allocates model capacity and NFEs across stages to balance generation fidelity and quality. In the first stage, prompt fidelity is prioritized through a low resolution generation process utilizing large parameters and sufficient NFEs to enhance computational efficiency. The second stage establishes flow matching between low and high resolutions, effectively generating fine details with minimal NFEs. Quantitative and visual results demonstrate that FlashVideo achieves state-of-the-art high resolution video generation with superior computational efficiency. Additionally, the two-stage design enables users to preview the initial output before committing to full resolution generation, thereby significantly reducing computational costs and wait times as well as enhancing commercial viability .
Paper: https://arxiv.org/pdf/2502.05179v1.pdf
Code: https://github.com/foundationvision/flashvideo
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
https://news.1rj.ru/str/DataScienceT
👍2❤1
OmniParser for Pure Vision Based GUI Agent
1 Aug 2024 · Yadong Lu, Jianwei Yang, Yelong Shen, Ahmed Awadallah
Paper: https://arxiv.org/pdf/2408.00203v1.pdf
Code: https://github.com/microsoft/omniparser
Dataset: ScreenSpot
Note: Ranked #10 on Natural Language Visual Grounding on #ScreenSpot
1 Aug 2024 · Yadong Lu, Jianwei Yang, Yelong Shen, Ahmed Awadallah
The recent success of large vision language models shows great potential in driving the agent system operating on user interfaces. However, we argue that the power multimodal models like GPT-4V as a general agent on multiple operating systems across different applications is largely underestimated due to the lack of a robust screen parsing technique capable of: 1) reliably identifying interactable icons within the user interface, and 2) understanding the semantics of various elements in a screenshot and accurately associate the intended action with the corresponding region on the screen. To fill these gaps, we introduce \textsc{OmniParser}, a comprehensive method for parsing user interface screenshots into structured elements, which significantly enhances the ability of #GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface. We first curated an interactable icon detection dataset using popular webpages and an icon denoscription dataset. These datasets were utilized to fine-tune specialized models: a detection model to parse interactable regions on the screen and a caption model to extract the functional semantics of the detected elements. \textsc{#OmniParser} significantly improves GPT-4V's performance on ScreenSpot benchmark. And on #Mind2Web and AITW benchmark, \textsc{OmniParser} with screenshot only input #outperforms the GPT-4V baselines requiring additional information outside of screenshot.
Paper: https://arxiv.org/pdf/2408.00203v1.pdf
Code: https://github.com/microsoft/omniparser
Dataset: ScreenSpot
Note: Ranked #10 on Natural Language Visual Grounding on #ScreenSpot
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
https://news.1rj.ru/str/DataScienceT
👍1
PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation
20 Jan 2025 · Jinyu Wang, Jingjing Fu, Rui Wang, Lei Song, Jiang Bian
Paper: https://arxiv.org/pdf/2501.11551v2.pdf
Code: https://github.com/microsoft/pike-rag
Datasets: HotpotQA - 2WikiMultiHopQA
20 Jan 2025 · Jinyu Wang, Jingjing Fu, Rui Wang, Lei Song, Jiang Bian
Despite notable advancements in Retrieval-Augmented Generation (#RAG) systems that expand large language model (LLM) capabilities through external retrieval, these systems often struggle to meet the complex and diverse needs of real-world industrial applications. The reliance on retrieval alone proves insufficient for extracting deep, domain-specific knowledge performing in logical reasoning from specialized corpora. To address this, we introduces PecIalized KnowledgE and Rationale Augmentation Generation (PIKE-RAG), focusing on extracting, understanding, and applying specialized knowledge, while constructing coherent rationale to incrementally steer LLMs toward accurate responses. Recognizing the diverse challenges of industrial tasks, we introduce a new paradigm that classifies tasks based on their complexity in knowledge extraction and application, allowing for a systematic evaluation of RAG systems' problem-solving capabilities. This strategic approach offers a roadmap for the phased development and enhancement of RAG systems, tailored to meet the evolving demands of industrial applications. Furthermore, we propose knowledge atomizing and knowledge-aware task decomposition to effectively extract multifaceted knowledge from the data chunks and iteratively construct the rationale based on original query and the accumulated knowledge, respectively, showcasing exceptional performance across various benchmarks.
Paper: https://arxiv.org/pdf/2501.11551v2.pdf
Code: https://github.com/microsoft/pike-rag
Datasets: HotpotQA - 2WikiMultiHopQA
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
https://news.1rj.ru/str/DataScienceT
👍3❤1
Forwarded from Machine Learning with Python
#DataAnalytics #Python #SQL #RProgramming #DataScience #MachineLearning #DeepLearning #Statistics #DataVisualization #PowerBI #Tableau #LinearRegression #Probability #DataWrangling #Excel #AI #ArtificialIntelligence #BigData #DataAnalysis #NeuralNetworks #SupervisedLearning #IBMDataScience #FreeCourses #Certification #LearnDataScience
https://news.1rj.ru/str/CodeProgrammer🖥
Please open Telegram to view this post
VIEW IN TELEGRAM
👍7
Enhance-A-Video: Better Generated Video for Free
11 Feb 2025 · Yang Luo, Xuanlei Zhao, Mengzhao Chen, Kaipeng Zhang, Wenqi Shao, Kai Wang, Zhangyang Wang, Yang You
Paper: https://arxiv.org/pdf/2502.07508v1.pdf
Code: https://github.com/NUS-HPC-AI-Lab/Enhance-A-Video
11 Feb 2025 · Yang Luo, Xuanlei Zhao, Mengzhao Chen, Kaipeng Zhang, Wenqi Shao, Kai Wang, Zhangyang Wang, Yang You
DiT-based video generation has achieved remarkable results, but research into enhancing existing models remains relatively unexplored. In this work, we introduce a training-free approach to enhance the coherence and quality of DiT-based generated videos, named Enhance-A-Video. The core idea is enhancing the cross-frame correlations based on non-diagonal temporal attention distributions. Thanks to its simple design, our approach can be easily applied to most DiT-based video generation frameworks without any retraining or fine-tuning. Across various DiT-based video generation models, our approach demonstrates promising improvements in both temporal consistency and visual quality. We hope this research can inspire future explorations in video generation enhancement.
Paper: https://arxiv.org/pdf/2502.07508v1.pdf
Code: https://github.com/NUS-HPC-AI-Lab/Enhance-A-Video
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
https://news.1rj.ru/str/DataScienceT
❤1👍1
Accelerating Data Processing and Benchmarking of AI Models for Pathology
10 Feb 2025 · Andrew Zhang, Guillaume Jaume, Anurag Vaidya, Tong Ding, Faisal Mahmood
Advances in foundation modeling have reshaped computational pathology. However, the increasing number of available models and lack of standardized benchmarks make it increasingly complex to assess their strengths, limitations, and potential for further development. To address these challenges, we introduce a new suite of software tools for whole-slide image processing, foundation model benchmarking, and curated publicly available tasks. We anticipate that these resources will promote transparency, reproducibility, and continued progress in the field.
Paper: https://arxiv.org/pdf/2502.06750v1.pdf
Codes:
https://github.com/mahmoodlab/trident
https://github.com/mahmoodlab/patho-bench
10 Feb 2025 · Andrew Zhang, Guillaume Jaume, Anurag Vaidya, Tong Ding, Faisal Mahmood
Advances in foundation modeling have reshaped computational pathology. However, the increasing number of available models and lack of standardized benchmarks make it increasingly complex to assess their strengths, limitations, and potential for further development. To address these challenges, we introduce a new suite of software tools for whole-slide image processing, foundation model benchmarking, and curated publicly available tasks. We anticipate that these resources will promote transparency, reproducibility, and continued progress in the field.
Paper: https://arxiv.org/pdf/2502.06750v1.pdf
Codes:
https://github.com/mahmoodlab/trident
https://github.com/mahmoodlab/patho-bench
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
https://news.1rj.ru/str/DataScienceT
👍1
The Hundred-Page Language Models Book
Read it:
https://github.com/aburkov/theLMbook
Read it:
https://github.com/aburkov/theLMbook
#LLM #NLP #ML #AI #PYTHON #PYTORCH
https://news.1rj.ru/str/DataScienceM
👍4
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
Paper: https://arxiv.org/pdf/2502.10248v1.pdf
Codes:
https://github.com/phixion/phixion
https://github.com/stepfun-ai/step-video-t2v
We present Step-Video-T2V, a state-of-the-art text-to-video pre-trained model with 30B parameters and the ability to generate videos up to 204 frames in length. A deep compression Variational Autoencoder, Video-VAE, is designed for video generation tasks, achieving 16x16 spatial and 8x temporal compression ratios, while maintaining exceptional video reconstruction quality. User prompts are encoded using two bilingual text encoders to handle both English and Chinese. A DiT with 3D full attention is trained using Flow Matching and is employed to denoise input noise into latent frames. A video-based DPO approach, Video-DPO, is applied to reduce artifacts and improve the visual quality of the generated videos. We also detail our training strategies and share key observations and insights. Step-Video-T2V's performance is evaluated on a novel video generation benchmark, Step-Video-T2V-Eval, demonstrating its state-of-the-art text-to-video quality when compared with both open-source and commercial engines. Additionally, we discuss the limitations of current diffusion-based model paradigm and outline future directions for video foundation models. We make both Step-Video-T2V and Step-Video-T2V-Eval available at https://github.com/stepfun-ai/Step-Video-T2V. The online version can be accessed from https://yuewen.cn/videos as well. Our goal is to accelerate the innovation of video foundation models and empower video content creators.
Paper: https://arxiv.org/pdf/2502.10248v1.pdf
Codes:
https://github.com/phixion/phixion
https://github.com/stepfun-ai/step-video-t2v
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
https://news.1rj.ru/str/DataScienceT
👍3
Bridging Text and Vision: A Multi-View Text-Vision Registration Approach for Cross-Modal Place Recognition
🖥 Github: https://github.com/nuozimiaowu/Text4VPR
📕 Paper: https://arxiv.org/abs/2502.14195v1
🌟 Dataset: https://paperswithcode.com/task/cross-modal-place-recognition
🌟 Dataset: https://paperswithcode.com/task/cross-modal-place-recognition
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
👍1
KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAG
13 Feb 2025 · Yiqian Huang, Shiqi Zhang, Xiaokui Xiao ·
Paper: https://arxiv.org/pdf/2502.09304v1.pdf
Code: https://github.com/waetr/KET-RAG
13 Feb 2025 · Yiqian Huang, Shiqi Zhang, Xiaokui Xiao ·
Graph-RAG constructs a knowledge graph from text chunks to improve retrieval in Large Language Model (LLM)-based question answering. It is particularly useful in domains such as biomedicine, law, and political science, where retrieval often requires multi-hop reasoning over proprietary documents. Some existing Graph-RAG systems construct #KNN graphs based on text chunk relevance, but this coarse-grained approach fails to capture entity relationships within texts, leading to sub-par retrieval and generation quality. To address this, recent solutions leverage LLMs to extract entities and relationships from text chunks, constructing triplet-based knowledge graphs. However, this approach incurs significant indexing costs, especially for large document collections. To ensure a good result accuracy while reducing the indexing cost, we propose KET-RAG, a multi-granular indexing framework. KET-RAG first identifies a small set of key text chunks and leverages an #LLM to construct a knowledge graph skeleton. It then builds a text-keyword bipartite graph from all text chunks, serving as a lightweight alternative to a full knowledge graph. During retrieval, KET-RAG searches both structures: it follows the local search strategy of existing Graph-RAG systems on the skeleton while mimicking this search on the bipartite graph to improve retrieval quality. We evaluate eight solutions on two real-world datasets, demonstrating that KET-RAG outperforms all competitors in indexing cost, retrieval effectiveness, and generation quality. Notably, it achieves comparable or superior retrieval quality to Microsoft's Graph-RAG while reducing indexing costs by over an order of magnitude. Additionally, it improves the generation quality by up to 32.4% while lowering indexing costs by around 20%.
Paper: https://arxiv.org/pdf/2502.09304v1.pdf
Code: https://github.com/waetr/KET-RAG
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
https://news.1rj.ru/str/DataScienceT
👍2