ml4se – Telegram
ml4se
505 subscribers
446 photos
1 file
524 links
Machine Learning for Software Engineering
Download Telegram
Fixing Code Generation Errors for Large Language Models

The authors conducted ten rounds of tests on 14 LLMs using the HumanEval dataset. Through manual analysis of the test results, they found that these LLMs achieved an average of 84.07% of their reported performance.

They also investigated the relationship between Pass@1 results, model inference time, and model parameter size. The analysis revealed a positive correlation between Pass@1 results and model parameter size, while no significant correlation was observed between inference time and parameter size.

Subsequently, the authors performed an in-depth analysis of errors in the test results, extracting and categorizing 12,837 errors into 14 types. Through the analysis, they identified 19 specific causes leading to these errors.

The proposed a fixing method can fix three types of errors, improving the performance of 14 LLMs on HumanEval and MBPP datasets with average increases of 9.5% and 5.4%, respectively.
Chat template viewer

Different LLMs expect very different input formats. HuggingFace added chat templates, they are part of the tokenizer. Chat templates pecify how to convert conversations, represented as lists of messages, into a single string in the format that the model expects. To learn more about chat_template in the different models, visit this.
The 2024 Nobel Prize in Physics has been awarded to John J. Hopfield and Geoffrey E. Hinton

“for foundational discoveries and inventions that enable machine learning with artificial neural networks.”
🎉61🤔1😱1
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

The authors analyzed the errors of LLMs by examining their internal representations. They discover that information related to truthfulness is localized within the exact answer tokens. From a practical perspective, this finding enhances error detection methods applicable to production-level LLMs.

The code is coming soon
State of AI Report 2024

Key takeways from the 2024 Report include:
- Frontier lab performance begins to converge and proprietary models lose their edge
- Planning and reasoning take priority in LLM research
- Foundation models demonstrate their ability to break out of language
- US sanctions have limited effects on Chinese labs’ ability to produce capable models
- The enterprise value of AI companies has hit $9T
- A handful of AI companies begin to generate serious revenue
- The pseudo-acquisition emerges as an off-ramp for AI companies
- The existential risk discourse has cooled off

PDF
🔥1
2^136279841-1 is the New Largest Known Prime Number

The Great Internet Mersenne Prime Search has discovered a new Mersenne prime number, $2^136279841-1$. At 41,024,320 digits, it eclipses by more than 16 million digits the previous largest known prime number found by GIMPS nearly 6 years ago.
👏4
Top 10 Serverless GPUs: A comprehensive vendor selection

Serverless model offers several advantages:
- Cost Efficiency: You are billed only for the compute time you consume, not for idle server time.
- Scalability: The provider automatically scales the infrastructure to handle varying loads.
- Improved Productivity: No need to manage servers, patch operating systems, or handle scaling. Developers can focus on writing code and business logic rather than managing infrastructure.
- Faster Time to Market: Rapid deployment and updates are possible because there’s no infrastructure to manage.

The article compares top 10 serverless GPU platforms in the emerging market.
Teaching Transformers Modular Arithmetic at Scale

The work introduces novel techniques to help ML models learn modular addition. These techniques—varying the diversity of training data, using an angular embedding for model inputs and outputs, and introducing a regularized loss function—enable ML models to add hundreds of elements mod a large $q$ with high accuracy, a significant improvement over prior work.

Modular addition: given $N$ elements in $Z_q$, compute their sum modulo $q$.
The Artificial Inflation (AI) of Artificial Intelligence (AI)—or AI^2 Bursts its Bubble, Bringing Down the Hype of the AI Threat

While some of the promises of AI have come true, and technology (like ChatGPT and its plugins) will continue to impress with its capabilities, AI-based technologies have largely failed to live up to the mountainous hype. In 2025, the authors expect the industry to pull back on the promises, investment, and hype of new AI capabilities and settle down into what is real versus marketing noise.