ml4se – Telegram
ml4se
503 subscribers
446 photos
1 file
524 links
Machine Learning for Software Engineering
Download Telegram
LLM360: Towards Fully Transparent Open-Source LLMs

Most open-source LLMs have only released partial artifacts, such as the final model weights or inference code. The authors present LLM360, an initiative to fully open-source LLMs, which advocates for all training code and data, model checkpoints, and intermediate results to be made available to the community. As a first step of LLM360, they release two 7B parameter LLMs pre-trained from scratch, Amber and CrystalCoder, including their training code, data, intermediate checkpoints, and analyses: https://www.llm360.ai
👍1
RealCode_eval

It is a benchmark to perform execution-based evaluation of LLM code generation for real github repositories.
RealCode is a dataset of 219 Python functions from 22 github repositories published between June and August on 2023. All these functions are covered with tests in their respective repositories.

- Around 60% of the used repositories are related to the field of AI/LLMs/ML.
- The code in RealCode repositories was not be seen during pretraining of starcoder or codellama as these models were trained before the summer 2023. Deepseek-coder may have seen this code in pretraining.
- Repositories are rolled back to a specific commit during data preparation
- Not all the tests are passed in the repositories.
ENASE 2024 (Deadline Extension)

Regular Paper Submission Extension: January 3, 2024
Position Paper Submission: January 25, 2024
Doctoral Consortium Paper Submission: February 29, 2024

The mission of ENASE (Evaluation of Novel Approaches to Software Engineering) is to be a prime international forum to discuss and publish research findings and IT industry experiences with relation to novel approaches to software engineering.
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

LMs may still produce a valid solution if they not only write code, but also selectively “emulate” the interpreter by generating the expected output and other lines of code that cannot be executed. In this work, the authors propose Chain of Code, a simple extension that improves LM code-driven reasoning. The key idea is to encourage LMs to format semantic sub-tasks in a program as flexible pseudocode that the interpreter can explicitly catch undefined behaviors and hand off to simulate with an LM. Experiments demonstrate that Chain of Code outperforms Chain of Thought and other baselines across a variety of benchmarks; on BIG-Bench Hard, Chain of Code achieves 84%, a gain of 12% over Chain of Thought.

Project: https://chain-of-code.github.io/
Entity-Augmented Code Generation

LLMs are effective in generating high-quality text and encapsulating a broad spectrum of world knowledge. However, these mod-
els are not designed to utilize external information sources. In this paper, the authors use retrieval-augmented LLMs for a new task — code generation using external entities. Existing retrieval-augmented LLMs fail to assign relevance scores between similar entity names, thus the authors propose a novel end-to-end trainable architecture with an scalable entity retriever injected directly into the LLM decoder.
👍1
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning

Adapters is a new open-source library based on the initial version of AdapterHub. It is aimed at unifying parameter-efficient and modular transfer learning. The library integrates 10 diverse adapter methods into a unified interface for easy usage and provides a simple way of leveraging the modularity of adapters by designing composition blocks.

GitHub: https://github.com/adapter-hub/adapters
Building Your Own Product Copilot Challenges Opportunities and Needs

The proliferation of product copilots, driven by advancements in LLMs, has strained existing software engineering processes and tools, leaving software engineers improvising new development practices. The study, involving 26 professional software engineers, revealed critical pain points across the entire engineering process for developing such AI-powered products.
👍1
WaveCoder Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation

CodeOcean is a dataset comprising 20,000 instruction instances across 4 universal code-related tasks, which is aimed at augmenting the effectiveness of instruction tuning and improving the generalization ability of fine-tuned model.

WaveCoder is a fine-tuned Code LLM with Widespread And Versatile Enhanced instruction tuning. Wavecoder models outperform other open-source models in terms of generalization ability across different code-related tasks at the same level of fine-tuning scale.
Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language Models

The authors have presented a large-scale and comprehensive study of how LLMs can understand binary code semantics. They built BinSum, a comprehensive benchmark with an expansive dataset with over 557K binary functions, spanning various code representations, computer architectures, and optimization levels.

RQs:
- To what extent can LLMs comprehend binary code? What input of binary code impacts LLM’s output more?
- Which LLM performs the best on binary code comprehension? Which LLM is more efficient than others?
- How do the different computer architectures and optimization levels affect LLMs’ performance?
- What are additional factors of binary code input influencing LLMs’ comprehension capabilities?
🤔2
TypeEvalPy: A Micro-benchmarking Framework for Python Type Inference Tools

The paper introduces TypeEvalPy, a comprehensive micro benchmarking framework for evaluating type inference tools. TypeEvalPy contains 154 code snippets with 845 type annotations across 18 categories that target various Python features.

GitHub: https://github.com/secure-software-engineering/TypeEvalPy
Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math

MathPile is a specialized corpus centered around mathematics, characterized by its diversity and high quality. The authors plan to open-source different versions of MathPile with the noscripts used for processing, to facilitate future developments in this field.

GitHub: https://github.com/GAIR-NLP/MathPile/
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon

The work introduces Activation Beacon for the extension of LLM’s context length. Activation Beacon condenses the LLM’s raw activations into more compact forms, enabling the LLM to perceive a vast context with a limited context window. As a plug-and-play component for the LLM, it brings in long contextual information while fully preserving the LLM’s existing capabilities on short contexts. The experimental studies show that Activation Beacon is able to extend Llama-2-7B's context length by x100 times (from 4K to 400K), meanwhile achieving a superior result on both long-context generation and understanding tasks.

GitHub: https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/activation_beacon
😱4👍1
Committing without git

How to create a branch with two commits (add file and change file) without running git.
Source: https://matheustavares.gitlab.io/assets/committing-without-git/commit.py
Synergy of Reinforcement Learning and Large Language Models (RL+LLMs) @ AAAI 2024

The goal of the workshop is to bring together RL and LLM communities to facilitate cross-pollination.
Workshop: February 26th 2024

Accepted papers:
- Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4
- Generate Subgoal Images before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts
- CriticGPT: Multimodal LLM as a Critic for Robot Manipulation
- Decision Transformer With Tokenized Actions
- Reinforcement Learning for Optimizing RAG for Domain Chatbots
- Software Security Vulnerability Repair Using Reinforcement Learning with Large Language Models
- Exploring Reinforcement Learning with Large Language Models for Enhancing Badminton Players' Strategies
- DeLF: Designing Learning Environments with Foundation Models