ml4se – Telegram
ml4se
501 subscribers
446 photos
1 file
524 links
Machine Learning for Software Engineering
Download Telegram
The 16th Annual AGI Conference

The AGI conferences, since the first one way back in 2008, have been organized by the Artificial General Intelligence Society. The 16th annual AGI conference (AGI-23) will be held as a mixed virtual/F2F event in Stockholm between June 16 and June 19, 2023.

Final Deadline for submitted Papers: March 12, 2023

Appropriate topics for contributed papers include, but are not restricted to:
AGI Architectures
Autonomy and Creativity
Benchmarks and Evaluation
Cognitive Modeling
Multi-Agent Interaction and Collaborative Intelligence
Theoretical Foundation of General Intelligence
Broader Implications of AGI
Knowledge Representation
Reinforcement and Learning Theory
Motivation, Emotion and Affect
Natural Language Understanding
Neurosymbolic AI
Perception and Perceptual Modeling
Reasoning, Inference and Planning
Robotic and Virtual Agent
Simulation and Evolutionary Computation
1
ChatML

OpenAI released ChatGPT API with Chat Markup Language. The basic idea behind ChatML is ensure the LLM model inputs are sent in structured format following ChatML and not as unstructured text.

https://github.com/openai/openai-python/blob/main/chatml.md
👍1
Cops: An Improved Information Retrieval-Based Bug Localization Technique Using Context-Aware Program Simplification

Authors propose a context-aware program simplification technique, which enables statement-level bug localization for Python-based projects. They evaluate COPS on the PyTraceBugs benchmark and compare it to state of-the-art techniques by using four widely used metrics.
👍2
Defectors: A Large, Diverse Python Dataset for Defect Prediction

Defectors is a large dataset for just-in-time and line-level defect prediction. Defectors consists of ≈ 213K source code files (≈ 93K defective and ≈ 120K defect-free) that span across 24 popular Python projects. These projects come from 18 different domains, including machine learning, automation, and internet-of-things.

Dataset: https://zenodo.org/record/7708984#.ZBK7CaJBycw
Planning with Large Language Models for Code Generation

Planning-Guided Transformer Decoding (PG-TD) uses a planning algorithm for lookahead search and guide the Transformer to generate better codes. The algorithm is model-agnostic, which can work with any standard Transformer model, and does not require knowledge of the grammar of the generated programs. A direct integration of the planning algorithm with the Transformer decoding process can cause redundant uses of the Transformer beam search algorithm.

https://codeaimcts.github.io/
RCABench: Open Benchmarking Platform for Root Cause Analysis

RCABench is an end-to-end benchmarking platform for evaluating root cause analysis (RCA) techniques across various targeted programs. Fuzzing has contributed to automatically identifying bugs and vulnerabilities in the software testing field. Although it can efficiently generate crashing inputs, these inputs are usually analyzed manually. Several root cause analysis (RCA) techniques have been proposed to automatically analyze the root causes of crashes to mitigate this cost.

Repository: https://github.com/RICSecLab/RCABench
PaLM API & MakerSuite

In order to simplify the development of applications using the PaLM API, Google offered the MakerSuite tool for quickly prototyping their own systems. End-user-friendly AI is fine-tuned right in the browser, while performance-intensive implementation and deployment tasks are performed on the Google Cloud infrastructure.
Software Vulnerability Prediction Knowledge Transferring Between Programming Languages

One of the biggest challenges in this area is the lack of code samples for all different programming languages. In this study, authors address this issue by proposing a transfer learning technique to leverage available datasets and generate a model to detect common vulnerabilities in different programming languages. They use C source code samples to train a CNN model, then, they use Java source code samples to adopt and evaluate the learned model. The authors use code samples from two benchmark datasets: NIST Software Assurance Reference Dataset (SARD) and Draper VDISC dataset. The results show that proposed model detects vulnerabilities in both C and Java codes with average recall of 72%.
InferFix: End-to-End Program Repair with LLMs over Retrieval-Augmented Prompts (Microsoft)

InferFix: a transformer-based program repair framework paired with a state-of-the-art static analyzer to fix critical security and performance bugs. InferFix combines a Retriever – transformer encoder model pretrained via contrastive learning objective, which aims at searching for semantically equivalent bugs and corresponding fixes; and a Generator – a large language model (12 billion parameter Codex Cushman model) finetuned on supervised bug-fix data with prompts augmented via adding bug type annotations and semantically similar fixes retrieved from an external non-parametric memory.

InferredBugs: a novel, metadata-rich dataset of bugs extracted by executing the Infer static analyzer on the change histories of thousands of Java and C# repositories.
SecretBench: A Dataset of Software Secrets

SecretBench is a labeled dataset of source codes containing 97,479 secrets (of which 15,084 are true secrets) of various secret types extracted from 818 public GitHub repositories. The dataset covers 49 programming languages and 311 file types.

Dataset: https://github.com/setu1421/SecretBench
GitHub Code Dataset

* 115M code files from GitHub
* 32 programming languages
* 1TB of data

The dataset was created from the public GitHub dataset on Google BiqQuery.

from datasets import load_dataset
ds = load_dataset("codeparrot/github-code", streaming=True, split="train")
👍3😱2
ChatGPT Prompt Patterns for Improving Code Quality, Refactoring, Requirements Elicitation, and Software Design

This paper presents prompt design techniques for software engineering, in the form of patterns, to solve common problems when using LLMs, such as ChatGPT to automate common software engineering activities, such as ensuring code is decoupled from third-party libraries and creating an API specification from a requirements list.
👍2
DACOS: A Manually Annotated Dataset of Code Smells

DACOS (DAtaset of COde Smells) is a manually annotated dataset containing 10,267 annotations for 5,192 code snippets. The dataset targets three kinds of code smells at different granularity
* multifaceted abstraction
* complex method, and
* long parameter list

Dataset: https://zenodo.org/record/7570428#.ZBrxX6JBycw
Tagman (a web platform to create a manually annotated dataset of smells): https://github.com/SMART-Dal/Tagman
Mirror: A Natural Language Interface for Data Querying, Summarization, and Visualization

Mirror is an open-source platform for data exploration and analysis powered by large language models. Mirror offers an intuitive natural language interface for querying databases, and automatically generates executable SQL commands to retrieve relevant data and summarize it in natural language. In addition, users can preview and manually edit the generated SQL commands to ensure the accuracy of their queries. Mirror also generates visualizations to facilitate understanding of the data. Designed with flexibility and human input in mind, Mirror is suitable for both experienced data analysts and non-technical professionals looking to gain insights from their data.

MIrror: https://github.com/mirror-data/mirror
Measuring The Impact Of Programming Language Distribution (Google)

Current benchmarks for evaluating neural code models focus on only a small subset of programming languages, excluding many popular languages such as Go or Rust. To ameliorate this issue, authors present the BabelCode framework for execution-based evaluation of any benchmark in any language

BabelCode: https://github.com/google-research/babelcode
ENASE'23 Technical Program

Conference Areas
1 . Theory and Practice of Systems and Applications Development
2 . Challenges and Novel Approaches to Systems and Software Engineering (SSE)
3 . Systems and Software Quality
4 . Systems and Software Engineering (SSE) for Emerging Domains
Improving Code Generation by Training with Natural Language Feedback

Imitation learning from language feedback (ILF) is an algorithm for learning from natural language feedback at training time. ILF requires only a small amount of human-written feedback during training and does not require the same feedback at test time, making it both user-friendly and sample-efficient. It can be seen as a form of minimizing the KL divergence to the ground truth distribution and demonstrate a proof-of-concept on a neural program synthesis task.
👍3