Measuring the Runtime Performance of Code Produced with GitHub Copilot
GitHub Copilot is an artificially intelligent programming assistant used by many developers. The authors evaluate the runtime performance of code produced when developers use GitHub Copilot versus when they do not. To this end, they conducted a user study with 32 participants where each participant solved two C++ programming problems, one with Copilot and the other without it and measured the run-time performance of the participants’ solutions. The results suggest that using Copilot may produce code with a significantly slower runtime performance.
RQ0: Does using Copilot influence program correctness?
RQ1: Is there a runtime performance difference in code when using GitHub Copilot?
RQ2: Do Copilot’s suggestions sway developers towards or away from code with faster runtime performance?
RQ3: Do characteristics of Copilot users influence the run-time performance when it is used?
GitHub Copilot is an artificially intelligent programming assistant used by many developers. The authors evaluate the runtime performance of code produced when developers use GitHub Copilot versus when they do not. To this end, they conducted a user study with 32 participants where each participant solved two C++ programming problems, one with Copilot and the other without it and measured the run-time performance of the participants’ solutions. The results suggest that using Copilot may produce code with a significantly slower runtime performance.
RQ0: Does using Copilot influence program correctness?
RQ1: Is there a runtime performance difference in code when using GitHub Copilot?
RQ2: Do Copilot’s suggestions sway developers towards or away from code with faster runtime performance?
RQ3: Do characteristics of Copilot users influence the run-time performance when it is used?
RLocator: Reinforcement Learning for Bug Localization
The authors propose RLocator, a RL-based technique to rank the source code files where the bug may reside, given the bug report. The contribution of the study is the formulation of the bug localization problem using the Markov Decision Process, which helps to optimize the evaluation measures directly. RLocator is evaluated on 8,316 bug reports. The authors found that RLocator is better than the other state-of-the-art techniques when using MAP as an evaluation measure and is good most of the time when using MRR. Thus the authors conclude that RL for bug detection is a promising avenue for future exploration.
The authors propose RLocator, a RL-based technique to rank the source code files where the bug may reside, given the bug report. The contribution of the study is the formulation of the bug localization problem using the Markov Decision Process, which helps to optimize the evaluation measures directly. RLocator is evaluated on 8,316 bug reports. The authors found that RLocator is better than the other state-of-the-art techniques when using MAP as an evaluation measure and is good most of the time when using MRR. Thus the authors conclude that RL for bug detection is a promising avenue for future exploration.
Recommending Root-Cause and Mitigation Steps for Cloud Incidents using Large Language Models
In this work, the authors do the first large-scale study to evaluate the effectiveness of LLMs for helping engineers root cause and mitigate production incidents. Human evaluation with actual incident owners shows the efficacy and future potential of using artificial intelligence for resolving cloud incidents.
In this work, the authors do the first large-scale study to evaluate the effectiveness of LLMs for helping engineers root cause and mitigate production incidents. Human evaluation with actual incident owners shows the efficacy and future potential of using artificial intelligence for resolving cloud incidents.
Code Execution with Pre-trained Language Models
Code execution is a fundamental aspect of programming language semantics that reflects the exact behavior of the code. However, most pretrained models for code intelligence ignore the execution trace and only rely on source code and syntactic structures. In this paper, the authors aim to teach pretrained models the real-world code execution process. They propose CodeExecutor, a Transformer-based model that learns to execute arbitrary programs and predict their execution traces.
Code execution is a fundamental aspect of programming language semantics that reflects the exact behavior of the code. However, most pretrained models for code intelligence ignore the execution trace and only rely on source code and syntactic structures. In this paper, the authors aim to teach pretrained models the real-world code execution process. They propose CodeExecutor, a Transformer-based model that learns to execute arbitrary programs and predict their execution traces.
Searching by Code: a New SearchBySnippet Dataset and SnippeR Retrieval Model for Searching by Code Snippets
The authors argue that using a code snippet (and possibly an associated traceback) as a query and looking for answers with bugfixing instructions and code samples is a natural use case that is not covered by existing approaches. The paper presents a new SearchBySnippet dataset implementing the search-by-code use case based on StackOverflow data; it turns out that in this setting, existing architectures fall short of the simplest BM25 baseline even after fine-tuning.
The authors argue that using a code snippet (and possibly an associated traceback) as a query and looking for answers with bugfixing instructions and code samples is a natural use case that is not covered by existing approaches. The paper presents a new SearchBySnippet dataset implementing the search-by-code use case based on StackOverflow data; it turns out that in this setting, existing architectures fall short of the simplest BM25 baseline even after fine-tuning.
CCT-Code: Cross-Consistency Training for Multilingual Clone Detection and Code Search
Understanding semantic similarity is an important aspect of language processing. The authors present a new method CCT-LM that improves this ability via a novel CCT pretraining approach and demonstrate its viability on the clone detection and code search tasks. The proposed CCT-LM model outperforms strong baselines in all presented tasks, proving that CCT pretraining provides better semantic similarity understanding for a language model.
Understanding semantic similarity is an important aspect of language processing. The authors present a new method CCT-LM that improves this ability via a novel CCT pretraining approach and demonstrate its viability on the clone detection and code search tasks. The proposed CCT-LM model outperforms strong baselines in all presented tasks, proving that CCT pretraining provides better semantic similarity understanding for a language model.
CodeT5+: Open Code LLMs for Code Understanding and Generation
Salesforce AI Research proposes CodeT5+, a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks. Such flexibility is enabled by a mixture of pretraining objectives. These objectives cover span denoising, contrastive learning, text-code matching, and causal LM pretraining tasks, on both unimodal and bimodal multilingual code corpora.
The authors observe state-of-the-art model performance on various code-related tasks, such as code generation and completion, math programming, and text-to-code retrieval tasks. Particularly, the instruction-tuned CodeT5+ 16B achieves new SoTA results of 35.0% pass@1 and 54.5% pass@10 on the HumanEval.
- CodeT5+ 220M and 770M
- CodeT5+ 220M-py and 770M-py that are further tuned on Python subset
- CodeT5+: 2B, 6B, and 16B
- InstructCodeT5+ 16B
[GitHub]
Salesforce AI Research proposes CodeT5+, a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks. Such flexibility is enabled by a mixture of pretraining objectives. These objectives cover span denoising, contrastive learning, text-code matching, and causal LM pretraining tasks, on both unimodal and bimodal multilingual code corpora.
The authors observe state-of-the-art model performance on various code-related tasks, such as code generation and completion, math programming, and text-to-code retrieval tasks. Particularly, the instruction-tuned CodeT5+ 16B achieves new SoTA results of 35.0% pass@1 and 54.5% pass@10 on the HumanEval.
- CodeT5+ 220M and 770M
- CodeT5+ 220M-py and 770M-py that are further tuned on Python subset
- CodeT5+: 2B, 6B, and 16B
- InstructCodeT5+ 16B
[GitHub]
🔥3
LLMs and Text-to-SQL task
* LLMs and SQL — writing prompts for Text-to-SQL task
* Evaluating the Text-to-SQL Capabilities of Large Language Models — it is assumed that some queries from the target domain are available
* A Generic Prompt for an LLM that enables NL-to-SQL across Domains and Compositions — completely a cross-domain setting
* How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings — zero-shot, single-domain, and cross-domain text-to-SQL settings
* Divide and Prompt: Chain of Thought Prompting for Text-to-SQL — a new paradigm for prompting Text-to-SQL tasks, which first divides the task into subtasks, and then approach each subtask through CoT
* LLMs and SQL — writing prompts for Text-to-SQL task
* Evaluating the Text-to-SQL Capabilities of Large Language Models — it is assumed that some queries from the target domain are available
* A Generic Prompt for an LLM that enables NL-to-SQL across Domains and Compositions — completely a cross-domain setting
* How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings — zero-shot, single-domain, and cross-domain text-to-SQL settings
* Divide and Prompt: Chain of Thought Prompting for Text-to-SQL — a new paradigm for prompting Text-to-SQL tasks, which first divides the task into subtasks, and then approach each subtask through CoT
👍3
MIT: Generative AI for Constructive Communication Evaluation and New Research Methods
Advances in large language models recently popularized by ChatGPT represent a remarkable leap forward in language processing by machines.
* What does this mean for us, how can we make the most of these advancements, and what are the risks?
* What research opportunities have opened up?
* What kinds of evaluation are called for?
[Schedule]
Advances in large language models recently popularized by ChatGPT represent a remarkable leap forward in language processing by machines.
* What does this mean for us, how can we make the most of these advancements, and what are the risks?
* What research opportunities have opened up?
* What kinds of evaluation are called for?
[Schedule]
Code Alpaca: An Instruction-following LLaMA Model trained on code generation instructions
The project aims to build and share an instruction-following LLaMA model for code generation. The repository contains data, code for fine-tuning the model.
- instruction-following data
- demo
The project aims to build and share an instruction-following LLaMA model for code generation. The repository contains data, code for fine-tuning the model.
- instruction-following data
- demo
👍5
ICSE 2024
Important dates:
- Fri 2 Jun 2023 Research Track First Cycle: Acceptance Notification
- Mon 10 Jul 2023 Research Track First Cycle: Revision due
- Tue 1 Aug 2023 Research Track Second Cycle: Submissions Deadline
- Thu 17 Aug 2023 Workshops Workshop Proposal Submissions Deadline
- Thu 24 Aug 2023 Research Track First Cycle: Final Decisions
- Thu 14 Sep 2023 Workshops Workshop Proposal Acceptance Notification
- Thu 14 Sep 2023 New Ideas and Emerging Results Submission Deadline
- Fri 15 Sep 2023 Research Track First Cycle: Camera-ready Submission
Important dates:
- Fri 2 Jun 2023 Research Track First Cycle: Acceptance Notification
- Mon 10 Jul 2023 Research Track First Cycle: Revision due
- Tue 1 Aug 2023 Research Track Second Cycle: Submissions Deadline
- Thu 17 Aug 2023 Workshops Workshop Proposal Submissions Deadline
- Thu 24 Aug 2023 Research Track First Cycle: Final Decisions
- Thu 14 Sep 2023 Workshops Workshop Proposal Acceptance Notification
- Thu 14 Sep 2023 New Ideas and Emerging Results Submission Deadline
- Fri 15 Sep 2023 Research Track First Cycle: Camera-ready Submission
Microsoft AI Plugin Ecosystem
Microsoft is adopting the same open plugin standard that OpenAI introduced for ChatGPT, enabling interoperability across ChatGPT and the breadth of Microsoft’s copilot offerings. That means developers can now use one platform to build plugins that work across both business and consumer surfaces, including ChatGPT, Bing, Dynamics 365 Copilot, Microsoft 365 Copilot and Windows Copilot. Microsoft also announced it is bringing Bing to ChatGPT as the default search experience.
Microsoft is adopting the same open plugin standard that OpenAI introduced for ChatGPT, enabling interoperability across ChatGPT and the breadth of Microsoft’s copilot offerings. That means developers can now use one platform to build plugins that work across both business and consumer surfaces, including ChatGPT, Bing, Dynamics 365 Copilot, Microsoft 365 Copilot and Windows Copilot. Microsoft also announced it is bringing Bing to ChatGPT as the default search experience.
PERFOGRAPH: A Numerical Aware Program Graph Representation for Performance Optimization and Program Analysis
The remarkable growth and significant success of machine learning have expanded its applications into programming languages and program analysis. However, a key challenge in adopting the latest machine learning methods is the representation of programming languages, which directly impacts the ability of machine learning methods to reason about programs.
To overcome the limitations and challenges of current program representations, the authors propose a novel graph-based program representation called PerfoGraph.
The experimental results demonstrate that PERFOGRAPH outperforms existing representations and sets new state-of-the-art results by reducing the error rate by 7.4% (AMD dataset) and 10% (NVIDIA dataset) in the well-known Device Mapping challenge.
The remarkable growth and significant success of machine learning have expanded its applications into programming languages and program analysis. However, a key challenge in adopting the latest machine learning methods is the representation of programming languages, which directly impacts the ability of machine learning methods to reason about programs.
To overcome the limitations and challenges of current program representations, the authors propose a novel graph-based program representation called PerfoGraph.
The experimental results demonstrate that PERFOGRAPH outperforms existing representations and sets new state-of-the-art results by reducing the error rate by 7.4% (AMD dataset) and 10% (NVIDIA dataset) in the well-known Device Mapping challenge.
CodeTF: One-stop Transformer Library for State-of-the-art Code LLM (Salesforce)
The authors we present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence. CodeTF is designed with a unified interface to enable rapid access and development across different types of models, datasets and tasks. The library supports a collection of pretrained Code LLM models and popular code benchmarks, including a standardized interface to train and serve code LLMs efficiently, and data features such as language-specific parsers and utility functions for extracting code attributes.
The authors we present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence. CodeTF is designed with a unified interface to enable rapid access and development across different types of models, datasets and tasks. The library supports a collection of pretrained Code LLM models and popular code benchmarks, including a standardized interface to train and serve code LLMs efficiently, and data features such as language-specific parsers and utility functions for extracting code attributes.
👍2