ITER: Iterative Neural Repair for Multi-Location Patches
In this paper, an iterative program repair paradigm called ITER is proposed. It is founded on the concept of improving partial patches until they become plausible and correct:
- ITER iteratively improves partial single-location patches by fixing compilation errors and further refining the previously generated code.
- ITER iteratively improves partial patches to construct multi-location patches, with fault localization re-execution.
ITER is implemented for Java based on battle-proven deep neural networks and code representation and is evaluated on 476 bugs from 10 open-source projects in Defects4J 2.0.
In this paper, an iterative program repair paradigm called ITER is proposed. It is founded on the concept of improving partial patches until they become plausible and correct:
- ITER iteratively improves partial single-location patches by fixing compilation errors and further refining the previously generated code.
- ITER iteratively improves partial patches to construct multi-location patches, with fault localization re-execution.
ITER is implemented for Java based on battle-proven deep neural networks and code representation and is evaluated on 476 bugs from 10 open-source projects in Defects4J 2.0.
👍1
A Survey of Trojans in Neural Models of Source Code: Taxonomy and Techniques
With the growing prevalence of neural models in modern software development ecosystem, the security issues in these models have also become widely apparent. Models are susceptible to poisoning by “Trojans", which can lead them to output harmful, insecure code whenever a special “sign” is present in the input; even worse, such capabilities might evade detection. Given these models’ widespread use, it is thus important to study potential Trojan attacks on them. In order to do so it is important to understand how models of code interpret input, and how they can be attacked.
In this work, the authors study literature in Explainable AI and Safe AI to understand poisoning of neural models of code. In order to do so, they establish a novel taxonomy for Trojan AI for code, and present a new aspect-based classification of triggers in neural models of code.
With the growing prevalence of neural models in modern software development ecosystem, the security issues in these models have also become widely apparent. Models are susceptible to poisoning by “Trojans", which can lead them to output harmful, insecure code whenever a special “sign” is present in the input; even worse, such capabilities might evade detection. Given these models’ widespread use, it is thus important to study potential Trojan attacks on them. In order to do so it is important to understand how models of code interpret input, and how they can be attacked.
In this work, the authors study literature in Explainable AI and Safe AI to understand poisoning of neural models of code. In order to do so, they establish a novel taxonomy for Trojan AI for code, and present a new aspect-based classification of triggers in neural models of code.
🔥3
COSCO: On Contrastive Learning of Semantic Similarity for Code to Code Search
The paper introduces a novel code-to-code search technique that enhances the performance of LLMs by including both static and dynamic features as well as utilizing both similar and dissimilar examples during training. The authors present the code search method that encodes dynamic runtime information during training without the need to execute either the corpus under search or the search query at inference time. The proposed approach outperforms the state-of-the-art cross-language search tool by up to 44.7%.
COSCO (github)
RQ1. How does COSCO’s performance compare to the performance of other cross-language code search techniques?
RQ2. Does COSCO’s methodology and performance generalize across different models?
RQ3. Does including semantic similarity scores during training improve code search?
RQ4. How does changing the number of positive and negative comparison samples available for training effect COSCO’s performance?
The paper introduces a novel code-to-code search technique that enhances the performance of LLMs by including both static and dynamic features as well as utilizing both similar and dissimilar examples during training. The authors present the code search method that encodes dynamic runtime information during training without the need to execute either the corpus under search or the search query at inference time. The proposed approach outperforms the state-of-the-art cross-language search tool by up to 44.7%.
COSCO (github)
RQ1. How does COSCO’s performance compare to the performance of other cross-language code search techniques?
RQ2. Does COSCO’s methodology and performance generalize across different models?
RQ3. Does including semantic similarity scores during training improve code search?
RQ4. How does changing the number of positive and negative comparison samples available for training effect COSCO’s performance?
GitHub code search is generally available
New code search and code view are generally available to all users on GitHub.com.
New code search and code view are generally available to all users on GitHub.com.
The GitHub Blog
GitHub code search is generally available
The world’s code is now at your fingertips.
Proceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineering
This book contains the Proceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2023). This conference is sponsored by the Institute for Systems and Technologies of Information, Control and Communication (INSTICC), held in cooperation with the ACM Special Interest Group on Management Information Systems (ACM SIGMIS) and technically co-sponsored by the IEEE SMC - IEEE Technical Committee on Enterprise Information Systems. This year’s ENASE is held in Prague, Czech Republic, from April 24−25.
This book contains the Proceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2023). This conference is sponsored by the Institute for Systems and Technologies of Information, Control and Communication (INSTICC), held in cooperation with the ACM Special Interest Group on Management Information Systems (ACM SIGMIS) and technically co-sponsored by the IEEE SMC - IEEE Technical Committee on Enterprise Information Systems. This year’s ENASE is held in Prague, Czech Republic, from April 24−25.
StarCoder: may the source be with you!
The BigCode community, an open-scientific collaboration working on the responsible development of Code LLMs, introduces StarCoder and StarCoderBase:
- 15.5B parameter models
- 8K context length
- StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process
- StarCoderBase is fine-tuned on 35B Python tokens, resulting in the creation of StarCoder
StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model.
The BigCode community, an open-scientific collaboration working on the responsible development of Code LLMs, introduces StarCoder and StarCoderBase:
- 15.5B parameter models
- 8K context length
- StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process
- StarCoderBase is fine-tuned on 35B Python tokens, resulting in the creation of StarCoder
StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model.
🔥4
The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation
The Vault is an open-source large-scale code-text dataset designed to enhance the training of code-focused LLMs. Existing open-source datasets for training code-based LLMs often face challenges in terms of size, quality, and format. The Vault overcomes these limitations by providing 40M code-text pairs across 10 popular programming languages, thorough cleaning for 10+ prevalent issues, and various levels of code-text pairings, including class, function, and line levels.
The Vault is an open-source large-scale code-text dataset designed to enhance the training of code-focused LLMs. Existing open-source datasets for training code-based LLMs often face challenges in terms of size, quality, and format. The Vault overcomes these limitations by providing 40M code-text pairs across 10 popular programming languages, thorough cleaning for 10+ prevalent issues, and various levels of code-text pairings, including class, function, and line levels.
👍3
Introducing 100K Token Context Windows
- approximately 75K words
- hundreds of pages
- a book, for example "The Great Gatsby" (about 72K tokens)
- a text that will take approximately 5 hours to read
- approximately 75K words
- hundreds of pages
- a book, for example "The Great Gatsby" (about 72K tokens)
- a text that will take approximately 5 hours to read
🔥1
Visualization in the Era of Artificial Intelligence: Experiments for Creating Structural Visualizations by Prompting LLMs
Experiments with 2D/3D visualization using LLMs.
Experiments with 2D/3D visualization using LLMs.
Measuring the Runtime Performance of Code Produced with GitHub Copilot
GitHub Copilot is an artificially intelligent programming assistant used by many developers. The authors evaluate the runtime performance of code produced when developers use GitHub Copilot versus when they do not. To this end, they conducted a user study with 32 participants where each participant solved two C++ programming problems, one with Copilot and the other without it and measured the run-time performance of the participants’ solutions. The results suggest that using Copilot may produce code with a significantly slower runtime performance.
RQ0: Does using Copilot influence program correctness?
RQ1: Is there a runtime performance difference in code when using GitHub Copilot?
RQ2: Do Copilot’s suggestions sway developers towards or away from code with faster runtime performance?
RQ3: Do characteristics of Copilot users influence the run-time performance when it is used?
GitHub Copilot is an artificially intelligent programming assistant used by many developers. The authors evaluate the runtime performance of code produced when developers use GitHub Copilot versus when they do not. To this end, they conducted a user study with 32 participants where each participant solved two C++ programming problems, one with Copilot and the other without it and measured the run-time performance of the participants’ solutions. The results suggest that using Copilot may produce code with a significantly slower runtime performance.
RQ0: Does using Copilot influence program correctness?
RQ1: Is there a runtime performance difference in code when using GitHub Copilot?
RQ2: Do Copilot’s suggestions sway developers towards or away from code with faster runtime performance?
RQ3: Do characteristics of Copilot users influence the run-time performance when it is used?
RLocator: Reinforcement Learning for Bug Localization
The authors propose RLocator, a RL-based technique to rank the source code files where the bug may reside, given the bug report. The contribution of the study is the formulation of the bug localization problem using the Markov Decision Process, which helps to optimize the evaluation measures directly. RLocator is evaluated on 8,316 bug reports. The authors found that RLocator is better than the other state-of-the-art techniques when using MAP as an evaluation measure and is good most of the time when using MRR. Thus the authors conclude that RL for bug detection is a promising avenue for future exploration.
The authors propose RLocator, a RL-based technique to rank the source code files where the bug may reside, given the bug report. The contribution of the study is the formulation of the bug localization problem using the Markov Decision Process, which helps to optimize the evaluation measures directly. RLocator is evaluated on 8,316 bug reports. The authors found that RLocator is better than the other state-of-the-art techniques when using MAP as an evaluation measure and is good most of the time when using MRR. Thus the authors conclude that RL for bug detection is a promising avenue for future exploration.
Recommending Root-Cause and Mitigation Steps for Cloud Incidents using Large Language Models
In this work, the authors do the first large-scale study to evaluate the effectiveness of LLMs for helping engineers root cause and mitigate production incidents. Human evaluation with actual incident owners shows the efficacy and future potential of using artificial intelligence for resolving cloud incidents.
In this work, the authors do the first large-scale study to evaluate the effectiveness of LLMs for helping engineers root cause and mitigate production incidents. Human evaluation with actual incident owners shows the efficacy and future potential of using artificial intelligence for resolving cloud incidents.