Measuring The Impact Of Programming Language Distribution (Google)
Current benchmarks for evaluating neural code models focus on only a small subset of programming languages, excluding many popular languages such as Go or Rust. To ameliorate this issue, authors present the BabelCode framework for execution-based evaluation of any benchmark in any language
BabelCode: https://github.com/google-research/babelcode
Current benchmarks for evaluating neural code models focus on only a small subset of programming languages, excluding many popular languages such as Go or Rust. To ameliorate this issue, authors present the BabelCode framework for execution-based evaluation of any benchmark in any language
BabelCode: https://github.com/google-research/babelcode
ENASE'23 Technical Program
Conference Areas
1 . Theory and Practice of Systems and Applications Development
2 . Challenges and Novel Approaches to Systems and Software Engineering (SSE)
3 . Systems and Software Quality
4 . Systems and Software Engineering (SSE) for Emerging Domains
Conference Areas
1 . Theory and Practice of Systems and Applications Development
2 . Challenges and Novel Approaches to Systems and Software Engineering (SSE)
3 . Systems and Software Quality
4 . Systems and Software Engineering (SSE) for Emerging Domains
Improving Code Generation by Training with Natural Language Feedback
Imitation learning from language feedback (ILF) is an algorithm for learning from natural language feedback at training time. ILF requires only a small amount of human-written feedback during training and does not require the same feedback at test time, making it both user-friendly and sample-efficient. It can be seen as a form of minimizing the KL divergence to the ground truth distribution and demonstrate a proof-of-concept on a neural program synthesis task.
Imitation learning from language feedback (ILF) is an algorithm for learning from natural language feedback at training time. ILF requires only a small amount of human-written feedback during training and does not require the same feedback at test time, making it both user-friendly and sample-efficient. It can be seen as a form of minimizing the KL divergence to the ground truth distribution and demonstrate a proof-of-concept on a neural program synthesis task.
👍3
An AST-based Code Change Representation and its Performance in Just-in-time Vulnerability Prediction
Authors propose a novel way of representing changes in source code, the Code Change Tree, a form that is designed to keep only the differences between two abstract syntax trees of Java source code. The appoach was evaluated in predicting if a code change introduces a vulnerability against multiple representation types and evaluated them by a number of machine learning models as a baseline. The evaluation is done on a novel dataset VIC.
RQ. 1 Can a vulnerability introducing database generated from a vulnerability fixing commit database be used for vulnerability prediction?
RQ. 2 How effective are Code Change Trees in representing source code changes?
RQ. 3 Are source code metrics sufficient to represent code changes?
dataset paper
VIC dataset
Authors propose a novel way of representing changes in source code, the Code Change Tree, a form that is designed to keep only the differences between two abstract syntax trees of Java source code. The appoach was evaluated in predicting if a code change introduces a vulnerability against multiple representation types and evaluated them by a number of machine learning models as a baseline. The evaluation is done on a novel dataset VIC.
RQ. 1 Can a vulnerability introducing database generated from a vulnerability fixing commit database be used for vulnerability prediction?
RQ. 2 How effective are Code Change Trees in representing source code changes?
RQ. 3 Are source code metrics sufficient to represent code changes?
dataset paper
VIC dataset
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X
CodeGeeX is a multilingual model with 13 billion parameters for code generation. It is pre-trained on 850 billion tokens of 23 programming languages.
- Multilingual Code Generation: CodeGeeX has good performance for generating executable programs in several mainstream programming languages, including Python, C++, Java, JavaScript, Go, etc.
- Crosslingual Code Translation: CodeGeeX supports the translation of code snippets between different languages.
- Customizable Programming Assistant: CodeGeeX is available in the VS Code extension marketplace for free. It supports code completion, explanation, summarization and more, which empower users with a better coding experience.
- Open-Source and Cross-Platform: All codes and model weights are publicly available for research purposes. CodeGeeX supports both Ascend and NVIDIA platforms. It supports inference in a single Ascend 910, NVIDIA V100 or A100.
GitHub
CodeGeeX is a multilingual model with 13 billion parameters for code generation. It is pre-trained on 850 billion tokens of 23 programming languages.
- Multilingual Code Generation: CodeGeeX has good performance for generating executable programs in several mainstream programming languages, including Python, C++, Java, JavaScript, Go, etc.
- Crosslingual Code Translation: CodeGeeX supports the translation of code snippets between different languages.
- Customizable Programming Assistant: CodeGeeX is available in the VS Code extension marketplace for free. It supports code completion, explanation, summarization and more, which empower users with a better coding experience.
- Open-Source and Cross-Platform: All codes and model weights are publicly available for research purposes. CodeGeeX supports both Ascend and NVIDIA platforms. It supports inference in a single Ascend 910, NVIDIA V100 or A100.
GitHub
❤1👍1
Forwarded from Consciousnesses
Nature Language Reasoning, A Survey
This survey paper provides a definition for natural language reasoning in NLP, based on both philosophy and NLP scenarios, discuss what types of tasks require reasoning, and introduce a taxonomy of reasoning.
This survey paper provides a definition for natural language reasoning in NLP, based on both philosophy and NLP scenarios, discuss what types of tasks require reasoning, and introduce a taxonomy of reasoning.
🔥2
BloombergGPT: A Large Language Model for Finance
The work presents BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. Authors construct a 363 billion token dataset based on Bloomberg's extensive data sources. Mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks.
The work presents BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. Authors construct a 363 billion token dataset based on Bloomberg's extensive data sources. Mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks.
CONAN: Diagnosing Batch Failures for Cloud Systems (Microsoft)
Failure diagnosis is critical to the maintenance of large-scale cloud systems, which has attracted tremendous attention from academia and industry over the last decade. In this paper, authors focus on diagnosing batch failures, which occur to a batch of instances of the same subject (e.g., API requests, VMs, nodes, etc.), resulting in degraded service availability and performance. CONAN is an efficient and flexible framework that can automatically extract contrast patterns (failed vs. succeeded, slow vs. normal etc.) from contextual data.
Failure diagnosis is critical to the maintenance of large-scale cloud systems, which has attracted tremendous attention from academia and industry over the last decade. In this paper, authors focus on diagnosing batch failures, which occur to a batch of instances of the same subject (e.g., API requests, VMs, nodes, etc.), resulting in degraded service availability and performance. CONAN is an efficient and flexible framework that can automatically extract contrast patterns (failed vs. succeeded, slow vs. normal etc.) from contextual data.
ICCQ'23: The Third International Conference on Code Quality
- What IS Code Quality: from “ilities” to QWAN
- Mutant Selection Strategies in Mutation Testing
- Understanding Software Performance Challenges - An Empirical Study on Stack Overflow
- Applying Machine Learning Analysis for Software Quality Test
- Test-based and metric-based evaluation of code generation models for practical question answering
Accepted papers
Live
- What IS Code Quality: from “ilities” to QWAN
- Mutant Selection Strategies in Mutation Testing
- Understanding Software Performance Challenges - An Empirical Study on Stack Overflow
- Applying Machine Learning Analysis for Software Quality Test
- Test-based and metric-based evaluation of code generation models for practical question answering
Accepted papers
Live
ICCQ.ru
ICCQ-2023: 3rd International Conference on Code Quality
In cooperation with IEEE Computer Society the event is focused on static analysis, program verification, bug detection, and software maintenance.
Federated Learning with Flexible Control (IBM)
Federated learning (FL) enables distributed model training from local data collected by users. Existing works have separately considered different configurations to make FL more efficient, such as infrequent transmission of model updates, client subsampling, and compression of update vectors. However, an important open problem is how to jointly apply and tune these control knobs in a single FL algorithm.
Is it possible to jointly apply a wide range of control options in a single FL algorithm, to support heterogeneous and time-varying costs of multiple types of resources?
FlexFL is an FL algorithm, which allows flexible configurations in the amount of computation at each client and the amount of communication between clients and the server. This algorithm provides a high degree of freedom in adapting the FL procedure to heterogeneous and dynamically changing resource costs.
Federated learning (FL) enables distributed model training from local data collected by users. Existing works have separately considered different configurations to make FL more efficient, such as infrequent transmission of model updates, client subsampling, and compression of update vectors. However, an important open problem is how to jointly apply and tune these control knobs in a single FL algorithm.
Is it possible to jointly apply a wide range of control options in a single FL algorithm, to support heterogeneous and time-varying costs of multiple types of resources?
FlexFL is an FL algorithm, which allows flexible configurations in the amount of computation at each client and the amount of communication between clients and the server. This algorithm provides a high degree of freedom in adapting the FL procedure to heterogeneous and dynamically changing resource costs.
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection
The paper presents a new dataset, DiverseVul, for detecting software vulnerabilities using deep learning. The dataset contains 150 CWEs, 26,635 vulnerable functions, and 352,606 nonvulnerable functions extracted from 7,861 commits, which is more diverse and twice the size of the previous largest and most diverse dataset, CVEFixes. The authors plan to publish the DiverseVul dataset.
The paper presents a new dataset, DiverseVul, for detecting software vulnerabilities using deep learning. The dataset contains 150 CWEs, 26,635 vulnerable functions, and 352,606 nonvulnerable functions extracted from 7,861 commits, which is more diverse and twice the size of the previous largest and most diverse dataset, CVEFixes. The authors plan to publish the DiverseVul dataset.
Samsung's chip boffins couldn't help but tell ChatGPT their secrets
Samsung has been forced to limit access to ChatGPT after dealing with multiple leaks of confidential info via the chatbot. The leaks reportedly taking place only shortly after the company lifted a ban on the chatbot's use due to concerns over leaking.
Samsung has been forced to limit access to ChatGPT after dealing with multiple leaks of confidential info via the chatbot. The leaks reportedly taking place only shortly after the company lifted a ban on the chatbot's use due to concerns over leaking.
PC Gamer
Samsung's chip boffins couldn't help but tell ChatGPT their secrets
Samsung fab staff had access to ChatGPT for less than a month and leaked confidential info three times.
This media is not supported in your browser
VIEW IN TELEGRAM
Run LLaMA and Alpaca on your computer
$ npx dalai llama install 7b
$ npx dalai serve
$ npx dalai llama install 7b
$ npx dalai serve
👍4
Tabby: Self-hosted AI coding assistant
Self-hosted AI coding assistant. An opensource / on-prem alternative to GitHub Copilot.
- Self-contained, with no need for a DBMS or cloud service
- Web UI for visualizing and configuration models and MLOps.
- OpenAPI interface, easy to integrate with existing infrastructure.
- Consumer level GPU supports (FP-16 weight loading with various optimization).
Self-hosted AI coding assistant. An opensource / on-prem alternative to GitHub Copilot.
- Self-contained, with no need for a DBMS or cloud service
- Web UI for visualizing and configuration models and MLOps.
- OpenAPI interface, easy to integrate with existing infrastructure.
- Consumer level GPU supports (FP-16 weight loading with various optimization).
GitHub
GitHub - TabbyML/tabby: Self-hosted AI coding assistant
Self-hosted AI coding assistant. Contribute to TabbyML/tabby development by creating an account on GitHub.
👎1
Towards Efficient Fine-tuning of Pre-trained Code Models
There are many studies on accelerating fine-tuning (FT) process. The paper conducts experimental study to explore what happens to layer-wise code knowledge and pre-trained representations during FT. The authors propose alternatives to fine-tune the large pre-trained code model.
The experimental study shows that the lexical, syntactic, and structural properties of source code are mainly captured in the lower, intermediate, and higher layers, respectively, while the semantic property spans across the entire model. The basic code properties captured by lower and intermediate layers are still preserved during FT.
Telly efficiently fine-tunes pre-trained code models via selective layer freezing. The experiments on various downstream tasks demonstrate that both training parameters and time costs can be reduced, while performance is similar or even better.
There are many studies on accelerating fine-tuning (FT) process. The paper conducts experimental study to explore what happens to layer-wise code knowledge and pre-trained representations during FT. The authors propose alternatives to fine-tune the large pre-trained code model.
The experimental study shows that the lexical, syntactic, and structural properties of source code are mainly captured in the lower, intermediate, and higher layers, respectively, while the semantic property spans across the entire model. The basic code properties captured by lower and intermediate layers are still preserved during FT.
Telly efficiently fine-tunes pre-trained code models via selective layer freezing. The experiments on various downstream tasks demonstrate that both training parameters and time costs can be reduced, while performance is similar or even better.
👍1
Evaluating AIGC Detectors on Code Content
Artificial Intelligence Generated Content (AIGC) has garnered considerable attention for its impressive performance, with ChatGPT emerging as a leading AIGC model that produces high-quality responses across various applications, including software development and maintenance.
Numerous AIGC detectors have been developed and evaluated on natural language data. However, their performance on code-related content generated by ChatGPT remains unexplored. To fill this gap, this paper presents the first empirical study on evaluating existing AIGC detectors in the software domain.
The results indicate that AIGC detectors demonstrate lower performance on code-related data compared to natural language data. Fine-tuning can enhance detector performance, especially for content within the same domain; but generalization remains a challenge. The human evaluation reveals that detection by humans is quite challenging.
Artificial Intelligence Generated Content (AIGC) has garnered considerable attention for its impressive performance, with ChatGPT emerging as a leading AIGC model that produces high-quality responses across various applications, including software development and maintenance.
Numerous AIGC detectors have been developed and evaluated on natural language data. However, their performance on code-related content generated by ChatGPT remains unexplored. To fill this gap, this paper presents the first empirical study on evaluating existing AIGC detectors in the software domain.
The results indicate that AIGC detectors demonstrate lower performance on code-related data compared to natural language data. Fine-tuning can enhance detector performance, especially for content within the same domain; but generalization remains a challenge. The human evaluation reveals that detection by humans is quite challenging.
AI for IT Operations (AIOps) on Cloud Platforms: Reviews, Opportunities and Challenges (Salesforce AI)
A review of the AIOps vision, trends challenges and opportunities, specifically focusing on the underlying AI techniques.
1. INTRODUCTION
2. CONTRIBUTION OF THIS SURVEY
3. DATA FOR AIOPS
A. Metrics
B. Logs
C. Traces
D. Other data
4. INCIDENT DETECTION
A. Metrics based Incident Detection
B. Logs based Incident Detection
C. Traces and Multimodal Incident Detection
5. FAILURE PREDICTION
A. Metrics based Failure Prediction
B. Logs based Incident Detection
6. ROOT CAUSE ANALYSIS
A. Metric-based RCA
B. Log-based RCA
C. Trace-based and Multimodal RCA
7. AUTOMATED ACTIONS
A. Automated Remediation
B. Auto-scaling
C. Resource Management
8. FUTURE OF AIOPS
A. Common AI Challenges for AIOps
B. Opportunities and Future Trends
9. CONCLUSION
A review of the AIOps vision, trends challenges and opportunities, specifically focusing on the underlying AI techniques.
1. INTRODUCTION
2. CONTRIBUTION OF THIS SURVEY
3. DATA FOR AIOPS
A. Metrics
B. Logs
C. Traces
D. Other data
4. INCIDENT DETECTION
A. Metrics based Incident Detection
B. Logs based Incident Detection
C. Traces and Multimodal Incident Detection
5. FAILURE PREDICTION
A. Metrics based Failure Prediction
B. Logs based Incident Detection
6. ROOT CAUSE ANALYSIS
A. Metric-based RCA
B. Log-based RCA
C. Trace-based and Multimodal RCA
7. AUTOMATED ACTIONS
A. Automated Remediation
B. Auto-scaling
C. Resource Management
8. FUTURE OF AIOPS
A. Common AI Challenges for AIOps
B. Opportunities and Future Trends
9. CONCLUSION
👍1