LLMs and Text-to-SQL task
* LLMs and SQL — writing prompts for Text-to-SQL task
* Evaluating the Text-to-SQL Capabilities of Large Language Models — it is assumed that some queries from the target domain are available
* A Generic Prompt for an LLM that enables NL-to-SQL across Domains and Compositions — completely a cross-domain setting
* How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings — zero-shot, single-domain, and cross-domain text-to-SQL settings
* Divide and Prompt: Chain of Thought Prompting for Text-to-SQL — a new paradigm for prompting Text-to-SQL tasks, which first divides the task into subtasks, and then approach each subtask through CoT
* LLMs and SQL — writing prompts for Text-to-SQL task
* Evaluating the Text-to-SQL Capabilities of Large Language Models — it is assumed that some queries from the target domain are available
* A Generic Prompt for an LLM that enables NL-to-SQL across Domains and Compositions — completely a cross-domain setting
* How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings — zero-shot, single-domain, and cross-domain text-to-SQL settings
* Divide and Prompt: Chain of Thought Prompting for Text-to-SQL — a new paradigm for prompting Text-to-SQL tasks, which first divides the task into subtasks, and then approach each subtask through CoT
👍3
MIT: Generative AI for Constructive Communication Evaluation and New Research Methods
Advances in large language models recently popularized by ChatGPT represent a remarkable leap forward in language processing by machines.
* What does this mean for us, how can we make the most of these advancements, and what are the risks?
* What research opportunities have opened up?
* What kinds of evaluation are called for?
[Schedule]
Advances in large language models recently popularized by ChatGPT represent a remarkable leap forward in language processing by machines.
* What does this mean for us, how can we make the most of these advancements, and what are the risks?
* What research opportunities have opened up?
* What kinds of evaluation are called for?
[Schedule]
Code Alpaca: An Instruction-following LLaMA Model trained on code generation instructions
The project aims to build and share an instruction-following LLaMA model for code generation. The repository contains data, code for fine-tuning the model.
- instruction-following data
- demo
The project aims to build and share an instruction-following LLaMA model for code generation. The repository contains data, code for fine-tuning the model.
- instruction-following data
- demo
👍5
ICSE 2024
Important dates:
- Fri 2 Jun 2023 Research Track First Cycle: Acceptance Notification
- Mon 10 Jul 2023 Research Track First Cycle: Revision due
- Tue 1 Aug 2023 Research Track Second Cycle: Submissions Deadline
- Thu 17 Aug 2023 Workshops Workshop Proposal Submissions Deadline
- Thu 24 Aug 2023 Research Track First Cycle: Final Decisions
- Thu 14 Sep 2023 Workshops Workshop Proposal Acceptance Notification
- Thu 14 Sep 2023 New Ideas and Emerging Results Submission Deadline
- Fri 15 Sep 2023 Research Track First Cycle: Camera-ready Submission
Important dates:
- Fri 2 Jun 2023 Research Track First Cycle: Acceptance Notification
- Mon 10 Jul 2023 Research Track First Cycle: Revision due
- Tue 1 Aug 2023 Research Track Second Cycle: Submissions Deadline
- Thu 17 Aug 2023 Workshops Workshop Proposal Submissions Deadline
- Thu 24 Aug 2023 Research Track First Cycle: Final Decisions
- Thu 14 Sep 2023 Workshops Workshop Proposal Acceptance Notification
- Thu 14 Sep 2023 New Ideas and Emerging Results Submission Deadline
- Fri 15 Sep 2023 Research Track First Cycle: Camera-ready Submission
Microsoft AI Plugin Ecosystem
Microsoft is adopting the same open plugin standard that OpenAI introduced for ChatGPT, enabling interoperability across ChatGPT and the breadth of Microsoft’s copilot offerings. That means developers can now use one platform to build plugins that work across both business and consumer surfaces, including ChatGPT, Bing, Dynamics 365 Copilot, Microsoft 365 Copilot and Windows Copilot. Microsoft also announced it is bringing Bing to ChatGPT as the default search experience.
Microsoft is adopting the same open plugin standard that OpenAI introduced for ChatGPT, enabling interoperability across ChatGPT and the breadth of Microsoft’s copilot offerings. That means developers can now use one platform to build plugins that work across both business and consumer surfaces, including ChatGPT, Bing, Dynamics 365 Copilot, Microsoft 365 Copilot and Windows Copilot. Microsoft also announced it is bringing Bing to ChatGPT as the default search experience.
PERFOGRAPH: A Numerical Aware Program Graph Representation for Performance Optimization and Program Analysis
The remarkable growth and significant success of machine learning have expanded its applications into programming languages and program analysis. However, a key challenge in adopting the latest machine learning methods is the representation of programming languages, which directly impacts the ability of machine learning methods to reason about programs.
To overcome the limitations and challenges of current program representations, the authors propose a novel graph-based program representation called PerfoGraph.
The experimental results demonstrate that PERFOGRAPH outperforms existing representations and sets new state-of-the-art results by reducing the error rate by 7.4% (AMD dataset) and 10% (NVIDIA dataset) in the well-known Device Mapping challenge.
The remarkable growth and significant success of machine learning have expanded its applications into programming languages and program analysis. However, a key challenge in adopting the latest machine learning methods is the representation of programming languages, which directly impacts the ability of machine learning methods to reason about programs.
To overcome the limitations and challenges of current program representations, the authors propose a novel graph-based program representation called PerfoGraph.
The experimental results demonstrate that PERFOGRAPH outperforms existing representations and sets new state-of-the-art results by reducing the error rate by 7.4% (AMD dataset) and 10% (NVIDIA dataset) in the well-known Device Mapping challenge.
CodeTF: One-stop Transformer Library for State-of-the-art Code LLM (Salesforce)
The authors we present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence. CodeTF is designed with a unified interface to enable rapid access and development across different types of models, datasets and tasks. The library supports a collection of pretrained Code LLM models and popular code benchmarks, including a standardized interface to train and serve code LLMs efficiently, and data features such as language-specific parsers and utility functions for extracting code attributes.
The authors we present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence. CodeTF is designed with a unified interface to enable rapid access and development across different types of models, datasets and tasks. The library supports a collection of pretrained Code LLM models and popular code benchmarks, including a standardized interface to train and serve code LLMs efficiently, and data features such as language-specific parsers and utility functions for extracting code attributes.
👍2
AI for Low-Code for AI
LowCoder is the first low-code tool for developing AI pipelines that supports both a visual programming interface (LowCoder_VP) and an AI-powered natural language interface (LowCoder_NL). The authors leverage this tool to provide some of the first insights into whether and how two modalities (visual, e.g. drag-and-drop, and natural language instructions) help programmers by conducting a user study. They task 20 developers with varying levels of AI expertise with implementing four ML pipelines using LowCoder, replacing the LowCoder_NL component with a simple keyword search in half the tasks.
LowCoder helped developers compose (85% of tasks) and iterate (72.5% of tasks) over AI pipelines. Furthermore, LowCoder_NL helped users discover previously-unknown operators in 75% of tasks, compared to just 22.5% (12.5% in the NL condition and 32.5% in the keyword condition) using web search.
[LowCoder Artifacts]
LowCoder is the first low-code tool for developing AI pipelines that supports both a visual programming interface (LowCoder_VP) and an AI-powered natural language interface (LowCoder_NL). The authors leverage this tool to provide some of the first insights into whether and how two modalities (visual, e.g. drag-and-drop, and natural language instructions) help programmers by conducting a user study. They task 20 developers with varying levels of AI expertise with implementing four ML pipelines using LowCoder, replacing the LowCoder_NL component with a simple keyword search in half the tasks.
LowCoder helped developers compose (85% of tasks) and iterate (72.5% of tasks) over AI pipelines. Furthermore, LowCoder_NL helped users discover previously-unknown operators in 75% of tasks, compared to just 22.5% (12.5% in the NL condition and 32.5% in the keyword condition) using web search.
[LowCoder Artifacts]
How Effective Are Neural Networks for Fixing Security Vulnerabilities
Security vulnerability repair is a difficult task that is in dire need of automation. Two groups of techniques have shown promise:
- large code language models (LLMs) that have been pre-trained on source code for tasks such as code completion, and
- automated program repair (APR) techniques that use deep learning (DL) models to automatically fix software bugs.
Findings:
- Existing LLMs and APR models fix very few Java vulnerabilities. Codex fixes 10.2 (20.4%), the most number of vulnerabilities.
- Fine-tuning with general APR data improves LLMs' vulnerability-fixing capabilities.
- New VJBench reveals that LLMs and APR models fail to fix many CWE types, such as CWE-325 Missing cryptographic step and CWE-444 HTTP request smuggling.
- Codex still fixes 8.3 transformed vulnerabilities, outperforming all the other LLMs and APR models on transformed vulnerabilities.
Security vulnerability repair is a difficult task that is in dire need of automation. Two groups of techniques have shown promise:
- large code language models (LLMs) that have been pre-trained on source code for tasks such as code completion, and
- automated program repair (APR) techniques that use deep learning (DL) models to automatically fix software bugs.
Findings:
- Existing LLMs and APR models fix very few Java vulnerabilities. Codex fixes 10.2 (20.4%), the most number of vulnerabilities.
- Fine-tuning with general APR data improves LLMs' vulnerability-fixing capabilities.
- New VJBench reveals that LLMs and APR models fail to fix many CWE types, such as CWE-325 Missing cryptographic step and CWE-444 HTTP request smuggling.
- Codex still fixes 8.3 transformed vulnerabilities, outperforming all the other LLMs and APR models on transformed vulnerabilities.
Data Augmentation Approaches for Source Code Models: A Survey
The paper provides a comprehensive analysis of data augmentation techniques in the context of source code.
[github repo]
The paper provides a comprehensive analysis of data augmentation techniques in the context of source code.
[github repo]
Tuning Models of Code with Compiler-Generated Reinforcement Learning Feedback
The authors propose RLCF approach, that trains a pre-trained LLM using feedback from a code compiler. RLCF views the LLM as an RL agent that generates code step by step and receives:
- compiler-derived feedback on whether the code it generates passes a set of correctness checks; and
- feedback from a different LLM on whether the generated code is similar to a set of reference programs in the training corpus.
Together, these feedback mechanisms help the generated code remain within the target distribution while passing all static correctness checks. The experiments show that RLCF significantly raises the odds that an LLM-generated program compiles, is executable, and produces the right output on tests, often allowing LLMs to match the performance of 2x-8x larger LLMs.
The authors propose RLCF approach, that trains a pre-trained LLM using feedback from a code compiler. RLCF views the LLM as an RL agent that generates code step by step and receives:
- compiler-derived feedback on whether the code it generates passes a set of correctness checks; and
- feedback from a different LLM on whether the generated code is similar to a set of reference programs in the training corpus.
Together, these feedback mechanisms help the generated code remain within the target distribution while passing all static correctness checks. The experiments show that RLCF significantly raises the odds that an LLM-generated program compiles, is executable, and produces the right output on tests, often allowing LLMs to match the performance of 2x-8x larger LLMs.
👍2
Machine-Learning Kronecker Coefficients
The Kronecker coefficients are the decomposition multiplicities of the tensor product of two irreducible representations of the symmetric group. There is no known combinatorial denoscription of the Kronecker coefficients, and it is an NP-hard problem to decide whether a given Kronecker coefficient is zero or not.
In this paper, the author shows that standard machine-learning algorithms such as NNs, CNNs and Gradient Boosting Decision Trees may be trained to predict with high accuracy whether a given Kronecker coefficient is zero or not.
The Kronecker coefficients are the decomposition multiplicities of the tensor product of two irreducible representations of the symmetric group. There is no known combinatorial denoscription of the Kronecker coefficients, and it is an NP-hard problem to decide whether a given Kronecker coefficient is zero or not.
In this paper, the author shows that standard machine-learning algorithms such as NNs, CNNs and Gradient Boosting Decision Trees may be trained to predict with high accuracy whether a given Kronecker coefficient is zero or not.
Scalable and Adaptive Log-based Anomaly Detection with Expert in the Loop
The authors present SeaLog, a scalable and adaptive log-based anomaly detection framework designed to meet the practical requirements of accuracy, lightweight design, and adaptiveness in cloud systems. SeaLog utilizes a trie-based detection agent for lightweight and adaptive anomaly detection in a streaming manner. It also incorporates expert feedback, including utilizing LLMs as an expert, to continuously enhance the system’s accuracy. Experimental results on two public datasets and an industrial dataset from CloudX showed that SeaLog is effective, achieving F1 scores between 0.908 and 0.990
The authors present SeaLog, a scalable and adaptive log-based anomaly detection framework designed to meet the practical requirements of accuracy, lightweight design, and adaptiveness in cloud systems. SeaLog utilizes a trie-based detection agent for lightweight and adaptive anomaly detection in a streaming manner. It also incorporates expert feedback, including utilizing LLMs as an expert, to continuously enhance the system’s accuracy. Experimental results on two public datasets and an industrial dataset from CloudX showed that SeaLog is effective, achieving F1 scores between 0.908 and 0.990
Analysis of ChatGPT on Source Code
The paper explores the use of LLMs and in particular ChatGPT in programming, source code analysis, and code generation. While these models can save time and provide highly accurate results, they are not yet advanced enough to replace human programmers entirely. The paper investigates the potential applications of LLMs and ChatGPT in various areas, such as
- code creation,
- code documentation,
- bug detection,
- refactoring, and
- more.
The paper explores the use of LLMs and in particular ChatGPT in programming, source code analysis, and code generation. While these models can save time and provide highly accurate results, they are not yet advanced enough to replace human programmers entirely. The paper investigates the potential applications of LLMs and ChatGPT in various areas, such as
- code creation,
- code documentation,
- bug detection,
- refactoring, and
- more.
👍2