LongNet: Scaling Transformers to 1,000,000,000 Tokens
The authors introduce LongNet, a Transformer variant that can scale sequence length to more than 1 billion tokens, without sacrificing the performance on shorter sequences. They proposed dilated attention, which expands the attentive field exponentially as the distance grows.
The authors introduce LongNet, a Transformer variant that can scale sequence length to more than 1 billion tokens, without sacrificing the performance on shorter sequences. They proposed dilated attention, which expands the attentive field exponentially as the distance grows.
Why AI Matters And How To Deal With The Coming Change w/ Emad Mostaque
Emad Mostaque (Stability AI):
- in five years there will be no more programmers
- by the end of next year, ChatGPT will be in mobile, without internet
- AI decentralization is a key element; the goal of Stability AI is to enable everyone to have a personalized AI system that reflects their own narratives and unique perspectives
Emad Mostaque (Stability AI):
- in five years there will be no more programmers
- by the end of next year, ChatGPT will be in mobile, without internet
- AI decentralization is a key element; the goal of Stability AI is to enable everyone to have a personalized AI system that reflects their own narratives and unique perspectives
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Based on the RLHF approach, the authors use the optimal solution to the KL-constrained reward maximization objective. Applying the Bradley-Terry model, the authors get a DPO problem that does not explicitly contain the reward model. This avoids reinforcement learning. At the same time, the reward model is implicitly present.
The resulting algorithm is stable, performant, and computationally lightweight, eliminating the need for fitting a reward model, sampling from the LM during fine-tuning, or performing significant hyperparameter tuning. Fine-tuning with DPO exceeds RLHF’s ability to control sentiment of generations and improves response quality in summarization and single-turn dialogue while being substantially simpler to implement and train.
Based on the RLHF approach, the authors use the optimal solution to the KL-constrained reward maximization objective. Applying the Bradley-Terry model, the authors get a DPO problem that does not explicitly contain the reward model. This avoids reinforcement learning. At the same time, the reward model is implicitly present.
The resulting algorithm is stable, performant, and computationally lightweight, eliminating the need for fitting a reward model, sampling from the LM during fine-tuning, or performing significant hyperparameter tuning. Fine-tuning with DPO exceeds RLHF’s ability to control sentiment of generations and improves response quality in summarization and single-turn dialogue while being substantially simpler to implement and train.
PdfGptIndexer
PdfGptIndexer is a tool for indexing and searching PDF text data using OpenAI's GPT-2 model and FAISS. The PdfGptIndexer operates in several stages:
1. It first processes a specified folder of PDF documents, extracting the text and splitting it into manageable chunks using a GPT-2 tokenizer from the Transformers library.
2. Each text chunk is then embedded using the OpenAI GPT-2 model through the LangChain library.
3. These embeddings are stored in a FAISS index, providing a compact and efficient storage method.
4. Finally, a query interface allows you to retrieve relevant information from the indexed data by asking questions. The application fetches and displays the most relevant text chunk.
PdfGptIndexer is a tool for indexing and searching PDF text data using OpenAI's GPT-2 model and FAISS. The PdfGptIndexer operates in several stages:
1. It first processes a specified folder of PDF documents, extracting the text and splitting it into manageable chunks using a GPT-2 tokenizer from the Transformers library.
2. Each text chunk is then embedded using the OpenAI GPT-2 model through the LangChain library.
3. These embeddings are stored in a FAISS index, providing a compact and efficient storage method.
4. Finally, a query interface allows you to retrieve relevant information from the indexed data by asking questions. The application fetches and displays the most relevant text chunk.
Automatic Static Bug Detection for Machine Learning Libraries: Are We There Yet?
The authors address a question of practical effectiveness and usefulness of static bug detectors for machine learning libraries. They analyze five popular and widely used static bug detectors, namely Flawfinder, RATS, Cppcheck, Facebook Infer, and Clang static analyzer on a curated dataset of software bugs gathered from four popular machine learning libraries including Mlpack, MXNet, PyTorch, and TensorFlow with a total of 410 known bugs. The study shows that static bug detectors find a negligible amount of all bugs accounting for 6/410 bugs (0.01%). Also the study reveals several findings that can serve as applicable guidelines for improving static bug detection for ML libraries.
The authors address a question of practical effectiveness and usefulness of static bug detectors for machine learning libraries. They analyze five popular and widely used static bug detectors, namely Flawfinder, RATS, Cppcheck, Facebook Infer, and Clang static analyzer on a curated dataset of software bugs gathered from four popular machine learning libraries including Mlpack, MXNet, PyTorch, and TensorFlow with a total of 410 known bugs. The study shows that static bug detectors find a negligible amount of all bugs accounting for 6/410 bugs (0.01%). Also the study reveals several findings that can serve as applicable guidelines for improving static bug detection for ML libraries.
Using Commandline To Process CSV files
- to print the first column of a CSV file:awk -F, '{print $1}' file.csv
- to print the first and third columns of a CSV file: awk -F, '{print $1 "," $3}' file.csv
- to print only the lines of a CSV file that contain a specific string: grep "string" file.csv
- to sort a CSV file based on the values in the second column: sort -t, -k2 file.csv
- to remove the first row of a CSV file (the header row): tail -n +2 file.csv
- to remove duplicates from a CSV file based on the values in the first column: awk -F, '!seen[$1]++' file.csv
- to calculate the sum of the values in the third column of a CSV file: awk -F, '{sum+=$3} END {print sum}' file.csv
- to convert a CSV file to a JSON array: jq -R -r 'split(",") | {name:.[0],age:.[1]}' file.csv
- to convert a CSV file to a SQL INSERT statement: awk -F, '{printf "INSERT INTO table VALUES (\"%s\", \"%s\", \"%s\");\n", $1, $2, $3}' file.csv
- to print the first column of a CSV file:
❤2
Optimising the Software Development Process with Artificial Intelligence
Contents
- 1 Introduction
Part I Planning and Analysis
- 2 Artificial Intelligence in Software Project Management
- 3 Requirements Engineering
- 4 Leveraging Artificial Intelligence for Model-based Software Analysis and Design
Part II Development and Deployment
- 5 Statistical Models and Machine Learning to Advance Code Completion: Are We There Yet?
- 6 Cloud Development and Deployment
Part III Testing and Maintenance
- 7 Automated Support for Unit Test Generation
- 8 Artificial Intelligence Techniques in System Testing
- 9 Intelligent Software Maintenance
Part IV AI Techniques from Scratch
- 10 Metaheuristics in a Nutshell
- 11 Foundations of Machine Learning for Software Engineering
Contents
- 1 Introduction
Part I Planning and Analysis
- 2 Artificial Intelligence in Software Project Management
- 3 Requirements Engineering
- 4 Leveraging Artificial Intelligence for Model-based Software Analysis and Design
Part II Development and Deployment
- 5 Statistical Models and Machine Learning to Advance Code Completion: Are We There Yet?
- 6 Cloud Development and Deployment
Part III Testing and Maintenance
- 7 Automated Support for Unit Test Generation
- 8 Artificial Intelligence Techniques in System Testing
- 9 Intelligent Software Maintenance
Part IV AI Techniques from Scratch
- 10 Metaheuristics in a Nutshell
- 11 Foundations of Machine Learning for Software Engineering
🔥2
Self-consistency for open-ended generations
Although individual generations sampled from the large-scale pre-trained language models often yield high-quality results, multiple samplings can produce certain generations of substantially higher quality than the average output of the model.
Recently for the special case of problems that have fixed answer, a simple approach, called self-consistency was suggested for selecting the best answer from multiple generations (Wang et al. 2022). In that paper, the authors sample multiple generations from the LLM, extract the predicted answer from each generation and select the answer with the most number of votes. However, it is important to note that the self-consistency approach is not applicable to prompts that are open-ended and do not have fixed answers.
In this paper, the authors introduce a generalized framework for self-consistency that extends its applicability beyond problems that have fixed-answer answers.
Although individual generations sampled from the large-scale pre-trained language models often yield high-quality results, multiple samplings can produce certain generations of substantially higher quality than the average output of the model.
Recently for the special case of problems that have fixed answer, a simple approach, called self-consistency was suggested for selecting the best answer from multiple generations (Wang et al. 2022). In that paper, the authors sample multiple generations from the LLM, extract the predicted answer from each generation and select the answer with the most number of votes. However, it is important to note that the self-consistency approach is not applicable to prompts that are open-ended and do not have fixed answers.
In this paper, the authors introduce a generalized framework for self-consistency that extends its applicability beyond problems that have fixed-answer answers.
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
In the paper, the authors
- survey open problems and fundamental limitations of RLHF and related methods;
- overview techniques to understand, improve, and complement RLHF in practice; and
- propose auditing and disclosure standards to improve societal oversight of RLHF systems.
In the paper, the authors
- survey open problems and fundamental limitations of RLHF and related methods;
- overview techniques to understand, improve, and complement RLHF in practice; and
- propose auditing and disclosure standards to improve societal oversight of RLHF systems.
Patterns for Building LLM-based Systems & Products
The post is about practical patterns for integrating LLMs into systems and products:
- Evals: To measure performance
- RAG: To add recent, external knowledge
- Fine-tuning: To get better at specific tasks
- Caching: To reduce latency & cost
- Guardrails: To ensure output quality
- Defensive UX: To anticipate & manage errors gracefully
- Collect user feedback: To build our data flywheel
The post is about practical patterns for integrating LLMs into systems and products:
- Evals: To measure performance
- RAG: To add recent, external knowledge
- Fine-tuning: To get better at specific tasks
- Caching: To reduce latency & cost
- Guardrails: To ensure output quality
- Defensive UX: To anticipate & manage errors gracefully
- Collect user feedback: To build our data flywheel
CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code
In this work, the authors conduct a study of subtokenization options for large LM pretraining on source code. They show that for large LMs pretrained on source code:
- Grouping punctuation chars in single tokens reduces the average length by 17% without downstream performance drop, and permitting more complex composite tokens reduces lengths by 40%, sometimes with quality drop;
- UnigramLM is generally preferable over BPE;
- Smaller vocabularies may improve quality with 3—19% length increase;
- Subtokenizers are well transferable between programming languages.
In this work, the authors conduct a study of subtokenization options for large LM pretraining on source code. They show that for large LMs pretrained on source code:
- Grouping punctuation chars in single tokens reduces the average length by 17% without downstream performance drop, and permitting more complex composite tokens reduces lengths by 40%, sometimes with quality drop;
- UnigramLM is generally preferable over BPE;
- Smaller vocabularies may improve quality with 3—19% length increase;
- Subtokenizers are well transferable between programming languages.
Towards Understanding the Capability of Large Language Models on Code Clone Detection: A Survey
The study presented a comprehensive empirical evaluation of LLMs for automated code clone detection across diverse clone types, languages, and prompt formulations. The key findings demonstrate that advanced LLMs like GPT-3.5-Turbo and GPT-4 can achieve remarkably high recall and accuracy in detecting even complex semantic clones, outperforming existing techniques. Introducing intermediate reasoning steps through chain-of-thought prompting leads to noticeable gains by equipping models with a structured thought process.
- Can LLMs detect code clones with a simple prompt?
- How do LLMs perform by using one-step chain-of-thought prompts?
- Can LLMs perform better by using multi-step chain-of-thought prompts?
- How do LLMs perform using code embedding?
- How does the performance of LLMs in code clone detection vary across different programming languages?
The study presented a comprehensive empirical evaluation of LLMs for automated code clone detection across diverse clone types, languages, and prompt formulations. The key findings demonstrate that advanced LLMs like GPT-3.5-Turbo and GPT-4 can achieve remarkably high recall and accuracy in detecting even complex semantic clones, outperforming existing techniques. Introducing intermediate reasoning steps through chain-of-thought prompting leads to noticeable gains by equipping models with a structured thought process.
- Can LLMs detect code clones with a simple prompt?
- How do LLMs perform by using one-step chain-of-thought prompts?
- Can LLMs perform better by using multi-step chain-of-thought prompts?
- How do LLMs perform using code embedding?
- How does the performance of LLMs in code clone detection vary across different programming languages?
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback
In this paper, the authors introduce a novel framework, namely RRTF (Rank Responses to align Test&Teacher Feedback), and present a new Code LLM, namely PanGu-Coder2. Firstly, they adopt the Evol-Instruct technique to obtain a substantial amount of high-quality natural language instruction and code solution data pairs. Then, they train the base model by ranking candidate code solutions using feedback from test cases and heurstic preferences.
Through comprehensive evaluations on HumanEval, CodeEval, and LeetCode benchmarks, PanGu-Coder2 achieves new state-of-the-art performance among billion-parameter-level Code LLMs, surpassing all of the existing ones by a large margin.
In this paper, the authors introduce a novel framework, namely RRTF (Rank Responses to align Test&Teacher Feedback), and present a new Code LLM, namely PanGu-Coder2. Firstly, they adopt the Evol-Instruct technique to obtain a substantial amount of high-quality natural language instruction and code solution data pairs. Then, they train the base model by ranking candidate code solutions using feedback from test cases and heurstic preferences.
Through comprehensive evaluations on HumanEval, CodeEval, and LeetCode benchmarks, PanGu-Coder2 achieves new state-of-the-art performance among billion-parameter-level Code LLMs, surpassing all of the existing ones by a large margin.
notebook_whisperer
A coding assistant to help with the construction of Jupyter notebooks. With the Notebook Whisperer, you can enter a short sentence saying what you would like to do. It then populates the next cell in your Jupyter notebook with the code for performing that task. This is accomplished by sending the contents of your notebook to chatGPT and having it provide the code that it thinks will fulfill your request.
A coding assistant to help with the construction of Jupyter notebooks. With the Notebook Whisperer, you can enter a short sentence saying what you would like to do. It then populates the next cell in your Jupyter notebook with the code for performing that task. This is accomplished by sending the contents of your notebook to chatGPT and having it provide the code that it thinks will fulfill your request.
👍1
Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs
The quality of code produced by a code LLM varies significantly by programming languages. The paper presents an effective approach for boosting the performance of code LLMs on low-resource languages using semi-synthetic data.
Key ingredients:
1. The large volume of training data for high-resource programming languages includes a lot of well-documented code
2. Code LLMs are effective unit test generators, and we can check that generated tests pass
3. We can mechanically translate many unit tests to a low-resource language with a simple compiler
4. Code LLMs can translate code from one language to another, and we can test these translations with the aforementioned tests, and engineer a prompt to increase the likelihood of a successful translation
The MultiPL-T datasets, and links to the fine-tuned models are available at huggingface.co/datasets/nuprl/MultiPL-T
The quality of code produced by a code LLM varies significantly by programming languages. The paper presents an effective approach for boosting the performance of code LLMs on low-resource languages using semi-synthetic data.
Key ingredients:
1. The large volume of training data for high-resource programming languages includes a lot of well-documented code
2. Code LLMs are effective unit test generators, and we can check that generated tests pass
3. We can mechanically translate many unit tests to a low-resource language with a simple compiler
4. Code LLMs can translate code from one language to another, and we can test these translations with the aforementioned tests, and engineer a prompt to increase the likelihood of a successful translation
The MultiPL-T datasets, and links to the fine-tuned models are available at huggingface.co/datasets/nuprl/MultiPL-T
👍1
A Survey of Time Series Anomaly Detection Methods in the AIOps Domain
Internet-based services have seen remarkable success, generating vast amounts of monitored key performance indicators as univariate or multivariate time series. Monitoring and analyzing these time series are crucial for researchers, service operators, and on-call engineers to detect outliers or anomalies indicating service failures or significant events. Numerous advanced anomaly detection methods have emerged to address availability and performance issues.
The review offers a comprehensive overview of time series anomaly detection in Artificial Intelligence for IT operations (AIOps), which uses AI capabilities to automate and optimize operational workflows. Additionally, it explores future directions for real-world and next-generation time-series anomaly detection based on recent advancements.
Internet-based services have seen remarkable success, generating vast amounts of monitored key performance indicators as univariate or multivariate time series. Monitoring and analyzing these time series are crucial for researchers, service operators, and on-call engineers to detect outliers or anomalies indicating service failures or significant events. Numerous advanced anomaly detection methods have emerged to address availability and performance issues.
The review offers a comprehensive overview of time series anomaly detection in Artificial Intelligence for IT operations (AIOps), which uses AI capabilities to automate and optimize operational workflows. Additionally, it explores future directions for real-world and next-generation time-series anomaly detection based on recent advancements.
👍1
OWASP Top 10 for LLM
The OWASP Top 10 for Large Language Model Applications project aims to educate developers, designers, architects, managers, and organizations about the potential security risks when deploying and managing Large Language Models (LLMs). The project provides a list of the top 10 most critical vulnerabilities often seen in LLM applications, highlighting their potential impact, ease of exploitation, and prevalence in real-world applications. Examples of vulnerabilities include prompt injections, data leakage, inadequate sandboxing, and unauthorized code execution, among others. The goal is to raise awareness of these vulnerabilities, suggest remediation strategies, and ultimately improve the security posture of LLM applications.
1 Prompt Injection
2 Insecure Output Handling
3 Training Data Poisoning
4 Model Denial of Service
5 Supply Chain Vulnerabilities
6 Sensitive Information Disclosure
7 Insecure Plugin Design
8 Excessive Agency
9 Overreliance
10 Model Theft
PDF
The OWASP Top 10 for Large Language Model Applications project aims to educate developers, designers, architects, managers, and organizations about the potential security risks when deploying and managing Large Language Models (LLMs). The project provides a list of the top 10 most critical vulnerabilities often seen in LLM applications, highlighting their potential impact, ease of exploitation, and prevalence in real-world applications. Examples of vulnerabilities include prompt injections, data leakage, inadequate sandboxing, and unauthorized code execution, among others. The goal is to raise awareness of these vulnerabilities, suggest remediation strategies, and ultimately improve the security posture of LLM applications.
1 Prompt Injection
2 Insecure Output Handling
3 Training Data Poisoning
4 Model Denial of Service
5 Supply Chain Vulnerabilities
6 Sensitive Information Disclosure
7 Insecure Plugin Design
8 Excessive Agency
9 Overreliance
10 Model Theft