ml4se – Telegram
ml4se
505 subscribers
446 photos
1 file
524 links
Machine Learning for Software Engineering
Download Telegram
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

LongLoRA, an efficient fine-tuning approach that extends the context sizes of pre-trained LLMs, with limited computation cost. It adopts LLaMA2 7B from 4k context to 100k, or LLaMA2 70B to 32k on a single 8xA100 machine.

Results:
1. The proposed shifted short attention is easy to implement, compatible with Flash-Attention, and not required during inference.
2. Released models: from 7B to 70B, context length from 8k to 100k, including LLaMA2-LongLoRA-7B-100k, LLaMA2-LongLoRA-13B-64k, and LLaMA2-LongLoRA-70B-32k.
3. A long-context QA dataset, LongQA, for supervised fine-tuning (SFT).

Repository: https://github.com/dvlab-research/LongLoRA
👍1
78% MNIST accuracy using GZIP in under 10 lines of code

c = lambda z: len(gzip.compress(z.tobytes()))

def ncd(x, y):
return (c(x + y) - min(c(x), c(y))) / max(c(x), c(y))

cls = [(x, c(x), l) for x, l in training_set]

correct_predictions = sum([np.array_equal(Counter(
[l for _, _, l in sorted([(ncd(x1, x), x, l) for x, _, l in cls],
key=lambda t: t[0])[:5]]).most_common(1)[0][0], label)
for x1, label in test_set])
🤯3
AutoGen: Enabling next-generation large language model applications

AutoGen is a framework for simplifying the orchestration, optimization, and automation of LLM workflows. It offers customizable and conversable agents that leverage the strongest capabilities of the most advanced LLMs, like GPT-4, while addressing their limitations by integrating with humans and tools and having conversations between multiple agents via automated chat.

Paper: https://arxiv.org/abs/2308.08155
🔥1
Patterns for Building LLM-based Systems & Products:

- Evals: To measure performance
- RAG: To add recent, external knowledge
- Fine-tuning: To get better at specific tasks
- Caching: To reduce latency & cost
- Guardrails: To ensure output quality
- Defensive UX: To anticipate & manage errors gracefully
- Collect user feedback: To build our data flywheel
👍4
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

PB is a general-purpose self-referential self-improvement mechanism for LLMs. Given a seed set of mutation-prompts, thinking-styles, and a domain-specific problem denoscription, PB generates variations of the task-prompts and mutation-prompts, exploiting the fact that LLMs can be prompted to act as mutation operators. Based on the fitness of the evolved task-prompts, a subset of evolutionary units consisting of task-prompts and their associated mutation-prompt is selected for future generations. Over multiple generations of PB, prompts have adapted to the domain at hand, e.g., in a mathematical domain, PB evolved the task-prompt "Show all your working. II. You should use the correct mathematical notation and vocabulary, where appropriate. III. You should write your answer in full sentences and in words. IV. You should use examples to illustrate your points and prove your answers. V. Your workings out should be neat and legible" on GSM8K.
🤯3👍1
L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models

L2CEval is a comprehensive evaluation of LLMs for natural language to code generation, along a variety of axes such as model scale, training data, sensitivity to few shot exemplars as well as the impact of instruction tuning, etc.

L2CEval includes a wide range of sota models, specifically 54 models from 13 different organizations, all evaluated on 3 core domains of language-to-code generation tasks. L2CEval includes extensive evaluations of models as small as 1B parameters, to significantly larger ones such as davinci and GPT-4 models from OpenAI, with estimated size of 170B+ parameters.

The study can be useful for the community in applying LLMs for downstream code applications.

https://l2c-eval.github.io/
👍1
Think before you speak: Training Language Models With Pause Tokens

Language models generate responses by producing a series of tokens in immediate succession: the (K + 1)-th token is an outcome of manipulating K hidden vectors per layer, one vector per preceding token.

What happens if we delay a model’s answer generation, and how can we execute these delays? What if we were to let the model manipulate say, K + 10 hidden vectors, before it outputs the (K + 1)-th token?

The authors operationalize this idea by performing training and inference on language models with a pause token, a sequence of which is appended to the input prefix. It allows the model to process extra computation before committing to an answer.

The main finding is that such delays show gains on downstream tasks covering reasoning, question-answering, general understanding, when the model is both pre-trained and finetuned with delays.
👍2
ICAART 2024: International Conference on Agents and Artificial Intelligence

Conference Areas
1 . Agents
2 . Artificial Intelligence

Deadlines:
1. Regular Paper Submission Extension: October 26, 2023
2. Position Paper Submission: November 15, 2023
3. Doctoral Consortium Paper Submission: December 21, 2023
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

SWE-bench is an evaluation framework including 2,294 software engineering problems drawn from real GitHub issues and corresponding pull requests across 12 popular Python repositories. Given a codebase along with a denoscription of an issue to be resolved, a language model is tasked with editing the codebase to address the issue.

The evaluations show that both state-of-the-art proprietary models and our fine-tuned model SWE-Llama can resolve only the simplest issues. Claude 2 and GPT-4 solve a mere 4.8% and 1.7% of instances respectively, even when provided with an oracle retrieve.

Leaderboard: http://www.swebench.com/
GitHub: https://github.com/princeton-nlp/SWE-bench
👍2
Mistral 7B Paper

A paper about the Mistral 7 model appeared on the arxiv.org website

Mistral 7B v0.1 is a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation.
👍5
AutoAgents: A Framework for Automatic Agent Generation

In the paper, the authors propose a framework for agents orchestration. This multi-agent approach makes it possible to solve problems that are difficult for a single model to cope with. The difference between this approach and the previous ones is that at the same time:
- an unlimited number of dynamically generated agents ,
- multi-agent conversation,
- self-refinement agents,
- collaborative refinement actions.

github: https://github.com/Link-AGI/AutoAgents
huggingface: https://huggingface.co/spaces/LinkSoul/AutoAgents
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model

CodeFuse-13B is an open-sourced pre-trained code LLM. It is specifically designed for code-related tasks with both English and Chinese prompts and supports over 40 programming languages.
👍3
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion

CrossCodeEval is a diverse and multilingual code completion benchmark that necessitates an in-depth cross-file contextual understanding to complete the code accurately. CrossCodeEval is built on a diverse set of real-world, open-sourced, permissively-licensed repositories in four popular programming languages: Python, Java, TypeScript, and C#.

github: https://github.com/amazon-science/cceval
👍1
Is code text, or is text code? (1)

The same approaches are often used to work with code and text. Namely, the code is processed as a sequence of tokens. This is not the only approach of working with code; for example, you can represent the code in the form of AST, DFG, etc. There are approaches that combine different representations, e.g., CodeBERT or GraphCodeBERT. Despite this, the basic approach now is to treat code as if it were text. This allows you to use unified data processing methods and build models that can simultaneously work with natural language and programming languages.

Okay, let's assume the code is text. Can we say that text is code? Are there fundamental differences between natural language and programming language? At first glance, these languages are designed for different tasks: one for expression, the other for execution.
Is code text, or is text code? (2)

One of the most important properties of code is the concept of functionality. The code runs on the computer and there is some result—a changed state of the computer, for example, output to stdout or file on disk. Speaking about functionality, it is necessary to mention the syntactic and semantic properties of the code. Syntactic properties are what the code looks like, and semantic properties are what it does. In particular, two functions can look different, but implement the same function (here the problem of clone detection arises—you need to understand whether two code fragments are clones of each other). At first glance, the concept of functionality is what distinguishes code from text, but if we consider that text is a program that runs inside a person and changes his state, then the difference between code and text becomes smaller.

So, code is a program or set of instructions that is executed on a computer. A natural language text is a program that is executed by a person: a person reads the text and changes his state.