NEW BOT Телеграм, страница

LLM4Code'24

Accepted papers: https://llm4code.github.io/papers/

326 views15:35

StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation

The very first entirely self-aligned code LLM trained with a fully permissive and transparent pipeline.
- weights: https://huggingface.co/bigcode/starcoder2-15b-instruct-v0.1

316 views15:52

ml4se

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Prometheus 2 (8x7B) is an open-source evaluator language model. Compared to Prometheus 1 (13B), Prometheus 2 (8x7B) shows improved evaluation performances & supports assessing in pairwise ranking (relative grading) formats as well. It also scores a 72% to 85% agreement with human judgments across multiple pairwise ranking benchmarks.

Prometheus 2 (7B) is a lighter version of Prometheus 2 (8x7B) model with reasonable performances (outperforming Llama-2-70B & on par with Mixtral-8x7B). It achieves at least 80% of the evaluation statistics or performances of Prometheus 2 (8x7B).

GitHub: https://github.com/prometheus-eval/prometheus-eval

397 views16:03

ml4se

ICLR 2024

6—11 May, Schedule

Workshops (some):
- Representational Alignment , papers
- Privacy Regulation and Protection in Machine Learning , papers
- LLM Agents
- How Far Are We From AGI
- Secure and Trustworthy Large Language Models
- Bridging the Gap Between Practice and Theory in Deep Learning

Papers
ICLR Proceedings at OpenReview

👏2

372 views17:10

ml4se

Better & Faster Large Language Models via Multi-token Prediction

LLMs are trained with a next-token prediction loss. The authors propose multi-token prediction as an improvement over next-token prediction in training language models for generative or reasoning tasks. The experiments (up to 7B parameters and 1T tokens) show that this is increasingly useful for larger models and in particular show strong improvements for code tasks.

👍2

297 views15:32

ml4se

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Existing RLHF methods can be roughly categorized as either reward-based or reward-free. Novel applications such as ChatGPT and Claude leverage reward-based methods that first learn a reward model and apply actor-critic algorithms, such as PPO.

However, in academic benchmarks, the SotA results are often achieved via reward-free methods, such as DPO.

Is DPO truly superior to PPO?

Through theoretical and experimental analysis, the authors explore the limitations of DPO and find that DPO is sensitive to the distribution shift between the base model outputs and preference data. Also DPO fails to improve the performance on challenging tasks such as code generation. PPO demonstrates robust effectiveness across diverse tasks and achieves state-of-the-art results in challenging code competition tasks.

326 views15:40

ml4se

Open sourcing IBM’s Granite code models

IBM is releasing a family of Granite code models to the open-source community.

- paper
- github: https://github.com/ibm-granite
- models: https://huggingface.co/ibm-granite

296 views15:50

ml4se

Large Language Models Cannot Self-Correct Reasoning Yet

The research indicates that LLMs struggle to self-correct their responses without external feedback, and at times, their performance even degrades after self-correction.

324 views15:59

ml4se

AgentBench: Evaluating LLMs as Agents

AgentBench is a multi-dimensional benchmark that consists of 8 distinct environments to assess LLM-as-Agent’s reasoning and decision-making abilities.

github: https://github.com/THUDM/AgentBench

355 views16:13

ml4se

AutoDev: Automated AI-Driven Development

One more agent-based framework for software engineering tasks (Microsoft). AutoDev enables AI Agents to autonomously interact with repositories, perform actions, and tackle complex software engineering tasks.

RQs:
- How effective is AutoDev in a code generation task?
- How effective is AutoDev in test generation task?
- How efficient is AutoDev in completing tasks?

The evaluation on the HumanEval dataset for code and test generation showcased high results, achieving a Pass@1 score of 91.5 for code generation—a second-best result on the leaderboard at the time of writing, and the best among approaches requiring no extra training data. AutoDev also good in test generation with a Pass@1 score of 87.8%, achieving a 99.3% coverage from passing tests.

314 views14:05

ml4se

From Human-to-Human to Human-to-Bot Conversations in Software Engineering

Similarities and differences between human-to-human and human-to-bot conversations. Conversations between a software developer and
1. a fellow software developer
2. an NLU-based chatbot
3. an LLM-based chatbot

325 views14:32

ml4se

Codestral

- 22B parameters
- 32K context window
- non-production license

HuggingFace: https://huggingface.co/mistralai/Codestral-22B-v0.1

360 views04:10

ml4se

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

The paper bridges the conceptual gap between SSMs and attention variants. It yields insights on how recent SSMs (e.g. Mamba) perform as well as Transformers on language modeling. Also, it provides new ideas to improve SSMs (and potentially Transformers) by connecting the algorithmic and systems advances on both sides.

❤1

403 views14:58

ml4se

Multi-turn Reinforcement Learning from Preference Human Feedback

RLHF has become the standard approach for aligning LLMs with human preferences, allowing LLMs to demonstrate remarkable abilities in various tasks. Existing methods work by emulating the preferences at the single decision (turn) level, limiting their capabilities in settings that require planning or multi-turn interactions to achieve a long-term goal.

Authors propose novel methods for reinforcement learning from preference feedback between two full multi-turn conversations.

Algorithms for the multi-turn setting:
- Preference-based Q-function
- Multi-turn Preference Optimization algorithm
- MTPO with mixture policy
- Multi-turn RLHF

300 views16:42

About

Blog

Apps

Platform