NEW BOT Телеграм, страница

ml4se

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

DeepSeek-Coder-V2 is an open-source MoE code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. The dataset of DeepSeek-Coder-V2 is created with a composition of 60% source code, 10% math corpus, and 30% natural language corpus. DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6T tokens. The model expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K.

github: https://github.com/deepseek-ai/DeepSeek-Coder-V2
paper: https://arxiv.org/abs/2406.11931
16B Instruct: https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
238B Instruct: https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct

🎉3

559 views10:53

ml4se

Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

Can we systematically address jailbreak attacks?

It is difficult to prepare against all possible jailbreak queries (which current approaches like SFT attempt to do)—these queries usually elicit related harmful responses that rely on the same underlying knowledge (e.g., detailed steps to make a bomb).

Consequently, by directly unlearning harmful knowledge in the LLM, it prevents the model from generating harmful responses, even when confronted with unseen jailbreak prompts.

The authors realize unlearning method named Safe Unlearning, which implements three complementary objectives:
- minimizing the probability of generating harmful responses,
- maximizing the probability of rejecting harmful queries, and
- maintaining general performance on harmless queries

github: https://github.com/thu-coai/SafeUnlearning

426 views12:40

ml4se

Is Functional Correctness Enough to Evaluate Code Language Models?Exploring Diversity of Generated Codes

In complex code generation tasks, utilizing diversity encoded in LMs help generate correct outputs.

RQs:
- Can recent code LMs generate sufficiently diverse solutions to specific problems?
- Is there a correlation between the diversity and correctness of the generated codes?
- Do advanced code generation strategies enhance both code diversity and correctness?

The authors observe that existing code LMs tend to generate functionally correct codes with limited diversity.

237 views13:22

ml4se

Understanding Defects in Generated Codes by Language Models

LLMs sometimes generate code with defects.

RQs:
- What are the types of defects in the generated code, and how can they be classified based on their characteristics?
- Can existing prompt engineering techniques help in fixing the problematic code?

❤1

244 views13:51

ml4se

Diffusion is spectral autoregression

Autoregression and diffusion are currently the two dominant generative modelling paradigms. And they aren’t all that different: diffusion models of images perform approximate autoregression in the frequency domain.

Colab: https://colab.research.google.com/drive/1siywvhvl1OxI1UmqRrJHiFUK0M5SHlcx

👍1

367 views16:30

ml4se

VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters

The authors explore a novel approach to building a time series forecasting foundation model using natural images (based on the intrinsic similarities between images and time series). The proposed VisionTS, without any training on time series data, outperforms the largest foundation model MOIRAI_Large in the zero-shot setting.

🔥2

261 views16:44

ml4se

LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models

Long context generalization depends on token distances set by position indices, which are then combined with token representations. LongRecipe is primarily focused on optimizing the learning process by efficiently handling both position indices and token representations.
The approach extends effective context window of open-source LLMs from 8k to 128k, achieving performance close to GPT-4 with just one day of dedicated training using a single GPU with 80G memory

code: https://github.com/zhiyuanhubj/LongRecipe

277 views05:29

ml4se

My Python code is a neural network

Many programs we write can be embedded in an RNN, and a trained RNN can perform better than if we wrote the algorithm by hand. The author demonstrates this idea with a program that determines whether a message sent during code review clearly refers to the program code.

253 views05:49

ml4se

Learning to Ask: When LLMs Meet Unclear Instruction

The study delves into the issue of unclear user instructions and their impact on the effective use of tools by modern LLMs. Recognizing the limitations of LLMs in dealing with ambiguous instructions, the authors conducted an investigation into the common error patterns present in real-world user instructions. Based on the analysis, they introduced the Noisy ToolBench dataset, a novel tool-using benchmark aimed to evaluate the LLM’s tool-using performance under unclear user instructions. Furthermore, they developed the Ask-when-Needed method (AwN), an approach that empowers LLMs to actively seek user input whenever they face uncertainty in instructions.

🔥1

276 views10:03

ml4se

Automatic Detection of LLM-generated Code: A Case Study of Claude 3 Haiku

The results indicate that Claude 3 tends to generate longer functions, but shorter classes than humans, and this characteristic can be used to detect Claude 3-generated code with ML models with 82% and 66% accuracies for function-level and class-level snippets, respectively.

331 views10:08

ml4se

Fixing Code Generation Errors for Large Language Models

The authors conducted ten rounds of tests on 14 LLMs using the HumanEval dataset. Through manual analysis of the test results, they found that these LLMs achieved an average of 84.07% of their reported performance.

They also investigated the relationship between Pass@1 results, model inference time, and model parameter size. The analysis revealed a positive correlation between Pass@1 results and model parameter size, while no significant correlation was observed between inference time and parameter size.

Subsequently, the authors performed an in-depth analysis of errors in the test results, extracting and categorizing 12,837 errors into 14 types. Through the analysis, they identified 19 specific causes leading to these errors.

The proposed a fixing method can fix three types of errors, improving the performance of 14 LLMs on HumanEval and MBPP datasets with average increases of 9.5% and 5.4%, respectively.

484 views10:49

About

Blog

Apps

Platform