LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models
Long context generalization depends on token distances set by position indices, which are then combined with token representations. LongRecipe is primarily focused on optimizing the learning process by efficiently handling both position indices and token representations.
The approach extends effective context window of open-source LLMs from 8k to 128k, achieving performance close to GPT-4 with just one day of dedicated training using a single GPU with 80G memory
code: https://github.com/zhiyuanhubj/LongRecipe
Long context generalization depends on token distances set by position indices, which are then combined with token representations. LongRecipe is primarily focused on optimizing the learning process by efficiently handling both position indices and token representations.
The approach extends effective context window of open-source LLMs from 8k to 128k, achieving performance close to GPT-4 with just one day of dedicated training using a single GPU with 80G memory
code: https://github.com/zhiyuanhubj/LongRecipe
My Python code is a neural network
Many programs we write can be embedded in an RNN, and a trained RNN can perform better than if we wrote the algorithm by hand. The author demonstrates this idea with a program that determines whether a message sent during code review clearly refers to the program code.
Many programs we write can be embedded in an RNN, and a trained RNN can perform better than if we wrote the algorithm by hand. The author demonstrates this idea with a program that determines whether a message sent during code review clearly refers to the program code.
Learning to Ask: When LLMs Meet Unclear Instruction
The study delves into the issue of unclear user instructions and their impact on the effective use of tools by modern LLMs. Recognizing the limitations of LLMs in dealing with ambiguous instructions, the authors conducted an investigation into the common error patterns present in real-world user instructions. Based on the analysis, they introduced the Noisy ToolBench dataset, a novel tool-using benchmark aimed to evaluate the LLM’s tool-using performance under unclear user instructions. Furthermore, they developed the Ask-when-Needed method (AwN), an approach that empowers LLMs to actively seek user input whenever they face uncertainty in instructions.
The study delves into the issue of unclear user instructions and their impact on the effective use of tools by modern LLMs. Recognizing the limitations of LLMs in dealing with ambiguous instructions, the authors conducted an investigation into the common error patterns present in real-world user instructions. Based on the analysis, they introduced the Noisy ToolBench dataset, a novel tool-using benchmark aimed to evaluate the LLM’s tool-using performance under unclear user instructions. Furthermore, they developed the Ask-when-Needed method (AwN), an approach that empowers LLMs to actively seek user input whenever they face uncertainty in instructions.
🔥1
Automatic Detection of LLM-generated Code: A Case Study of Claude 3 Haiku
The results indicate that Claude 3 tends to generate longer functions, but shorter classes than humans, and this characteristic can be used to detect Claude 3-generated code with ML models with 82% and 66% accuracies for function-level and class-level snippets, respectively.
The results indicate that Claude 3 tends to generate longer functions, but shorter classes than humans, and this characteristic can be used to detect Claude 3-generated code with ML models with 82% and 66% accuracies for function-level and class-level snippets, respectively.
Fixing Code Generation Errors for Large Language Models
The authors conducted ten rounds of tests on 14 LLMs using the HumanEval dataset. Through manual analysis of the test results, they found that these LLMs achieved an average of 84.07% of their reported performance.
They also investigated the relationship between Pass@1 results, model inference time, and model parameter size. The analysis revealed a positive correlation between Pass@1 results and model parameter size, while no significant correlation was observed between inference time and parameter size.
Subsequently, the authors performed an in-depth analysis of errors in the test results, extracting and categorizing 12,837 errors into 14 types. Through the analysis, they identified 19 specific causes leading to these errors.
The proposed a fixing method can fix three types of errors, improving the performance of 14 LLMs on HumanEval and MBPP datasets with average increases of 9.5% and 5.4%, respectively.
The authors conducted ten rounds of tests on 14 LLMs using the HumanEval dataset. Through manual analysis of the test results, they found that these LLMs achieved an average of 84.07% of their reported performance.
They also investigated the relationship between Pass@1 results, model inference time, and model parameter size. The analysis revealed a positive correlation between Pass@1 results and model parameter size, while no significant correlation was observed between inference time and parameter size.
Subsequently, the authors performed an in-depth analysis of errors in the test results, extracting and categorizing 12,837 errors into 14 types. Through the analysis, they identified 19 specific causes leading to these errors.
The proposed a fixing method can fix three types of errors, improving the performance of 14 LLMs on HumanEval and MBPP datasets with average increases of 9.5% and 5.4%, respectively.
Chat template viewer
Different LLMs expect very different input formats. HuggingFace added chat templates, they are part of the tokenizer. Chat templates pecify how to convert conversations, represented as lists of messages, into a single string in the format that the model expects. To learn more about chat_template in the different models, visit this.
Different LLMs expect very different input formats. HuggingFace added chat templates, they are part of the tokenizer. Chat templates pecify how to convert conversations, represented as lists of messages, into a single string in the format that the model expects. To learn more about chat_template in the different models, visit this.
The 2024 Nobel Prize in Physics has been awarded to John J. Hopfield and Geoffrey E. Hinton
“for foundational discoveries and inventions that enable machine learning with artificial neural networks.”
“for foundational discoveries and inventions that enable machine learning with artificial neural networks.”
🎉6❤1🤔1😱1
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
The authors analyzed the errors of LLMs by examining their internal representations. They discover that information related to truthfulness is localized within the exact answer tokens. From a practical perspective, this finding enhances error detection methods applicable to production-level LLMs.
The code is coming soon
The authors analyzed the errors of LLMs by examining their internal representations. They discover that information related to truthfulness is localized within the exact answer tokens. From a practical perspective, this finding enhances error detection methods applicable to production-level LLMs.
The code is coming soon
ml4se
The 2024 Nobel Prize in Physics has been awarded to John J. Hopfield and Geoffrey E. Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks.”
NobelPrize.org
Nobel Prize in Chemistry 2024
The Nobel Prize in Chemistry 2024 was divided, one half awarded to David Baker "for computational protein design", the other half jointly to Demis Hassabis and John Jumper "for protein structure prediction"
😱1🎉1
State of AI Report 2024
Key takeways from the 2024 Report include:
- Frontier lab performance begins to converge and proprietary models lose their edge
- Planning and reasoning take priority in LLM research
- Foundation models demonstrate their ability to break out of language
- US sanctions have limited effects on Chinese labs’ ability to produce capable models
- The enterprise value of AI companies has hit $9T
- A handful of AI companies begin to generate serious revenue
- The pseudo-acquisition emerges as an off-ramp for AI companies
- The existential risk discourse has cooled off
PDF
Key takeways from the 2024 Report include:
- Frontier lab performance begins to converge and proprietary models lose their edge
- Planning and reasoning take priority in LLM research
- Foundation models demonstrate their ability to break out of language
- US sanctions have limited effects on Chinese labs’ ability to produce capable models
- The enterprise value of AI companies has hit $9T
- A handful of AI companies begin to generate serious revenue
- The pseudo-acquisition emerges as an off-ramp for AI companies
- The existential risk discourse has cooled off
🔥1