Chat template viewer
Different LLMs expect very different input formats. HuggingFace added chat templates, they are part of the tokenizer. Chat templates pecify how to convert conversations, represented as lists of messages, into a single string in the format that the model expects. To learn more about chat_template in the different models, visit this.
Different LLMs expect very different input formats. HuggingFace added chat templates, they are part of the tokenizer. Chat templates pecify how to convert conversations, represented as lists of messages, into a single string in the format that the model expects. To learn more about chat_template in the different models, visit this.
The 2024 Nobel Prize in Physics has been awarded to John J. Hopfield and Geoffrey E. Hinton
“for foundational discoveries and inventions that enable machine learning with artificial neural networks.”
“for foundational discoveries and inventions that enable machine learning with artificial neural networks.”
🎉6❤1🤔1😱1
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
The authors analyzed the errors of LLMs by examining their internal representations. They discover that information related to truthfulness is localized within the exact answer tokens. From a practical perspective, this finding enhances error detection methods applicable to production-level LLMs.
The code is coming soon
The authors analyzed the errors of LLMs by examining their internal representations. They discover that information related to truthfulness is localized within the exact answer tokens. From a practical perspective, this finding enhances error detection methods applicable to production-level LLMs.
The code is coming soon
ml4se
The 2024 Nobel Prize in Physics has been awarded to John J. Hopfield and Geoffrey E. Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks.”
NobelPrize.org
Nobel Prize in Chemistry 2024
The Nobel Prize in Chemistry 2024 was divided, one half awarded to David Baker "for computational protein design", the other half jointly to Demis Hassabis and John Jumper "for protein structure prediction"
😱1🎉1
State of AI Report 2024
Key takeways from the 2024 Report include:
- Frontier lab performance begins to converge and proprietary models lose their edge
- Planning and reasoning take priority in LLM research
- Foundation models demonstrate their ability to break out of language
- US sanctions have limited effects on Chinese labs’ ability to produce capable models
- The enterprise value of AI companies has hit $9T
- A handful of AI companies begin to generate serious revenue
- The pseudo-acquisition emerges as an off-ramp for AI companies
- The existential risk discourse has cooled off
PDF
Key takeways from the 2024 Report include:
- Frontier lab performance begins to converge and proprietary models lose their edge
- Planning and reasoning take priority in LLM research
- Foundation models demonstrate their ability to break out of language
- US sanctions have limited effects on Chinese labs’ ability to produce capable models
- The enterprise value of AI companies has hit $9T
- A handful of AI companies begin to generate serious revenue
- The pseudo-acquisition emerges as an off-ramp for AI companies
- The existential risk discourse has cooled off
🔥1
2^136279841-1 is the New Largest Known Prime Number
The Great Internet Mersenne Prime Search has discovered a new Mersenne prime number, $2^136279841-1$. At 41,024,320 digits, it eclipses by more than 16 million digits the previous largest known prime number found by GIMPS nearly 6 years ago.
The Great Internet Mersenne Prime Search has discovered a new Mersenne prime number, $2^136279841-1$. At 41,024,320 digits, it eclipses by more than 16 million digits the previous largest known prime number found by GIMPS nearly 6 years ago.
👏4
Top 10 Serverless GPUs: A comprehensive vendor selection
Serverless model offers several advantages:
- Cost Efficiency: You are billed only for the compute time you consume, not for idle server time.
- Scalability: The provider automatically scales the infrastructure to handle varying loads.
- Improved Productivity: No need to manage servers, patch operating systems, or handle scaling. Developers can focus on writing code and business logic rather than managing infrastructure.
- Faster Time to Market: Rapid deployment and updates are possible because there’s no infrastructure to manage.
The article compares top 10 serverless GPU platforms in the emerging market.
Serverless model offers several advantages:
- Cost Efficiency: You are billed only for the compute time you consume, not for idle server time.
- Scalability: The provider automatically scales the infrastructure to handle varying loads.
- Improved Productivity: No need to manage servers, patch operating systems, or handle scaling. Developers can focus on writing code and business logic rather than managing infrastructure.
- Faster Time to Market: Rapid deployment and updates are possible because there’s no infrastructure to manage.
The article compares top 10 serverless GPU platforms in the emerging market.
Teaching Transformers Modular Arithmetic at Scale
The work introduces novel techniques to help ML models learn modular addition. These techniques—varying the diversity of training data, using an angular embedding for model inputs and outputs, and introducing a regularized loss function—enable ML models to add hundreds of elements mod a large $q$ with high accuracy, a significant improvement over prior work.
Modular addition: given $N$ elements in $Z_q$, compute their sum modulo $q$.
The work introduces novel techniques to help ML models learn modular addition. These techniques—varying the diversity of training data, using an angular embedding for model inputs and outputs, and introducing a regularized loss function—enable ML models to add hundreds of elements mod a large $q$ with high accuracy, a significant improvement over prior work.
Modular addition: given $N$ elements in $Z_q$, compute their sum modulo $q$.
The Artificial Inflation (AI) of Artificial Intelligence (AI)—or AI^2 Bursts its Bubble, Bringing Down the Hype of the AI Threat
While some of the promises of AI have come true, and technology (like ChatGPT and its plugins) will continue to impress with its capabilities, AI-based technologies have largely failed to live up to the mountainous hype. In 2025, the authors expect the industry to pull back on the promises, investment, and hype of new AI capabilities and settle down into what is real versus marketing noise.
While some of the promises of AI have come true, and technology (like ChatGPT and its plugins) will continue to impress with its capabilities, AI-based technologies have largely failed to live up to the mountainous hype. In 2025, the authors expect the industry to pull back on the promises, investment, and hype of new AI capabilities and settle down into what is real versus marketing noise.
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
The paper introduces Tokenformer. The architecture leverages the attention mechanism to facilitate not only inter-token computations but also interactions between tokens and model parameters. The authors replace all linear projection layers in the Transformer with Pattention layers, allowing for efficient incremental scaling without the need for retraining from scratch.
Future work:
- Extending the Mixture-of-Experts Paradigm
- Advancing Parameter-Efficient Tuning
- Integrating Vision and Language Models
- Device-Cloud Collaboration
- Enhancing Model Interpretability
Code: https://github.com/Haiyang-W/TokenFormer
The paper introduces Tokenformer. The architecture leverages the attention mechanism to facilitate not only inter-token computations but also interactions between tokens and model parameters. The authors replace all linear projection layers in the Transformer with Pattention layers, allowing for efficient incremental scaling without the need for retraining from scratch.
Future work:
- Extending the Mixture-of-Experts Paradigm
- Advancing Parameter-Efficient Tuning
- Integrating Vision and Language Models
- Device-Cloud Collaboration
- Enhancing Model Interpretability
Code: https://github.com/Haiyang-W/TokenFormer
RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts
The authors presented RE-Bench, a suite of environments that measure the ability of AI agents to automate AI R&D tasks. They compare humans to several public frontier models through best-of-k with varying time budgets and agent designs, and find that the best AI agents achieve a score 4x higher than human experts when both are given a total time budget of 2 hours per environment. However, humans currently display better returns to increasing time budgets, narrowly exceeding the top AI agent scores given an 8-hour budget, and achieving 2x the score of the top AI agent when both are given 32 total hours.
The authors presented RE-Bench, a suite of environments that measure the ability of AI agents to automate AI R&D tasks. They compare humans to several public frontier models through best-of-k with varying time budgets and agent designs, and find that the best AI agents achieve a score 4x higher than human experts when both are given a total time budget of 2 hours per environment. However, humans currently display better returns to increasing time budgets, narrowly exceeding the top AI agent scores given an 8-hour budget, and achieving 2x the score of the top AI agent when both are given 32 total hours.
👍2
Mechanistic Interpretability
I have prepared a list of papers on Mechanistical Interpretability. If you have good links on this topic, please share them in the comments.
* 2021: A Mathematical Framework for Transformer Circuits
* 2022.06.27: Mechanistic Interpretability, Variables, and the Importance of Interpretable Bases
* 2022.09.14: Toy Models of Superposition
* 2022.09.24: In-context Learning and Induction Heads
* 2023.04.28: Towards Automated Circuit Discovery for Mechanistic Interpretability
* 2023.01.12: Progress measures for grokking via mechanistic interpretability
* 2023.05.24: Interpretability Dreams
* 2023.09: Sparse Autoencoders Find Highly Interpretable Model Directions
* 2023.10.25: Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism
* 2024.01.15: Sparse Autoencoders Work on Attention Layer Outputs
...
I have prepared a list of papers on Mechanistical Interpretability. If you have good links on this topic, please share them in the comments.
* 2021: A Mathematical Framework for Transformer Circuits
* 2022.06.27: Mechanistic Interpretability, Variables, and the Importance of Interpretable Bases
* 2022.09.14: Toy Models of Superposition
* 2022.09.24: In-context Learning and Induction Heads
* 2023.04.28: Towards Automated Circuit Discovery for Mechanistic Interpretability
* 2023.01.12: Progress measures for grokking via mechanistic interpretability
* 2023.05.24: Interpretability Dreams
* 2023.09: Sparse Autoencoders Find Highly Interpretable Model Directions
* 2023.10.25: Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism
* 2024.01.15: Sparse Autoencoders Work on Attention Layer Outputs
...
👍4
...
* 2024.02.01: Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small
* 2024.02.06: Challenges in Mechanistically Interpreting Model Representations
* 2024.02.22: Do sparse autoencoders find "true features"?
* 2024.03.14: Sparse autoencoders find composed features in small toy models
* 2024.03.15: Improving SAE's by Sqrt()-ing L1 & Removing Lowest Activating Features
* 2024.03.29: SAE reconstruction errors are (empirically) pathological
* 2024.04.22: Mechanistic Interpretability for AI Safety A Review
* 2024.05.21: Mapping the Mind of a Large Language Mode
* 2024.06.13: The engineering challenges of scaling interpretability
* 2024.07.02: A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
* 2024.07.29: Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability
* 2024.10.10: Bilinear MLPs enable weight-based mechanistic interpretability
* 2024.10.11: Explaining AI through mechanistic interpretability
* 2024.10.15: Mechanistic Permutability: Match Features Across Layers
* 2024.10.17: Using Dictionary Learning Features as Classifiers
* 2024.10.24: Probing Ranking LLMs: Mechanistic Interpretability in Information Retrieval
* 2024.10.25: Evaluating feature steering: A case study in mitigating social biases
* 2024.11.25: Adaptive Circuit Behavior and Generalization in Mechanistic Interpretability
* 2024.02.01: Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small
* 2024.02.06: Challenges in Mechanistically Interpreting Model Representations
* 2024.02.22: Do sparse autoencoders find "true features"?
* 2024.03.14: Sparse autoencoders find composed features in small toy models
* 2024.03.15: Improving SAE's by Sqrt()-ing L1 & Removing Lowest Activating Features
* 2024.03.29: SAE reconstruction errors are (empirically) pathological
* 2024.04.22: Mechanistic Interpretability for AI Safety A Review
* 2024.05.21: Mapping the Mind of a Large Language Mode
* 2024.06.13: The engineering challenges of scaling interpretability
* 2024.07.02: A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
* 2024.07.29: Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability
* 2024.10.10: Bilinear MLPs enable weight-based mechanistic interpretability
* 2024.10.11: Explaining AI through mechanistic interpretability
* 2024.10.15: Mechanistic Permutability: Match Features Across Layers
* 2024.10.17: Using Dictionary Learning Features as Classifiers
* 2024.10.24: Probing Ranking LLMs: Mechanistic Interpretability in Information Retrieval
* 2024.10.25: Evaluating feature steering: A case study in mitigating social biases
* 2024.11.25: Adaptive Circuit Behavior and Generalization in Mechanistic Interpretability
👍3