Mistral выкатил MoE (Mixture of Experts) модель Mixtral 8x7B, которая типа бьёт GPT-3.5 из коробки. Также есть instruction finetuned Mixtral 8x7B Instruct. Это интересно.
https://mistral.ai/news/mixtral-of-experts/
https://mistral.ai/news/mixtral-of-experts/
mistral.ai
Mixtral of experts | Mistral AI
A high quality Sparse Mixture-of-Experts.
🔥12
А ещё из интересного, в свежей huggingface transformers растёт и крепнет поддержка GPU AMD.
AMD's ROCm GPU architecture is now supported across the board and fully tested in our CI with MI210/MI250 GPUs. We further enable specific hardware acceleration for ROCm in Transformers, such as Flash Attention 2, GPTQ quantization and DeepSpeed.
* Add RoCm scheduled CI & upgrade RoCm CI to PyTorch 2.1 by @fxmarty in #26940
* Flash Attention 2 support for RoCm by @fxmarty in #27611
* Reflect RoCm support in the documentation by @fxmarty in #27636
* restructure AMD scheduled CI by @ydshieh in #27743
https://github.com/huggingface/transformers/releases/tag/v4.36.0
AMD's ROCm GPU architecture is now supported across the board and fully tested in our CI with MI210/MI250 GPUs. We further enable specific hardware acceleration for ROCm in Transformers, such as Flash Attention 2, GPTQ quantization and DeepSpeed.
* Add RoCm scheduled CI & upgrade RoCm CI to PyTorch 2.1 by @fxmarty in #26940
* Flash Attention 2 support for RoCm by @fxmarty in #27611
* Reflect RoCm support in the documentation by @fxmarty in #27636
* restructure AMD scheduled CI by @ydshieh in #27743
https://github.com/huggingface/transformers/releases/tag/v4.36.0
GitHub
Release v4.36: Mixtral, Llava/BakLlava, SeamlessM4T v2, AMD ROCm, F.sdpa wide-spread support · huggingface/transformers
New model additions
Mixtral
Mixtral is the new open-source model from Mistral AI announced by the blogpost Mixtral of Experts. The model has been proven to have comparable capabilities to Chat-GPT ...
Mixtral
Mixtral is the new open-source model from Mistral AI announced by the blogpost Mixtral of Experts. The model has been proven to have comparable capabilities to Chat-GPT ...
🔥15❤3👍2
И раз сегодня много LLM новостей, то вот ещё одна для тех, кто пропустил.
Nexusflow выложили NexusRaven-V2 с 13B параметров. Модель бьёт GPT-4 (но вроде не Turbo) на Zero-shot Function Calling. Теперь можете построить больше разных ко-пилотов :)
Блог: https://nexusflow.ai/blogs/ravenv2
HF: https://huggingface.co/Nexusflow/NexusRaven-V2-13B
Nexusflow выложили NexusRaven-V2 с 13B параметров. Модель бьёт GPT-4 (но вроде не Turbo) на Zero-shot Function Calling. Теперь можете построить больше разных ко-пилотов :)
Блог: https://nexusflow.ai/blogs/ravenv2
HF: https://huggingface.co/Nexusflow/NexusRaven-V2-13B
huggingface.co
Nexusflow/NexusRaven-V2-13B · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
🔥15👍5
Это просто пир духа какой-то.
https://www.cerebras.net/blog/introducing-gigagpt-gpt-3-sized-models-in-565-lines-of-code/
GigaGPT is Cerebras’ implementation of Andrei Karpathy’s nanoGPT – the simplest and most compact code base to train and fine-tune GPT models. Whereas nanoGPT can train models in the 100M parameter range, gigaGPT trains models well over 100B parameters. We do this without introducing additional code or relying on third party frameworks – the entire repo is just 565 lines of code. Instead gigaGPT utilizes the large memory and compute capacity of Cerebras hardware to enable large scale training on vanilla torch.nn code. With no modifications, gigaGPT supports long context lengths and works with a variety of optimizers.
Но кажется только на железе Cerebras'а. Но всё равно прикольно, больше железных и облачных альтернатив!
https://www.cerebras.net/blog/introducing-gigagpt-gpt-3-sized-models-in-565-lines-of-code/
GigaGPT is Cerebras’ implementation of Andrei Karpathy’s nanoGPT – the simplest and most compact code base to train and fine-tune GPT models. Whereas nanoGPT can train models in the 100M parameter range, gigaGPT trains models well over 100B parameters. We do this without introducing additional code or relying on third party frameworks – the entire repo is just 565 lines of code. Instead gigaGPT utilizes the large memory and compute capacity of Cerebras hardware to enable large scale training on vanilla torch.nn code. With no modifications, gigaGPT supports long context lengths and works with a variety of optimizers.
Но кажется только на железе Cerebras'а. Но всё равно прикольно, больше железных и облачных альтернатив!
www.cerebras.ai
Introducing gigaGPT: GPT-3 sized models in 565 lines of code - Cerebras
GigaGPT is Cerebras’ implementation of Andrei Karpathy’s nanoGPT – the simplest and most compact code base to train and fine-tune GPT models.
🔥23👍2👎1
А кому надоели LLM, есть свежий лонгрид от Стивена нашего Вольфрама
https://writings.stephenwolfram.com/2023/12/observer-theory/
https://writings.stephenwolfram.com/2023/12/observer-theory/
Stephenwolfram
Observer Theory
Stephen Wolfram discusses building a general observer theory using discoveries from the Physics Project and NKS, including the ruliad. Read how the nature of observers is critical to determining the most fundamental laws we attribute to the universe.
🔥21🤡2❤1👎1