Mistral выкатил MoE (Mixture of Experts) модель Mixtral 8x7B, которая типа бьёт GPT-3.5 из коробки. Также есть instruction finetuned Mixtral 8x7B Instruct. Это интересно.
https://mistral.ai/news/mixtral-of-experts/
https://mistral.ai/news/mixtral-of-experts/
mistral.ai
Mixtral of experts | Mistral AI
A high quality Sparse Mixture-of-Experts.
🔥12
А ещё из интересного, в свежей huggingface transformers растёт и крепнет поддержка GPU AMD.
AMD's ROCm GPU architecture is now supported across the board and fully tested in our CI with MI210/MI250 GPUs. We further enable specific hardware acceleration for ROCm in Transformers, such as Flash Attention 2, GPTQ quantization and DeepSpeed.
* Add RoCm scheduled CI & upgrade RoCm CI to PyTorch 2.1 by @fxmarty in #26940
* Flash Attention 2 support for RoCm by @fxmarty in #27611
* Reflect RoCm support in the documentation by @fxmarty in #27636
* restructure AMD scheduled CI by @ydshieh in #27743
https://github.com/huggingface/transformers/releases/tag/v4.36.0
AMD's ROCm GPU architecture is now supported across the board and fully tested in our CI with MI210/MI250 GPUs. We further enable specific hardware acceleration for ROCm in Transformers, such as Flash Attention 2, GPTQ quantization and DeepSpeed.
* Add RoCm scheduled CI & upgrade RoCm CI to PyTorch 2.1 by @fxmarty in #26940
* Flash Attention 2 support for RoCm by @fxmarty in #27611
* Reflect RoCm support in the documentation by @fxmarty in #27636
* restructure AMD scheduled CI by @ydshieh in #27743
https://github.com/huggingface/transformers/releases/tag/v4.36.0
GitHub
Release v4.36: Mixtral, Llava/BakLlava, SeamlessM4T v2, AMD ROCm, F.sdpa wide-spread support · huggingface/transformers
New model additions
Mixtral
Mixtral is the new open-source model from Mistral AI announced by the blogpost Mixtral of Experts. The model has been proven to have comparable capabilities to Chat-GPT ...
Mixtral
Mixtral is the new open-source model from Mistral AI announced by the blogpost Mixtral of Experts. The model has been proven to have comparable capabilities to Chat-GPT ...
🔥15❤3👍2
И раз сегодня много LLM новостей, то вот ещё одна для тех, кто пропустил.
Nexusflow выложили NexusRaven-V2 с 13B параметров. Модель бьёт GPT-4 (но вроде не Turbo) на Zero-shot Function Calling. Теперь можете построить больше разных ко-пилотов :)
Блог: https://nexusflow.ai/blogs/ravenv2
HF: https://huggingface.co/Nexusflow/NexusRaven-V2-13B
Nexusflow выложили NexusRaven-V2 с 13B параметров. Модель бьёт GPT-4 (но вроде не Turbo) на Zero-shot Function Calling. Теперь можете построить больше разных ко-пилотов :)
Блог: https://nexusflow.ai/blogs/ravenv2
HF: https://huggingface.co/Nexusflow/NexusRaven-V2-13B
huggingface.co
Nexusflow/NexusRaven-V2-13B · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
🔥15👍5