InternLM-XComposer-2.5 copes with the tasks of text denoscription of images with complex composition, achieving the capabilities of GPT-4V. Trained with alternating 24 KB image-text contexts, it can easily expand to 96 KB contexts via RoPE extrapolation.
Compared to the previous version 2.0, InternLM-XComposer-2.5 has three major improvements:
- understanding of ultra-high resolution;
- detailed understanding of the video;
- process several images in the context of 1 dialogue.
Using extra Lora, XComposer-2.5 is capable of performing complex tasks:
- creation of web pages;
- creation of high-quality text articles with images.
XComposer-2.5 was evaluated on 28 benchmarks, outperforming existing state-of-the-art open source models in 16 benchmarks . It also closely competes with GPT-4V and Gemini Pro on 16 key tasks.
https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
❤2
Forwarded from Machine Learning
If you ask: which crypto channels are worth watching?👇🏽
Then I will immediately share with you the CRYPTO BARON channel. This is one of the best channels for spot trading🔥 Daily plan for Bitcoin movement and spot signals for alts.
Don’t pay for channels with analytics, everything is free here: https://news.1rj.ru/str/+_kCQKqG5SNZkODhi
Then I will immediately share with you the CRYPTO BARON channel. This is one of the best channels for spot trading🔥 Daily plan for Bitcoin movement and spot signals for alts.
Don’t pay for channels with analytics, everything is free here: https://news.1rj.ru/str/+_kCQKqG5SNZkODhi
Artificial Intelligence A-Z 2024: Build 7 AI + LLM & ChatGPT
Updated : Version 2024
Price: 30$ - full course [offline]
📖 Combine the power of Data Science, Machine Learning and Deep Learning to create powerful AI for Real-World applications!
🔊 Taught By: Hadelin de Ponteves, Kirill Eremenko
Contact @Husseinsheikho
Updated : Version 2024
Price: 30$ - full course [offline]
📖 Combine the power of Data Science, Machine Learning and Deep Learning to create powerful AI for Real-World applications!
🔊 Taught By: Hadelin de Ponteves, Kirill Eremenko
Contact @Husseinsheikho
👍3
In anticipation of the upcoming ICML 2024 (Vienna, July 21-27, 2024), Microsoft has published the results of a study from the MInference project. This method allows you to speed up the processing of long sequences due to sparse calculations and the use of unique templates in matrices.
The MInference technique does not require changes in pre-training settings.
Microsoft researchers' synthetic tests of the method on the LLaMA-3-1M, GLM4-1M, Yi-200K, Phi-3-128K, and Qwen2-128K models show up to a 10x reduction in latency and prefill errors on the A100 while maintaining accuracy.
https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
👍2
Arcee Agent 7B is superior to GPT-3.5-Turbo, and many other models in writing and interpreting code.
Arcee Agent 7B is especially suitable for those wishing to implement complex AI solutions without the computational expense of large language models.
And yes, there are also quantized GGUF versions of Arcee Agent 7B.
https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
👍2
Kolors is a large diffusion model published recently by the Kuaishou Kolors team.
Kolors has been trained on billions of text-to-image pairs and shows excellent results in generating complex photorealistic images.
As evaluated by 50 independent experts, the Kolors model generates more realistic and beautiful images than Midjourney-v6, Stable Diffusion 3, DALL-E 3 and other models
🟡 Kolors page
🟡 Try
🖥 GitHub
https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3❤1
Forwarded from Machine Learning with Python
https://news.1rj.ru/str/major/start?startapp=418788114
👑 Telegram's official bot to win Telegram stars ⭐️
You can convert stars into money or buy ads or services in Telegram
You can convert stars into money or buy ads or services in Telegram
Please open Telegram to view this post
VIEW IN TELEGRAM
Telegram
Major
Check it out on @major.
👍1
TTT is a technique that allows artificial intelligence models to adapt and learn while in use, rather than just during pre-training.
The main advantage of TTT is that it can efficiently process long contexts (large amounts of input data) without significantly increasing the computational cost.
The researchers conducted experiments on various datasets, including books, and found that TTT often outperformed traditional methods.
In comparative benchmarks with other popular machine learning methods such as transformers and recurrent neural networks, TTT was found to perform better on some tasks.
This revolutionary method will bring us closer to creating more flexible and efficient artificial intelligence models that can better adapt to new data in real time.
Adaptations of the method have been published on Github:
- adaptation for Pytorch
- adaptation to JAX
#Pytorch #Jax #TTT #LLM #Training
https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
👍2
Forwarded from Learn Python Hub
https://news.1rj.ru/str/major/start?startapp=418788114
👑 Telegram's official bot to win Telegram stars ⭐️
You can convert stars into money or buy ads or services in Telegram
You can convert stars into money or buy ads or services in Telegram
Please open Telegram to view this post
VIEW IN TELEGRAM
Telegram
Major
Check it out on @major.
This media is not supported in your browser
VIEW IN TELEGRAM
Vico is a no-training framework that analyzes how individual tokens from prompt input tokens affect the generated video, and adjusts the model to prevent dominance by considering all prompt words equally.
To do this, Vico builds a spatio-temporal attention graph, with which it evaluates and adjusts the representation of all input concepts in the video.
git clone https://github.com/Adamdad/vico.git
pip install diffusers==0.26.3
git lfs install
git clone https://huggingface.co/adamdad/videocrafterv2_diffusers
export PYTHONPATH="$PWD"
python videocrafterv2_vico.py \
--prompts XXX \
--unet_path $PATH_TO_VIDEOCRAFTERV2 \
--attribution_mode "latent_attention_flow_st_soft"#T2V #Framework #ML
https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
👍1
⚡️ NOW! Only today-tomorrow I am looking for 10 people for the experiment!
By the end of the month I will try to bring them to a good income. No cheating!
💸I bet a large sum of money that by the end of the month will make 10 unknown people teach to make money on trading. I will not take money. I will not sell anything.
I just sincerely want to help you become more successful and win your bet!
If you want to take part - write me in DM!
👉CLICK HERE👈
👉CLICK HERE👈
👉CLICK HERE👈
By the end of the month I will try to bring them to a good income. No cheating!
💸I bet a large sum of money that by the end of the month will make 10 unknown people teach to make money on trading. I will not take money. I will not sell anything.
I just sincerely want to help you become more successful and win your bet!
If you want to take part - write me in DM!
👉CLICK HERE👈
👉CLICK HERE👈
👉CLICK HERE👈
When training generative models, the training dataset plays an important role in the quality of reference of ready-made models.
One of the good sources can be MiraData from Tencent - a ready-made dataset with a total video duration of 16 thousand hours, designed for training models for generating text in videos. It includes long videos (average 72.1 seconds) with high motion intensity and detailed structured annotations (average 318 words per video).
To assess the quality of the dataset, a system of MiraBench benchmarks was even specially created, consisting of 17 metrics that evaluate temporal consistency, movement in the frame, video quality, and other parameters. According to their results, MiroData outperforms other well-known datasets available in open sources, which mainly consist of short videos with floating quality and short denoscriptions.
#Text2Video #Dataset #ML
https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
👍2❤1
Mamba Vision is an implementation of the Mamba architecture using Selective State Space Models (SSM) in image processing from Nvidia Lab.
MambaVision demonstrates more efficient use of computing resources compared to traditional transformer-based architectures (VIT and Swin), and the use of SSM opens up new ways of extracting and processing visual features. The proposed architecture shows good scalability, maintaining efficiency as the model size increases.
MambaVision is applicable to a variety of computer vision tasks, including image classification and semantic segmentation.
The project is in its early stages and its effectiveness on real-world computer vision tasks has yet to be fully assessed.
At the moment, it has only been used in the image classification task.
MambaVision-T (32M)
MambaVision-T2 (35M)
MambaVision-S (50M)
MambaVision-B (98M)
MambaVision-L (228M)
MambaVision-L2 (241M)
⚠️ Licensing:
For non-commercial projects: CC-BY-NC-SA-4.0
For commercial use: request via form
#MambaVision #ML
https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
❤2👍2🏆2