Neural networks achieved great success at various NLP tasks, however, they are limited at handling infrequent patterns. In this article, the problem is described in the context of machine translation task.
The authors noted that NMT is good at learning translation pairs that are frequently observed, but the system may ‘forget’ to use low-frequency pairs when they should be. In contrast, in traditional rule-based systems, low-frequency pairs cannot be smoothed out no matter how rare they are. One solution to this is combining both approaches.
The authors propose to use a large external memory along with a selection mechanism to allow NN to use this memory. Selection fetches all relevant translation variants using words in source sentence and then an attention mechanism selects among these variants. After that neural net decides what source of prediction should be used on each translation step.
The important thing is that the vectors for external memory were trained separately. That basically means that we can build knowledge bases for neural nets. That seems like a promising way to construct really large-scale models with huge capacity.
The authors noted that NMT is good at learning translation pairs that are frequently observed, but the system may ‘forget’ to use low-frequency pairs when they should be. In contrast, in traditional rule-based systems, low-frequency pairs cannot be smoothed out no matter how rare they are. One solution to this is combining both approaches.
The authors propose to use a large external memory along with a selection mechanism to allow NN to use this memory. Selection fetches all relevant translation variants using words in source sentence and then an attention mechanism selects among these variants. After that neural net decides what source of prediction should be used on each translation step.
The important thing is that the vectors for external memory were trained separately. That basically means that we can build knowledge bases for neural nets. That seems like a promising way to construct really large-scale models with huge capacity.
Wrote an article on the Medium about pushing fastText into Colab.
Tl;dr: original binary fastText is too large for Colab.
We can shrink it, but it is a little tricky for n-gram matrix: we need to consider uniformness of collision distribution.
The final model takes 2Gb of RAM instead of 16Gb and 94% similar to the original model.
Code is also provided.
Tl;dr: original binary fastText is too large for Colab.
We can shrink it, but it is a little tricky for n-gram matrix: we need to consider uniformness of collision distribution.
The final model takes 2Gb of RAM instead of 16Gb and 94% similar to the original model.
Code is also provided.
Medium
Shrinking fastText embeddings so that it fits Google Colab
Attempt to compress fastText so it takes 2Gb of RAM instead of 16Gb. Keep new model 94% similar to the original model.
Have finished building demo and landing page for my project on mention classification. The idea of this project is to create a model which can assign some labels to objects based on their mentions in context. Right now it works only for people mentions, but if I find interest in this work, I will extend the model to other types like organizations or events. For now, you can check out the online demo of the neural network.
The current implementation can take account of several mentions at a time, so it can distinguish relevant parts of the context, not just averaging prediction.
It's also open sourced, and built with AllenNLP framework from training to serving. Take a look at it.
More technical details of implementation coming later.
The current implementation can take account of several mentions at a time, so it can distinguish relevant parts of the context, not just averaging prediction.
It's also open sourced, and built with AllenNLP framework from training to serving. Take a look at it.
More technical details of implementation coming later.
Partially trainable embeddings
Understanding the meaning of natural language require a huge amount of information to be arranged by a neural network.
And the largest part if this information is usually stored in word embeddings.
Typically, labeled data from a particular task is not enough to train so many parameters. Thus, word embeddings are trained separately on a large general-purpose corpora.
But there are some cases when we want to be able to train word embeddings in our custom task, for example:
- We have a specific domain with a non-standard terminology or sentence structure
- We want to use additional markup like
In these cases, we need to update a small number of weights, responsible for new words and meanings. At the same time, we can't update pre-trained embeddings cause it will lead to very quick overfitting.
To deal with this problem partially trainable embeddings were used in this project.
The idea is to concatenate fixed pre-trained embeddings with additional small trainable embeddings. It is also useful to add a linear layer right after concatenation so embeddings could interact during training.
Changing the size of an additional embedding gives control over the number of parameters and, as a result, allows to prevent overfitting.
Another good thing is that AllenNLP allows implementing this technique without a single line of code but with just a simple configuration:
Understanding the meaning of natural language require a huge amount of information to be arranged by a neural network.
And the largest part if this information is usually stored in word embeddings.
Typically, labeled data from a particular task is not enough to train so many parameters. Thus, word embeddings are trained separately on a large general-purpose corpora.
But there are some cases when we want to be able to train word embeddings in our custom task, for example:
- We have a specific domain with a non-standard terminology or sentence structure
- We want to use additional markup like
<tags> in our task In these cases, we need to update a small number of weights, responsible for new words and meanings. At the same time, we can't update pre-trained embeddings cause it will lead to very quick overfitting.
To deal with this problem partially trainable embeddings were used in this project.
The idea is to concatenate fixed pre-trained embeddings with additional small trainable embeddings. It is also useful to add a linear layer right after concatenation so embeddings could interact during training.
Changing the size of an additional embedding gives control over the number of parameters and, as a result, allows to prevent overfitting.
Another good thing is that AllenNLP allows implementing this technique without a single line of code but with just a simple configuration:
{
"token_embedders": {
"tokens-ngram": {
"type": "fasttext-embedder",
"model_path": "./data/fasttext_embedding.model",
"trainable": false
},
"tokens": {
"type": "embedding",
"embedding_dim": 20,
"trainable": true
}
}
}Filterable approximate nearest neighbors search
I did a little research on how to search in vector space if you also need to take into account additional restrictions: search in a subset, filter by a numerical criterion or geo.
The article turned out to be too large for the telegram channel format, so I’ll leave only the essence here.
The full article is available on my updated blog.
The main point is that with minor modifications of the state-of-the-art HNSM algorithm we can cover a variety of filtering cases.
Modifications are to add edges to a navigation graph to ensure that it is connected after filtering out some part of its nodes.
Looking at filtering by category we can see that adding edges within particular small categories solve the connectivity problem for them.
And large categories sustain its connectivity due to the law of the Percolation theory.
Filtering by categories with could be relatively easy be extended to the numerical range filtering and geo spatial index.
At the full version of this article I also present a couple experiments to prove this approach.
It also contains some consideration of how to avoid possible failures.
Take a look at it!
I did a little research on how to search in vector space if you also need to take into account additional restrictions: search in a subset, filter by a numerical criterion or geo.
The article turned out to be too large for the telegram channel format, so I’ll leave only the essence here.
The full article is available on my updated blog.
The main point is that with minor modifications of the state-of-the-art HNSM algorithm we can cover a variety of filtering cases.
Modifications are to add edges to a navigation graph to ensure that it is connected after filtering out some part of its nodes.
Looking at filtering by category we can see that adding edges within particular small categories solve the connectivity problem for them.
And large categories sustain its connectivity due to the law of the Percolation theory.
Filtering by categories with could be relatively easy be extended to the numerical range filtering and geo spatial index.
At the full version of this article I also present a couple experiments to prove this approach.
It also contains some consideration of how to avoid possible failures.
Take a look at it!
Recently I found an interesting repository on GitHub.
Actually, it is not a single repository, but a whole project, created by a CAIR center for research at the University of Agder.
It includes a bunch of articles and different implementations of a novel concept called Tsetlin Machine.
The author claims that this approach can replace neural networks and is faster and more accurate.
This work itself looks quite marginal, it's not recent but didn't become widely used.
It is noticeable that it is alive only thanks to the enthusiasm of several people.
From public sources, I found only the overselling press release of their own university and a skeptical thread on Reddit. As rightly noted in the latter, there are quite a few red flags and imperfections in this work, including excessive self-citation, unconvincing MNIST experiments, a poorly written article that is difficult to read.
However, I still decided to spend a little time reading about this concept - to use finite automaton states with linear tactics as trainable parameters of the model.
States define if signals are used in a logical clause or not.
The model is trained with two types of feedback: first fights false-negative actuation of the Clause and the second, false-positive, respectively.
The author shows benchmarks of the model on a couple different tasks but pays small attention to the main problem - there is no method provided to make Tsetlin Machine truly deep.
Instead, he suggests to train it layer by layer like Hinton trained a Deep Belief Network.
This restriction won't let Tsetlin Machine equal with neural networks in any area.
On the other hand, there are no theoretical limitations for the discrete feedback propagation mechanism to exist.
I going to conduct some experiments with this concept, will keep you posted if something works out.
Actually, it is not a single repository, but a whole project, created by a CAIR center for research at the University of Agder.
It includes a bunch of articles and different implementations of a novel concept called Tsetlin Machine.
The author claims that this approach can replace neural networks and is faster and more accurate.
This work itself looks quite marginal, it's not recent but didn't become widely used.
It is noticeable that it is alive only thanks to the enthusiasm of several people.
From public sources, I found only the overselling press release of their own university and a skeptical thread on Reddit. As rightly noted in the latter, there are quite a few red flags and imperfections in this work, including excessive self-citation, unconvincing MNIST experiments, a poorly written article that is difficult to read.
However, I still decided to spend a little time reading about this concept - to use finite automaton states with linear tactics as trainable parameters of the model.
States define if signals are used in a logical clause or not.
The model is trained with two types of feedback: first fights false-negative actuation of the Clause and the second, false-positive, respectively.
The author shows benchmarks of the model on a couple different tasks but pays small attention to the main problem - there is no method provided to make Tsetlin Machine truly deep.
Instead, he suggests to train it layer by layer like Hinton trained a Deep Belief Network.
This restriction won't let Tsetlin Machine equal with neural networks in any area.
On the other hand, there are no theoretical limitations for the discrete feedback propagation mechanism to exist.
I going to conduct some experiments with this concept, will keep you posted if something works out.
Tools for setting up a new ML project.
Compiled a list of tools I find worth a try if you are going to set up a new ML project.
This list is not intended to be exhaustive overview and it does not include any ML frameworks or libraries.
It is focused on auxiliary tools that can make development easier and experiments reproducible.
Some of this tools I have used in real projects, others I just tried on a toy example, but found interesting to use in future.
Compiled a list of tools I find worth a try if you are going to set up a new ML project.
This list is not intended to be exhaustive overview and it does not include any ML frameworks or libraries.
It is focused on auxiliary tools that can make development easier and experiments reproducible.
Some of this tools I have used in real projects, others I just tried on a toy example, but found interesting to use in future.
Filterable HNSW - part 2
In a previous article on the filter when searching for nearest neighbors, we discussed the theoretical background.
This time I am going to present a C++ implementation with Python bindings.
As a base implementation of HNSW I took hnswlib, stand-alone header-only implementation of HNSW.
With new implementation it is possible now to assign an arbitrary number of tags to any point with a simple code:
The group of points under the same tag could be searched separately from others:
These groups could also be combined using boolean expressions. For example
If the group is large enough (
Based on the HNSW with categorical filtering, it is possible to build build a tool that can search in specified geo-region only.
Find a full version of this article with more examples and explanations in my blog.
In a previous article on the filter when searching for nearest neighbors, we discussed the theoretical background.
This time I am going to present a C++ implementation with Python bindings.
As a base implementation of HNSW I took hnswlib, stand-alone header-only implementation of HNSW.
With new implementation it is possible now to assign an arbitrary number of tags to any point with a simple code:
# ids - list of point ids
# tag - tag id
hnsw.add_tags(ids, tag)
The group of points under the same tag could be searched separately from others:
query_vector = ...
tag_to_search_in = 42
# Search among points with this tag
condition = [[(False, tag_to_search_in)]]
labels, dist = hnsw.knn_query(query_vector, k=10, conditions=condition)
These groups could also be combined using boolean expressions. For example
(A | !B) & C is represented as [[(0, A), (1, B)], [(0, C)]], where A, B, C are logical clauses if respective tag is assigned to a point.If the group is large enough (
>> 1/M fraction of all points), knn_query should work fine. But if the group is smaller, it may need to build additional connections in the HNSW graph for these groups.hnsw.index_tagged(tag=42, m=8)
Based on the HNSW with categorical filtering, it is possible to build build a tool that can search in specified geo-region only.
Find a full version of this article with more examples and explanations in my blog.
ONNX and deployment libraries
Libraries like AllenNLP are great for model training and prototyping, they contain functions and helpers for almost any practical and theoretical task.
Some of these libraries even have functions for model serving, but they still might be a poor choice for a serving model in production.
Very same functionality, which makes them convenient for development, makes them hard to support in a production environment.
Docker image with only AllenNLP installed takes up a whole 1.9 GB compressed! It could hardly be called a micro-service.
In Tensorflow this problem was solved by saving computational graphs in a special serialization format, independent of training and preprocessing libraries.
This serialized view can later be served by the
Good solution, but not universal - there are plenty of frameworks, like PyTorch, which does not follow Google's standard.
Now, this is a part where ONNX appears - an open standard for NN representation.
It defines a common set of operators - the building blocks of machine learning and deep learning models.
Not any valid Python-PyTorch model can be converted into ONNX representation. Only a subset of operations is also valid for ONNX.
Unfortunately, default implementation of most AllenNLP models does not fit this subset:
- AllenNLP model handles a vast variety of corner cases, conditions that are essentially python functions.
ONNX does not support arbitrary code execution, ONNX model should consist of computation graph only
- AllenNLP models take care of text preprocessing. It operates with dictionaries and tokenization. ONNX does not support these operations.
Luckily in most cases, AllenNLP models could be used as just a wrapper for actual model implementation.
For this, you need to have an AllenNLP model, which handles loss function, makes preprocessing, and interacts with the model trainer.
And also an internal class for the "pure" model, which implements standard
It should use tensors as input and output.
Internally it should construct a persistent computational graph.
This internal model now could be converted into the ONNX model and saved independently.
Having ONNX you can use whatever instrument you need to serve or explore your model.
Libraries like AllenNLP are great for model training and prototyping, they contain functions and helpers for almost any practical and theoretical task.
Some of these libraries even have functions for model serving, but they still might be a poor choice for a serving model in production.
Very same functionality, which makes them convenient for development, makes them hard to support in a production environment.
Docker image with only AllenNLP installed takes up a whole 1.9 GB compressed! It could hardly be called a micro-service.
In Tensorflow this problem was solved by saving computational graphs in a special serialization format, independent of training and preprocessing libraries.
This serialized view can later be served by the
tensor serving service.Good solution, but not universal - there are plenty of frameworks, like PyTorch, which does not follow Google's standard.
Now, this is a part where ONNX appears - an open standard for NN representation.
It defines a common set of operators - the building blocks of machine learning and deep learning models.
Not any valid Python-PyTorch model can be converted into ONNX representation. Only a subset of operations is also valid for ONNX.
Unfortunately, default implementation of most AllenNLP models does not fit this subset:
- AllenNLP model handles a vast variety of corner cases, conditions that are essentially python functions.
ONNX does not support arbitrary code execution, ONNX model should consist of computation graph only
- AllenNLP models take care of text preprocessing. It operates with dictionaries and tokenization. ONNX does not support these operations.
Luckily in most cases, AllenNLP models could be used as just a wrapper for actual model implementation.
For this, you need to have an AllenNLP model, which handles loss function, makes preprocessing, and interacts with the model trainer.
And also an internal class for the "pure" model, which implements standard
nn.Module interface.It should use tensors as input and output.
Internally it should construct a persistent computational graph.
This internal model now could be converted into the ONNX model and saved independently.
Having ONNX you can use whatever instrument you need to serve or explore your model.
Forwarded from Spark in me (Alexander)
Silero Speech-To-Text Models V1 Released
We are proud to announce that we have released our high-quality (i.e. on par with premium Google models) speech-to-text Models for the following languages:
- English
- German
- Spanish
Why this is a big deal:
- STT Research is typically focused on huge compute budgets
- Pre-trained models and recipes did not generalize well, were difficult to use even as-is, relied on obsolete tech
- Until now STT community lacked easy to use high quality production grade STT models
How we solve it:
- We publish a set of pre-trained high-quality models for popular languages
- Our models are embarrassingly easy to use
- Our models are fast and can be run on commodity hardware
Even if you do not work with STT, please give us a star / share!
Links
- https://github.com/snakers4/silero-models
We are proud to announce that we have released our high-quality (i.e. on par with premium Google models) speech-to-text Models for the following languages:
- English
- German
- Spanish
Why this is a big deal:
- STT Research is typically focused on huge compute budgets
- Pre-trained models and recipes did not generalize well, were difficult to use even as-is, relied on obsolete tech
- Until now STT community lacked easy to use high quality production grade STT models
How we solve it:
- We publish a set of pre-trained high-quality models for popular languages
- Our models are embarrassingly easy to use
- Our models are fast and can be run on commodity hardware
Even if you do not work with STT, please give us a star / share!
Links
- https://github.com/snakers4/silero-models
GitHub
GitHub - snakers4/silero-models: Silero Models: pre-trained text-to-speech models made embarrassingly simple
Silero Models: pre-trained text-to-speech models made embarrassingly simple - snakers4/silero-models
👍1
🔲 Qdrant - vector search engine
Since my last post about filtrable HNSW
I was working on a new Search Engine to give this idea a proper implementation.
And I finally published an alpha version of the engine called Qdrant.
Development is still in an early stage, but it already provides ElasticSearch-like conditions
Use-cases
You might need Qdrant in cases when a vector could not fully represent a sought object.
For example, a neural network might model a visual appearance of a piece of clothing, but can hardly consider its stock availability.
With Qdrant you can assign this feature as a payload and use it for filtering.
Among the possible applications:
- Semantic search with facets
- Semantic search on map
- Matching engines - e.g. Candidates and job positions
- Personal recommendations
Technical highlights
Qdrant is written in Rust, the language specially designed for system programming - the building of services that are used by other services.
Rust is comparable in speed with C but also protects from data races what is crucial for database applications.
Push the crab 🦀 if you are interested in more Rust-specific details of the project.
The engine uses write-ahead logging. Once it confirmed an update - it won't lose data even in case of power shut down.
You can already try it with Docker image:
Simple search request could look like this:
All APIs are documented with OpenAPI 3.0.
It provides an easy way to generate client for any programming language.
I would highly appreciate any feedback on the project, and I will be grateful if you give it a star on GitHub.
Since my last post about filtrable HNSW
I was working on a new Search Engine to give this idea a proper implementation.
And I finally published an alpha version of the engine called Qdrant.
Development is still in an early stage, but it already provides ElasticSearch-like conditions
must, should and must_not which you can combine to represent an arbitrary condition.Use-cases
You might need Qdrant in cases when a vector could not fully represent a sought object.
For example, a neural network might model a visual appearance of a piece of clothing, but can hardly consider its stock availability.
With Qdrant you can assign this feature as a payload and use it for filtering.
Among the possible applications:
- Semantic search with facets
- Semantic search on map
- Matching engines - e.g. Candidates and job positions
- Personal recommendations
Technical highlights
Qdrant is written in Rust, the language specially designed for system programming - the building of services that are used by other services.
Rust is comparable in speed with C but also protects from data races what is crucial for database applications.
Push the crab 🦀 if you are interested in more Rust-specific details of the project.
The engine uses write-ahead logging. Once it confirmed an update - it won't lose data even in case of power shut down.
You can already try it with Docker image:
docker pull generall/qdrant
Simple search request could look like this:
POST /test_collection/points/search
{
"filter": {
"should": [
{
"match": {
"key": "city",
"keyword": "London"
}
}
]
},
"vector": [0.2, 0.1, 0.9, 0.7],
"top": 3
}
All APIs are documented with OpenAPI 3.0.
It provides an easy way to generate client for any programming language.
I would highly appreciate any feedback on the project, and I will be grateful if you give it a star on GitHub.
❤2
For those who reacted with 🦀 on a previous post. I wrote a Twitter thread on how I am building Qdrant with Rust. It is on Twitter because the development is still in progress, and I would like to tell you about some interesting details without a special blog-post.
Some topics of the thread:
- How Qdrant is useful?
- How it stores data and build indexes?
- How to keep data always available for search?
- How do I auto-generate documentation in Rust?
Your comments are welcome here and on Twitter!
Some topics of the thread:
- How Qdrant is useful?
- How it stores data and build indexes?
- How to keep data always available for search?
- How do I auto-generate documentation in Rust?
Your comments are welcome here and on Twitter!
Metric Learning Tips & Tricks
Hi everyone, I don't often have posts on this channel.
But today, I want to share some insights about what I'm working on.
Over the last year, I have been developing on a job matching system, and in the process, I have solved some interesting problems related to metric(similarity) learning.
I decided to collect all the interesting solutions into an article.
Here are some highlights:
- We have embeddings for professions and you can play with them online
- There is a way to train a model without labeled data, but it requires some tricks
- Hard Negative Mining does not work, but you can increase batch size instead
- It is possible to estimate embedding confidence
- We can micro-manage the model without re-training. Introducing the neural rules
- How do we deploy metric learning in production. Spoiler: with Qdrant
Hi everyone, I don't often have posts on this channel.
But today, I want to share some insights about what I'm working on.
Over the last year, I have been developing on a job matching system, and in the process, I have solved some interesting problems related to metric(similarity) learning.
I decided to collect all the interesting solutions into an article.
Here are some highlights:
- We have embeddings for professions and you can play with them online
- There is a way to train a model without labeled data, but it requires some tricks
- Hard Negative Mining does not work, but you can increase batch size instead
- It is possible to estimate embedding confidence
- We can micro-manage the model without re-training. Introducing the neural rules
- How do we deploy metric learning in production. Spoiler: with Qdrant
❤1👍1
Neural Search Step-by-Step
We made a tutorial on Semantic Embeddings and Neural Search. With this guide, you will build your own semantic search service from scratch.
You won't need any complicated training of the neural network. Moreover, you can do all preparation steps in the Google Colab notebook.
Tutorial includes:
- What is the Neural Search?
- Getting embeddings from BERT Encoder
- Using vector search engine Qdrant
- Creating an API server with FastAPI.
If you want to learn how to build projects like this, the tutorial is for you.
We made a tutorial on Semantic Embeddings and Neural Search. With this guide, you will build your own semantic search service from scratch.
You won't need any complicated training of the neural network. Moreover, you can do all preparation steps in the Google Colab notebook.
Tutorial includes:
- What is the Neural Search?
- Getting embeddings from BERT Encoder
- Using vector search engine Qdrant
- Creating an API server with FastAPI.
If you want to learn how to build projects like this, the tutorial is for you.
Hello everyone,
The largest Russian-speaking data science community ODS.ai organizes the Summer School, designed after the famous Google SoC, and I participate in it as a mentor for the Metric Learning track with Qdrant.
During the track, participants are challenged to research the fine-tuning for the similarity learning, build a working prototype and contribute to Open Source.
Today at 18:00 MSK, there will be a first meetup of the track. I will be talking about:
- Which datasets are suitable for similarity matching, how can you obtain self-supervised
- Approaches to fine-tuning encoders. Selection of method depending on the amount of available annotation
I invite everyone interested to Spatial Chat ODS, at 18:00 MSK. Password:
Language of the event: Russian.
Materials will be available in English later on the channel.
The largest Russian-speaking data science community ODS.ai organizes the Summer School, designed after the famous Google SoC, and I participate in it as a mentor for the Metric Learning track with Qdrant.
During the track, participants are challenged to research the fine-tuning for the similarity learning, build a working prototype and contribute to Open Source.
Today at 18:00 MSK, there will be a first meetup of the track. I will be talking about:
- Which datasets are suitable for similarity matching, how can you obtain self-supervised
- Approaches to fine-tuning encoders. Selection of method depending on the amount of available annotation
I invite everyone interested to Spatial Chat ODS, at 18:00 MSK. Password:
odssummerofcodeison.Language of the event: Russian.
Materials will be available in English later on the channel.
ODS.ai Summer of Code results
Hi everyone, ODS SoC has officially finished in the last week, and it is time to present the results.
First of all, the winner of the Metric Learning track Tatiana Grechishcheva has published a detailed article on her work of fine-tuning and deploying metric learning models.
She fine-tuned the ViT model for matching similar clothing and put together a detailed tutorial of how you can deploy such a model to production.
An online demo is also included!
There are also some exciting results on fine-tuning transformers with different types of head layers.
In a nutshell, the result is that it is enough to have only a couple hundred examples to improve the similarity matching result without overfitting.
I will make a separate post about it and further plan on making metric learning practical.
Hi everyone, ODS SoC has officially finished in the last week, and it is time to present the results.
First of all, the winner of the Metric Learning track Tatiana Grechishcheva has published a detailed article on her work of fine-tuning and deploying metric learning models.
She fine-tuned the ViT model for matching similar clothing and put together a detailed tutorial of how you can deploy such a model to production.
An online demo is also included!
There are also some exciting results on fine-tuning transformers with different types of head layers.
In a nutshell, the result is that it is enough to have only a couple hundred examples to improve the similarity matching result without overfitting.
I will make a separate post about it and further plan on making metric learning practical.
👍2
Hi!
Check out the first-handed experience of building neural search solutions with Qdrant from its user in the latest Vector Podcast
Check out the first-handed experience of building neural search solutions with Qdrant from its user in the latest Vector Podcast
YouTube
Tom Lackner - VP Engineering - Classic.com - on Qdrant, NFT, challenges and joys of ML engineering
Topics:
00:00 Intro
00:53 -- Tom' background in IT Engineering Management over 20 years
01:21 -- What's Classic and what kind of cars can one get there?
03:17 How does search flow look like on the site? Transition from Elasticsearch/Postgres to embeddings…
00:00 Intro
00:53 -- Tom' background in IT Engineering Management over 20 years
01:21 -- What's Classic and what kind of cars can one get there?
03:17 How does search flow look like on the site? Transition from Elasticsearch/Postgres to embeddings…
👍2❤1
Awesome Metric Learning
The Metric Learning approach to data science problems is heavily underutilized. There are a lot of academic research papers around it but much fewer practical guides and tutorials.
So we decided that we could help people adopt metric learning by collecting related materials in one place.
We are publishing a curated list of awesome practical metric learning tools, libraries, and materials - https://github.com/qdrant/awesome-metric-learning!
This collection aims to put together references to all relevant materials for building your application using Metric Learning.
If you know some exciting article, helpful tool, or a blog post that helped you apply metric learning - feel free to PR your proposal!
The Metric Learning approach to data science problems is heavily underutilized. There are a lot of academic research papers around it but much fewer practical guides and tutorials.
So we decided that we could help people adopt metric learning by collecting related materials in one place.
We are publishing a curated list of awesome practical metric learning tools, libraries, and materials - https://github.com/qdrant/awesome-metric-learning!
This collection aims to put together references to all relevant materials for building your application using Metric Learning.
If you know some exciting article, helpful tool, or a blog post that helped you apply metric learning - feel free to PR your proposal!
GitHub
GitHub - qdrant/awesome-metric-learning: 😎 A curated list of awesome practical Metric Learning and its applications
😎 A curated list of awesome practical Metric Learning and its applications - qdrant/awesome-metric-learning
👍16
Triplet loss - Advanced Intro
Loss functions in metric learning are all chasing the same goal - to make positive pairs closer and negative further.
But the way they achieve this leads to different results and different side effects.
In today's post, we describe the differences between Triplet and Contrastive loss, why the use of Triplet loss can give an advantage, especially in the context of fine-tuning.
It also covers the approach to an efficient implementation of batch-all triplet mining.
Loss functions in metric learning are all chasing the same goal - to make positive pairs closer and negative further.
But the way they achieve this leads to different results and different side effects.
In today's post, we describe the differences between Triplet and Contrastive loss, why the use of Triplet loss can give an advantage, especially in the context of fine-tuning.
It also covers the approach to an efficient implementation of batch-all triplet mining.
👍5🔥1
Metric Learning for Anomaly Detection
Anomaly detection is one of those tasks to which it is challenging to apply classical ML methods directly.
The balancing of normal and abnormal examples and the internal inconsistency of anomalies make classifier training a challenging task.
And the difficulty is often related to data labeling, which in the case of anomalies may not be trivial.
The metric learning approach avoids the explicit separation into classes while combining the advantage of modeling the subject domain with the knowledge of specific anomalous examples.
In our case study, we are solving the problem of estimating the quality of coffee beans ☕️🌱 and determining the type of defects.
We trained Auto-Encoder on unlabeled samples and made fine-tuning on a small fraction of labeled ones.
This approach achieves results equivalent to conventional classification but requires orders of magnitude less labeled data.
Anomaly detection is one of those tasks to which it is challenging to apply classical ML methods directly.
The balancing of normal and abnormal examples and the internal inconsistency of anomalies make classifier training a challenging task.
And the difficulty is often related to data labeling, which in the case of anomalies may not be trivial.
The metric learning approach avoids the explicit separation into classes while combining the advantage of modeling the subject domain with the knowledge of specific anomalous examples.
In our case study, we are solving the problem of estimating the quality of coffee beans ☕️🌱 and determining the type of defects.
We trained Auto-Encoder on unlabeled samples and made fine-tuning on a small fraction of labeled ones.
This approach achieves results equivalent to conventional classification but requires orders of magnitude less labeled data.
qdrant.tech
Metric Learning for Anomaly Detection - Qdrant
Practical use of metric learning for anomaly detection. A way to match the results of a classification-based approach with only ~0.6% of the labeled data.
🔥11👍7❤2
Similarity Learning lacks a framework. So we built one.
Many general-purpose frameworks allow you to train Computer Vision or NLP tasks quickly. However, Similarity Learning has peculiarities, which usually require an additional layer of complexity on top of the usual pipelines.
So, for example, the batch size in the training of similarity models has a much greater role than in other models. Labels either do not exist or are handled in a completely different way. In many cases, the model is already pre-trained, which also adjusts the process.
Developing Similarity Learning models one after another, we began to notice patterns that helped us generalize and bring all our experience with training and fine-tuning such models into one package.
Yesterday we published Quaterion — an open-source, blazing-fast, customizable, scalable framework for training and fine-tuning similarity learning models.
Many general-purpose frameworks allow you to train Computer Vision or NLP tasks quickly. However, Similarity Learning has peculiarities, which usually require an additional layer of complexity on top of the usual pipelines.
So, for example, the batch size in the training of similarity models has a much greater role than in other models. Labels either do not exist or are handled in a completely different way. In many cases, the model is already pre-trained, which also adjusts the process.
Developing Similarity Learning models one after another, we began to notice patterns that helped us generalize and bring all our experience with training and fine-tuning such models into one package.
Yesterday we published Quaterion — an open-source, blazing-fast, customizable, scalable framework for training and fine-tuning similarity learning models.
GitHub
GitHub - qdrant/quaterion: Blazing fast framework for fine-tuning similarity learning models
Blazing fast framework for fine-tuning similarity learning models - qdrant/quaterion
🔥9👍4