Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer
tl;dr:
- 11 billion parameters
- encoder-decoder models generally outperformed “decoder-only” language models
- fill-in-the-blank-style denoising objectives worked best;
- the most important factor was the computational cost;
- training on in-domain data can be beneficial but that pre-training on smaller datasets can lead to detrimental overfitting;
- multitask learning could be close to competitive with a pre-train-then-fine-tune approach but requires carefully choosing how often the model is trained on each task
The model can be fine-tuned on smaller labeled datasets, often resulting in (far) better performance than training on the labeled data alone.
Present a large-scale empirical survey to determine which transfer learning techniques work best and apply these insights at scale to create a new model that we call the T5. Also, introduce a new open-source pre-training dataset, called the Colossal Clean Crawled Corpus (C4).
The T5 model, pre-trained on C4, achieves SOTA results on many NLP benchmarks while being flexible enough to be fine-tuned to a variety of important downstream tasks.
blog post: https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html
paper: https://arxiv.org/abs/1910.10683
github (with pre-trained models): https://github.com/google-research/text-to-text-transfer-transformer
colab notebook: https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/master/notebooks/t5-trivia.ipynb
#nlp #transformer #t5
tl;dr:
- 11 billion parameters
- encoder-decoder models generally outperformed “decoder-only” language models
- fill-in-the-blank-style denoising objectives worked best;
- the most important factor was the computational cost;
- training on in-domain data can be beneficial but that pre-training on smaller datasets can lead to detrimental overfitting;
- multitask learning could be close to competitive with a pre-train-then-fine-tune approach but requires carefully choosing how often the model is trained on each task
The model can be fine-tuned on smaller labeled datasets, often resulting in (far) better performance than training on the labeled data alone.
Present a large-scale empirical survey to determine which transfer learning techniques work best and apply these insights at scale to create a new model that we call the T5. Also, introduce a new open-source pre-training dataset, called the Colossal Clean Crawled Corpus (C4).
The T5 model, pre-trained on C4, achieves SOTA results on many NLP benchmarks while being flexible enough to be fine-tuned to a variety of important downstream tasks.
blog post: https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html
paper: https://arxiv.org/abs/1910.10683
github (with pre-trained models): https://github.com/google-research/text-to-text-transfer-transformer
colab notebook: https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/master/notebooks/t5-trivia.ipynb
#nlp #transformer #t5
ODS breakfast in Paris! ☕️ 🇫🇷 See you this Saturday at 10:30 (some people come around 11:00) at Malongo Café, 50 Rue Saint-André des Arts. We are expecting from 4 to 16 people.
🔥AI Meme Generator: This Meme Does Not Exist
Imgflip created an “AI meme generator”. Meme captions are generated by neural network.
Link: https://imgflip.com/ai-meme
#NLP #NLU #meme #generation #imgflip
Imgflip created an “AI meme generator”. Meme captions are generated by neural network.
Link: https://imgflip.com/ai-meme
#NLP #NLU #meme #generation #imgflip
#DeepPavlov & #transformers
and now at 🤗 you can also use the next models:
-
-
-
-
-
-
page: https://huggingface.co/DeepPavlov
colab tutorial: here
and now at 🤗 you can also use the next models:
-
DeepPavlov/bert-base-bg-cs-pl-ru-cased-
DeepPavlov/bert-base-cased-conversational-
DeepPavlov/bert-base-multilingual-cased-sentence-
DeepPavlov/rubert-base-cased-conversational-
DeepPavlov/rubert-base-cased-sentence-
DeepPavlov/rubert-base-casedpage: https://huggingface.co/DeepPavlov
colab tutorial: here
👍1
Data Science interview questions list
List, compiled from medium article and peer-provided contributions.
Github (questions and answers): https://github.com/alexeygrigorev/data-science-interviews/blob/master/theory.md
#interview #questions #meta
List, compiled from medium article and peer-provided contributions.
Github (questions and answers): https://github.com/alexeygrigorev/data-science-interviews/blob/master/theory.md
#interview #questions #meta
GitHub
data-science-interviews/theory.md at master · alexeygrigorev/data-science-interviews
Data science interview questions and answers. Contribute to alexeygrigorev/data-science-interviews development by creating an account on GitHub.
Forwarded from Spark in me (Alexander)
Russian Text Normalization for Speech Recognition
Usually no one talks about this, but STT / TTS technologies contain many "small" tasks that have to be solved, to make your STT / TTS pipeline work in real life.
For example:
- Speech recognition / dataset itself;
- Post-processing - beam-search / decoding;
- Domain customizations;
- Normalization (5 =>
- De-Normalization (
We want the Imagenet moment to arrive sooner in Speech in general.
So we released the Open STT dataset.
This time we have decided to share our text normalization to support STT research in Russian.
Please like / share / repost:
- Original publication
- Habr.com article
- GitHub repository
- Medium (coming soon!)
- Support dataset on Open Collective
#stt
#deep_learning
#nlp
Usually no one talks about this, but STT / TTS technologies contain many "small" tasks that have to be solved, to make your STT / TTS pipeline work in real life.
For example:
- Speech recognition / dataset itself;
- Post-processing - beam-search / decoding;
- Domain customizations;
- Normalization (5 =>
пять);- De-Normalization (
пять => 5);We want the Imagenet moment to arrive sooner in Speech in general.
So we released the Open STT dataset.
This time we have decided to share our text normalization to support STT research in Russian.
Please like / share / repost:
- Original publication
- Habr.com article
- GitHub repository
- Medium (coming soon!)
- Support dataset on Open Collective
#stt
#deep_learning
#nlp
GitHub
GitHub - snakers4/open_stt: Open STT
Open STT. Contribute to snakers4/open_stt development by creating an account on GitHub.
TensorFlow Quantum
A Software Framework for Quantum Machine Learning
Introduce TensorFlow Quantum (TFQ), an open source library for the rapid prototyping of hybrid quantum-classical models for classical or quantum data.
TFQ provides the tools necessary for bringing the quantum computing and ML research communities together to control and model natural or artificial quantum systems; e.g. Noisy Intermediate Scale Quantum (NISQ) processors with ~50-100 qubits.
A quantum model has the ability to represent and generalize data with a quantum mechanical origin. However, to understand quantum models, two concepts must be introduced – quantum data and hybrid quantum-classical models.
Quantum data exhibits superposition and entanglement, leading to joint probability distributions that could require an exponential amount of classical computational resources to represent or store. Quantum data, which can be generated/simulated on quantum processors/sensors/networks include the simulation of chemicals and quantum matter, quantum control, quantum communication networks, quantum metrology, and much more.
Quantum models cannot use quantum processors alone – NISQ processors will need to work in concert with classical processors to become effective. As TensorFlow already supports heterogeneous computing across CPUs, GPUs, and TPUs, it is a natural platform for experimenting with hybrid quantum-classical algorithms.
To build and train such a model, the researcher can do the following:
– prepare a quantum dataset
– evaluate a quantum NN model
- sample or Average
– evaluate a classical NN model
– evaluate сost function
– evaluate gradients & update parameters
blog post: https://ai.googleblog.com/2020/03/announcing-tensorflow-quantum-open.html
paper: https://arxiv.org/abs/2003.02989
#tfq #tensorflow #quantum #physics #ml
A Software Framework for Quantum Machine Learning
Introduce TensorFlow Quantum (TFQ), an open source library for the rapid prototyping of hybrid quantum-classical models for classical or quantum data.
TFQ provides the tools necessary for bringing the quantum computing and ML research communities together to control and model natural or artificial quantum systems; e.g. Noisy Intermediate Scale Quantum (NISQ) processors with ~50-100 qubits.
A quantum model has the ability to represent and generalize data with a quantum mechanical origin. However, to understand quantum models, two concepts must be introduced – quantum data and hybrid quantum-classical models.
Quantum data exhibits superposition and entanglement, leading to joint probability distributions that could require an exponential amount of classical computational resources to represent or store. Quantum data, which can be generated/simulated on quantum processors/sensors/networks include the simulation of chemicals and quantum matter, quantum control, quantum communication networks, quantum metrology, and much more.
Quantum models cannot use quantum processors alone – NISQ processors will need to work in concert with classical processors to become effective. As TensorFlow already supports heterogeneous computing across CPUs, GPUs, and TPUs, it is a natural platform for experimenting with hybrid quantum-classical algorithms.
To build and train such a model, the researcher can do the following:
– prepare a quantum dataset
– evaluate a quantum NN model
- sample or Average
– evaluate a classical NN model
– evaluate сost function
– evaluate gradients & update parameters
blog post: https://ai.googleblog.com/2020/03/announcing-tensorflow-quantum-open.html
paper: https://arxiv.org/abs/2003.02989
#tfq #tensorflow #quantum #physics #ml
Survey of machine-learning experimental methods at NeurIPS2019 and ICLR2020
Good thread about what ML scientists do experiments on their papers
twitter: https://twitter.com/deliprao/status/1235697595919421440
report: https://hal.archives-ouvertes.fr/hal-02447823/document
#Survey #NeurIPS #ICLR #Experiments #ml
Good thread about what ML scientists do experiments on their papers
twitter: https://twitter.com/deliprao/status/1235697595919421440
report: https://hal.archives-ouvertes.fr/hal-02447823/document
#Survey #NeurIPS #ICLR #Experiments #ml
Twitter
Delip Rao
Survey of #MachineLearning experimental methods (aka "how do ML folks do their experiments") at #NeurIPS2019 and #ICLR2020, a thread of results:
Can evolution be the Master Algorithm?
Fun AutoML-Zero experiments: Evolutionary search discovers fundamental ML algorithms from scratch, e.g., small neural nets with backprop.
Genetic programming learned operations reminiscent of dropout, normalized gradients, and weight averaging when trying to evolve better learning algorithms.
Paper: https://arxiv.org/abs/2003.03384
Code: https://git.io/JvKrZ
#automl #genetic
Fun AutoML-Zero experiments: Evolutionary search discovers fundamental ML algorithms from scratch, e.g., small neural nets with backprop.
Genetic programming learned operations reminiscent of dropout, normalized gradients, and weight averaging when trying to evolve better learning algorithms.
Paper: https://arxiv.org/abs/2003.03384
Code: https://git.io/JvKrZ
#automl #genetic
ODS breakfast in Paris! ☕️ 🇫🇷 See you this Saturday at 10:30 (some people come around 11:00) at Malongo Café, 50 Rue Saint-André des Arts. We are expecting from 6 to 12 coronafearless people.
Forwarded from Karim Iskakov - канал (Vladimir Ivashkin)
New paper by Yandex.MILAB 🎉
Tired of waiting for backprop to project your face into StyleGAN latent space to use some funny vector on it? Just distilate this tranformation by pix2pixHD!
📝 arxiv.org/abs/2003.03581
👤 @iviazovetskyi, @vlivashkin, @digitman
📉 @loss_function_porn
Tired of waiting for backprop to project your face into StyleGAN latent space to use some funny vector on it? Just distilate this tranformation by pix2pixHD!
📝 arxiv.org/abs/2003.03581
👤 @iviazovetskyi, @vlivashkin, @digitman
📉 @loss_function_porn
We ignored lots of news on 👑🦠
What do you think?
What do you think?
Anonymous Poll
19%
IT’S NEVER ENOUGH
47%
We need only good stuff
34%
Please ignore it completely
Transferring Dense Pose to Proximal Animal Classes
Article on how to train DensePose for animals withiout labels
DensePose approach predicts the pose of humans densely and accurately given a large dataset of poses annotated in detail. It's super expensive to collect DensePose annotations for all different classes of animals. So authors show that, at least for proximal animal classes such as chimpanzees, it is possible to transfer the knowledge existing in DensePose for humans. They propose to utilize the existing annotations of humans and do self-training on unlabeled images of animals.
Link: https://asanakoy.github.io/densepose-evolution/
YouTube: https://youtu.be/OU3Ayg_l4QM
Paper: https://arxiv.org/pdf/2003.00080.pdf
#Facebook #FAIR #CVPR #CVPR2020 #posetransfer #dl
Article on how to train DensePose for animals withiout labels
DensePose approach predicts the pose of humans densely and accurately given a large dataset of poses annotated in detail. It's super expensive to collect DensePose annotations for all different classes of animals. So authors show that, at least for proximal animal classes such as chimpanzees, it is possible to transfer the knowledge existing in DensePose for humans. They propose to utilize the existing annotations of humans and do self-training on unlabeled images of animals.
Link: https://asanakoy.github.io/densepose-evolution/
YouTube: https://youtu.be/OU3Ayg_l4QM
Paper: https://arxiv.org/pdf/2003.00080.pdf
#Facebook #FAIR #CVPR #CVPR2020 #posetransfer #dl
YouTube
DensePose applied on chimps: comparison of our method before self-training (left) and after (right)
Frame-by-frame predictions produced by our model before (teacher) and after self-training (student).
After self training the 24-class body part segmentation is more accurate and stable.
Project page: https://asanakoy.github.io/densepose-evolution/
After self training the 24-class body part segmentation is more accurate and stable.
Project page: https://asanakoy.github.io/densepose-evolution/
👑🦠 We are building ultimate post on coronavirus, with the purpose on gathering all reliable and informative (not entertaining or just making you worry more) content there is to-date.
We just want to make a sane post on coronavirus, which will (to the best extent of our efforts) be bias and fake/unreliable news free, and comply with following rules:
1 Provided information should be correct, better if it is verifiable.
2 Source should be provided, if applicable. Only trustworthy sources are allowed (WHO, UN, academic institutions).
3 Biases and distributions should be taken into account: raw information is not that representative and can misguide opinions.
4 If appliable, information should be actionable — readers should get a clear picture of what they can do after reading it, not just get upset or worried.
You can submit information for considertion before the release of the post with our @opendatasciencebot, if you believe that it will be helpful to our dear audience and will serve your fellows well.
The post will be shared in a form of github repo, so contributions are welcome in advance 👹
We just want to make a sane post on coronavirus, which will (to the best extent of our efforts) be bias and fake/unreliable news free, and comply with following rules:
1 Provided information should be correct, better if it is verifiable.
2 Source should be provided, if applicable. Only trustworthy sources are allowed (WHO, UN, academic institutions).
3 Biases and distributions should be taken into account: raw information is not that representative and can misguide opinions.
4 If appliable, information should be actionable — readers should get a clear picture of what they can do after reading it, not just get upset or worried.
You can submit information for considertion before the release of the post with our @opendatasciencebot, if you believe that it will be helpful to our dear audience and will serve your fellows well.
The post will be shared in a form of github repo, so contributions are welcome in advance 👹
MaxUp: A Simple Way to Improve Generalization of Neural Network Training
A new approach to augmentation both images and text. The idea is to generate a set of augmented data with some random perturbations or transforms and minimize the maximum, or worst case loss over the augmented data. By doing so, the authors implicitly introduce a smoothness or robustness regularization against the random perturbations, and hence improve the generation performance. Testing MaxUp on a range of tasks, including image classification, language modeling, and adversarial certification, it is consistently outperforming the existing best baseline methods, without introducing substantial computational overhead.
Each sample in the batch is augmented
There is some proof of the theorem that MaxUp is gradient-norm regularization if minimizing loss through all batch. Also, It can be viewed as an adversarial variant of data augmentation, in that it minimizes the worse case loss on the perturbed data, instead of an average loss like typical data augmentation methods.
MaxUp easy to mix with other
paper: https://arxiv.org/abs/2002.09024
#augmentations #SOTA #ml
A new approach to augmentation both images and text. The idea is to generate a set of augmented data with some random perturbations or transforms and minimize the maximum, or worst case loss over the augmented data. By doing so, the authors implicitly introduce a smoothness or robustness regularization against the random perturbations, and hence improve the generation performance. Testing MaxUp on a range of tasks, including image classification, language modeling, and adversarial certification, it is consistently outperforming the existing best baseline methods, without introducing substantial computational overhead.
Each sample in the batch is augmented
m times and then found aug with maximum loss and does backprop only through that. i.e. minimizing max loss.There is some proof of the theorem that MaxUp is gradient-norm regularization if minimizing loss through all batch. Also, It can be viewed as an adversarial variant of data augmentation, in that it minimizes the worse case loss on the perturbed data, instead of an average loss like typical data augmentation methods.
MaxUp easy to mix with other
augs without the overhead. Only m times to forward pass on the sample but one time to backprop. paper: https://arxiv.org/abs/2002.09024
#augmentations #SOTA #ml
Recurrent Hierarchical Topic-Guided Neural Language Models
The authors propose a recurrent gamma belief network (rGBN) guided neural language modeling framework, a novel method to learn a language model and a deep recurrent topic model simultaneously.
For scalable inference, they develop hybrid SG-MCMC and recurrent autoencoding variational inference, allowing efficient end-to-end training.
Experiments results conducted on real-world corpora demonstrate that the proposed models outperform a variety of shallow-topic-model-guided neural language models, and effectively generate the sentences from the designated multi-level topics or noise while inferring the interpretable hierarchical latent topic structure of the document and hierarchical multiscale structures of sequences.
paper: https://openreview.net/forum?id=Byl1W1rtvH
#ICLR2020 #nlm #nlg
The authors propose a recurrent gamma belief network (rGBN) guided neural language modeling framework, a novel method to learn a language model and a deep recurrent topic model simultaneously.
For scalable inference, they develop hybrid SG-MCMC and recurrent autoencoding variational inference, allowing efficient end-to-end training.
Experiments results conducted on real-world corpora demonstrate that the proposed models outperform a variety of shallow-topic-model-guided neural language models, and effectively generate the sentences from the designated multi-level topics or noise while inferring the interpretable hierarchical latent topic structure of the document and hierarchical multiscale structures of sequences.
paper: https://openreview.net/forum?id=Byl1W1rtvH
#ICLR2020 #nlm #nlg
How to generate text: using different decoding methods for language generation with Transformers
by huggingface
in this blog, the author talk about how to generate text and compared some approaches like:
– greedy search
– beam search
– top-K sampling
– top-p (nucleus) sampling
blog post: https://huggingface.co/blog/how-to-generate
#nlp #nlg #transformers
by huggingface
in this blog, the author talk about how to generate text and compared some approaches like:
– greedy search
– beam search
– top-K sampling
– top-p (nucleus) sampling
blog post: https://huggingface.co/blog/how-to-generate
#nlp #nlg #transformers
Forwarded from Karim Iskakov - канал (Karim Iskakov)
This media is not supported in your browser
VIEW IN TELEGRAM
Representing Scenes as Neural Radiance Fields for View Synthesis. You first feed a set of images to the model and then it can generate photorealistic novel views of the scene conditioning on your viewing direction. Amazing results!
🔎 matthewtancik.com/nerf
📝 arxiv.org/abs/2003.08934
📉 @loss_function_porn
🔎 matthewtancik.com/nerf
📝 arxiv.org/abs/2003.08934
📉 @loss_function_porn
👑🦠
As we promised, we compiled all intersting and relevant infomation in one post, not to lose focus on DS in our channel. And we made special emphasis on what you can do as engineers and active community members:
1 Follow WHO's advice (in the article below, also — in any self-respecting source of information you read) to lower your chances of getting infecting.
2 Stay inside, switch to remote work if possible.
3 Spread the word about the pandemia, share trustworthy information.
4 Take part in projects: review information, build models, research.
Needless to say, we are open to PRs and corrections. You are most welcome.
Link: https://github.com/open-data-science/ultimate_posts/blob/master/COVID_2019/README.md
P.S. We saw this on TikTok and Twitter: let’s try to keep emojis balanced.
#coronafeerless #covid2019 #ultimatepost
As we promised, we compiled all intersting and relevant infomation in one post, not to lose focus on DS in our channel. And we made special emphasis on what you can do as engineers and active community members:
1 Follow WHO's advice (in the article below, also — in any self-respecting source of information you read) to lower your chances of getting infecting.
2 Stay inside, switch to remote work if possible.
3 Spread the word about the pandemia, share trustworthy information.
4 Take part in projects: review information, build models, research.
Needless to say, we are open to PRs and corrections. You are most welcome.
Link: https://github.com/open-data-science/ultimate_posts/blob/master/COVID_2019/README.md
P.S. We saw this on TikTok and Twitter: let’s try to keep emojis balanced.
#coronafeerless #covid2019 #ultimatepost
GitHub
ultimate_posts/COVID_2019/README.md at master · open-data-science/ultimate_posts
Ultimate posts for opendatascience telegram channel - open-data-science/ultimate_posts