NEW BOT Телеграм, страница

Data Science by ODS.ai 🦜

Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer

tl;dr:
- 11 billion parameters
- encoder-decoder models generally outperformed “decoder-only” language models
- fill-in-the-blank-style denoising objectives worked best;
- the most important factor was the computational cost;
- training on in-domain data can be beneficial but that pre-training on smaller datasets can lead to detrimental overfitting;
- multitask learning could be close to competitive with a pre-train-then-fine-tune approach but requires carefully choosing how often the model is trained on each task

The model can be fine-tuned on smaller labeled datasets, often resulting in (far) better performance than training on the labeled data alone.
Present a large-scale empirical survey to determine which transfer learning techniques work best and apply these insights at scale to create a new model that we call the T5. Also, introduce a new open-source pre-training dataset, called the Colossal Clean Crawled Corpus (C4).

The T5 model, pre-trained on C4, achieves SOTA results on many NLP benchmarks while being flexible enough to be fine-tuned to a variety of important downstream tasks.

blog post: https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html
paper: https://arxiv.org/abs/1910.10683
github (with pre-trained models): https://github.com/google-research/text-to-text-transfer-transformer
colab notebook: https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/master/notebooks/t5-trivia.ipynb

#nlp #transformer #t5

13.2K views07:03

👎🏿👍🏿 30

Data Science by ODS.ai 🦜

ODS breakfast in Paris! ☕️ 🇫🇷 See you this Saturday at 10:30 (some people come around 11:00) at Malongo Café, 50 Rue Saint-André des Arts. We are expecting from 4 to 16 people.

9.45K views11:04

Data Science by ODS.ai 🦜

🔥AI Meme Generator: This Meme Does Not Exist

Imgflip created an “AI meme generator”. Meme captions are generated by neural network.

Link: https://imgflip.com/ai-meme

#NLP #NLU #meme #generation #imgflip

10K views14:34

🔥 38 🤕 14

Data Science by ODS.ai 🦜

Page might be unavailable due to huge rate of requests, here is a screenshot for you:

9.77K views14:37

Data Science by ODS.ai 🦜

#DeepPavlov & #transformers

and now at 🤗 you can also use the next models:
- DeepPavlov/bert-base-bg-cs-pl-ru-cased
- DeepPavlov/bert-base-cased-conversational
- DeepPavlov/bert-base-multilingual-cased-sentence
- DeepPavlov/rubert-base-cased-conversational
- DeepPavlov/rubert-base-cased-sentence
- DeepPavlov/rubert-base-cased

page: https://huggingface.co/DeepPavlov
colab tutorial: here

👍1

10.4K views21:15

🎉 32

Data Science by ODS.ai 🦜

Data Science interview questions list

List, compiled from medium article and peer-provided contributions.

Github (questions and answers): https://github.com/alexeygrigorev/data-science-interviews/blob/master/theory.md

#interview #questions #meta

GitHub

data-science-interviews/theory.md at master · alexeygrigorev/data-science-interviews

Data science interview questions and answers. Contribute to alexeygrigorev/data-science-interviews development by creating an account on GitHub.

11.6K views08:43

Data Science by ODS.ai 🦜

Forwarded from Spark in me (Alexander)

Russian Text Normalization for Speech Recognition

Usually no one talks about this, but STT / TTS technologies contain many "small" tasks that have to be solved, to make your STT / TTS pipeline work in real life.

For example:

- Speech recognition / dataset itself;
- Post-processing - beam-search / decoding;
- Domain customizations;
- Normalization (5 => пять);
- De-Normalization (пять => 5);

We want the Imagenet moment to arrive sooner in Speech in general.
So we released the Open STT dataset.
This time we have decided to share our text normalization to support STT research in Russian.

Please like / share / repost:

- Original publication
- Habr.com article
- GitHub repository
- Medium (coming soon!)
- Support dataset on Open Collective

#stt
#deep_learning
#nlp

GitHub

GitHub - snakers4/open_stt: Open STT

Open STT. Contribute to snakers4/open_stt development by creating an account on GitHub.

733 views13:19

Data Science by ODS.ai 🦜

TensorFlow Quantum
A Software Framework for Quantum Machine Learning

Introduce TensorFlow Quantum (TFQ), an open source library for the rapid prototyping of hybrid quantum-classical models for classical or quantum data.
TFQ provides the tools necessary for bringing the quantum computing and ML research communities together to control and model natural or artificial quantum systems; e.g. Noisy Intermediate Scale Quantum (NISQ) processors with ~50-100 qubits.

A quantum model has the ability to represent and generalize data with a quantum mechanical origin. However, to understand quantum models, two concepts must be introduced – quantum data and hybrid quantum-classical models.

Quantum data exhibits superposition and entanglement, leading to joint probability distributions that could require an exponential amount of classical computational resources to represent or store. Quantum data, which can be generated/simulated on quantum processors/sensors/networks include the simulation of chemicals and quantum matter, quantum control, quantum communication networks, quantum metrology, and much more.

Quantum models cannot use quantum processors alone – NISQ processors will need to work in concert with classical processors to become effective. As TensorFlow already supports heterogeneous computing across CPUs, GPUs, and TPUs, it is a natural platform for experimenting with hybrid quantum-classical algorithms.

To build and train such a model, the researcher can do the following:
– prepare a quantum dataset
– evaluate a quantum NN model
- sample or Average
– evaluate a classical NN model
– evaluate сost function
– evaluate gradients & update parameters

blog post: https://ai.googleblog.com/2020/03/announcing-tensorflow-quantum-open.html
paper: https://arxiv.org/abs/2003.02989

#tfq #tensorflow #quantum #physics #ml

10.9K views10:04

👎🏿 2 👍🏿 36

Data Science by ODS.ai 🦜

Survey of machine-learning experimental methods at NeurIPS2019 and ICLR2020

Good thread about what ML scientists do experiments on their papers

twitter: https://twitter.com/deliprao/status/1235697595919421440
report: https://hal.archives-ouvertes.fr/hal-02447823/document

#Survey #NeurIPS #ICLR #Experiments #ml

Twitter

Delip Rao

Survey of #MachineLearning experimental methods (aka "how do ML folks do their experiments") at #NeurIPS2019 and #ICLR2020, a thread of results:

11.2K views14:38

Data Science by ODS.ai 🦜

overview of current #trends & #problems in #NLP
by #huggingface

link to presentation: here

10.9K views21:57

🤗 38

Data Science by ODS.ai 🦜

Can evolution be the Master Algorithm?

Fun AutoML-Zero experiments: Evolutionary search discovers fundamental ML algorithms from scratch, e.g., small neural nets with backprop.

Genetic programming learned operations reminiscent of dropout, normalized gradients, and weight averaging when trying to evolve better learning algorithms.

Paper: https://arxiv.org/abs/2003.03384
Code: https://git.io/JvKrZ

#automl #genetic

11.6K views13:33

Data Science by ODS.ai 🦜

ODS breakfast in Paris! ☕️ 🇫🇷 See you this Saturday at 10:30 (some people come around 11:00) at Malongo Café, 50 Rue Saint-André des Arts. We are expecting from 6 to 12 coronafearless people.

9.85K views11:26

Data Science by ODS.ai 🦜

Forwarded from Karim Iskakov - канал (Vladimir Ivashkin)

New paper by Yandex.MILAB 🎉
Tired of waiting for backprop to project your face into StyleGAN latent space to use some funny vector on it? Just distilate this tranformation by pix2pixHD!
📝 arxiv.org/abs/2003.03581
👤 @iviazovetskyi, @vlivashkin, @digitman
📉 @loss_function_porn

826 views13:01

Data Science by ODS.ai 🦜

We ignored lots of news on 👑🦠

What do you think?

Anonymous Poll

19%

IT’S NEVER ENOUGH

47%

We need only good stuff

34%

Please ignore it completely

1.2K voters11.1K views14:02

Data Science by ODS.ai 🦜

Transferring Dense Pose to Proximal Animal Classes

Article on how to train DensePose for animals withiout labels

DensePose approach predicts the pose of humans densely and accurately given a large dataset of poses annotated in detail. It's super expensive to collect DensePose annotations for all different classes of animals. So authors show that, at least for proximal animal classes such as chimpanzees, it is possible to transfer the knowledge existing in DensePose for humans. They propose to utilize the existing annotations of humans and do self-training on unlabeled images of animals.

Link: https://asanakoy.github.io/densepose-evolution/
YouTube: https://youtu.be/OU3Ayg_l4QM
Paper: https://arxiv.org/pdf/2003.00080.pdf

#Facebook #FAIR #CVPR #CVPR2020 #posetransfer #dl

YouTube

DensePose applied on chimps: comparison of our method before self-training (left) and after (right)

Frame-by-frame predictions produced by our model before (teacher) and after self-training (student).
After self training the 24-class body part segmentation is more accurate and stable.

Project page: https://asanakoy.github.io/densepose-evolution/

13.2K viewsedited 08:12

🦍 16 🐒 9

Data Science by ODS.ai 🦜

👑🦠 We are building ultimate post on coronavirus, with the purpose on gathering all reliable and informative (not entertaining or just making you worry more) content there is to-date.

We just want to make a sane post on coronavirus, which will (to the best extent of our efforts) be bias and fake/unreliable news free, and comply with following rules:

1 Provided information should be correct, better if it is verifiable.
2 Source should be provided, if applicable. Only trustworthy sources are allowed (WHO, UN, academic institutions).
3 Biases and distributions should be taken into account: raw information is not that representative and can misguide opinions.
4 If appliable, information should be actionable — readers should get a clear picture of what they can do after reading it, not just get upset or worried.

You can submit information for considertion before the release of the post with our @opendatasciencebot, if you believe that it will be helpful to our dear audience and will serve your fellows well.

The post will be shared in a form of github repo, so contributions are welcome in advance 👹

12.8K views10:37

Data Science by ODS.ai 🦜

MaxUp: A Simple Way to Improve Generalization of Neural Network Training

A new approach to augmentation both images and text. The idea is to generate a set of augmented data with some random perturbations or transforms and minimize the maximum, or worst case loss over the augmented data. By doing so, the authors implicitly introduce a smoothness or robustness regularization against the random perturbations, and hence improve the generation performance. Testing MaxUp on a range of tasks, including image classification, language modeling, and adversarial certification, it is consistently outperforming the existing best baseline methods, without introducing substantial computational overhead.

Each sample in the batch is augmented m times and then found aug with maximum loss and does backprop only through that. i.e. minimizing max loss.

There is some proof of the theorem that MaxUp is gradient-norm regularization if minimizing loss through all batch. Also, It can be viewed as an adversarial variant of data augmentation, in that it minimizes the worse case loss on the perturbed data, instead of an average loss like typical data augmentation methods.

MaxUp easy to mix with other augs without the overhead. Only m times to forward pass on the sample but one time to backprop.

paper: https://arxiv.org/abs/2002.09024

#augmentations #SOTA #ml

11.8K views15:11

👎🏿 2 👍🏿 31

Data Science by ODS.ai 🦜

Recurrent Hierarchical Topic-Guided Neural Language Models

The authors propose a recurrent gamma belief network (rGBN) guided neural language modeling framework, a novel method to learn a language model and a deep recurrent topic model simultaneously.

For scalable inference, they develop hybrid SG-MCMC and recurrent autoencoding variational inference, allowing efficient end-to-end training.

Experiments results conducted on real-world corpora demonstrate that the proposed models outperform a variety of shallow-topic-model-guided neural language models, and effectively generate the sentences from the designated multi-level topics or noise while inferring the interpretable hierarchical latent topic structure of the document and hierarchical multiscale structures of sequences.

paper: https://openreview.net/forum?id=Byl1W1rtvH

#ICLR2020 #nlm #nlg

11.6K views12:57

👎🏿👍🏿 17

Data Science by ODS.ai 🦜

How to generate text: using different decoding methods for language generation with Transformers
by huggingface

in this blog, the author talk about how to generate text and compared some approaches like:
– greedy search
– beam search
– top-K sampling
– top-p (nucleus) sampling

blog post: https://huggingface.co/blog/how-to-generate

#nlp #nlg #transformers

11.8K views16:02

🤗 46

Data Science by ODS.ai 🦜

Forwarded from Karim Iskakov - канал (Karim Iskakov)

0:20

This media is not supported in your browser

VIEW IN TELEGRAM

Representing Scenes as Neural Radiance Fields for View Synthesis. You first feed a set of images to the model and then it can generate photorealistic novel views of the scene conditioning on your viewing direction. Amazing results!
🔎 matthewtancik.com/nerf
📝 arxiv.org/abs/2003.08934
📉 @loss_function_porn

855 views13:55

Data Science by ODS.ai 🦜

👑🦠

As we promised, we compiled all intersting and relevant infomation in one post, not to lose focus on DS in our channel. And we made special emphasis on what you can do as engineers and active community members:

1 Follow WHO's advice (in the article below, also — in any self-respecting source of information you read) to lower your chances of getting infecting.
2 Stay inside, switch to remote work if possible.
3 Spread the word about the pandemia, share trustworthy information.
4 Take part in projects: review information, build models, research.

Needless to say, we are open to PRs and corrections. You are most welcome.

Link: https://github.com/open-data-science/ultimate_posts/blob/master/COVID_2019/README.md

P.S. We saw this on TikTok and Twitter: let’s try to keep emojis balanced.

#coronafeerless #covid2019 #ultimatepost

GitHub

ultimate_posts/COVID_2019/README.md at master · open-data-science/ultimate_posts

Ultimate posts for opendatascience telegram channel - open-data-science/ultimate_posts

11.9K viewsedited 17:23

👑 53 🦠 – 46

About

Blog

Apps

Platform