Data Phoenix – Telegram
Data Phoenix
1.45K subscribers
641 photos
3 videos
1 file
1.33K links
Data Phoenix is your best friend in learning and growing in the data world!
We publish digest, organize events and help expand the frontiers of your knowledge in ML, CV, NLP, and other aspects of AI. Idea and implementation: @dmitryspodarets
Download Telegram
Comparing Test Sets with Item Response Theory

In this paper, Clara Vania et al. use the Item Response Theory to evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples. Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models, while SNLI, MNLI, and CommitmentBank seem to be saturated for current strong models.

https://bit.ly/3xHDkJj
How Airbnb Standardized Metric Computation at Scale

The engineering team of Airbnb reveals the design principles of Minerva compute infrastructure. Minerva is a single source of truth metric platform that standardizes the way business metrics are created, computed, served, and consumed. The article features the link to the first post on Minerva. Check it out, too!

https://bit.ly/3zLDv8g
AI Can Now Emulate Text Style in Images in One Shot — Using Just a Single Word

In this article, the engineering team of Facebook AI presents TextStyleBrush, an AI research project that can copy the style of text in a photo using just a single word. With this AI model, you can edit and replace text in images. The team hopes to spur dialogue and research into detecting potential misuse of this type of technology, so make sure to contribute.

https://bit.ly/3wU3t7I
​​Data Science Digest — 24.06.21

The new issue of DataScienceDigest is here! The impact of NLP and the growing budgets to drive AI transformations. How Airbnb standardized metric computation at scale. Cross-Validation, MASA-SR, AgileGAN, EfficientNetV2, and more…

https://bit.ly/3qnuy0u

Join 👉@DataScienceDigest
MLOps Toys
The platform is a collection of MLOps projects by category, including data versioning, training orchestration, feature store, experiment tracking, model serving, model monitoring, and explainability.

https://bit.ly/3xV97GF
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification

In this research, Yongming Rao et al. propose a dynamic token sparsification framework to prune redundant tokens progressively and dynamically based on the input. A lightweight prediction module can estimate the importance score of each token given the current features. The module is added to different layers to prune redundant tokens hierarchically.

Web Page — https://bit.ly/3dgESlq
Paper — https://bit.ly/3xWf5XJ
Code — https://bit.ly/3jlUtUK
Consistent Instance False Positive Improves Fairness in Face Recognition

In this paper, Xingkun Xu et al. propose a false positive rate penalty loss, a novel method to mitigate face recognition bias by increasing the consistency of instance False Positive Rate (FPR). The method requires no demographic annotations, allowing to mitigate bias among demographic groups divided by various attributes.

Paper — https://bit.ly/361fHiQ
Code — https://bit.ly/2UKAqoF
The FLORES-101 Data Set: Helping Build Better Translation Systems Around the World

Building on the success of machine translation systems like M2M-100, Facebook AI has open-sourced FLORES-101, a many-to-many evaluation data set covering 101 languages from all over the world, to enable researchers to rapidly test and improve upon multilingual translation models like M2M-100. In this article, you’ll delve into its basics.

https://bit.ly/3hja98z
Multivariate Probabilistic Regression with Natural Gradient Boosting

Natural Gradient Boosting (NGBoost) is a new method proposed by the researchers. It is based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution. The method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.

Paper — https://bit.ly/3haEF5C
Code — https://bit.ly/3qC1SB3
Channel name was changed to «Data Phoenix»
Channel photo updated
​​Data Phoenix Rises
We at Data Science Digest have always strived to ignite the fire of knowledge in the AI community. We’re proud to have helped thousands of people to learn something new and give you the tools to push ahead. And we’ve not been standing still, either.
Please meet Data Phoenix, a Data Science Digest rebranded and risen anew from our own flame. Our mission is to help everyone interested in Data Science and AI/ML to expand the frontiers of knowledge. More news, more updates, and webinars (!) are coming. Stay tuned!

​​Data Phoenix Digest — 01.07.2021​​
The new issue of Data Phoenix Digest is here! AI that helps write code, EU’s ban on biometric surveillance, genetic algorithms for NLP, multivariate probabilistic regression with NGBoosting, alias-free GAN, MLOps toys, and more…

https://bit.ly/3h722gE

Join 👉@DataPhoenix
What Is MLOps? — Everything You Must Know to Get Started

MLOps is a buzzword right now. Everyone talks about it; everybody wants to implement it and drive MLOps transformations. If you’re interested in what MLOps is too, this article will provide a scoop of ML systems development lifecycle and explain why you need MLOps.

https://bit.ly/3dB9gau
To Retrain, or Not to Retrain? Let’s Get Analytical About ML Model Updates

In this ML 101 article, you’ll find answers to questions like, «How often should I retrain a model?», «Should I retrain the model now?», and «Should I retrain, or should I update the model?». Dig in for an easy but important piece to read!

https://bit.ly/3qVwDRL
PlanSys2: A Planning System Framework for ROS2

In this paper, the researchers reveal the ROS2 Planning System (PlanSys2), a framework for symbolic planning that incorporates novel approaches for execution on robots working in demanding environments. PlanSys2 aims to be the reference task planning framework in ROS2, the latest version of the {\em de facto} standard in robotics software development.

Paper — https://bit.ly/3qRr8U4
Code
 — https://bit.ly/3yy3Sx5
A Discourse on Reinforcement Learning [Part 1]

This is the first of the 3-article series «A Discourse on Reinforcement Learning» that kicks off with a holistic overview of Reinforcement Learning with an expansive setting. Save the article not to miss parts 2 and 3 about more advanced RL topics.

https://bit.ly/3ApUZr0
The MultiBERTs: BERT Reproductions for Robustness Analysis

In this paper, the international team of researchers introduce MultiBERTs: a set of 25 BERT-base checkpoints, trained with similar hyper-parameters as the original BERT model but differing in random initialization and data shuffling. The aim is to enable researchers to draw robust and statistically justified conclusions about pre-training procedures.

Paper — https://bit.ly/3ylSVyh
Code — https://bit.ly/3dGW3wY