Data Phoenix – Telegram
Data Phoenix
1.45K subscribers
641 photos
3 videos
1 file
1.33K links
Data Phoenix is your best friend in learning and growing in the data world!
We publish digest, organize events and help expand the frontiers of your knowledge in ML, CV, NLP, and other aspects of AI. Idea and implementation: @dmitryspodarets
Download Telegram
​​Paper Review: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Microsoft Research Asia has presented a brand new vision Transformer called Swin Transformer that can serve as a backbone like usual CNNs in computer vision and Transformers in natural language processing. The author provides a detailed review of the paper, exploring all the do’s and don’ts of the new approach and the possibilities it offers for developing a unified architecture for CV and NLP tasks.

https://bit.ly/32nLZTu

Subscribe to our weekly newsletter — https://bit.ly/3tvunB8
​​Transferable Visual Words: Exploiting the Semantics of Anatomical Patterns for Self-supervised Learning

In this paper, Fatemeh Haghighi and the team of authors introduce a new concept called «transferable visual words» (TransVW), which is designed to help achieve annotation efficiency for deep learning in medical image analysis. Learn about the team’s extensive experiments and the advantages that TransVW has demonstrated. The research is available as code, pre-trained models, and curated visual words.

Paper — https://bit.ly/3gjtZlj
Code — https://bit.ly/32ms9rP

Subscribe to our weekly newsletter — https://bit.ly/3ahhZNd
​​How Graph Neural Networks (GNN) Work: Introduction to Graph Convolutions from Scratch

The noscript of this one is quite self-explanatory — The author explores graph neural networks and graph convolutions to explain how they work and how you can apply them in theory and practice in your projects. All points are illustrated with code for convenience.

https://bit.ly/3mY2cYS

@DataScienceDigest
​​OpenCV Face Detection with Haar Cascades

Face detection is one of the most popular Computer Vision use cases (at least, as perceived by the general public). Learning how to use OpenCV and Haar Cascades can be critical if you want to go deep with the field — and this detailed tutorial provides a fresh and easy start for new learners. Just follow the instructions step by step and see the results in action.

https://bit.ly/3v5C3KB

@DataScienceDigest
​​Lviv Data Science Summer School

Hi folks,
I’m pleased to invite you all to enroll in the Lviv Data Science Summer School, to delve into advanced methods and tools of Data Science and Machine Learning, including such domains as CV, NLP, Healthcare, Social Network Analysis, and Urban Data Science. The courses are practice-oriented and are geared towards undergraduates, Ph.D. students, and young professionals (intermediate level). The studies begin July 19–30 and will be hosted online. Make sure to apply — Spots are running fast!

https://bit.ly/2Qc0QOx

@DataScienceDigest
​​Interpretable Machine Learning: A Guide for Making Black Box Models Explainable

The book by Christoph Molnar goes deep to explain how to make supervised machine learning models more interpretable. You’ll start by exploring the concepts of interpretability to learn about simple, interpretable models such as decision trees, decision rules, and linear regression. Then, you’ll look into general model-agnostic methods for interpreting black box models like feature importance and accumulated local effects and explaining individual predictions with Shapley values and LIME. The book focuses on ML models for tabular data and less on computer vision and natural language processing tasks. Reading the book is recommended for machine learning practitioners, data scientists, statisticians, and anyone else interested in making machine learning models interpretable.

https://bit.ly/3sH8Ofq

@DataScienceDigest
​​Zero-Shot Learning: Can You Classify an Object Without Seeing It Before?

Developing machine learning models that can perform predictive functions on data it has never seen before has become an important research area called zero-shot learning. We tend to be pretty great at recognizing things in the world we never saw before, and zero-shot learning offers a possible path toward mimicking this powerful human capability.

https://bit.ly/3xxMF7c

@DataScienceDigest
​​Shedding Light on Fairness in AI with a New Data Set

Bias and fairness in AI are highly debatable topics. To address the problem, Facebook AI has created Casual Conversations, a new dataset consisting of 45,186 videos of participants having non noscripted conversations, to help AI researchers identify and evaluate the fairness of their computer vision and audio models across subgroups of age, gender, apparent skin tone, and ambient lighting.

https://bit.ly/3tRt3bX

@DataScienceDigest
​​VideoGPT: Video Generation using VQ-VAE and Transformers

In this research paper, Wilson Yan et al. present VideoGPT, a simple architecture for scaling likelihood-based generative modeling to natural videos. Despite its simplicity, it can generate samples competitive with advanced GAN models for video generation, as well as high fidelity natural images from UCF-101 and Tumbler GIF Dataset (TGIF).

Paper - https://bit.ly/3aHbpAa
Code - https://bit.ly/32NQQxw
Demo - https://bit.ly/3dUzuFF

@DataScienceDigest
​​Data Science Digest — 28.04.21

The new issue of DataScience Digest is here! Hop to learn about the latest articles, tutorials, research papers, and books on Data Science, AI, ML, and Big Data. All sections are prioritized for your convenience. Enjoy!

https://bit.ly/3nrBYOT

Join 👉 @DataScienceDigest
​​Token Labeling: Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet

In this paper, Zihang Jiang, Qibin Hou et al. explore vision transformers applied to ImageNet classification. They have developed new training techniques to demonstrate that by slightly tuning the structure of vision transformers and introducing token labeling, the models can achieve better results than the CNN counterparts and other transformer-based classification models.

Paper - https://bit.ly/32VtFRQ
Code - https://bit.ly/3eDxAbn
​​Boosting Natural Language Processing with Wikipedia

In this hands-on tutorial, Nicola Melluso explains how you can take advantage of Wikipedia to improve your Natural Language Processing models. To illustrate how it works, he takes such NLP tasks as Named Entity Recognition and Topic Modeling, and then goes deep step by step, to explain how to collect and process data, build and train the models, etc.

https://bit.ly/3tfdiul
​​NLP Profiler

A simple but useful NLP library created by @neomatrix369. It enables Data Science practitioners to easily profile datasets with one, two, or more text columns. The library is designed to return either high-level insights or low-level/granular statistical information about the text when given a dataset and a column name containing text data, in that column. Check out the library and let us know what you think.

https://bit.ly/3xEgSBp
​​Deep Learning for Audio with the Speech Commands Dataset

If you want to learn how to train a simple model on the Speech Commands audio dataset, this article by Peter Gao is for you. He explains how to choose a dataset and handle data, train, test, tune the model, and, most importantly, how to do error analysis (and analyze failure cases) to improve model performance over time.

https://bit.ly/2SbUIpR
​​skweak: Weak Supervision Made Easy for NLP

In this paper, Pierre Lison et al. present skweak, a versatile, Python-based software toolkit to help NLP developers apply weak supervision to a wide range of NLP tasks. The toolkit makes it easy to implement a large spectrum of labeling functions (such as heuristics, gazetteers, neural models, or linguistic constraints) on text data, apply them on a corpus, and aggregate their results in a fully unsupervised fashion.

Paper — https://bit.ly/3tk0ORU
Code — https://bit.ly/33aEmAj
​​Face Detection Tips, Suggestions, and Best Practices

In this tutorial, Adrian Rosenbrock continues to explore the topic of face detection. You will learn their tips, suggestions, and best practices to achieve high face detection accuracy with OpenCV and dlib. Though the tutorial is mostly theoretical, it features code and tons of useful links inside.

https://bit.ly/3ehR0na
​​Data Science Digest — 05.05.21

The new issue of DataScienceDigest is here! Hop to learn about the latest articles, tutorials, research papers, and projects on DataScience, AI, ML, and BigData. All sections are prioritized for your convenience. Enjoy!

https://bit.ly/33mYRd3

Join 👉 @DataScienceDigest
​​Improving Model Performance Through Human Participation

In this article, Preetam Josh (Netflix) and Mudit Jain (Google) explore a complex topic of AI-to-human cooperation. Specifically, they explain how human input in the model inference loop (human-in-the-loop) can increase the final precision and recall, and how to incorporate human feedback at inference time to ensure higher precision and recall.

https://bit.ly/3eUWm7b
​​Motion Representations for Articulated Animation

In this research, Aliaksandr Siarohin et al. present novel motion representations for animating articulated objects consisting of distinct parts. Learn about the new method they propose, how it differs from keypoint-based works, and how it can be used to animate a variety of objects, surpassing previous methods on existing benchmarks.

Paper — https://bit.ly/3eVsVlk
Code — https://bit.ly/33q5nj4
Video — https://bit.ly/3tmvlOZ
​​How to Plot XGBoost Trees in R

XGBoost is a popular ML algorithm, which is frequently used in Kaggle competitions and has many practical use cases. If you always wanted to learn more about XGBoost, this short tutorial is for you. You will learn how to prepare the dataset for modeling, train the XGBoot model, plot the XGBoot trees, then export tree plots, and plot multiple trees at once.

https://bit.ly/33pYiiv
​​Multiple Time Series Forecasting with PyCaret

PyCaret is a popular machine learning library and a model management tool for automating machine learning workflows. It allows us to build and deploy end-to-end ML prototypes quickly and efficiently. In this step-by-step tutorial, you will learn how to use PyCaret to forecast multiple time series in less than 50 lines of code.

https://bit.ly/3xWy2KO