Data Phoenix – Telegram
Data Phoenix
1.45K subscribers
641 photos
3 videos
1 file
1.33K links
Data Phoenix is your best friend in learning and growing in the data world!
We publish digest, organize events and help expand the frontiers of your knowledge in ML, CV, NLP, and other aspects of AI. Idea and implementation: @dmitryspodarets
Download Telegram
Albumentations 1.0.0 has been released!

Albumentations is a computer vision tool and a Python library designed to improve the performance of deep convolutional neural networks by enabling fast, flexible, cost- and resource-efficient image augmentations. The tool can be used for different CV tasks, including object classification, segmentation, and detection.
New version contains 10 new transforms, independence from imgaug, bug fixes, etc.

https://bit.ly/3fKM6jC
​​Data Science Digest — 02.06.21

The new issue of DataScienceDigest is here! Hop to learn about the latest news, articles, tutorials, research papers, datasets, videos, and tools on DataScience, AI, ML, and BigData. All sections are prioritized for your convenience. Enjoy!

https://bit.ly/3vN2CF4

Join 👉@DataScienceDigest
Fraud Detection: Using Relational Graph Learning to Detect Collusion

Uber’s popularity attracted the attention of financial criminals in cyberspace. One type of fraudulent behavior is collusion, a cooperative fraud action among users. In this article, Uber Engineering demonstrates a case study of applying a cutting-edge, deep graph learning model called relational graph convolutional networks (RGCN) to detect such collusion.

https://ubr.to/3irmc6f
CogView: Mastering Text-to-Image Generation via Transformers

Text-to-Image generation is a challenging task that requires powerful generative models and cross-modal understanding. CogView is a 4-billion-parameter Transformer with VQ-VAE tokenizer that, according to the authors, achieves a new state-of-the-art FID on blurred MS COCO, outperforms previous GAN-based models and a recent similar work DALL-E.

Paper — https://bit.ly/3chno8b
Code — https://bit.ly/3ciDoqp
Demo — https://bit.ly/3z4Ba7N
Airflow and Ray: A Data Science Story

In this article, you’ll learn about a Ray provider for Apache Airflow. Ray is a Python-first cluster computing framework that allows Python code, even with complex libraries or packages, to be distributed and run on clusters of infinite size, enabling fast transformations of Airflow DAGs into scalable machine learning pipelines.

https://bit.ly/3pp5Ufe
A Checklist to Track Your Data Science Progress

Progress is fickle. You may think that you are moving forward while, actually, being stuck in the repetition rut. That’s why you need to have a system to track your progress; for example, you can use this awesome checklist by Pascal Janetzky. Get an overview of your progress and find the next goal just by following these steps.

https://bit.ly/3gct7gK
DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort

DatasetGAN is an automatic procedure to generate massive datasets of high-quality semantically segmented images requiring minimal human effort. Presented by an international team of researchers, it outperforms all semi-supervised baselines and is on par with fully supervised methods using labor-intensive annotations.

Web Page — https://bit.ly/2T8S5WA
Paper — https://bit.ly/2SfcDMP
Code — Coming Soon
#DataScienceDigest #DataScience #MachineLearning #ArtificialIntelligence #AI #ML

Subscribe to our weekly newsletter — https://bit.ly/3fXLuXW
Long Text Generation by Modeling Sentence-Level and Discourse-Level Coherence

Generating long and coherent text is an important but challenging task. In this paper, the authors propose a long text generation model that represents the prefix sentences at sentence level and discourse level in the decoding process. Extensive experiments show that the model can generate more coherent texts than state-of-the-art baselines.

Paper — https://bit.ly/3uZvvfZ
Code — https://bit.ly/3vXsj61
Introducing Orbit, An Open Source Package for Time Series Inference and Forecasting

Orbit (Object-ORiented BayesIan Time Series) is a general interface for Bayesian time series modeling developed by Uber Engineering. In this article, you’ll learn the ins and outs of Orbit, from the basics and use cases to a tutorial and benchmarks to follow. Uber is going to introduce more dedicated Bayesian time series models, so the project is worth a look.

https://ubr.to/3v6Hbxy
​​Data Science Digest — 10.06.21

The new issue of DataScienceDigest is here! Machine learning in healthcare, the top 10 TED talks on AI, fraud detection in Uber, DatasetGAN, Text-to-Image generation via transformers, and more…

https://bit.ly/2TR87o9

Join 👉@DataScienceDigest
Metric Learning Tips & Tricks

In this article, the author presents ways of overcoming the limitations of classification, such as the number of training samples, production integration, and scaling. Specifically, he’ll explain how to train an object matching model with no labeled data and use it in production, to ensure metric learning is more scalable and flexible.

https://bit.ly/3cznb05

Subscribe to our weekly newsletter — https://bit.ly/3gqPUp5
Tinkering with the Mobile Apps Dataset

In this article, the author demonstrates how you can use an open-source dataset featuring mobile apps data to build your own models. The article includes such steps as choosing a dataset, exploratory data analysis, feature engineering, and predicting with a model. The dataset and the models are available for re-use.

https://bit.ly/3xiMQT0
Building Scalable Machine Learning Pipelines for Multimodal Health Data on AWS

Machine learning is used extensively in the healthcare and life sciences industries. Among many approaches and methods to increase the accuracy and efficiency of ML models, Multimodal ML stands out as one of the most promising. In this article, you’ll learn how to build a scalable, cloud architecture for Multimodal ML on health data.

https://amzn.to/3wmXM1D
PyCon US 2021 [Conference Materials]

This playlist features all keynotes, talks, and other materials from PyCon US 2021, a virtual conference for the community using and developing the open-source Python programming language. Over 80 videos in total!

https://bit.ly/2TsBJIi
Session-based Recommender Systems

In this extensive research report by Cloudera Fast Forward, you’ll learn all the ins and outs of designing, building, and managing AI/ML-powered recommender systems. The authors will demonstrate how to use specific algorithms and datasets to arrive at conclusions about the do’s and don’ts of building such systems (e.g. while using word2vec).

https://bit.ly/3pS8URC
Dynamically Generating DAGs in Airflow

In this guide, the Astronomer team looks into specific methods of dynamically generating DAGs in Airflow, from single-file methods to multiple-file methods. Every method is accompanied by code and examples. The team also presents DAG Factory, an open source Python library for dynamically generating Airflow DAGs from YAML files.

https://bit.ly/3xt8RhF
​​Data Science Digest — 17.06.21

The new issue of DataScienceDigest is here! Facebook AI migrates its systems to PyTorch, metric learning tips & tricks, session-based recommender systems, AndroidEnv, materials from PyCon US 2021, and more…

https://bit.ly/3vshrMs

Join 👉@DataScienceDigest