Data Science Digest — 26.05.21
The new issue of DataScienceDigest is here! Hop to learn about the latest news, articles, tutorials, research papers, event materials, and projects on DataScience, AI, ML, and BigData. All sections are prioritized for your convenience. Enjoy!
https://bit.ly/3hW5agi
Join 👉 @DataScienceDigest
The new issue of DataScienceDigest is here! Hop to learn about the latest news, articles, tutorials, research papers, event materials, and projects on DataScience, AI, ML, and BigData. All sections are prioritized for your convenience. Enjoy!
https://bit.ly/3hW5agi
Join 👉 @DataScienceDigest
Easy MLOps with PyCaret + MLflow
PyCaret is an open-source, low-code library for machine learning. Built on Python, it’s simple and easy to use, and allows you to quickly and efficiently handle ML models. MLflow is an open-source platform to manage the ML lifecycle. In this article, you’ll learn how to integrate MLOps in your ML experiments using PyCaret and MLflow.
https://bit.ly/3yJmomP
PyCaret is an open-source, low-code library for machine learning. Built on Python, it’s simple and easy to use, and allows you to quickly and efficiently handle ML models. MLflow is an open-source platform to manage the ML lifecycle. In this article, you’ll learn how to integrate MLOps in your ML experiments using PyCaret and MLflow.
https://bit.ly/3yJmomP
Lessons on ML Platforms — From Netflix, DoorDash, Spotify, and More
In this article, the author draws from the experience of AI industry leaders to answer the ubiquitous question, How can organizations enable data scientists to repeatedly deliver value, out of scope of the existing ML production systems? Here he also looks into best practices, tools, and management approaches to resolve the value delivery problem.
https://bit.ly/3uA8Hna
In this article, the author draws from the experience of AI industry leaders to answer the ubiquitous question, How can organizations enable data scientists to repeatedly deliver value, out of scope of the existing ML production systems? Here he also looks into best practices, tools, and management approaches to resolve the value delivery problem.
https://bit.ly/3uA8Hna
Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency
In this paper, researchers look into fairness and bias issues in Twitter’s automated image cropping system. They found systematic disparities in cropping, identified contributing factors, and to resolve the problem proposed the removal of saliency-based cropping in favor of a solution that better preserves user agency.
Paper — https://bit.ly/3yM4ksa
Code —https://bit.ly/3fSifUX
In this paper, researchers look into fairness and bias issues in Twitter’s automated image cropping system. They found systematic disparities in cropping, identified contributing factors, and to resolve the problem proposed the removal of saliency-based cropping in favor of a solution that better preserves user agency.
Paper — https://bit.ly/3yM4ksa
Code —https://bit.ly/3fSifUX
LAPAR: Linearly-Assembled Pixel-Adaptive Regression Network for Single Image Super-Resolution and Beyond
In this paper, the team of researchers propose a linearly-assembled pixel-adaptive regression network (LAPAR), designed and built to deal with a fundamental problem of upsampling a low-resolution (LR) image to its high-resolution (HR) version. LAPAR is highly lightweight and easy to optimize, and helps achieve superb results on SISR benchmarks.
Paper — https://bit.ly/3yYFzcC
Code — https://bit.ly/2RPlPaG
In this paper, the team of researchers propose a linearly-assembled pixel-adaptive regression network (LAPAR), designed and built to deal with a fundamental problem of upsampling a low-resolution (LR) image to its high-resolution (HR) version. LAPAR is highly lightweight and easy to optimize, and helps achieve superb results on SISR benchmarks.
Paper — https://bit.ly/3yYFzcC
Code — https://bit.ly/2RPlPaG
GAN Prior Embedded Network for Blind Face Restoration in the Wild
In this paper, Tao Yang et al. use existing generative adversarial network-based methods to solve the problem of blind face restoration from severely degraded face images in the wild. The proposed GAN prior embedded network (GPEN) generates visually photo-realistic results, which are significantly superior to BFR methods both quantitatively and qualitatively.
Paper — https://bit.ly/3fCV8PA
Code — https://bit.ly/2SIt04e
In this paper, Tao Yang et al. use existing generative adversarial network-based methods to solve the problem of blind face restoration from severely degraded face images in the wild. The proposed GAN prior embedded network (GPEN) generates visually photo-realistic results, which are significantly superior to BFR methods both quantitatively and qualitatively.
Paper — https://bit.ly/3fCV8PA
Code — https://bit.ly/2SIt04e
Build a Scalable Machine Learning Pipeline for Ultra-High Resolution Medical Images using Amazon SageMaker
In this comprehensive article by the AWS team, you’ll learn how to preprocess medical images in ultra-high resolution, train an image classifier on these preprocessed images, and deploy a pretrained model as an API — all done on the Amazon SageMaker platform — to, finally, build a highly scalable machine learning pipeline.
https://amzn.to/3fAJowT
In this comprehensive article by the AWS team, you’ll learn how to preprocess medical images in ultra-high resolution, train an image classifier on these preprocessed images, and deploy a pretrained model as an API — all done on the Amazon SageMaker platform — to, finally, build a highly scalable machine learning pipeline.
https://amzn.to/3fAJowT
Albumentations 1.0.0 has been released!
Albumentations is a computer vision tool and a Python library designed to improve the performance of deep convolutional neural networks by enabling fast, flexible, cost- and resource-efficient image augmentations. The tool can be used for different CV tasks, including object classification, segmentation, and detection.
New version contains 10 new transforms, independence from imgaug, bug fixes, etc.
https://bit.ly/3fKM6jC
Albumentations is a computer vision tool and a Python library designed to improve the performance of deep convolutional neural networks by enabling fast, flexible, cost- and resource-efficient image augmentations. The tool can be used for different CV tasks, including object classification, segmentation, and detection.
New version contains 10 new transforms, independence from imgaug, bug fixes, etc.
https://bit.ly/3fKM6jC
Data Science Digest — 02.06.21
The new issue of DataScienceDigest is here! Hop to learn about the latest news, articles, tutorials, research papers, datasets, videos, and tools on DataScience, AI, ML, and BigData. All sections are prioritized for your convenience. Enjoy!
https://bit.ly/3vN2CF4
Join 👉@DataScienceDigest
The new issue of DataScienceDigest is here! Hop to learn about the latest news, articles, tutorials, research papers, datasets, videos, and tools on DataScience, AI, ML, and BigData. All sections are prioritized for your convenience. Enjoy!
https://bit.ly/3vN2CF4
Join 👉@DataScienceDigest
Fraud Detection: Using Relational Graph Learning to Detect Collusion
Uber’s popularity attracted the attention of financial criminals in cyberspace. One type of fraudulent behavior is collusion, a cooperative fraud action among users. In this article, Uber Engineering demonstrates a case study of applying a cutting-edge, deep graph learning model called relational graph convolutional networks (RGCN) to detect such collusion.
https://ubr.to/3irmc6f
Uber’s popularity attracted the attention of financial criminals in cyberspace. One type of fraudulent behavior is collusion, a cooperative fraud action among users. In this article, Uber Engineering demonstrates a case study of applying a cutting-edge, deep graph learning model called relational graph convolutional networks (RGCN) to detect such collusion.
https://ubr.to/3irmc6f
CogView: Mastering Text-to-Image Generation via Transformers
Text-to-Image generation is a challenging task that requires powerful generative models and cross-modal understanding. CogView is a 4-billion-parameter Transformer with VQ-VAE tokenizer that, according to the authors, achieves a new state-of-the-art FID on blurred MS COCO, outperforms previous GAN-based models and a recent similar work DALL-E.
Paper — https://bit.ly/3chno8b
Code — https://bit.ly/3ciDoqp
Demo — https://bit.ly/3z4Ba7N
Text-to-Image generation is a challenging task that requires powerful generative models and cross-modal understanding. CogView is a 4-billion-parameter Transformer with VQ-VAE tokenizer that, according to the authors, achieves a new state-of-the-art FID on blurred MS COCO, outperforms previous GAN-based models and a recent similar work DALL-E.
Paper — https://bit.ly/3chno8b
Code — https://bit.ly/3ciDoqp
Demo — https://bit.ly/3z4Ba7N
Airflow and Ray: A Data Science Story
In this article, you’ll learn about a Ray provider for Apache Airflow. Ray is a Python-first cluster computing framework that allows Python code, even with complex libraries or packages, to be distributed and run on clusters of infinite size, enabling fast transformations of Airflow DAGs into scalable machine learning pipelines.
https://bit.ly/3pp5Ufe
In this article, you’ll learn about a Ray provider for Apache Airflow. Ray is a Python-first cluster computing framework that allows Python code, even with complex libraries or packages, to be distributed and run on clusters of infinite size, enabling fast transformations of Airflow DAGs into scalable machine learning pipelines.
https://bit.ly/3pp5Ufe
A Checklist to Track Your Data Science Progress
Progress is fickle. You may think that you are moving forward while, actually, being stuck in the repetition rut. That’s why you need to have a system to track your progress; for example, you can use this awesome checklist by Pascal Janetzky. Get an overview of your progress and find the next goal just by following these steps.
https://bit.ly/3gct7gK
Progress is fickle. You may think that you are moving forward while, actually, being stuck in the repetition rut. That’s why you need to have a system to track your progress; for example, you can use this awesome checklist by Pascal Janetzky. Get an overview of your progress and find the next goal just by following these steps.
https://bit.ly/3gct7gK
DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort
DatasetGAN is an automatic procedure to generate massive datasets of high-quality semantically segmented images requiring minimal human effort. Presented by an international team of researchers, it outperforms all semi-supervised baselines and is on par with fully supervised methods using labor-intensive annotations.
Web Page — https://bit.ly/2T8S5WA
Paper — https://bit.ly/2SfcDMP
Code — Coming Soon
#DataScienceDigest #DataScience #MachineLearning #ArtificialIntelligence #AI #ML
Subscribe to our weekly newsletter — https://bit.ly/3fXLuXW
DatasetGAN is an automatic procedure to generate massive datasets of high-quality semantically segmented images requiring minimal human effort. Presented by an international team of researchers, it outperforms all semi-supervised baselines and is on par with fully supervised methods using labor-intensive annotations.
Web Page — https://bit.ly/2T8S5WA
Paper — https://bit.ly/2SfcDMP
Code — Coming Soon
#DataScienceDigest #DataScience #MachineLearning #ArtificialIntelligence #AI #ML
Subscribe to our weekly newsletter — https://bit.ly/3fXLuXW
Long Text Generation by Modeling Sentence-Level and Discourse-Level Coherence
Generating long and coherent text is an important but challenging task. In this paper, the authors propose a long text generation model that represents the prefix sentences at sentence level and discourse level in the decoding process. Extensive experiments show that the model can generate more coherent texts than state-of-the-art baselines.
Paper — https://bit.ly/3uZvvfZ
Code — https://bit.ly/3vXsj61
Generating long and coherent text is an important but challenging task. In this paper, the authors propose a long text generation model that represents the prefix sentences at sentence level and discourse level in the decoding process. Extensive experiments show that the model can generate more coherent texts than state-of-the-art baselines.
Paper — https://bit.ly/3uZvvfZ
Code — https://bit.ly/3vXsj61
Introducing Orbit, An Open Source Package for Time Series Inference and Forecasting
Orbit (Object-ORiented BayesIan Time Series) is a general interface for Bayesian time series modeling developed by Uber Engineering. In this article, you’ll learn the ins and outs of Orbit, from the basics and use cases to a tutorial and benchmarks to follow. Uber is going to introduce more dedicated Bayesian time series models, so the project is worth a look.
https://ubr.to/3v6Hbxy
Orbit (Object-ORiented BayesIan Time Series) is a general interface for Bayesian time series modeling developed by Uber Engineering. In this article, you’ll learn the ins and outs of Orbit, from the basics and use cases to a tutorial and benchmarks to follow. Uber is going to introduce more dedicated Bayesian time series models, so the project is worth a look.
https://ubr.to/3v6Hbxy
Data Science Digest — 10.06.21
The new issue of DataScienceDigest is here! Machine learning in healthcare, the top 10 TED talks on AI, fraud detection in Uber, DatasetGAN, Text-to-Image generation via transformers, and more…
https://bit.ly/2TR87o9
Join 👉@DataScienceDigest
The new issue of DataScienceDigest is here! Machine learning in healthcare, the top 10 TED talks on AI, fraud detection in Uber, DatasetGAN, Text-to-Image generation via transformers, and more…
https://bit.ly/2TR87o9
Join 👉@DataScienceDigest