DE & ML Digest
@mlbigdata
120
subscribers
9
photos
1
video
34.7K
links
Collection of all articles on Data Engineering and Machine Learning
Contact -
@luminousmen
Download Telegram
Join
DE & ML Digest
120 subscribers
DE & ML Digest
Big Data
CAP and PACELC theorems in plain English
Blog | iamluminousmen
CAP and PACELC Theorems in Plain English
Understand the CAP and PACELC theorems in distributed systems. Learn how to navigate tradeoffs between consistency, availability, and partition tolerance for optimal system design.
DE & ML Digest
Big Data
Architecturally Significant Requirements
Blog | iamluminousmen
Architecturally Significant Requirements
Discover the crucial Architecturally Significant Requirements (ASR) for distributed systems, including Availability, Durability, Resiliency, Reliability, and Scalability. Learn how these factors impact system design and performance.
DE & ML Digest
Big Data
Explaining the mechanics of Spark caching
Blog | iamluminousmen
Explaining the mechanics of Spark caching
Caching... There is so much in that word - the pain of invalidation and the joy of reusing computation. In Spark, this is known as an optimization technique
DE & ML Digest
Big Data
Machine Learning types
Blog | iamluminousmen
Machine Learning types
Machine Learning is based on the idea that analytic systems can learn to identify patterns and make decisions with minimal human involvement
DE & ML Digest
Big Data
What is Serverless Architecture and what are its benefits?
Blog | iamluminousmen
What is Serverless Architecture and what are its benefits?
So much hype around serverless architectures but what it's really bringing to the table for us? Is it the next standard in application development?
DE & ML Digest
Big Data
Get Hive count in seconds
DE & ML Digest
Big Data
Databricks’ Open Source Genomics Toolkit Outperforms Leading Tools
Databricks
How Glow Performs Genetic Association Studies 10x More Efficiently Than Hail
Learn more about Glow, the open-source toolkit for genomics data analytics that scales to population levels and the testing and benchmarking that shows it is up to 10x faster than competitors.
DE & ML Digest
Big Data
Ray on Databricks
Databricks
How to Use Ray, a Distributed Python Framework, on Databricks
Learn how to use Ray on the Databricks Lakehouse Platform for reinforcement learning and with custom distributed Python pipeline new use cases and optimizations.
DE & ML Digest
Big Data
Things to consider while running Google Cloud Dataproc
Blog | iamluminousmen
Things to consider while running Google Cloud Dataproc
There are many pitfalls that inexperienced engineers may encounter when building pipelines based on Cloud Dataproc, let's look into them.
DE & ML Digest
Big Data
Data Stream Processing
dzone.com
Data Stream Processing - DZone Big Data
In this post, we'll explore unbounded data, what data stream processing is, its characteristics, workflow, and why should we leverage the cloud for it?
DE & ML Digest
Big Data
Scaling With Presto on Spark
dzone.com
Scaling With Presto on Spark - DZone Big Data
Presto on Spark enables more use cases for data analytics, providing a unified SQL experience for both interactive and batch use cases.
DE & ML Digest
Big Data
Spark tips. Caching
Blog | iamluminousmen
Spark Tips. Caching
Another portion of tips to Apache Spark usage, now it's about caching and checkpointing data
DE & ML Digest
Big Data
HDFS vs Cloud-based Object storage(S3)
Blog | iamluminousmen
HDFS vs Cloud-based Object storage(S3)
I am very annoyed that all sorts of big data engineers confuse S3 and HDFS systems, assuming that S3 is the same as HDFS. That’s not true.
DE & ML Digest
Big Data
Announcing Amazon SageMaker Ground Truth Plus – Create Training Datasets Without Code or In-house Resources
Amazon
Announcing Amazon SageMaker Ground Truth Plus – Create Training Datasets Without Code or In-house Resources | Amazon Web Services
Today, we’re pleased to announce the latest service in the Amazon SageMaker suite that will make labeling datasets easier than ever before. Ground Truth Plus is a turn-key service that uses an expert workforce to deliver high-quality training datasets fast…
DE & ML Digest
Big Data
Announcing Amazon SageMaker Inference Recommender
Amazon
Announcing Amazon SageMaker Inference Recommender | Amazon Web Services
Today, we’re pleased to announce Amazon SageMaker Inference Recommender — a brand-new Amazon SageMaker Studio capability that automates load testing and optimizes model performance across machine learning (ML) instances. Ultimately, it reduces the time it…
DE & ML Digest
Big Data
New – Introducing SageMaker Training Compiler
Amazon
New – Introducing SageMaker Training Compiler | Amazon Web Services
Today, we’re pleased to announce Amazon SageMaker Training Compiler, a new Amazon SageMaker capability that can accelerate the training of deep learning (DL) models by up to 50%. As DL models grow in complexity, so too does the time it can take to optimize…
DE & ML Digest
Big Data
New AWS Scholarship Program Helps Underrepresented and Underserved Students Prep for Careers in AI and ML
Amazon
New AWS Scholarship Program Helps Underrepresented and Underserved Students Prep for Careers in AI and ML | Amazon Web Services
As a woman working in information technology (IT) for many years, it has always been close to my heart to challenge long-standing gender stereotypes and inspire more young learners to consider a career in tech. With artificial intelligence (AI) and machine…
DE & ML Digest
Big Data
Turning 2 Trillion Data Points of Traffic Intelligence into Critical Business Insights
Databricks
Simplifying Geospatial Data Analysis With Python Using Databricks
Learn how we optimized our workflow at Intelematics by creating a library to work with geospatial data using python on the Databricks Lakehouse platform.
DE & ML Digest
Big Data
Network Address Management and Auditing at Scale with Amazon VPC IP Address Manager
Amazon
Network Address Management and Auditing at Scale with Amazon VPC IP Address Manager | Amazon Web Services
Managing, monitoring, and auditing IP address allocation for at-scale networks, as the growth in cloud workloads and connected devices continues at a rapid pace, is a complex, time-consuming, and potentially error-prone task. Traditionally, network administrators…
DE & ML Digest
ML
Principal Component Analysis for Visualization
MachineLearningMastery.com
Principal Component Analysis for Visualization - MachineLearningMastery.com
Principal component analysis (PCA) is an unsupervised machine learning technique. Perhaps the most popular use of principal component analysis is dimensionality reduction. Besides using PCA as a data preparation technique, we can also use it to help visualize…
DE & ML Digest
ML
Face Recognition using Principal Component Analysis
TWeb.init({scrollToPost:'mlbigdata/35029'});