Here is a great resource for getting familiar with AWS Data Analytics solutions even if you are not planning to take exam. It is very interactive and visually pleasing course.
https://www.aws.training/Details/eLearning?id=46612
https://www.aws.training/Details/eLearning?id=46612
Forwarded from DataEng
CAP теорема для дата инженеров: https://www.analyticsvidhya.com/blog/2020/08/a-beginners-guide-to-cap-theorem-for-data-engineering/
Analytics Vidhya
A Beginner's Guide to CAP Theorem for Data Engineering
CAP theorem helps to handle your distributed database systems when a few database servers refuse to communicate with each other.
If you noticed, recently I was not actively posting in the channel because I was busy preparing for an exam. I want to share this guide if you are considering to pass it as well.
https://towardsdatascience.com/becoming-an-aws-certified-data-analytics-new-april-2020-4a3ef0d9f23a?gi=b2f48e1e3986
https://towardsdatascience.com/becoming-an-aws-certified-data-analytics-new-april-2020-4a3ef0d9f23a?gi=b2f48e1e3986
Medium
Becoming an AWS Certified Data Analytics — NEW April 2020
A guide to become the next AWS Certified Data Analytics expert.
Interesting new feature rich data warehouse with unusual pricing approach base on AWS. Especially I like dedup feature.
https://www.dataengineeringpodcast.com/firebolt-cloud-data-warehouse-episode-148/
https://www.firebolt.io/
https://www.dataengineeringpodcast.com/firebolt-cloud-data-warehouse-episode-148/
https://www.firebolt.io/
Data Engineering Podcast
Building A Better Data Warehouse For The Cloud At Firebolt - Episode 148
Data warehouse technology has been around for decades and has gone through several generational shifts in that time. The current trends in data warehousing are oriented around cloud native architectures that take advantage of dynamic scaling and the separation…
Forwarded from DataEng
Карта навыков современного дата инженера: https://github.com/datastacktv/data-engineer-roadmap
Неплохо дополняет мою статью: https://khashtamov.com/ru/data-engineer/
Неплохо дополняет мою статью: https://khashtamov.com/ru/data-engineer/
GitHub
GitHub - datastacktv/data-engineer-roadmap: Roadmap to becoming a data engineer in 2021
Roadmap to becoming a data engineer in 2021. Contribute to datastacktv/data-engineer-roadmap development by creating an account on GitHub.
Forwarded from DataEng
В Amazon Redshift стал доступен функционал работы с БД поверх HTTPS: https://aws.amazon.com/ru/about-aws/whats-new/2020/09/announcing-data-api-for-amazon-redshift/
Amazon
Announcing Data API for Amazon Redshift
AWS Glue now supports serverless streaming ETL
https://aws.amazon.com/about-aws/whats-new/2020/04/aws-glue-now-supports-serverless-streaming-etl/
https://aws.amazon.com/about-aws/whats-new/2020/04/aws-glue-now-supports-serverless-streaming-etl/
Amazon
AWS Glue now supports serverless streaming ETL
You may find above updates from AWS random and not important but there is a general tendency for me. Basically you can do one thing in many ways and there is better integration between different ETL types: streaming, batch and CDC. However it is hard sometimes to choose between services. #AWS
Data Quality Implementation in Data Warehouses | Toptal
https://www.toptal.com/database/data-warehouse-data-quality-process?utm_campaign=Toptal%20Engineering%20Blog&utm_medium=email&_hsmi=94506066&_hsenc=p2ANqtz-96vNKU-BhpAA5fMC-8gJ3pDBq23ob6VF1lvqOxgYVoATaLUYMQexXEDzJN9c-dFoeH5APIz07aa8hrA2PL9wlSXo1PA62qBFqdkzcTTTOV4AIoWbE&utm_content=94506066&utm_source=hs_email
https://www.toptal.com/database/data-warehouse-data-quality-process?utm_campaign=Toptal%20Engineering%20Blog&utm_medium=email&_hsmi=94506066&_hsenc=p2ANqtz-96vNKU-BhpAA5fMC-8gJ3pDBq23ob6VF1lvqOxgYVoATaLUYMQexXEDzJN9c-dFoeH5APIz07aa8hrA2PL9wlSXo1PA62qBFqdkzcTTTOV4AIoWbE&utm_content=94506066&utm_source=hs_email
Toptal Engineering Blog
Data Quality Implementation in Data Warehouses
Data quality is a crucial element of any successful data warehouse solution. As the complexity of data warehouses increases, so does the need for data quality processes.
Very useful and relevant blog post about data deletion in a data lake. Besides suggested solution I would like to mention also using Delta Lake as alternative. And finally, it would be great if the author has mentioned cost considerations .
https://aws.amazon.com/blogs/big-data/how-to-delete-user-data-in-an-aws-data-lake/
https://aws.amazon.com/blogs/big-data/how-to-delete-user-data-in-an-aws-data-lake/
Amazon
How to delete user data in an AWS data lake | Amazon Web Services
General Data Protection Regulation (GDPR) is an important aspect of today’s technology world, and processing data in compliance with GDPR is a necessity for those who implement solutions within the AWS public cloud. One article of GDPR is the “right to erasure”…
Amazing statistics about data.
https://www.datanami.com/2020/09/04/10-big-data-statistics-that-will-blow-your-mind/?utm_source=rss&utm_medium=rss&utm_campaign=10-big-data-statistics-that-will-blow-your-mind
https://www.datanami.com/2020/09/04/10-big-data-statistics-that-will-blow-your-mind/?utm_source=rss&utm_medium=rss&utm_campaign=10-big-data-statistics-that-will-blow-your-mind
Datanami
10 Big Data Statistics That Will Blow Your Mind
They call it “big data” for a reason--it's really, really big. But getting your head wrapped around the growth of information digitization is not easy.
20x improvement compared to #Spark 2.4
https://techcommunity.microsoft.com/t5/azure-databricks/turbocharge-azure-databricks-with-photon-powered-delta-engine/ba-p/1694929
https://techcommunity.microsoft.com/t5/azure-databricks/turbocharge-azure-databricks-with-photon-powered-delta-engine/ba-p/1694929
TECHCOMMUNITY.MICROSOFT.COM
Turbocharge Azure Databricks with Photon powered Delta Engine
Today we are excited to announce the preview of Photon powered Delta Engine on Azure Databricks – fast, easy, and collaborative Analytics and AI service. Built from scratch in C++ and fully compatible with Spark APIs, Photon is a vectorized query engine that…
Most of the subscribers know why I've paused posting in the channel. I think most of you are busy now with other important issues. So I would like to create a poll to ask you whether you would like to see new posts or not yet. Thank you for understanding.
Anonymous Poll
63%
Yes
37%
Not yet