Про Apache Beam за 12 минут: https://www.youtube.com/watch?v=yZUe4th9gwY
YouTube
Apache Beam Explained in 12 Minutes
Apache Beam is a popular parallel processing framework. In this video, Alexandra will give you an overview of Apache Beam and by the end of the video you will hopefully have the skills that you need to write a simple pipeline.
Source code - https://gith…
Source code - https://gith…
Serverless Data Lake Framework Workshop :: Serverless Data Lake Framework (SDLF) Workshop
https://sdlf.workshop.aws/
https://sdlf.workshop.aws/
sdlf.workshop.aws
Serverless Data Lake Framework (SDLF) Workshop
Внезапно! От издательства Packt Publishing вышла книга Data Engineering with Python: https://www.packtpub.com/product/data-engineering-with-python/9781839214189
В книге акцент уделён построению дата пайплайнов на Apache Airflow и Apache NiFi. Также есть главы, посвященные Kafka и Spark.
В книге акцент уделён построению дата пайплайнов на Apache Airflow и Apache NiFi. Также есть главы, посвященные Kafka и Spark.
Packt
Data Engineering with Python | Packt
Build, monitor, and manage real-time data pipelines to create data engineering infrastructure efficiently using open-source Apache projects
Для Redshift выпущен коннектор под Python: https://github.com/aws/amazon-redshift-python-driver
GitHub
GitHub - aws/amazon-redshift-python-driver: Redshift Python Connector. It supports Python Database API Specification v2.0.
Redshift Python Connector. It supports Python Database API Specification v2.0. - aws/amazon-redshift-python-driver
Отличный материал про сравнение самых популярных облачных хранили: BigQuery, Amazon Redshift и Snowflake — https://poplindata.com/data-warehouses/2020-database-showdown-bigquery-vs-redshift-vs-snowflake/
Snowplow
Snowplow Behavioral Data Platform - Fuel AI, Analytics, Marketing | Snowplow
Snowplow empowers organizations to unlock the value of its customer behavioral data in their cloud data warehouse to fuel next-gen AI, analytics, and marketing.
На платформе Udemy можно бесплатно зарегистрироваться на курс Google Associate Cloud Engineer 2020: https://www.udemy.com/course/google-certified-associate-cloud-engineer-2019-prep-course/
Udemy
Google Cloud Associate Cloud Engineer: Get Certified 2024
Learn How to Pass the Exam from the author of the Official Certification Guide for Google
Ссылка с купоном на бесплатный курс: https://www.udemy.com/course/google-certified-associate-cloud-engineer-2019-prep-course/?couponCode=23FFEC011AB4ED7E351B
Udemy
Google Cloud Associate Cloud Engineer: Get Certified 2024
Learn How to Pass the Exam from the author of the Official Certification Guide for Google
Лекции про распределенные системы: https://www.youtube.com/playlist?list=PLeKd45zvjcDFUEv_ohr_HdUFe97RItdiB
Forwarded from Data1984
A comparison of data version control tools.
https://dagshub.com/blog/data-version-control-tools/
https://dagshub.com/blog/data-version-control-tools/
DagsHub Blog
Comparing Data Version Control Tools - 2020
Data versioning is one of the keys to automating a team's machine learning model development. While it can be very complicated if your team attempts to develop its own system to manage the process, this doesn’t need to be the case.
Forwarded from Data1984
Some important updates from #AWS :
✅ Amazon Kinesis Data Streams enables data stream retention up to one year.
✅ Now you can export your Amazon DynamoDB table data to your data lake in Amazon S3 to perform analytics at any scale.
✅ Amazon Redshift now supports modifying column compression encodings to optimize storage utilization and query performance
✅ Amazon Athena announces availability of engine version 2
✅ Amazon Kinesis Data Streams enables data stream retention up to one year.
✅ Now you can export your Amazon DynamoDB table data to your data lake in Amazon S3 to perform analytics at any scale.
✅ Amazon Redshift now supports modifying column compression encodings to optimize storage utilization and query performance
✅ Amazon Athena announces availability of engine version 2
Amazon
Amazon Kinesis Data Streams enables data stream retention up to one year
Нашел интересный проект от Apache (пока на стадии инкубатора) — Apache Liminal: http://liminal.incubator.apache.org/
Платформа для оркестрации машинного обучения. Насколько понял, под капотом используется Apache Airflow.
Платформа для оркестрации машинного обучения. Насколько понял, под капотом используется Apache Airflow.
liminal.incubator.apache.org
Apache Limial official site
Я чуть выше публиковал серию лекций про распределённые системы от Мартина Клепмана, а вот недавно появился пост у него в блоге: https://martin.kleppmann.com/2020/11/18/distributed-systems-and-elliptic-curves.html
Forwarded from Инжиниринг Данных (Dmitry Anoshin)
Netflix создал еще одно решение - Бульдозер, для экспорта данных из хранилища данных в NoSQL. https://netflixtechblog.com/bulldozer-batch-data-moving-from-data-warehouse-to-online-key-value-stores-41bac13863f8
Medium
Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores
By Tianlong Chen and Ioannis Papapanagiotou
Серия видео про новшества в Airflow 2.0: https://bit.ly/395ib2C
YouTube
Airflow 2.0 - YouTube