If you work in one of these areas:
- Back-end development
- Data analytics
- Data science
- DevOps
and you are based in Armenia and want to become a data engineer, reach out to me (check channel denoscription).
- Back-end development
- Data analytics
- Data science
- DevOps
and you are based in Armenia and want to become a data engineer, reach out to me (check channel denoscription).
After sort and distribution keys, now also column compression encoding can be optimized automagically 😉
Amazon
Amazon Redshift announces Automatic Table Optimization
Came across this comparison while reading Firebolt white paper. Not sure why authors used a Redshift cluster which is much larger than 1TB dataset, maybe to add more computational power. Anyway, don't get scared by these clusters' prices.
Fivetran
Cloud Data Warehouse Benchmark | Blog | Fivetran
Our newest benchmark compares price, performance and differentiated features for Redshift, Snowflake, BigQuery, Databricks and Synapse.
It seems these days it's all about Ops, DevOps, MLOps, DevSecOps and now we have Ops for data, DataOps 😎. And yes, this term can be heard more recently, but not sure if this is something new. I guess data engineers were already covering these aspects.
https://www.linkedin.com/posts/firebolt_how-vimeo-keeps-data-intact-with-85-billion-activity-6833790832726810624-Sdjs
https://www.linkedin.com/posts/firebolt_how-vimeo-keeps-data-intact-with-85-billion-activity-6833790832726810624-Sdjs
Linkedin
Firebolt on LinkedIn: How Vimeo Keeps Data Intact with 85 Billion Events Per Month
WHEN IT'S CLEAR YOU NEED A DATA OPS TEAM
Lior Solomon, VP Data Engineering at Vimeo shares his own experience on The Data Engineering Show:
* What made him…
Lior Solomon, VP Data Engineering at Vimeo shares his own experience on The Data Engineering Show:
* What made him…
In data engineering landscape there are always new interesting project. And sometimes only way you hear about them is by talking to other data professionals. So here are two cool projects I learned about from a friend:
1. Presidio: Data protection and anonymization library from Microsoft
2. Trino: a new query engine from creators of Presto
1. Presidio: Data protection and anonymization library from Microsoft
2. Trino: a new query engine from creators of Presto
microsoft.github.io
Home - Microsoft Presidio
PII anonymization for text, images, and structured data.
It seems one tool, dbt, is driving demand for new analytics engineer specialization. Spark is popular too,and often considered as main tool for data engineers, but it did not create a specialization.
Medium
Analytics Engineer: The Newest Data Career Role
Is it analytics? Is it engineering? What do these people do?
Data Engineering Annotated Monthly – August 2021 | The Big Data Tools Blog
https://blog.jetbrains.com/big-data-tools/2021/09/06/data-engineering-annotated-monthly-august-2021/
https://blog.jetbrains.com/big-data-tools/2021/09/06/data-engineering-annotated-monthly-august-2021/
The JetBrains Blog
Data Engineering Annotated Monthly – August 2021 | The Big Data Tools Blog
August is usually a quiet month, with vacations taking their toll. But data engineering never stops. I’m Pasha Finkelshteyn and I will be your guide through this month’s news, my impressions of the de
Beginner's Series to Rust | Channel 9
https://channel9.msdn.com/Series/Beginners-Series-to-Rust?ocid=eml_pg293709_gdc_comm_az&mkt_tok=MTU3LUdRRS0zODIAAAF_aGNH8C6GnJuFIXOAMh12cOOZzlysGY-QSiGzExcs0QGgifOYInHhOMlSCA6styKmVWUcN3lrkTDBm1Bx3q8FUqYN0pt0w9Iqv4MXAoq-GZ-CDNm86IIZOw
https://channel9.msdn.com/Series/Beginners-Series-to-Rust?ocid=eml_pg293709_gdc_comm_az&mkt_tok=MTU3LUdRRS0zODIAAAF_aGNH8C6GnJuFIXOAMh12cOOZzlysGY-QSiGzExcs0QGgifOYInHhOMlSCA6styKmVWUcN3lrkTDBm1Bx3q8FUqYN0pt0w9Iqv4MXAoq-GZ-CDNm86IIZOw
Msdn
Beginner's Series to Rust | Channel 9
Rust has been ranked as one of the most loved languages by developers. In this series, you will learn the fundamentals of Rust development. We'll start by downloading the tools you need to program wit
I can say from my own experience that this is much better then post-factum analytics integration with traditional ETL. I just did not know that the term is IDT.
https://medium.com/whispering-data/the-end-of-etl-as-we-know-it-92166c19084c
https://medium.com/whispering-data/the-end-of-etl-as-we-know-it-92166c19084c
Medium
The End of ETL As We Know It
If you’re as sick of this three-letter phrase as I am, you’ll be happy to know there is another way.
Great presentation about Data Mesh,a term coined by Zhamak Dehghani, in her original article.
YouTube
Data Mesh in Practice: How Europe's Leading Online Platform for Fashion Goes Beyond the Data Lake
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities…
There are almost no books on data engineering which focus on concepts and problems rather than on specific technologies. But this book is one of this rare ones. It is a collection of advises, problems and solutions, or just ideas to reflect on.
Goodreads
97 Things Every Data Engineer Should Know: Collective W…
Take advantage of today's sky-high demand for data engi…
I wrote a summary where I compare Azure Synapse, Databricks and Azure Data Explorer focusing on the features that I find important.
https://medium.com/@gorros/azure-synapse-databricks-and-azure-data-explorer-kusto-73a3a0339cf2
https://medium.com/@gorros/azure-synapse-databricks-and-azure-data-explorer-kusto-73a3a0339cf2
Medium
Azure Synapse, Databricks, and Azure Data Explorer (Kusto)
Which analytical platform to choose?
HP introduced its new Unified analytics platform HPE GreenLake
Hpe
HPE GreenLake Announcement
See why enterprises are embracing a cloud-everywhere strategy with HPE GreenLake. Join HPE President and CEO Antonio Neri for a special broadcast on April 4. Learn about the next set of cloud services launching on HPE GreenLake edge-to-cloud platform.