Data1984 – Telegram
Data1984
787 subscribers
44 photos
1 video
17 files
762 links
This channel is mostly about data related stuff, some of the main topics are #DataEngineering #SQL #Python #cloud .

Contact: @gorros
Download Telegram
#reInvent #AWS
Data analytics and engineering related updates:
Announcing Amazon Redshift data sharing (preview)
Amazon EMR Studio (Preview): A new notebook-first IDE experience with Amazon EMR
Amazon Redshift announces native console integration with partners (Preview)
Amazon Redshift announces support for native JSON and semi-structured data processing (preview)
Simplify running Apache Spark jobs with Amazon EMR on Amazon EKS
#AWS #reInvent #Redshift
Amazon Redshift announces Automatic Table Optimization
Amazon Redshift now includes Amazon RDS for MySQL and Amazon Aurora MySQL databases as new data sources for federated querying (Preview)
Amazon Redshift launches RA3.xlplus nodes with managed storage
Forwarded from Инжиниринг Данных (Dmitry Anoshin)
Lakehouse = DW + Data Lake.

Примеры lakehouse:
- Redshift + Redshift Spectrum
- Snowflake
- Databrics Delta Lake
- Azure Synapse Analytics

Попался очень интересный paper, который был только недавно опубликован основателями Databricks.

This paper argues that the data warehouse architecture as we know it today will wither in the coming years and be replaced by a new architectural pattern, the Lakehouse, which will (i) be based on open direct-access data formats, such as Apache Parquet, (ii) have first class support for machine learning and data science, and (iii) offer state-of-the-art performance. Lakehouses can help address several major challenges with data warehouses, including data staleness, reliability, total cost of ownership, data lock-in, and limited use-case support.
I am using #dbt for my latest project and it definitely is a great tool. I wouldn't compare it with #Spark which is great tool to crunch data both on a single machine and cluster. But for me #dbt is great since you can store your reports in version control and deploy them via CI/CD.