Data1984 – Telegram
Data1984
787 subscribers
44 photos
1 video
17 files
762 links
This channel is mostly about data related stuff, some of the main topics are #DataEngineering #SQL #Python #cloud .

Contact: @gorros
Download Telegram
Forwarded from VG Recruiting Agency (IT) (Вазген Галоян)
Talkdesk is looking for a savvy Senior Data Engineer (REMOTE).

Վերջնաժամկետ 28.12.2020

Այս և մնացած բոլոր աշխատատեղերի մասին ամբողջական տեղեկատվություն կարող եք ստանալ մեր կայքից անցնելով հետևյալ հղումով `

https://www.talkdesk.com/careers/td/remote/engineering/senior-data-engineer-2445777?gh_jid=2445777
Here are some major updates for Lambda which probably will make you rethink your serverless architecture 😉
#reInvent #AWS #Lambda
New for AWS Lambda – Container Image Support
New for AWS Lambda – 1ms Billing Granularity Adds Cost Savings
New for AWS Lambda – Functions with Up to 10 GB of Memory and 6 vCPUs
#reInvent
Amazon S3 Update – Strong Read-After-Write Consistency:
Effective immediately, all S3 GET, PUT, and LIST operations, as well as operations that change object tags, ACLs, or metadata, are now strongly consistent!
This is especially import if you are using S3 as a Data Lake and process data via EMR.
#reInvent #AWS
Data analytics and engineering related updates:
Announcing Amazon Redshift data sharing (preview)
Amazon EMR Studio (Preview): A new notebook-first IDE experience with Amazon EMR
Amazon Redshift announces native console integration with partners (Preview)
Amazon Redshift announces support for native JSON and semi-structured data processing (preview)
Simplify running Apache Spark jobs with Amazon EMR on Amazon EKS
#AWS #reInvent #Redshift
Amazon Redshift announces Automatic Table Optimization
Amazon Redshift now includes Amazon RDS for MySQL and Amazon Aurora MySQL databases as new data sources for federated querying (Preview)
Amazon Redshift launches RA3.xlplus nodes with managed storage
Forwarded from Инжиниринг Данных (Dmitry Anoshin)
Lakehouse = DW + Data Lake.

Примеры lakehouse:
- Redshift + Redshift Spectrum
- Snowflake
- Databrics Delta Lake
- Azure Synapse Analytics

Попался очень интересный paper, который был только недавно опубликован основателями Databricks.

This paper argues that the data warehouse architecture as we know it today will wither in the coming years and be replaced by a new architectural pattern, the Lakehouse, which will (i) be based on open direct-access data formats, such as Apache Parquet, (ii) have first class support for machine learning and data science, and (iii) offer state-of-the-art performance. Lakehouses can help address several major challenges with data warehouses, including data staleness, reliability, total cost of ownership, data lock-in, and limited use-case support.
I am using #dbt for my latest project and it definitely is a great tool. I wouldn't compare it with #Spark which is great tool to crunch data both on a single machine and cluster. But for me #dbt is great since you can store your reports in version control and deploy them via CI/CD.