This reminds me of a solution designed and implemented a couple of years ago. But back then we used DynamoDB streams to capture item-level changes with exactly-one semantics, Lambda to modify data and Kinesis Firehose to deliver data to Redshift. Looks like now things are simpler.
Amazon
Near-real-time analytics using Amazon Redshift streaming ingestion with Amazon Kinesis Data Streams and Amazon DynamoDB | Amazon…
Amazon Redshift is a fully managed, scalable cloud data warehouse that accelerates your time to insights with fast, easy, and secure analytics at scale. Tens of thousands of customers rely on Amazon Redshift to analyze exabytes of data and run complex analytical…
VP and distinguished engineer over at S3 tells the story of building S3.
YouTube
FAST '23 - Building and Operating a Pretty Big Storage System (My Adventures in Amazon S3)
Building and Operating a Pretty Big Storage System (My Adventures in Amazon S3)
Andy Warfield, Amazon
Five years ago I decided to leave my faculty position at UBC and join Amazon. A lot of that time has been spent working as an engineer on the S3 team.…
Andy Warfield, Amazon
Five years ago I decided to leave my faculty position at UBC and join Amazon. A lot of that time has been spent working as an engineer on the S3 team.…
LinkedIn remains one of the coolest places in terms of data engineering where many popular open-source technologies emerge.
https://engineering.linkedin.com/blog/2023/declarative-data-pipelines-with-hoptimator
https://engineering.linkedin.com/blog/2023/declarative-data-pipelines-with-hoptimator
Linkedin
Declarative Data Pipelines with Hoptimator
Microsoft today announced the public preview of Python in Excel. So this is what Guido van Rossum was working on 😁 (Creator of Python joined Microsoft 3 years ago)
TECHCOMMUNITY.MICROSOFT.COM
Announcing Python in Excel
Announcing Python in Excel: Combining the power of Python and the flexibility of Excel.
❤2🔥2
This media is not supported in your browser
VIEW IN TELEGRAM
Looks like data lineage, profile and quality are getting more attention in the data tools.
Finally there is a dedicated certification for data engineers from AWS
https://aws.amazon.com/certification/certified-data-engineer-associate/
https://aws.amazon.com/certification/certified-data-engineer-associate/
Amazon
certified-data-engineer-associate
Category, Associate. Exam duration, 130 minutes. Exam format, 65 questions; either multiple choice or multiple response. Cost, 150 USD.
🔥4
https://www.dremio.com/blog/exploring-the-architecture-of-apache-iceberg-delta-lake-and-apache-hudi/
Dremio
Exploring the Architecture of Apache Iceberg, Delta Lake, and Apache Hudi | Dremio
Understand how different formats handle metadata for ACID transactions, time travel, and schema evolution in data lakehouses.
RAPIDS cuDF Accelerates pandas Nearly 150x with Zero Code Changes | NVIDIA Technical Blog
https://developer.nvidia.com/blog/rapids-cudf-accelerates-pandas-nearly-150x-with-zero-code-changes/
https://developer.nvidia.com/blog/rapids-cudf-accelerates-pandas-nearly-150x-with-zero-code-changes/
NVIDIA Technical Blog
RAPIDS cuDF Accelerates pandas Nearly 150x with Zero Code Changes
At NVIDIA GTC 2024, it was announced that RAPIDS cuDF can now bring GPU acceleration to 9.5M million pandas users without requiring them to change their code. pandas, a flexible and powerful data…
🔥1
Important announcements from AWS re:Invent 2023:
1. Announcing zero-ETL integrations with AWS Databases and Amazon Redshift
2. AWS launches S3 Express One Zone, promises 10x write speed improvement
3. AWS announces support for large language models in Amazon Redshift ML
1. Announcing zero-ETL integrations with AWS Databases and Amazon Redshift
2. AWS launches S3 Express One Zone, promises 10x write speed improvement
3. AWS announces support for large language models in Amazon Redshift ML
Amazon
Announcing zero-ETL integrations with AWS Databases and Amazon Redshift | Amazon Web Services
As customers become more data driven and use data as a source of competitive advantage, they want to easily run analytics on their data to better understand their core business drivers to grow sales, reduce costs, and optimize their businesses. To run analytics…
The new Elastic Search Query Language (ES|QL) looks a lot like Kusto Query Language (KQL) which I currently use a lot.
Datanami
Elastic Looks to Simplify Queries with New Piped Language
Elasticsearch Query Language (ES|QL), Elastic’s latest offering that introduces a piped query syntax and supports concurrent processing, is being touted
👍2
Nice to see new BI-as-Code tools.
YouTube
The future of BI: Exploring the impact of BI-as-code tools with DuckDB
In this video @mehdio dives into 3 BI-as-code tools (Evidence, Rill, Streamlit) to create beautiful dashboards with DuckDB!
📓 Resources
* Github Repo of the tutorial : https://github.com/mehd-io/duckdb-dataviz-demo
* DuckDB's getting started video : h…
📓 Resources
* Github Repo of the tutorial : https://github.com/mehd-io/duckdb-dataviz-demo
* DuckDB's getting started video : h…
👍2
This use-case puts Apache Doris into perspective. I don't know many alternatives to Clickhouse as open-source data warehouse but it looks like Doris is one of them.
Medium
Apache Doris speeds up data reporting, tagging, and data lake analytics
As much as we say Apache Doris is an all-in-one data platform that is capable of various analytics workloads, it is always compelling to…
👍3🤔1