Source: https://www.linkedin.com/posts/timo-dechau_in-our-little-data-world-are-we-naming-things-activity-6925303646817529856-Nu-U/
---
In our little data world are we naming things too much based on our marketing perspective. And is there serious over-selling going on.
Maybe yes.
Let’s do some examples:
dbt is not a data model tool. I see this notion quite often. It’s first a SQL orchestration and testing tool. Of course, I can use it to build and manage a data model. But this requires me to do the thinking not dbt
Snowflake and BigQuery are not data warehouses. Great people like .Rogier Werschkull. and Chad Sanderson remind us about that. They are analytical databases in the cloud. Of course, you can build a data warehouse with them. But this requires you to come up with a concept and architecture.
Fivetran and Airbyte are not ELT tools - they extract and load for you. And you are in charge of the transformation. They are basically supermarkets with self-checkout. Great idea but you have to do more.
Segment and Rudderstack are not really CDPs - Arpit Choudhury has written a great piece about it - they are customer data infrastructure, the collection and identity stitching layer
Reverse ETL is just ETL
Why is this important?
Because often these labels create expectations about the solution that these tools can’t fulfill.
When I set up Snowflake and think that I have a data warehouse now - I create huge expectations in my organization that I can’t fulfill.
Same with dbt - Ok, we need a data model, let’s use dbt for this. And then you add one sql file to the next one and call it a model.
Tools are tools, just that.
---
In our little data world are we naming things too much based on our marketing perspective. And is there serious over-selling going on.
Maybe yes.
Let’s do some examples:
dbt is not a data model tool. I see this notion quite often. It’s first a SQL orchestration and testing tool. Of course, I can use it to build and manage a data model. But this requires me to do the thinking not dbt
Snowflake and BigQuery are not data warehouses. Great people like .Rogier Werschkull. and Chad Sanderson remind us about that. They are analytical databases in the cloud. Of course, you can build a data warehouse with them. But this requires you to come up with a concept and architecture.
Fivetran and Airbyte are not ELT tools - they extract and load for you. And you are in charge of the transformation. They are basically supermarkets with self-checkout. Great idea but you have to do more.
Segment and Rudderstack are not really CDPs - Arpit Choudhury has written a great piece about it - they are customer data infrastructure, the collection and identity stitching layer
Reverse ETL is just ETL
Why is this important?
Because often these labels create expectations about the solution that these tools can’t fulfill.
When I set up Snowflake and think that I have a data warehouse now - I create huge expectations in my organization that I can’t fulfill.
Same with dbt - Ok, we need a data model, let’s use dbt for this. And then you add one sql file to the next one and call it a model.
Tools are tools, just that.
Linkedin
In our little data world are we naming things too much based on our… | Timo Dechau | 48 comments
In our little data world are we naming things too much based on our marketing perspective. And is there serious over-selling going on.
Maybe yes.
Let’s do some examples:
dbt is not a data model tool. I see this notion quite often. It’s first a SQL orchestration…
Maybe yes.
Let’s do some examples:
dbt is not a data model tool. I see this notion quite often. It’s first a SQL orchestration…
From DataCamp
1.Introduction to Airflow
2.Airflow DAGs
3.Airflow web interface
1.Introduction to Airflow
2.Airflow DAGs
3.Airflow web interface
Kubernetes Cheat Sheet - Page 1 & Page 2
Pandas vs PySpark DataFrame With Examples
Let’s learn the difference between Pandas vs PySpark DataFrame, their definitions, features, advantages, how to create them and transform one to another with Examples.
👉 @devops_dataops
https://sparkbyexamples.com/pyspark/pandas-vs-pyspark-dataframe-with-examples/
Let’s learn the difference between Pandas vs PySpark DataFrame, their definitions, features, advantages, how to create them and transform one to another with Examples.
👉 @devops_dataops
https://sparkbyexamples.com/pyspark/pandas-vs-pyspark-dataframe-with-examples/
Spark By {Examples}
Pandas vs PySpark DataFrame With Examples
What are the differences between Pandas and PySpark DataFrame? Pandas and PySpark are both powerful tools for data manipulation and analysis in Python.
Артем Шутак — Вставить в ClickHouse и не умереть
Презентация
Подборка ClickHouse + Spark - Altinity Knowledge Base
-----------
👉 @devops_dataops
Презентация
Подборка ClickHouse + Spark - Altinity Knowledge Base
-----------
👉 @devops_dataops
YouTube
Артем Шутак — Вставить в ClickHouse и не умереть
Подробнее о конференции SmartData: https://jrg.su/aTWU2K
— —
Казалось бы, что может быть проще, чем вставить данные в БД?! Но в Одноклассниках это делают 2 года и ClickHouse не перестает удивлять.
Артём Шутак из Одноклассников. Их инсталляция — это примерно…
— —
Казалось бы, что может быть проще, чем вставить данные в БД?! Но в Одноклассниках это делают 2 года и ClickHouse не перестает удивлять.
Артём Шутак из Одноклассников. Их инсталляция — это примерно…
🔥2
Github Actions - Введение в CI/CD
00:00 - О чем курс
03:50 - Github вводный курс
12:35 - Начало работы с Github Actions
18:20 - Пишем первый workflow
29:17 - Автоматически тестируем React
37:57 - Что такое Actions
48:25 - Усложняем workflow (практика)
53:40 - Зависимость job и их порядок
01:00:18 - Context & Events
01:21:19 - Добавление cache
01:28:13 - Matrix
01:35:44 - Artifacts
01:45:25 - Environment & Secrets
https://www.youtube.com/watch?v=e0A2hDObLmg
00:00 - О чем курс
03:50 - Github вводный курс
12:35 - Начало работы с Github Actions
18:20 - Пишем первый workflow
29:17 - Автоматически тестируем React
37:57 - Что такое Actions
48:25 - Усложняем workflow (практика)
53:40 - Зависимость job и их порядок
01:00:18 - Context & Events
01:21:19 - Добавление cache
01:28:13 - Matrix
01:35:44 - Artifacts
01:45:25 - Environment & Secrets
https://www.youtube.com/watch?v=e0A2hDObLmg
YouTube
Github Actions - Введение в CI/CD
Если вы хотите увидеть работу ИИ изнутри и собрать свой первый проект за 3 дня, присоединяйтесь к марафону. Вы пройдёте путь разработчика в комфортном темпе и получите результат уже в первый день.
Регистрация открыта 👉🏼 https://resuni.ru/obvUR
Телеграм…
Регистрация открыта 👉🏼 https://resuni.ru/obvUR
Телеграм…
Интересная модель монетизации у этого софта, вроде опенсоурс, но и есть разумные плюшки, которые можно получить только в платной версии (пользователи и роли + поддержка).
Ну и сама идея появления платформ с low-code подходом как open-source тоже интересная.
----
Tooljet | Open-source low-code platform to build internal tools
Extensible low-code framework for building business applications. Connect to databases, cloud storages, GraphQL, API endpoints, Airtable, etc and build apps using drag and drop application builder. Built using JavaScript/TypeScript.
https://www.tooljet.com/
Ну и сама идея появления платформ с low-code подходом как open-source тоже интересная.
----
Tooljet | Open-source low-code platform to build internal tools
Extensible low-code framework for building business applications. Connect to databases, cloud storages, GraphQL, API endpoints, Airtable, etc and build apps using drag and drop application builder. Built using JavaScript/TypeScript.
https://www.tooljet.com/
Tooljet
ToolJet | Build Full-Stack Enterprise Apps in Minutes with AI
ToolJet helps you build enterprise apps, AI agents and workflows in minutes, not months. Create secure internal apps that are enterprise ready and SOC2, GDPR and ISO compliant.
Kubernetes Explained in 6 Minutes | k8s Architecture - YouTube
https://m.youtube.com/watch?v=TlHvYWVUZyc
https://m.youtube.com/watch?v=TlHvYWVUZyc
YouTube
Kubernetes Explained in 6 Minutes | k8s Architecture
To get better at system design, subscribe to our weekly newsletter: https://bit.ly/3tfAlYD
Checkout our bestselling System Design Interview books:
Volume 1: https://amzn.to/3Ou7gkd
Volume 2: https://amzn.to/3HqGozy
ABOUT US:
Covering topics and trends…
Checkout our bestselling System Design Interview books:
Volume 1: https://amzn.to/3Ou7gkd
Volume 2: https://amzn.to/3HqGozy
ABOUT US:
Covering topics and trends…
ChatGPT Tutorial for Developers - 38 Ways to 10x Your Productivity
https://www.youtube.com/watch?v=sTeoEFzVNSc
https://www.youtube.com/watch?v=sTeoEFzVNSc
YouTube
ChatGPT Tutorial for Developers - 38 Ways to 10x Your Productivity
Learn how to use ChatGPT to 10x your productivity! 38 examples using Python, JavaScript, HTML, CSS, React, SQL and more!
- Subscribe for more ChatGPT tutorials: https://goo.gl/6PYaGF
ChatGPT Desktop App: https://github.com/f/awesome-chatgpt-prompts
ChatGPT…
- Subscribe for more ChatGPT tutorials: https://goo.gl/6PYaGF
ChatGPT Desktop App: https://github.com/f/awesome-chatgpt-prompts
ChatGPT…
👍2
Data Build Tool (dbt). Transformation in Modern data stack | by Amit Singh Rathore | Jan, 2023 | Dev Genius
https://blog.devgenius.io/data-build-tool-dbt-1f0b03d97cc6
https://blog.devgenius.io/data-build-tool-dbt-1f0b03d97cc6
Medium
Data Build Tool (dbt)
Transformation in Modern data stack
Как запушить в Gitlab пакет npm, помогло 👍
Publishing your private npm packages to Gitlab NPM Registry
https://shivamarora.medium.com/publishing-your-private-npm-packages-to-gitlab-npm-registry-39d30a791085
Publishing your private npm packages to Gitlab NPM Registry
https://shivamarora.medium.com/publishing-your-private-npm-packages-to-gitlab-npm-registry-39d30a791085
Medium
Publishing your private npm packages to Gitlab NPM Registry
Configure npm, yarn, lerna to publish packages to Gitlab Package Registry and use them as dependencies in your project
Очередная подборочка инструментов Awesome-Selfhosted
A list of Free Software network services and web applications which can be hosted on your own servers
https://github.com/awesome-selfhosted/awesome-selfhosted
A list of Free Software network services and web applications which can be hosted on your own servers
https://github.com/awesome-selfhosted/awesome-selfhosted
GitHub
GitHub - awesome-selfhosted/awesome-selfhosted: A list of Free Software network services and web applications which can be hosted…
A list of Free Software network services and web applications which can be hosted on your own servers - awesome-selfhosted/awesome-selfhosted
👍1
Prescriber-ETL-data-pipeline
An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS application using Apache Airflow as an orchestration tool and various data warehouse technologies and finally using Apache Superset to connect to DWH for generating BI dashboards for weekly reports
https://github.com/judeleonard/Prescriber-ETL-data-pipeline
An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS application using Apache Airflow as an orchestration tool and various data warehouse technologies and finally using Apache Superset to connect to DWH for generating BI dashboards for weekly reports
https://github.com/judeleonard/Prescriber-ETL-data-pipeline
GitHub
GitHub - judeleonard/Prescriber-ETL-data-pipeline: An End-to-End ETL data pipeline that leverages pyspark parallel processing to…
An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS application using Apache Airflow as an orchestration tool and ...
👍1
airflow-docker
This is my Apache Airflow Local development setup on Windows 10 WSL2/Mac using docker-compose. It will also include some sample DAGs and workflows.
https://github.com/anilkulkarni87/airflow-docker
This is my Apache Airflow Local development setup on Windows 10 WSL2/Mac using docker-compose. It will also include some sample DAGs and workflows.
https://github.com/anilkulkarni87/airflow-docker
GitHub
GitHub - anilkulkarni87/airflow-docker: This is my Apache Airflow Local development setup on Windows 10 WSL2/Mac using docker-compose.…
This is my Apache Airflow Local development setup on Windows 10 WSL2/Mac using docker-compose. It will also include some sample DAGs and workflows. - anilkulkarni87/airflow-docker
Неплохая сравнительная табличка по инструментам metadata management
Awesome Data Discovery and Observability
https://github.com/opendatadiscovery/awesome-data-catalogs
Awesome Data Discovery and Observability
https://github.com/opendatadiscovery/awesome-data-catalogs
GitHub
GitHub - opendatadiscovery/awesome-data-catalogs: 📙 Awesome Data Catalogs and Observability Platforms.
📙 Awesome Data Catalogs and Observability Platforms. - GitHub - opendatadiscovery/awesome-data-catalogs: 📙 Awesome Data Catalogs and Observability Platforms.
👍1
Примеры из курса про Apache Airflow 2.0
https://github.com/adilkhash/apache-airflow-course-materials
https://github.com/adilkhash/apache-airflow-course-materials
GitHub
GitHub - adilkhash/apache-airflow-course-materials: Курс про Apache Airflow 2.0
Курс про Apache Airflow 2.0. Contribute to adilkhash/apache-airflow-course-materials development by creating an account on GitHub.
❤1👍1🔥1