NEW BOT Телеграм, страница

Data Engineering / Инженерия данных / Data Engineer / DWH

Apache Spark / PySpark Tutorial: Basics In 15 Mins

This video gives an introduction to the Spark ecosystem and world of Big Data, using the Python Programming Language and its PySpark API. We also discuss the idea of parallel and distributed computing, and computing on a cluster of machines.

https://youtu.be/QLQsW8VbTN4

YouTube

Apache Spark / PySpark Tutorial: Basics In 15 Mins

Thank you for watching the video! Here is the notebook: https://github.com/gahogg/YouTube-I-mostly-use-colab-now-/blob/master/PySpark%20In%2015%20Minutes.ipynb

I offer 1 on 1 tutoring for Data Structures & Algos, and Analytics / ML! Book a free consultation…

👍2

382 views17:58

Data Engineering / Инженерия данных / Data Engineer / DWH

_Mastering Ubuntu Server - Third Edition.pdf

15.2 MB

Mastering Ubuntu Server - Third Edition

https://github.com/PacktPublishing/Mastering-Ubuntu-Server_Third-Edition

368 views17:34

Data Engineering / Инженерия данных / Data Engineer / DWH

Yacht vs. Portainer - Docker dashboard comparison - Virtualization Howto
https://www.virtualizationhowto.com/2022/12/yacht-vs-portainer-docker-dashboard-comparison/

Virtualization Howto

Yacht vs. Portainer - Docker dashboard comparison

Yacht vs. Portainer - Docker dashboard comparison. A look at Yacht, the Portainer alternative and which one you should use.

342 views08:39

Data Engineering / Инженерия данных / Data Engineer / DWH

Source: https://www.linkedin.com/posts/timo-dechau_in-our-little-data-world-are-we-naming-things-activity-6925303646817529856-Nu-U/
---
In our little data world are we naming things too much based on our marketing perspective. And is there serious over-selling going on.

Maybe yes.

Let’s do some examples:

dbt is not a data model tool. I see this notion quite often. It’s first a SQL orchestration and testing tool. Of course, I can use it to build and manage a data model. But this requires me to do the thinking not dbt

Snowflake and BigQuery are not data warehouses. Great people like .Rogier Werschkull. and Chad Sanderson remind us about that. They are analytical databases in the cloud. Of course, you can build a data warehouse with them. But this requires you to come up with a concept and architecture.

Fivetran and Airbyte are not ELT tools - they extract and load for you. And you are in charge of the transformation. They are basically supermarkets with self-checkout. Great idea but you have to do more.

Segment and Rudderstack are not really CDPs - Arpit Choudhury has written a great piece about it - they are customer data infrastructure, the collection and identity stitching layer

Reverse ETL is just ETL

Why is this important?

Because often these labels create expectations about the solution that these tools can’t fulfill.

When I set up Snowflake and think that I have a data warehouse now - I create huge expectations in my organization that I can’t fulfill.

Same with dbt - Ok, we need a data model, let’s use dbt for this. And then you add one sql file to the next one and call it a model.

Tools are tools, just that.

In our little data world are we naming things too much based on our… | Timo Dechau | 48 comments

In our little data world are we naming things too much based on our marketing perspective. And is there serious over-selling going on.

Maybe yes.

Let’s do some examples:

dbt is not a data model tool. I see this notion quite often. It’s first a SQL orchestration…

374 views15:19

Data Engineering / Инженерия данных / Data Engineer / DWH

From DataCamp
1.Introduction to Airflow
2.Airflow DAGs
3.Airflow web interface

421 views15:59

Data Engineering / Инженерия данных / Data Engineer / DWH

Kubernetes Cheat Sheet - Page 1 & Page 2

458 views18:47

Data Engineering / Инженерия данных / Data Engineer / DWH

Pandas vs PySpark DataFrame With Examples

Let’s learn the difference between Pandas vs PySpark DataFrame, their definitions, features, advantages, how to create them and transform one to another with Examples.

👉 @devops_dataops

https://sparkbyexamples.com/pyspark/pandas-vs-pyspark-dataframe-with-examples/

Spark By {Examples}

Pandas vs PySpark DataFrame With Examples

What are the differences between Pandas and PySpark DataFrame? Pandas and PySpark are both powerful tools for data manipulation and analysis in Python.

540 viewsedited 11:56

Data Engineering / Инженерия данных / Data Engineer / DWH

Артем Шутак — Вставить в ClickHouse и не умереть

Презентация

Подборка ClickHouse + Spark - Altinity Knowledge Base
-----------
👉 @devops_dataops

YouTube

Артем Шутак — Вставить в ClickHouse и не умереть

Подробнее о конференции SmartData: https://jrg.su/aTWU2K
— —
Казалось бы, что может быть проще, чем вставить данные в БД?! Но в Одноклассниках это делают 2 года и ClickHouse не перестает удивлять.

Артём Шутак из Одноклассников. Их инсталляция — это примерно…

🔥2

489 views15:34

Data Engineering / Инженерия данных / Data Engineer / DWH

Github Actions - Введение в CI/CD

00:00 - О чем курс
03:50 - Github вводный курс
12:35 - Начало работы с Github Actions
18:20 - Пишем первый workflow
29:17 - Автоматически тестируем React
37:57 - Что такое Actions
48:25 - Усложняем workflow (практика)
53:40 - Зависимость job и их порядок
01:00:18 - Context & Events
01:21:19 - Добавление cache
01:28:13 - Matrix
01:35:44 - Artifacts
01:45:25 - Environment & Secrets

https://www.youtube.com/watch?v=e0A2hDObLmg

YouTube

Github Actions - Введение в CI/CD

Если вы хотите увидеть работу ИИ изнутри и собрать свой первый проект за 3 дня, присоединяйтесь к марафону. Вы пройдёте путь разработчика в комфортном темпе и получите результат уже в первый день.

Регистрация открыта 👉🏼 https://resuni.ru/obvUR

Телеграм…

517 views13:26

Data Engineering / Инженерия данных / Data Engineer / DWH

The Clickhouse plugin for dbt (data build tool)

https://github.com/ClickHouse/dbt-clickhouse/

GitHub

GitHub - ClickHouse/dbt-clickhouse: The Clickhouse plugin for dbt (data build tool)

The Clickhouse plugin for dbt (data build tool). Contribute to ClickHouse/dbt-clickhouse development by creating an account on GitHub.

467 views18:38

Data Engineering / Инженерия данных / Data Engineer / DWH

Интересная модель монетизации у этого софта, вроде опенсоурс, но и есть разумные плюшки, которые можно получить только в платной версии (пользователи и роли + поддержка).
Ну и сама идея появления платформ с low-code подходом как open-source тоже интересная.
----
Tooljet | Open-source low-code platform to build internal tools

Extensible low-code framework for building business applications. Connect to databases, cloud storages, GraphQL, API endpoints, Airtable, etc and build apps using drag and drop application builder. Built using JavaScript/TypeScript.

https://www.tooljet.com/

Tooljet

ToolJet | Build Full-Stack Enterprise Apps in Minutes with AI

ToolJet helps you build enterprise apps, AI agents and workflows in minutes, not months. Create secure internal apps that are enterprise ready and SOC2, GDPR and ISO compliant.

452 views05:05

Data Engineering / Инженерия данных / Data Engineer / DWH

Kubernetes Explained in 6 Minutes | k8s Architecture - YouTube
https://m.youtube.com/watch?v=TlHvYWVUZyc

YouTube

6:28

Kubernetes Explained in 6 Minutes | k8s Architecture

To get better at system design, subscribe to our weekly newsletter: https://bit.ly/3tfAlYD

Checkout our bestselling System Design Interview books:
Volume 1: https://amzn.to/3Ou7gkd
Volume 2: https://amzn.to/3HqGozy

ABOUT US:
Covering topics and trends…

460 views06:35

Data Engineering / Инженерия данных / Data Engineer / DWH

ChatGPT Tutorial for Developers - 38 Ways to 10x Your Productivity

https://www.youtube.com/watch?v=sTeoEFzVNSc

YouTube

ChatGPT Tutorial for Developers - 38 Ways to 10x Your Productivity

Learn how to use ChatGPT to 10x your productivity! 38 examples using Python, JavaScript, HTML, CSS, React, SQL and more!

- Subscribe for more ChatGPT tutorials: https://goo.gl/6PYaGF

ChatGPT Desktop App: https://github.com/f/awesome-chatgpt-prompts
ChatGPT…

👍2

524 viewsedited 09:43

Data Engineering / Инженерия данных / Data Engineer / DWH

Data Build Tool (dbt). Transformation in Modern data stack | by Amit Singh Rathore | Jan, 2023 | Dev Genius
https://blog.devgenius.io/data-build-tool-dbt-1f0b03d97cc6

Medium

Data Build Tool (dbt)

Transformation in Modern data stack

499 views16:32

Data Engineering / Инженерия данных / Data Engineer / DWH

Как запушить в Gitlab пакет npm, помогло 👍

Publishing your private npm packages to Gitlab NPM Registry

https://shivamarora.medium.com/publishing-your-private-npm-packages-to-gitlab-npm-registry-39d30a791085

Medium

Publishing your private npm packages to Gitlab NPM Registry

Configure npm, yarn, lerna to publish packages to Gitlab Package Registry and use them as dependencies in your project

436 viewsedited 17:10

Data Engineering / Инженерия данных / Data Engineer / DWH

Очередная подборочка инструментов Awesome-Selfhosted

A list of Free Software network services and web applications which can be hosted on your own servers

https://github.com/awesome-selfhosted/awesome-selfhosted

GitHub

GitHub - awesome-selfhosted/awesome-selfhosted: A list of Free Software network services and web applications which can be hosted…

A list of Free Software network services and web applications which can be hosted on your own servers - awesome-selfhosted/awesome-selfhosted

👍1

442 views11:46

Data Engineering / Инженерия данных / Data Engineer / DWH

Prescriber-ETL-data-pipeline

An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS application using Apache Airflow as an orchestration tool and various data warehouse technologies and finally using Apache Superset to connect to DWH for generating BI dashboards for weekly reports

https://github.com/judeleonard/Prescriber-ETL-data-pipeline

GitHub

GitHub - judeleonard/Prescriber-ETL-data-pipeline: An End-to-End ETL data pipeline that leverages pyspark parallel processing to…

An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS application using Apache Airflow as an orchestration tool and ...

👍1

543 views13:36

Data Engineering / Инженерия данных / Data Engineer / DWH

airflow-docker

This is my Apache Airflow Local development setup on Windows 10 WSL2/Mac using docker-compose. It will also include some sample DAGs and workflows.

https://github.com/anilkulkarni87/airflow-docker

GitHub

GitHub - anilkulkarni87/airflow-docker: This is my Apache Airflow Local development setup on Windows 10 WSL2/Mac using docker-compose.…

This is my Apache Airflow Local development setup on Windows 10 WSL2/Mac using docker-compose. It will also include some sample DAGs and workflows. - anilkulkarni87/airflow-docker

640 views14:00

About

Blog

Apps

Platform