NEW BOT Телеграм, страница

737 viewsArtemiy Kzr, 10:15

Data Apps Design

AWS denounces its own error logs

Your post may include a non-inclusive word (master)

😭😂

610 viewsArtemiy Kzr, 06:42

Data Apps Design

Hey guys,.png

153.3 KB

rePost has detected your post may.png

36 KB

676 viewsArtemiy Kzr, 06:42

Data Apps Design

Hey, long time no see 🙂

We have just started our engagement with dbtLabs at Wheely.

Guys will try to conduct audit and help us improve our dbt deployment even further.

They have already access to:
– dbtCloud (jobs)
– Github repo (code)
– Redshift (database)
– Looker (bi + monitoring)
– Slack (communicating)

And yesterday was the Kick-off call. Overall great impressions.

What is going to happen:
– Audit: Deployment + Performance
– Audit: Project Structure
– Features/Fix's

I will keep you posted.

520 viewsArtemiy Kzr, 05:23

Data Apps Design

Airbyte Clickhouse destination

Airbyte deployed Clickhouse destination which I already use to gather data from multiple sources.

By default it replicates all the data as JSON blobs (all the attributes inside one String field)

To get it flattened you either have to do it yourself or use Airbyte normalization.

1. Flattening manually with JSON functions

JSONExtract(_airbyte_data, 'id', 'UInt64') as id

➕ Could be tricky and exhausting if you have a lot of attributes.

➖ Works extremely fast

2. Airbyte normalization (= dbt underneath 😉)

➕ It will flatten all your data automatically

Technically it is auto-generated dbt project

➖ Still a little bit buggy and looks like a work in progress.

I almost managed to get it done, but I use Yandex’ Managed Clickhouse which demands you use SSL / Secure connection.

Unfortunately, Airbyte’s dbt profiles.yml is hard-configured to secure: False at the moment.

I might create a PR to fix this when I have some time.

#airbyte #clickhouse #dbt

560 viewsArtemiy Kzr, 15:55

568 viewsArtemiy Kzr, 15:56

Data Apps Design

I will try to overcome normalization another day 😄

Leave a comment / reaction if you are interested

610 viewsArtemiy Kzr, 15:56

Data Apps Design

Datalens from Yandex is quite powerful BI tool.

Especially when you use it on top of Clickhouse which makes analytics interactive with sub-second latency.

Amongst outstanding features I've already tried:

— Advanced functions to built almost anything one can imagine: timeseries, arrays, geo, window functions
— Nice and customizable charts integrating with dashboard
— Sharing with team / anyone on the internet

The more I use it, the more I love it.

— Useful docs with examples and how-to
— Really friendly community here in Telegram (important!)
— It is free of charge!

Take a look at how I managed to build Year-over-Year analysis with LAG function and draw different kinds of viz!

LAG([Выручка (₽)], 52 ORDER BY [Неделя] BEFORE FILTER BY [Неделя])

#datalens #clickhouse

Yandex DataLens

Сообщество пользователей Yandex DataLens
- Правила: t.me/YandexDataLens/28609/28610
- Полезное: t.me/YandexDataLens/28609/28894

Номер заявления РКН: 4962849290

636 viewsArtemiy Kzr, edited 16:31

Data Apps Design

photo_2022-01-14_19-25-06.jpg

92.2 KB

photo_2022-01-14_19-25-09.jpg

63.8 KB

photo_2022-01-14_19-25-11.jpg

72.7 KB

Screenshot 2022-01-14 at 19.21.36.png

209.3 KB

667 viewsArtemiy Kzr, 16:31

Data Apps Design

Привет! Новая публикация на Хабр ⬇️⬇️⬇️
Накиньте плюсов, если материал нравится, а я уже готовлю вторую часть.

486 viewsArtemiy Kzr, 06:02

Data Apps Design

[RU] Вредные советы при построении Аналитики (Data Lake / DWH / BI) – чего стоит избегать

Последние месяцы я много занимаюсь рефакторингом кодовой базы, оптимизацией процессов и расчетов в сфере Анализа Данных.

Появилось желание в формате “вредных советов” обратить внимание на набор практик и подходов, которые могут обернуться весьма неприятными последствиями, а порой и вовсе дорого обойтись Вашей компании.

В публикации Вас ожидает:

– Использование select * – всё и сразу
– Употребление чрезмерного количество CTEs (common table expressions)
– NOT DRY (Don’t repeat yourself) – повторение и калейдоскопический характер расчетов

#best_practices #dwh

Читать на Хабр →

Хабр

Вредные советы при построении Аналитики (Data Lake / DWH / BI) – чего стоит избегать

Всем привет! На связи Артемий, со-автор и преподаватель курсов Data Engineer , DWH Analyst . Последние месяцы я много занимаюсь рефакторингом кодовой базы, оптимизацией процессов и расчетов в сфере...

614 viewsArtemiy Kzr, 06:03

Data Apps Design

What is the easiest way to write custom data integration?

1. Fetch source data via API calls – E of ELT
2. Store raw data via S3 / Data Lake – L of ELT
3. Transform data as you wish via dbt – T from ELT

While focusing mainly on Transformations, which the most complex and interesting part and all about delivering business value, it is still essential to perform Extract-Load in clear and understandable way.

Shell noscript is the easist way to perform EL in my opinion (where possible 😉).

Take a look at the example of Fetching exchange rates →

1. Useful shell options options – debugging and safe exit.

set -x expands variables and prints a little + sign before the line.

set -e instructs bash to immediately exit if any command has a non-zero exit status

2. Variables

Either assign directly in bash noscript or provide as Environment Varabiables (preferred).

TS=`date +"%Y-%m-%d-%H-%M-%S-%Z"`

3. Chain or pipe commands

Result of one command could be input to another command.

JSON response fetched from API call gets transferred to AWS S3 bucket directly without any intermediate storage:

curl -H "Authorization: Token $OXR_TOKEN" \
  "https://openexchangerates.org/api/historical/$BUSINESS_DT.json?base=$BASE_CURRENCY&symbols=$SYMBOLS" \
  | aws s3 cp - s3://$BUCKET/$BUCKET_PATH/$BUSINESS_DT-$BASE_CURRENCY-$TS.json

4. Echo log messages

5. Schedule and monitor with Airflow.

Use templates, variables, loops, dynamic DAGs.

Do it right way once and just monitor for any errors. As simple as that.

6. Additional pros:

- Shell (bash, zsh) is already installed on most VMs
- No module importing / lib / dependency crap
- Ability to parallelize heavy commands and do it in optimal way

Gist

Fetching exchange rates

Fetching exchange rates. GitHub Gist: instantly share code, notes, and snippets.

548 viewsArtemiy Kzr, 13:45

Data Apps Design

image_2022-01-31_16-48-38.png

334.2 KB

image_2022-01-31_16-48-39.png

84.5 KB

image_2022-01-31_16-48-39.png

310.1 KB

540 viewsArtemiy Kzr, 13:48

About

Blog

Apps

Platform