NEW BOT Телеграм, страница

Data Apps Design

Обожаю Looker.
С версии 21.0 по дефолту дашборды строятся в новом стиле.
Однако часть пользователей привыкли к предыдущей (legacy) версии.

Для обратной совместимости фичей заботливо оставлена возможность включить поддержку Legacy Features.
Вернул к предыдущему виду. Сам дашборд не могу показать :)

#looker #bi

319 viewsArtemiy Kzr, 12:38

Data Apps Design

Начали пылесосить события Github организации Wheely в наше Хранилище.
Интеграция с помощью Webhook:
– PushEvent
– PullRequestEvent
– ReleaseEvent

Пока отталкиваемся от опыта Gitlab – Centralized Engineering Metrics. Интересные метрики:
– MR Rate
– MRs vs Issues
– MRs by team members

Идея – отслеживать метрики и привязывать цели/OKR разработчиков и команд к этим метрикам.

Буду держать в курсе.

#dwh #pipelines

357 viewsArtemiy Kzr, edited 12:51

Data Apps Design

Here's an easy way to generate comprehensive definition (.yml) of your data sources:

dbt run-operation generate_source --args '{"schema_name": "hevo", "generate_columns": "True", "include_denoscriptions": "True"}' > src.yaml

- get the list of attributes for every source table
- include denoscriptions (docs) to be filled

My goals are to:

- Create a single source of truth unifying source data (backend, marketing, events), data marts (DWH), exposures (dashboards, reports) in one place – dbt Docs
- Provide smooth access to docs website via Google SSO / AWS Cognito to whole company
- Cover source tables with freshness and schema tests
- Enable filling in comments and denoscriptions into predefined structure
- Put docs where the code is (version controlled)

I will use the same generator for all the dbt models.

#docs #pipelines #catalog

323 viewsArtemiy Kzr, 15:14

Data Apps Design

This media is not supported in your browser

VIEW IN TELEGRAM

301 viewsArtemiy Kzr, 15:15

Data Apps Design

308 viewsArtemiy Kzr, 15:15

Data Apps Design

Long live Telegram
Rest in peace Facebook, WhatsApp, Instagram
Right now 😄

297 viewsArtemiy Kzr, edited 18:18

Data Apps Design

Here's what Github has sent onto our webhook for past 5 days.

Events of most interest:

- Issues & Pull requests (#, who, when, where, how hard)
- Push (commit frequency and complexity by repos, teams)
- Workflows (Actions metrices)
- Checks (Continuous Integration metrices)

Detailed event payload described at Github Docs.

This data is heavy nested, so new SUPER data type (Redshift) comes really handy for this task to unnest and flatten data.

Soon I will build something worthwhile on top of this data.

#dwh #pipelines #github

304 viewsArtemiy Kzr, 10:31

Data Apps Design

Did a brief Data Infrastructure overview today during onboarding session for new Product Analysts @ Wheely

Follow-up to share with you all

1. dbtCloud – invitations sent, start exploring
– Docs
– Data Sources
– Jobs definition

2. Redshift credentails sent to PM

3. Access Jupyter Hub via corporate email

4. Read more about dbt:
– dbt basics
– Start with dbtCloud IDE
– Alternatively, install dbt@0.20.2 (use Homebrew on Mac) and use with any local IDE (VSCode, PyCharm)

Shout-out to you guys!

#dataops #onboarding

343 viewsArtemiy Kzr, edited 20:23

Data Apps Design

Sometimes you have to test a lot of data quality expectations.

And sometimes tests might catch something glitchy and annoying over and over again, which in fact turns out to be OK.
For example, later arriving data or ELT process time lag.

Since 0.20.0 dbt introduced error_if + warn_if configs

Now it won't fail with ERROR waking me up in the morning

1st pic: error message in Slack
2nd pic: error details (failed tests)
3rd pic: new config which helps avoid errors with < 10 rows

#testing #dbt

311 viewsArtemiy Kzr, 12:03

Data Apps Design

I have updated Wheely's production workloads to a new version of dbt==0.21.0
Along with major improvements to performance, stability, and speed we now have:

– A dbt build command for multi-resource runs
– Handling for column schema changes in incremental models
– Defining configs in all the places you’d expect

An average prod job definition looks like: dbt seed + dbt run + dbt snapshot + dbt test
Now with single dbt build command its going to be really simplified and convenient, building resource by resource, from left to right across your DAG.

New on_schema_change parameter enables additional control when incremental model columns change. Possible strategies are:
– ignore (default): new column will not appear in your target table.
– fail: Triggers an error message when the source and target schemas diverge .
– append_new_columns: Append new columns to the existing table.
– sync_all_columns: Adds any new columns to the existing table, and removes any columns that are now missing.

But nothing comes all good, I’ve faced with a couple of bugs these days:
– One that broke my materialization macro with new dispatch logic
– Serializable isolation violation (unrelated to dbt)

I will describe them in next posts.

Meanwhile read more about upcoming dbt v1.0 in December 2021 !

#dbt #release

dbt Community Forum

Release: dbt Core v0.21 (Louis Kahn)

Updates [Sep 27] v0.21.0 (final) is available for production use. [Sep 27] v0.21.0-rc2 is available. It includes small bug fixes and bumps to schema versions for changed metadata artifacts. [Sep 20] v0.21.0-rc1 is available for prerelease testing. 🔔 Who…

322 viewsArtemiy Kzr, 08:33

Data Apps Design

320 viewsArtemiy Kzr, 08:34

Data Apps Design

This media is not supported in your browser

VIEW IN TELEGRAM

534 viewsArtemiy Kzr, 08:34

Data Apps Design

Just received a small viz on Data professions built on Metabase from a student as a homework assignment.

Simple but still incredible.

We might step onto next level by:
– refreshing data on a regular basis
– expand to a list of other professions like Web, Backend, Designer
– introduce some additional dimensions and metrices
– grant public access to this dashboard

#bi #metabase #hh

304 viewsArtemiy Kzr, 09:02

Data Apps Design

Data Vault 2.0 + Greenplum + dbtVault assignment

Step-by-step instruction available on Github Gist:

1. Spin up Greenplum Cluster on Yandex.Cloud

2. Generate TPCH dataset (10GB)

3. Populate Data Vault with dbtVault package

https://gist.github.com/kzzzr/4ab36bec6897e48e44e792dc2e706de9

We have been discussing this topic for 3 lessons so far (6 hours).
Anyone interested can try it out.

#datavault #greenplum #dbtvault

Gist

Data Vault 2.0 + Greenplum + dbtVault assignment

Data Vault 2.0 + Greenplum + dbtVault assignment. GitHub Gist: instantly share code, notes, and snippets.

329 viewsArtemiy Kzr, edited 17:57

Data Apps Design

290 viewsArtemiy Kzr, 17:57

Data Apps Design

This media is not supported in your browser

VIEW IN TELEGRAM

Full-size: https://habrastorage.org/webt/em/x1/gz/emx1gzcpvtaybd5z0xvxbi2diui.gif

305 viewsArtemiy Kzr, edited 17:57

About

Blog

Apps

Platform