Forwarded from LEFT JOIN
Лучшие практики использования SQL по версии Metabase.
Некоторые советы действительно ценные. Из того, что реально встречалось на практике особенно выделил бы один, так как в свое время удалось существенно оптимизировать время выполнения запроса:
Prefer EXISTS to IN
If you just need to verify the existence of a value in a table, prefer EXISTS to IN, as the EXISTS process exits as soon as it finds the search value, whereas IN will scan the entire table. IN should be used for finding values in lists.
Metabase довольно интересный инструмент, на одном из проектов используем его, надо бы записать видео в продолжение Гайда по BI.
Если у кого-то есть время и интерес поисследовать Metabase, а также записать видео, пишите мне в DM: @valiotti.
Некоторые советы действительно ценные. Из того, что реально встречалось на практике особенно выделил бы один, так как в свое время удалось существенно оптимизировать время выполнения запроса:
Prefer EXISTS to IN
If you just need to verify the existence of a value in a table, prefer EXISTS to IN, as the EXISTS process exits as soon as it finds the search value, whereas IN will scan the entire table. IN should be used for finding values in lists.
Metabase довольно интересный инструмент, на одном из проектов используем его, надо бы записать видео в продолжение Гайда по BI.
Если у кого-то есть время и интерес поисследовать Metabase, а также записать видео, пишите мне в DM: @valiotti.
Metabase | Business Intelligence, Dashboards, and Data Visualization
Best practices for writing SQL queries | Metabase Learn
SQL best practices: a brief guide to writing better SQL queries.
Here are more than 100 free course from Microsoft about #DataEngineering, #DataScience and #DataAnalytics. Many of them are #Azure related but also many focus on general knowledge in mention fields.
https://docs.microsoft.com/en-us/learn/browse/?terms=data&roles=ai-engineer%2Cdata-analyst%2Cdata-engineer%2Cdata-scientist
https://docs.microsoft.com/en-us/learn/browse/?terms=data&roles=ai-engineer%2Cdata-analyst%2Cdata-engineer%2Cdata-scientist
Docs
Browse all - Learn
Learn new skills and discover the power of Microsoft products with step-by-step guidance. Start your journey today by exploring our learning paths and modules.
Interesting long read about databases for website analytics. Besides telling about their experience of choosing right solution, author describes migration process and different options. However, he doesn't mention Druid which I think can be a good fit.
https://usefathom.com/blog/worlds-fastest-analytics
Credits to: https://news.1rj.ru/str/rockyourdata/2448
https://usefathom.com/blog/worlds-fastest-analytics
Credits to: https://news.1rj.ru/str/rockyourdata/2448
Fathom Analytics
Building the world’s fastest website analytics - Fathom Analytics
For over a year, we’d been struggling to keep up with our analytics data growth. Fathom had been growing at the speed of light, with more and more people ditching Google Analytics, our data ingestion was going through the roof.
Did you notice that there are no books on data engineering or just a couple which mostly describe different technologies that data engineers can use. Of course, there is "Designing Data-Intensive Applications", but this book makes sense if you have some experience. Courses also focus on technologies but not general knowledge that data engineer should have. Also technologies advertise what problems they solve, but these problems are familiar to more or less experienced data engineers. And currently there is a high demand of data engineers.
What will you look for to become a data engineer?
Anonymous Poll
31%
A book
19%
A video course
25%
A training or bootcamp
25%
An internship / real project
Azure Synapse Analytics vs Azure Databricks, a detailed comparison of feature and use-cases
https://www.element61.be/en/resource/when-use-azure-synapse-analytics-andor-azure-databricks
https://www.element61.be/en/resource/when-use-azure-synapse-analytics-andor-azure-databricks
element61
When to use Azure Synapse Analytics & Azure Databricks?
What is Azure Synapse Analytics?Azure Synapse Analytics is the Azure SQL Datawarehouse rebranded. Azure Synapse Analytics v2 (workspaces incl. Azure Synapse Studio) is still in preview. This version of Azure Synapse Analytics integrates existing and new analytical…
So as you may know, AWS has Lambda and Step Functions. But it seems Azure took a bit different approach with Durable functions. I am not sure if this is a right comparison, but it is interesting idea to have not only stateless functions but also stateful ones.
Docs
Durable Functions Overview - Azure
Introduction to the Durable Functions extension for Azure Functions.
I wonder how this is implemented under the hood and is AWS planning similar feature for S3?
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace
Docs
Azure Data Lake Storage Gen2 Hierarchical Namespace
Describes the concept of a hierarchical namespace for Azure Data Lake Storage Gen2
New data engineering podcast episode about Superset with author of Superset and Airflow.
Data Engineering Podcast
Data Engineering Podcast: Self Service Data Exploration And Dashboarding With Superset
An interview with Maxime Beauchemin about how to use Apache Superset as a platform for self-service data exploration and analytics.
Классный доклад про Kusto (Azure Data Explorer) от eго создателей. Согласно докладy Им пользуются практически все команды в Microsoft. О может заменить ElasticSearch и Solr. А мне он немножко напоминает Druid.
https://youtu.be/Kkd2rYQZAVU
https://youtu.be/Kkd2rYQZAVU
YouTube
Александр Слуцкий, Глеб Лесников — Kusto (Azure Data Explorer): Интерактивная платформа Big Data
Ближайшая конференция: SmartData 2022 – 17–18 октября (Online), 29 октября (Offline)
Билеты – https://bit.ly/3amdcNO — . Kusto — это новая и стремительно набирающая обороты платформа для работы с Big Data. Несколько лет назад она завоевала весь Майкрософт…
Билеты – https://bit.ly/3amdcNO — . Kusto — это новая и стремительно набирающая обороты платформа для работы с Big Data. Несколько лет назад она завоевала весь Майкрософт…
New support for Databricks and Apache Spark in dbt Cloud
https://blog.getdbt.com/analytics-engineering-for-everyone-databricks-in-dbt-cloud/
https://blog.getdbt.com/analytics-engineering-for-everyone-databricks-in-dbt-cloud/
Transform data in your warehouse
Analytics Engineering for Everyone: Databricks in dbt Cloud
This SQL-first integration with Databricks means that analysts can build fully automated data pipelines in the same space that data engineers & data scientists work in their preferred frameworks.
#Rust is paving its way into data engineering
https://arrow.apache.org/blog/2021/04/12/ballista-donation/
https://arrow.apache.org/blog/2021/04/12/ballista-donation/
Apache Arrow
Ballista: A Distributed Scheduler for Apache Arrow
We are excited to announce that Ballista has been donated to the Apache Arrow project. Ballista is a distributed scheduler for the Rust implementation of Apache Arrow.