Top 15 Scala Libraries for Data Science in 2021 | Scalac.io
https://scalac.io/blog/top-15-scala-libraries-for-data-science-in-2021/
https://scalac.io/blog/top-15-scala-libraries-for-data-science-in-2021/
Scalac - Software Development Company - Akka, Kafka, Spark, ZIO
Top 15 Scala Libraries for Data Science in 2023
In this article, we will take a look at what libraries can help us with our first custom ML algorithm.
Forwarded from DataEng
Про data engineering для тех, кто не в теме: https://www.youtube.com/watch?v=qWru-b6m030
Классное вводное видео.
Классное вводное видео.
YouTube
How Data Engineering Works
So, the sole purpose of data engineering is to take data from the source and save it to make it available for analysis. Sounds simple, but it’s the matter of the system that works under the hood.
Watch our video to find out more about data engineering:
00:00…
Watch our video to find out more about data engineering:
00:00…
Cloud Data Warehousing: Understanding Your Options
https://www.datanami.com/2021/04/01/cloud-data-warehousing-understanding-your-options/?utm_source=rss&utm_medium=rss&utm_campaign=cloud-data-warehousing-understanding-your-options
https://www.datanami.com/2021/04/01/cloud-data-warehousing-understanding-your-options/?utm_source=rss&utm_medium=rss&utm_campaign=cloud-data-warehousing-understanding-your-options
Datanami
Cloud Data Warehousing: Understanding Your Options
Cloud data warehouses have emerged as the go-to repositories for amassing huge amounts of data and running advanced analytics and AI upon it. This is
Forwarded from LEFT JOIN
Лучшие практики использования SQL по версии Metabase.
Некоторые советы действительно ценные. Из того, что реально встречалось на практике особенно выделил бы один, так как в свое время удалось существенно оптимизировать время выполнения запроса:
Prefer EXISTS to IN
If you just need to verify the existence of a value in a table, prefer EXISTS to IN, as the EXISTS process exits as soon as it finds the search value, whereas IN will scan the entire table. IN should be used for finding values in lists.
Metabase довольно интересный инструмент, на одном из проектов используем его, надо бы записать видео в продолжение Гайда по BI.
Если у кого-то есть время и интерес поисследовать Metabase, а также записать видео, пишите мне в DM: @valiotti.
Некоторые советы действительно ценные. Из того, что реально встречалось на практике особенно выделил бы один, так как в свое время удалось существенно оптимизировать время выполнения запроса:
Prefer EXISTS to IN
If you just need to verify the existence of a value in a table, prefer EXISTS to IN, as the EXISTS process exits as soon as it finds the search value, whereas IN will scan the entire table. IN should be used for finding values in lists.
Metabase довольно интересный инструмент, на одном из проектов используем его, надо бы записать видео в продолжение Гайда по BI.
Если у кого-то есть время и интерес поисследовать Metabase, а также записать видео, пишите мне в DM: @valiotti.
Metabase | Business Intelligence, Dashboards, and Data Visualization
Best practices for writing SQL queries | Metabase Learn
SQL best practices: a brief guide to writing better SQL queries.
Here are more than 100 free course from Microsoft about #DataEngineering, #DataScience and #DataAnalytics. Many of them are #Azure related but also many focus on general knowledge in mention fields.
https://docs.microsoft.com/en-us/learn/browse/?terms=data&roles=ai-engineer%2Cdata-analyst%2Cdata-engineer%2Cdata-scientist
https://docs.microsoft.com/en-us/learn/browse/?terms=data&roles=ai-engineer%2Cdata-analyst%2Cdata-engineer%2Cdata-scientist
Docs
Browse all - Learn
Learn new skills and discover the power of Microsoft products with step-by-step guidance. Start your journey today by exploring our learning paths and modules.
Interesting long read about databases for website analytics. Besides telling about their experience of choosing right solution, author describes migration process and different options. However, he doesn't mention Druid which I think can be a good fit.
https://usefathom.com/blog/worlds-fastest-analytics
Credits to: https://news.1rj.ru/str/rockyourdata/2448
https://usefathom.com/blog/worlds-fastest-analytics
Credits to: https://news.1rj.ru/str/rockyourdata/2448
Fathom Analytics
Building the world’s fastest website analytics - Fathom Analytics
For over a year, we’d been struggling to keep up with our analytics data growth. Fathom had been growing at the speed of light, with more and more people ditching Google Analytics, our data ingestion was going through the roof.
Did you notice that there are no books on data engineering or just a couple which mostly describe different technologies that data engineers can use. Of course, there is "Designing Data-Intensive Applications", but this book makes sense if you have some experience. Courses also focus on technologies but not general knowledge that data engineer should have. Also technologies advertise what problems they solve, but these problems are familiar to more or less experienced data engineers. And currently there is a high demand of data engineers.
What will you look for to become a data engineer?
Anonymous Poll
31%
A book
19%
A video course
25%
A training or bootcamp
25%
An internship / real project
Azure Synapse Analytics vs Azure Databricks, a detailed comparison of feature and use-cases
https://www.element61.be/en/resource/when-use-azure-synapse-analytics-andor-azure-databricks
https://www.element61.be/en/resource/when-use-azure-synapse-analytics-andor-azure-databricks
element61
When to use Azure Synapse Analytics & Azure Databricks?
What is Azure Synapse Analytics?Azure Synapse Analytics is the Azure SQL Datawarehouse rebranded. Azure Synapse Analytics v2 (workspaces incl. Azure Synapse Studio) is still in preview. This version of Azure Synapse Analytics integrates existing and new analytical…
So as you may know, AWS has Lambda and Step Functions. But it seems Azure took a bit different approach with Durable functions. I am not sure if this is a right comparison, but it is interesting idea to have not only stateless functions but also stateful ones.
Docs
Durable Functions Overview - Azure
Introduction to the Durable Functions extension for Azure Functions.
I wonder how this is implemented under the hood and is AWS planning similar feature for S3?
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace
Docs
Azure Data Lake Storage Gen2 Hierarchical Namespace
Describes the concept of a hierarchical namespace for Azure Data Lake Storage Gen2