💥Mathematics for the Data Scientist, Part 2: Zipf's Law
This empirical pattern of natural language word frequency distribution is often used in quantitative linguistics and NLP problems. Zipf's law says: if all words in a large text are ordered in descending order of frequency of their use, then the frequency of the n-th word in this list will be inversely proportional to its ordinal number n (rank). For example, the second most commonly used word occurs about two times less often than the first, the third - three times less often than the first, etc.
The pattern was first discovered by French stenographer Jean-Baptiste Estoux in 1908. In practice, the law was applied to describe the distribution of city sizes by the German physicist Felix Auerbach in 1913. And the American linguist George Zipf actively popularized this pattern in 1949, proposing to use it to describe the distribution of economic forces and social status: the richest person has twice as much money as the next rich man, etc. An explanation of Zipf's law based on the correlation properties of additive Markov chains (with a step memory function) was given in 2005. Mathematically, Zipf's law is described by the Pareto distribution (the well-known 80 to 20 principle).
The different areas of application of the law (not only linguistics) are explained by the American bioinformatics specialist Wentian Li, who proved that a random sequence of characters also obeys this Zipf's law. Scientist argues that Zipf's law is a statistical phenomenon that has nothing to do with the semantics of a text, and the probability of a random occurrence of any word of length n in a chain of random characters decreases with increasing n in the same proportion as the rank of this word in the frequency list (ordinal scale). Therefore, the multiplication of the rank of a word by its frequency is a constant.
This empirical pattern of natural language word frequency distribution is often used in quantitative linguistics and NLP problems. Zipf's law says: if all words in a large text are ordered in descending order of frequency of their use, then the frequency of the n-th word in this list will be inversely proportional to its ordinal number n (rank). For example, the second most commonly used word occurs about two times less often than the first, the third - three times less often than the first, etc.
The pattern was first discovered by French stenographer Jean-Baptiste Estoux in 1908. In practice, the law was applied to describe the distribution of city sizes by the German physicist Felix Auerbach in 1913. And the American linguist George Zipf actively popularized this pattern in 1949, proposing to use it to describe the distribution of economic forces and social status: the richest person has twice as much money as the next rich man, etc. An explanation of Zipf's law based on the correlation properties of additive Markov chains (with a step memory function) was given in 2005. Mathematically, Zipf's law is described by the Pareto distribution (the well-known 80 to 20 principle).
The different areas of application of the law (not only linguistics) are explained by the American bioinformatics specialist Wentian Li, who proved that a random sequence of characters also obeys this Zipf's law. Scientist argues that Zipf's law is a statistical phenomenon that has nothing to do with the semantics of a text, and the probability of a random occurrence of any word of length n in a chain of random characters decreases with increasing n in the same proportion as the rank of this word in the frequency list (ordinal scale). Therefore, the multiplication of the rank of a word by its frequency is a constant.
😜Welcome to office with a smile!
Canon's biometric access control systems passes into Chinese offices and other work areas only those employees who smile: a smile identification function is built into the face recognition module in video cameras at the entrance. This is expected to enhance corporate spirit and employee loyalty.)
https://www.theverge.com/2021/6/17/22538160/ai-camera-smile-recognition-office-workers-china-canon
Canon's biometric access control systems passes into Chinese offices and other work areas only those employees who smile: a smile identification function is built into the face recognition module in video cameras at the entrance. This is expected to enhance corporate spirit and employee loyalty.)
https://www.theverge.com/2021/6/17/22538160/ai-camera-smile-recognition-office-workers-china-canon
The Verge
Canon put AI cameras in its Chinese offices that only let smiling workers inside
Smile, you hate it here!
👍🏻One word is enough: Facebook AI announced a new project, TextStyleBrush, capable of copying the style of handwritten or typed text on an image using only one word. This is useful for changing and replacing text in photos. Unlike previous AI systems capable of copying text from photographs, TextStyleBrush can work with all types of calligraphy and typography, interpreting various rotations and transformations of text, from curved characters to the natural deformation of paper from pressing a pen. It is important that Facebook understands the possibility of misuse of the TextStyleBrush for malicious acts such as text deepfakes. To prevent such attacks, the company will share research results with the Deepfake Detection Challenge dataset, contributing to the broad knowledge base of DL-fakes.
https://techxplore.com/news/2021-06-facebook-ai-word-mimic-text.html
https://techxplore.com/news/2021-06-facebook-ai-word-mimic-text.html
Techxplore
Facebook AI can now use just one word to mimic text style from images
Facebook has announced their new AI project TextStyleBrush, a software capable of copying the style of handwritten or printed text in an image using only one word. Users can utilize this model to alter ...
💥The future of AI chips: a long history of collaboration and confrontation between AI giants to develop DS solutions. NVIDIA vs Google, the prospects for the development of deep neural networks and microcircuits, as well as money, cats and the Internet. https://www.wired.co.uk/article/nvidia-ai-chips
WIRED UK
NVIDIA and the battle for the future of AI chips
NVIDIA’s GPUs dominate AI chips. But a raft of startups say new architecture is needed for the fast-evolving AI field
👆🏻10 the most interesting DS-conferences all over the world in July 2021
• 11.07 - MDA 2021: 16th International Conference on Mass Data Analysis of Images and Signals with Applications in Medicine, Biotech, and more. New York, NY, USA.
• 12.07 International Conference on Mobile Geomatics and Geodata Science (ICMGGS) - Ottawa, Canada https://waset.org/mobile-geomatics-and-geodata-science-conference-in-july-2021-in-ottawa
• 14.07 MLCon - The AI and ML Developer Virtual Conference, online https://cnvrg.io/mlcon/
• 18.07 - MLDM 2021: 17th International Conference on Machine Learning and Data Mining. New York, NY, USA http://www.mldm.de/
• 18.07 - International Conference on Machine Learning 2021. Online https://icml.cc/
• 20.07 - Chief Data & Analytics Officers (CDAO) Government Live, leading data-driven transformation across the government sector by Corinium. Online https://cdao-gov.coriniumintelligence.com
• 21.07 - Subsurface™ LIVE Summer 2021 - Free and Virtual https://www.dremio.com/subsurface/live/
• 22.07 - Business of Data Festival – Online https://www.boddigitalbroadcast.com/festival/home
• 22.07 - MarketsandMarkets Big Data Virtual Summit. Online https://events.marketsandmarkets.com/2nd-edition-marketsandmarkets-big-data-virtual-summit/
• 26.07 - The 2021 International Conference on Data Science (ICDATA'21). Las Vegas, NV, USA https://icdata.org/
• 11.07 - MDA 2021: 16th International Conference on Mass Data Analysis of Images and Signals with Applications in Medicine, Biotech, and more. New York, NY, USA.
• 12.07 International Conference on Mobile Geomatics and Geodata Science (ICMGGS) - Ottawa, Canada https://waset.org/mobile-geomatics-and-geodata-science-conference-in-july-2021-in-ottawa
• 14.07 MLCon - The AI and ML Developer Virtual Conference, online https://cnvrg.io/mlcon/
• 18.07 - MLDM 2021: 17th International Conference on Machine Learning and Data Mining. New York, NY, USA http://www.mldm.de/
• 18.07 - International Conference on Machine Learning 2021. Online https://icml.cc/
• 20.07 - Chief Data & Analytics Officers (CDAO) Government Live, leading data-driven transformation across the government sector by Corinium. Online https://cdao-gov.coriniumintelligence.com
• 21.07 - Subsurface™ LIVE Summer 2021 - Free and Virtual https://www.dremio.com/subsurface/live/
• 22.07 - Business of Data Festival – Online https://www.boddigitalbroadcast.com/festival/home
• 22.07 - MarketsandMarkets Big Data Virtual Summit. Online https://events.marketsandmarkets.com/2nd-edition-marketsandmarkets-big-data-virtual-summit/
• 26.07 - The 2021 International Conference on Data Science (ICDATA'21). Las Vegas, NV, USA https://icdata.org/
waset.org
International Conference on Mobile Geomatics and Geodata Science ICMGGS in July 2021 in Ottawa
Mobile Geomatics and Geodata Science Conference scheduled on July 12-13, 2021 in July 2021 in Ottawa is for the researchers, scientists, scholars, engineers, academic, scientific and university practitioners to present research activities that might want…
👁Preparing for code review, speeding up development and testing with Copilot for Visual Studio from OpenAI
Write faster and better in Python, JavaScript, TypeScript, Ruby, Go, and a dozen other languages. Copilot runs on Codex, a new AI system from OpenAI, and understands more context than IDE helpers. Whether it's a docstring, a comment, a function name, or the code itself, Copilot uses context and synthesizes the instructions you need to help the developer create a quality product. https://copilot.github.com/
Write faster and better in Python, JavaScript, TypeScript, Ruby, Go, and a dozen other languages. Copilot runs on Codex, a new AI system from OpenAI, and understands more context than IDE helpers. Whether it's a docstring, a comment, a function name, or the code itself, Copilot uses context and synthesizes the instructions you need to help the developer create a quality product. https://copilot.github.com/
GitHub
GitHub Copilot
AI that builds with you
👍1
🏸Game theory as an engine for large-scale data analysis
A new look at principal components analysis as a competitive game, where each approximate eigenvector is controlled by the player, whose goal is to maximize his own utility function. As a multi-agent perspective, it has allowed the development of new ideas and algorithms with the efficient use of the latest computing resources, globally scaling datasets. Brief overview https://deepmind.com/blog/article/EigenGame and detailed article https://openreview.net/forum?id=NzTU59SYbNq
A new look at principal components analysis as a competitive game, where each approximate eigenvector is controlled by the player, whose goal is to maximize his own utility function. As a multi-agent perspective, it has allowed the development of new ideas and algorithms with the efficient use of the latest computing resources, globally scaling datasets. Brief overview https://deepmind.com/blog/article/EigenGame and detailed article https://openreview.net/forum?id=NzTU59SYbNq
Deepmind
Game theory as an engine for large-scale data analysis
Modern AI systems approach tasks like recognising objects in images and predicting the 3D structure of proteins as a diligent student would prepare for an exam. By training on many example problems, they minimise their mistakes over time until they achieve…
🥁NetHack Challenge at NeurIPS 2021 from Facebook: open-source project for Reinforcement Learning (RL) as a game
Many advances in RL have been achieved through simulation environments in games such as Dota 2, Minecraft, and StarCraft II. But this requires a lot of computation on thousands of GPUs at a time for just one experiment. To reduce the cost of RL modeling, Facebook in 2020 initiated the development of the open-source NetHack Learning Environment project. And in 2021, Facebook announced the NeurIPS 2021 competition as part of the NetHack Challenge in conjunction with AIcrowd, an AI crowdsourcing organization. The competition runs from early June to October 15, 2021, and the winners will be announced on NeurIPS in December.
The NetHack game has actually existed since the 1980s. It is visually straightforward and completely free to play, but more complicated than StarCraft II due to the very confusing interaction of players with their environment and related objects, users have to think outside the box or refer to the NetHack Wiki. And the main difficulty of NetHack is that after the death of a character, the game session of this player ends. Therefore, within this RL environment, researchers hope to find new ways to control agents so that in the future, AI can think creatively in difficult situations, helping people. Because NetHack runs on a terminal, players can quickly simulate gameplay by training billions of agents a day on just 2 GPUs. This is how the NetHack Challenge tests the latest AI techniques in a complex environment without the enormous power of a supercomputer. https://techxplore.com/news/2021-06-facebook-nethack-neurips.html
Many advances in RL have been achieved through simulation environments in games such as Dota 2, Minecraft, and StarCraft II. But this requires a lot of computation on thousands of GPUs at a time for just one experiment. To reduce the cost of RL modeling, Facebook in 2020 initiated the development of the open-source NetHack Learning Environment project. And in 2021, Facebook announced the NeurIPS 2021 competition as part of the NetHack Challenge in conjunction with AIcrowd, an AI crowdsourcing organization. The competition runs from early June to October 15, 2021, and the winners will be announced on NeurIPS in December.
The NetHack game has actually existed since the 1980s. It is visually straightforward and completely free to play, but more complicated than StarCraft II due to the very confusing interaction of players with their environment and related objects, users have to think outside the box or refer to the NetHack Wiki. And the main difficulty of NetHack is that after the death of a character, the game session of this player ends. Therefore, within this RL environment, researchers hope to find new ways to control agents so that in the future, AI can think creatively in difficult situations, helping people. Because NetHack runs on a terminal, players can quickly simulate gameplay by training billions of agents a day on just 2 GPUs. This is how the NetHack Challenge tests the latest AI techniques in a complex environment without the enormous power of a supercomputer. https://techxplore.com/news/2021-06-facebook-nethack-neurips.html
Tech Xplore
Facebook to launch NetHack Challenge at NeurIPS 2021
Historically, significant progress in the area of reinforcement learning (RL) has resulted from simulation environments in games such as Dota 2, Minecraft and StarCraft II. Unfortunately, these developments ...
✍🏻Need a quick rewrite? Try NLPAug!
NLPAug is a Python library that allows you to increase the efficiency of neural networks in NLP tasks without changing their architecture and fine-tuning. With it, you can synthesize new text based on the available data, replacing some words with synonyms, incl. by the principle of cosine similarity in vector representations, similar to word2vec or GloVe. NLPAug also performs context-based word replacement using transformers in the form of BERT networks and makes double translation of text into another language and vice versa. https://github.com/makcedward/nlpaug
NLPAug is a Python library that allows you to increase the efficiency of neural networks in NLP tasks without changing their architecture and fine-tuning. With it, you can synthesize new text based on the available data, replacing some words with synonyms, incl. by the principle of cosine similarity in vector representations, similar to word2vec or GloVe. NLPAug also performs context-based word replacement using transformers in the form of BERT networks and makes double translation of text into another language and vice versa. https://github.com/makcedward/nlpaug
GitHub
GitHub - makcedward/nlpaug: Data augmentation for NLP
Data augmentation for NLP . Contribute to makcedward/nlpaug development by creating an account on GitHub.
☀️ML for prediction of Solar Radiation
From a practical agronomic point of view, an accurate assessment of solar radiation is vital because it is a key factor in crop development. Most existing weather stations around the world have temperature and rain sensors, but only some of them measure solar radiation. Measuring solar radiation is usually very expensive due to complex sensors (pyranometers and radiometers) and a lack of reliable data. Therefore, a group of researchers from the University of Cordoba has developed ML-models to predict solar radiation in southern Spain and the United States.
The created ML-models are based not only on actual measurements, but are enriched with data on the geoclimatic conditions of the area (aridity, distance to the sea, altitude, etc.). To estimate daily solar radiation, the proposed neural network algorithms from current data only need information about the air temperature, which is relatively cheap due to inexpensive sensors and IoT technologies. Bayesian algorithms are used to optimize hyperparameters, and the models themselves can be adapted to any terrain, depending on its aridity.
https://techxplore.com/news/2021-07-machine-based-thermal-solar.html
From a practical agronomic point of view, an accurate assessment of solar radiation is vital because it is a key factor in crop development. Most existing weather stations around the world have temperature and rain sensors, but only some of them measure solar radiation. Measuring solar radiation is usually very expensive due to complex sensors (pyranometers and radiometers) and a lack of reliable data. Therefore, a group of researchers from the University of Cordoba has developed ML-models to predict solar radiation in southern Spain and the United States.
The created ML-models are based not only on actual measurements, but are enriched with data on the geoclimatic conditions of the area (aridity, distance to the sea, altitude, etc.). To estimate daily solar radiation, the proposed neural network algorithms from current data only need information about the air temperature, which is relatively cheap due to inexpensive sensors and IoT technologies. Bayesian algorithms are used to optimize hyperparameters, and the models themselves can be adapted to any terrain, depending on its aridity.
https://techxplore.com/news/2021-07-machine-based-thermal-solar.html
Tech Xplore
Machine learning models based on thermal data predict solar radiation
A research team at the University of Córdoba has developed and evaluated models for the prediction of solar radiation in nine locations in southern Spain and North Carolina (USA).
🌦Why is it raining not as predicted and how Yandex Meteum 2.0 deals with it
The story about replacing MatrixNet with CatBoost and new datasets for training NN models. Now Meteum neural nets learn not only on data from professional weather stations, but also on information about terrain features and user messages. https://tekdeeps.com/yandex-has-launched-meteum-2-0-a-new-technology-for-weather-forecasting-based-on-machine-learning/
The story about replacing MatrixNet with CatBoost and new datasets for training NN models. Now Meteum neural nets learn not only on data from professional weather stations, but also on information about terrain features and user messages. https://tekdeeps.com/yandex-has-launched-meteum-2-0-a-new-technology-for-weather-forecasting-based-on-machine-learning/
Tek Deeps
Yandex has launched Meteum 2.0 - a new technology for weather forecasting based on machine learning
Meteum 2.0 technology will help Yandex make more accurate weather forecasts using machine learning algorithms that take into account unusual factors.
✈️AI will schedule flight crews for the US Air Force
The AI system from MIT helps US Air Force pilots plan the workload of personnel on cargo flights, based on many factors: airspace availability, pilot tolerances, work and rest requirements, etc. Combining optimization through integer programming with RL neural networks, the system generates flight schedules based on explicit and implicit constraints. https://news.mit.edu/2021/us-air-force-pilots-artificial-intelligence-assist-scheduling-aircrews-0708
The AI system from MIT helps US Air Force pilots plan the workload of personnel on cargo flights, based on many factors: airspace availability, pilot tolerances, work and rest requirements, etc. Combining optimization through integer programming with RL neural networks, the system generates flight schedules based on explicit and implicit constraints. https://news.mit.edu/2021/us-air-force-pilots-artificial-intelligence-assist-scheduling-aircrews-0708
MIT News
US Air Force pilots get an artificial intelligence assist with scheduling aircrews
MIT, Lincoln Laboratory, and the U.S. Air Force created an AI tool to automate and optimize aircrew scheduling. The tool is designed for the widely used C-17 aircraft and was developed as part of the Dept. of Air Force–MIT AI Accelerator partnership.
🔥Video translation form Yandex
On July 16, 2021, Yandex showed the world's first prototype of machine video translation based on AI technologies of biometrics, speech recognition and speech synthesis. With its help, users of the desktop Yandex Browser can already watch videos in English with voice-over translation. The product will support other languages in the future. https://yandex.ru/company/services_news/2021/2021-07-16
Video:
https://disk.yandex.ru/d/7DYUm9QSfTPn5A
https://www.youtube.com/playlist?list=PLkMNi_iVG-shtwkqd918VUJ80NIOJ2pQf
On July 16, 2021, Yandex showed the world's first prototype of machine video translation based on AI technologies of biometrics, speech recognition and speech synthesis. With its help, users of the desktop Yandex Browser can already watch videos in English with voice-over translation. The product will support other languages in the future. https://yandex.ru/company/services_news/2021/2021-07-16
Video:
https://disk.yandex.ru/d/7DYUm9QSfTPn5A
https://www.youtube.com/playlist?list=PLkMNi_iVG-shtwkqd918VUJ80NIOJ2pQf
Компания Яндекс
Яндекс закрыл сделку по покупке банка
Яндекс закрыл сделку по покупке банка «Акрополь». В результате Яндекс стал стопроцентным акционером банка и получил все лицензии «Акрополя», включая универсальную банковскую.
👀Looking for an enterprise AI solution?
Try NVIDIA's NGC ™ Catalog — a registry of GPU-optimized software for high performance computing and big data analytics across industries ranging from retail chatbots to medical imaging and recommender systems. NGC contains enterprise-grade application containers, pre-trained AI models, and industry-specific SDKs that can be deployed on-premises, in the cloud, or at the edge of the network. For example, NVIDIA TAO is a platform for training, adapting and optimizing AI models that allows you to create enterprise-grade AI applications without deep expert knowledge and large datasets to train them. https://ngc.nvidia.com/
Try NVIDIA's NGC ™ Catalog — a registry of GPU-optimized software for high performance computing and big data analytics across industries ranging from retail chatbots to medical imaging and recommender systems. NGC contains enterprise-grade application containers, pre-trained AI models, and industry-specific SDKs that can be deployed on-premises, in the cloud, or at the edge of the network. For example, NVIDIA TAO is a platform for training, adapting and optimizing AI models that allows you to create enterprise-grade AI applications without deep expert knowledge and large datasets to train them. https://ngc.nvidia.com/
😎Neural networks of the Cloud Mail.ru service help preserve memories
ML algorithms will automatically find pictures taken on a specific day and display them as stories to generate a custom photo calendar illustrating memorable events. Thanks to image recognition methods, only successful frames will be included in the result. And if the user doesn't like the picture, it can be removed directly from the video story. You can share the animated photo gallery with your friends by sending it a message or share it on VK and Instagram. The update is already available in the app for iOS and Android. https://corp.mail.ru/ru/press/releases/10947/
ML algorithms will automatically find pictures taken on a specific day and display them as stories to generate a custom photo calendar illustrating memorable events. Thanks to image recognition methods, only successful frames will be included in the result. And if the user doesn't like the picture, it can be removed directly from the video story. You can share the animated photo gallery with your friends by sending it a message or share it on VK and Instagram. The update is already available in the app for iOS and Android. https://corp.mail.ru/ru/press/releases/10947/
vk.company
VK / Нейросеть Облако Mail.ru выберет лучшие фото для соцсетей
С помощью алгоритмов машинного обучения сервис Облако автоматически отбирает снимки, сделанные в определённый день, и отображает их в привычном формате сторис. Так у пользователя формируется своеобразный фотокалендарь, иллюстрирующий памятные события.
…
…
💦3 main items to build ML-pipeline
There are only 3 basic tools to build an effective machine learning pipeline:
• Feature Store to handle offline and online feature conversions. It support the version-control and integration with data lakes and DWH. It also enables fast service and rapid deployment of code in production. For example, Tecton, Hopsworks, Michelangelo Palette, Zipline, Feature Store from Amazon SageMaker and Databricks.
• Model Store as a central registry of models and the use of experiments. It provides version reproducibility and tracking history of ML models and related artifacts such as Git commits, pickle files, scores, regression, etc. Examples: Weights and Biases, MLFlow, Neptune.ai, EthicalML, and solutions by Amazon, Azure, Google.
• Evaluation Store for monitoring and improving the performance of models. It identifies performance metrics for each ML model in any environment, from training to production, including A/B testing tools and visual dashboard. For example, Arize and Neptune.ai.
Additionally, the data annotation platforms (Appen), ML-model maintenance (Kubeflow, Algorithmia) and AI-orchestration (Spell) will be useful for the system of all teams participating in MLOps-processes.
https://towardsdatascience.com/the-only-3-ml-tools-you-need-1aa750778d33
There are only 3 basic tools to build an effective machine learning pipeline:
• Feature Store to handle offline and online feature conversions. It support the version-control and integration with data lakes and DWH. It also enables fast service and rapid deployment of code in production. For example, Tecton, Hopsworks, Michelangelo Palette, Zipline, Feature Store from Amazon SageMaker and Databricks.
• Model Store as a central registry of models and the use of experiments. It provides version reproducibility and tracking history of ML models and related artifacts such as Git commits, pickle files, scores, regression, etc. Examples: Weights and Biases, MLFlow, Neptune.ai, EthicalML, and solutions by Amazon, Azure, Google.
• Evaluation Store for monitoring and improving the performance of models. It identifies performance metrics for each ML model in any environment, from training to production, including A/B testing tools and visual dashboard. For example, Arize and Neptune.ai.
Additionally, the data annotation platforms (Appen), ML-model maintenance (Kubeflow, Algorithmia) and AI-orchestration (Spell) will be useful for the system of all teams participating in MLOps-processes.
https://towardsdatascience.com/the-only-3-ml-tools-you-need-1aa750778d33
Medium
The Only 3 ML Tools You Need
At a rapid pace, many machine learning techniques have moved from proof of concepts to powering crucial pieces of technology that people…
😎10 the most interesting DS-conferences all over the world in August 2021
09.08 – 2nd Workshop on Knowledge Guided Machine Learning (KGML2021). Online event by University of Minnesota https://sites.google.com/umn.edu/kgmlworkshop/workshop
09.08 – International Conference on Sports Analytics and Data Science. New York, United States. https://waset.org/sports-analytics-and-data-science-conference-in-august-2021-in-new-york
11.08 - ML Data Engineering Community Online-meetup by Tecton. Feature Store, Streaming Architecture, MLOps and other DS-themes. Free registration https://www.applyconf.com/
14.08 - KDD 2021, the premier interdisciplinary data science conference in Singapore. Online https://kdd.org/kdd2021/
14.08 - Fragile Earth 2021, develop radically new technological foundations for advancing and meeting the Sustainable Development Goals. Online annual workshop is part of the Earth Day events at ACM’s KDD 2021 Conference on research in Machine Learning and its applications. https://ai4good.org/fragile-earth-2021/
17.08 - Ai4 2021. Online-conference brings together business leaders and data practitioners to facilitate the adoption of AI and ML technology. https://ai4.io/2021/
19.08 - IJCAI-21: 30th International Joint Conference on Artificial Intelligence. Montreal-themed Virtual Reality, Online. https://ijcai-21.org/
25.08 – Data Science Salon, Applying ML and AI to Retail and Ecommerce. Online https://www.datascience.salon/retail-and-ecommerce/
25.08 – DataOps Virtual Event – Zaloni Company, who is the vendor of Arena DataOps platform, invites CDO and lead DataOps Engineers from AWS, KPMG, PWC and others to provide modern experience of data management and engineering in different business areas. Free registration https://www.zaloni.com/dataops-virtual-event-second-annual/
26.08 – International Conference on Smart Technologies in Data Science and Communication. Paris, France. https://waset.org/smart-technologies-in-data-science-and-communication-conference-in-august-2021-in-paris
09.08 – 2nd Workshop on Knowledge Guided Machine Learning (KGML2021). Online event by University of Minnesota https://sites.google.com/umn.edu/kgmlworkshop/workshop
09.08 – International Conference on Sports Analytics and Data Science. New York, United States. https://waset.org/sports-analytics-and-data-science-conference-in-august-2021-in-new-york
11.08 - ML Data Engineering Community Online-meetup by Tecton. Feature Store, Streaming Architecture, MLOps and other DS-themes. Free registration https://www.applyconf.com/
14.08 - KDD 2021, the premier interdisciplinary data science conference in Singapore. Online https://kdd.org/kdd2021/
14.08 - Fragile Earth 2021, develop radically new technological foundations for advancing and meeting the Sustainable Development Goals. Online annual workshop is part of the Earth Day events at ACM’s KDD 2021 Conference on research in Machine Learning and its applications. https://ai4good.org/fragile-earth-2021/
17.08 - Ai4 2021. Online-conference brings together business leaders and data practitioners to facilitate the adoption of AI and ML technology. https://ai4.io/2021/
19.08 - IJCAI-21: 30th International Joint Conference on Artificial Intelligence. Montreal-themed Virtual Reality, Online. https://ijcai-21.org/
25.08 – Data Science Salon, Applying ML and AI to Retail and Ecommerce. Online https://www.datascience.salon/retail-and-ecommerce/
25.08 – DataOps Virtual Event – Zaloni Company, who is the vendor of Arena DataOps platform, invites CDO and lead DataOps Engineers from AWS, KPMG, PWC and others to provide modern experience of data management and engineering in different business areas. Free registration https://www.zaloni.com/dataops-virtual-event-second-annual/
26.08 – International Conference on Smart Technologies in Data Science and Communication. Paris, France. https://waset.org/smart-technologies-in-data-science-and-communication-conference-in-august-2021-in-paris
Google
Workshop
Background Call for Posters Agenda Confirmed Speakers Organizers Inaugural Workshop Register HERE!
Quicklinks to session details: Opening Session (ML1) Weather and Climate Aquatic Sciences Hydrology…
Quicklinks to session details: Opening Session (ML1) Weather and Climate Aquatic Sciences Hydrology…
🙌🏻🚗On July 22, 2021, Yandex opened the world's largest dataset of self-driving vehicles: more than 1600 hours of movement, divided into 600,000 marked-up fragments of trips on the roads of Russia, Israel and the United States in different weather conditions. The dataset was published for the Shifts Challenge at the international conference NeurIPS 2021 in order to draw attention to the problem of "data shift" in machine learning and reduce the uncertainty of applying ML-models in new conditions. All data are depersonalized. The dataset contains high-precision route maps and tracks of all surrounding cars and pedestrians (their position, speed, acceleration, etc.), without personal data (car numbers or faces of people). Participants have to train ML-algorithms on the provided data and check the quality of their work under shear conditions. Algorithm developers with the best quality will receive cash prizes of 5, 3 and 1 thousand dollars.
https://research.yandex.com/shifts
https://github.com/yandex-research/shifts
https://research.yandex.com/shifts
https://github.com/yandex-research/shifts
Shifts Challenge: Robustness and Uncertainty under Real-World Distributional Shift
We invite researchers and machine learning practitioners from all over the world to participate in our NeurIPS 2021 Shifts Challenge on robustness and uncertainty under real-world distributional shift.
👆🏻What is AUC - ROC Curve and why it is so important to evaluate quality of ML-model?
Area Under the Receiver Operating Characteristics is evaluation metric is used to check or visualize the performance of the multi-class classification problem.
AUC - ROC curve measures a performance of the classification at various threshold settings. ROC is a probability curve and AUC represents the degree or measure of separability. It tells how much the model is capable of distinguishing between classes. Higher the AUC, the better the model is at predicting 0 classes as 0 and 1 classes as 1. Higher the AUC, the better the model is at distinguishing between patients with the disease and no disease.
An excellent model has AUC near to the 1 which means it has a good measure of separability. A poor model has an AUC near 0 which means it has the worst measure of separability. In fact, it means it is reciprocating the result. It is predicting 0s as 1s and 1s as 0s. And when AUC is 0.5, it means the model has no class separation capacity whatsoever.
Sensitivity and Specificity are inversely proportional to each other. So when we increase Sensitivity, Specificity decreases, and vice versa. When we decrease the threshold, we get more positive values thus it increases the sensitivity and decreasing the specificity. Similarly, when we increase the threshold, we get more negative values thus we get higher specificity and lower sensitivity.
In a multi-class model, we can plot the N number of AUC ROC Curves for N number classes using the One vs ALL methodology. So for example, If you have three classes named X, Y, and Z, you will have one ROC for X classified against Y and Z, another ROC for Y classified against X and Z, and the third one of Z classified against Y and X.
https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5
Area Under the Receiver Operating Characteristics is evaluation metric is used to check or visualize the performance of the multi-class classification problem.
AUC - ROC curve measures a performance of the classification at various threshold settings. ROC is a probability curve and AUC represents the degree or measure of separability. It tells how much the model is capable of distinguishing between classes. Higher the AUC, the better the model is at predicting 0 classes as 0 and 1 classes as 1. Higher the AUC, the better the model is at distinguishing between patients with the disease and no disease.
An excellent model has AUC near to the 1 which means it has a good measure of separability. A poor model has an AUC near 0 which means it has the worst measure of separability. In fact, it means it is reciprocating the result. It is predicting 0s as 1s and 1s as 0s. And when AUC is 0.5, it means the model has no class separation capacity whatsoever.
Sensitivity and Specificity are inversely proportional to each other. So when we increase Sensitivity, Specificity decreases, and vice versa. When we decrease the threshold, we get more positive values thus it increases the sensitivity and decreasing the specificity. Similarly, when we increase the threshold, we get more negative values thus we get higher specificity and lower sensitivity.
In a multi-class model, we can plot the N number of AUC ROC Curves for N number classes using the One vs ALL methodology. So for example, If you have three classes named X, Y, and Z, you will have one ROC for X classified against Y and Z, another ROC for Y classified against X and Z, and the third one of Z classified against Y and X.
https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5
Medium
Understanding AUC - ROC Curve
In Machine Learning, performance measurement is an essential task. So when it comes to a classification problem, we can count on an AUC - ROC Curve. When we need to check or visualize the performance…
🚗Yandex robots will deliver food to American students
On July 6, 2021, Yandex entered into a cooperation agreement with the American food delivery service Grubhub to deliver food to US student campuses using Rovers. Developed by Yandex, these autonomous courier robots are based on self-driving car technology and can operate in any weather 24/7. Rovers drive on sidewalks and cross the roads at pedestrian crossings. Since the beginning of 2021, in Russia robots have brought thousands of orders from Yandex.Food and Yandex.Lavka. And since April, they have been delivering orders from restaurants in the American city of Ann Arbor, Michigan.
https://yandex.ru/company/press_releases/2021/07-06-2021
On July 6, 2021, Yandex entered into a cooperation agreement with the American food delivery service Grubhub to deliver food to US student campuses using Rovers. Developed by Yandex, these autonomous courier robots are based on self-driving car technology and can operate in any weather 24/7. Rovers drive on sidewalks and cross the roads at pedestrian crossings. Since the beginning of 2021, in Russia robots have brought thousands of orders from Yandex.Food and Yandex.Lavka. And since April, they have been delivering orders from restaurants in the American city of Ann Arbor, Michigan.
https://yandex.ru/company/press_releases/2021/07-06-2021
Компания Яндекс
Роботы Яндекса займутся доставкой еды в кампусах американских университетов
Яндекс заключил соглашение о сотрудничестве с американским сервисом доставки еды Grubhub. Компания станет партнёром Grubhub по роботизированной доставке в кампусах — студенческих городках при колледжах и университетах в США. Осуществлять доставку будут Роверы…
✈️2nd release TF-Ranking by Google AI
In December 2018, Google AI introduced TF-Ranking, an open-source library based on TensorFlow for developing scalable neural ranking models (LTR, learning-to-rank) that help get an ordered list of items in response to a user queries. Unlike standard classification models, which classify one item at a time, LTR models take a complete list of items as input and look for an order that maximizes the usefulness of the entire list. These LTR models are most common in search and recommendation systems, but TF-Ranking is also used in e-commerce, building smart spaces and cities.
In May 2021, Google AI released its second TF-Ranking release, which provides full support for built-in LTR model building using Keras, the high-level TensorFlow 2 API. The Keras ranking model has a new workflow design, incl. flexible ModelBuilder and DatasetBuilder for customizing the training set, and a pipeline for training the model. Also this version of TF-Ranking supports RaggedTensors, Orbit training library and many more improvements.
And thanks to an in-depth study of the capabilities of the TF-Ranking library, the Google AI team has created a Data Augmented Self-Attentive Latent Cross (DASALC) model that combines transformation of neural network features with data enrichment, ensemble methods, and loss ranking. DASALC eliminates the disadvantages of LTR models and gradient boosting decision trees, while retaining the advantages of these methods.
https://ai.googleblog.com/2021/07/advances-in-tf-ranking.html
https://research.google/pubs/pub50030/
In December 2018, Google AI introduced TF-Ranking, an open-source library based on TensorFlow for developing scalable neural ranking models (LTR, learning-to-rank) that help get an ordered list of items in response to a user queries. Unlike standard classification models, which classify one item at a time, LTR models take a complete list of items as input and look for an order that maximizes the usefulness of the entire list. These LTR models are most common in search and recommendation systems, but TF-Ranking is also used in e-commerce, building smart spaces and cities.
In May 2021, Google AI released its second TF-Ranking release, which provides full support for built-in LTR model building using Keras, the high-level TensorFlow 2 API. The Keras ranking model has a new workflow design, incl. flexible ModelBuilder and DatasetBuilder for customizing the training set, and a pipeline for training the model. Also this version of TF-Ranking supports RaggedTensors, Orbit training library and many more improvements.
And thanks to an in-depth study of the capabilities of the TF-Ranking library, the Google AI team has created a Data Augmented Self-Attentive Latent Cross (DASALC) model that combines transformation of neural network features with data enrichment, ensemble methods, and loss ranking. DASALC eliminates the disadvantages of LTR models and gradient boosting decision trees, while retaining the advantages of these methods.
https://ai.googleblog.com/2021/07/advances-in-tf-ranking.html
https://research.google/pubs/pub50030/
research.google
Advances in TF-Ranking
Posted by Michael Bendersky and Xuanhui Wang, Software Engineers, Google Research In December 2018, we introduced TF-Ranking, an open-source T...