NEW BOT Телеграм, страница

Big Data Science

🏸FastMoE: A Fast Mixture-of-Expert Training System
Mixture-of-Expert (MoE) presents a strong potential in enlarging the size of language model to trillions of parameters. However, training trillion-scale MoE requires algorithm and system co-design for a well-tuned high performance distributed training system. Unfortunately, the only existing platform that meets the requirements strongly depends on Google's hardware (TPU) and software (Mesh Tensorflow) stack, and is not open and available to the public, especially GPU and PyTorch communities.
The FastMoE – the distributed open-source MoE training system based on PyTorch with common accelerators. The system provides a hierarchical interface for both flexible model design and easy adaption to different applications, such as Transformer-XL and Megatron-LM. Different from direct implementation of MoE models using PyTorch, the training speed is highly optimized in FastMoE by sophisticated high-performance acceleration skills. The system supports placing different experts on multiple GPUs across multiple nodes, enabling enlarging the number of experts linearly against the number of GPUs.
https://github.com/laekov/fastmoe
https://arxiv.org/abs/2103.13262

GitHub

GitHub - laekov/fastmoe: A fast MoE impl for PyTorch

A fast MoE impl for PyTorch. Contribute to laekov/fastmoe development by creating an account on GitHub.

526 views04:27

Big Data Science

🔥Not only GPT-3: what is GPT-J-6B
OpenAI's powerful NLP GPT-3 algorithm is not an open source project. Therefore, other companies offer their alternative solutions. The most interesting of them is now considered GPT-J from EleutherAI with 6 billion parameters. The developers promise that GPT-J will provide more flexible and faster output than Tensorflow + TPU counterparts when performing various downstream streaming tasks.
https://6b.eleuther.ai/
https://colab.research.google.com/github/kingoflolz/mesh-transformer-jax/blob/master/colab_demo.ipynb
https://github.com/kingoflolz/mesh-transformer-jax/#gpt-j-6b
https://arankomatsuzaki.wordpress.com/2021/06/04/gpt-j/
https://minimaxir.com/2021/06/gpt-j-6b/

6b.eleuther.ai

EleutherAI - text generation testing UI

EleutherAI web app testing for language models

463 views14:51

Big Data Science

🌸News from MIT: A New AI-Powered Probabilistic Programming Language
It can impartially assess the "fairness" of AI algorithms more accurately and faster than existing alternatives. This Sum-Product Probabilistic Language (SPPL) is a probabilistic programming system - a new area at the intersection of programming languages and AI that simplifies the development of AI solutions using probabilistic models and explanations of observable data.
SPPL offers improved flexibility and robustness through the expressiveness of the language, its precise and simple semantics, and the speed and reliability of its exact character output engine. This avoids pitfalls by limiting it to a carefully designed class of AI models, including decision tree classifiers. SPPL works by compiling probabilistic programs into a specialized data structure called a sum-product expression. However, this approach cannot analyze neural networks, although it works faster than other similar solutions. SPPL is Python-based open source project.
https://news.mit.edu/2021/exact-symbolic-artificial-intelligence-faster-better-assessment-ai-fairness-0809
https://github.com/probcomp/sppl

MIT News

Exact symbolic artificial intelligence for faster, better assessment of AI fairness

A new domain-specific artificial intelligence programming language developed at MIT allows for error-free, exact, automatic solutions to hard AI problems — and it’s thousands of times faster than alternatives. The researchers' Sum-Product Probabilistic Language…

430 views15:17

Big Data Science

👻What is AIOps and how it differs from MLOps
MLOps is an interdisciplinary approach to managing machine learning methods as standalone products with their own life cycle, with a focus on developing, scaling, and applying ML algorithms on an ongoing basis.
MLOps aims to bridge the gap between creating ML models and maintaining them, while AIOps focuses on automating incident management and intelligent root cause analysis.
AIOps solutions use all tracking and reporting data and logs to detect events and apply machine learning and deep learning to notify IT operations of any issues or disruptions.
The goal of AIOps is to improve the efficiency of IT operations by automating the diagnosis of events and using machine learning to pinpoint root causes. These protections provide technical teams with high quality data that is easy to understand by analyzing the distortions generated by monitoring technologies and reducing false positives by allowing them to function in decision making. AIOps goes beyond preventing downtime to include cost containment, security, and AI-powered policy compliance to improve IT operations.
MLOps helps teams choose which tools, methodologies, and documentation will help their ML models go into production, and AIOps helps teams automate their technology lifecycles.
The greatest effect is provided by the combined use of MLOps and AIOps.
https://ai.plainenglish.io/whats-the-difference-between-aiops-and-mlops-15316cfa803d

Medium

What’s the Difference Between AIOps and MLOps?

MLOps bridges the gap between data scientists and operations. AIOps focuses on incident management automation and smart root cause…

403 views16:19

Big Data Science

👆🏻BYOL - Bootstrap Your Own Latent
BYOL is a new approach to self-teaching image representation with 2 neural networks that interact and learn from each other. The online network learns from the representation made by the target network on the same image with various additions. The underlying BYOL architecture is existing ResNet50 or other similar architectures. Input x is padded to t and t ', which are transmitted via the online and target network separately.
The difference between online and target networks is that the former has an MLP architecture with two fully connected layers, and Relu and batchnorm in between. The online network view learns from the view generated by the target network. The online network is updated with a regression loss function whose targets are set by the target network. And the parameters of the target model are updated by the exponential moving average of the online network, allowing you to process more information and avoid decision collapse.
The performance of BYOL is in line with the comparison with the supervised learning architecture of SOTA. There is a slight performance degradation when using only random cropping as image enlargement, but BYOL performs better than SimCLR by iteratively learning from previous versions of its output without using negative pairs with the linear classifier protocol. However, the BYOL approach is not yet applicable to the tasks of processing text, video, and audio.
https://www.youtube.com/watch?v=YPfUiOMYOEE
https://ai.plainenglish.io/byol-bootstrap-your-own-latent-dacee62a3dc8
https://arxiv.org/abs/2006.07733
https://arxiv.org/abs/2010.10241
https://github.com/lucidrains/byol-pytorch

YouTube

BYOL: Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning (Paper Explained)

Self-supervised representation learning relies on negative samples to keep the encoder from collapsing to trivial solutions. However, this paper shows that negative samples, which are a nuisance to implement, are not necessary for learning good representation…

383 views02:22

Big Data Science

💐TOP-15 the most interesting DS-conferences all over the world in September 2021
6-7.09 (offline) and 13-15.09 (online) - AI & Big Data Expo Global, the leading Artificial Intelligence & Big Data Conference & Exhibition, at the Business Design Centre, London https://www.ai-expo.net/global/
9-10.09 – R Conference, New York, Online https://rstats.ai/nyr/
13-17.09 – Data Science Salon Miami Machine Learning & AI Meetup Week. Miami, FL, USA https://www.datascience.salon/miami-ml-meetup-week
14-16.09 - Insurance AI and Innovative Tech USA 2021 – Online Conference by Reuters https://reutersevents.com/events/analyticsusa/
- 15-16.09 - DATA festival #online https://datafestival.de/
- 15-16.09 - Open Data Science Conference, Online https://odsc.com/apac
- 20.09 – 1st Citizen Data Science Summit, Boston https://www.citizen-data-science.org/
- 20-21.09 - International Conference on Advances in Big Data and Data Sciences, Toronto, Canada https://waset.org/advances-in-big-data-and-data-sciences-conference-in-september-2021-in-toronto
- 21.09 – Data Champions Online, Canada https://dco-canada.coriniumintelligence.com/
- 22-23.09 - Big Data LDN, UK largest data & analytics event, Olympia London, UK https://bigdataldn.com/
- 22-23.09 - RE.WORK Deep Learning Summit https://www.re-work.co/events/deep-learning-summit-research and https://www.re-work.co/events/deep-learning-summit-applications
- 28-29.09 – Chief Data & Analytics Officer, Financial Services, Online https://cdao-fs-eu.coriniumintelligence.com/
- 28-30.09 – DataOps Summit Online https://www.dataopssummit-sf.com/about/
- 30.09 - Web Data Extraction Summit 2021 by Zyte https://www.extractsummit.io/

AI & Big Data Expo Global - Conference & Exhibition

AI & Big Data Expo

AI & Big Data Expo, part of TechEx Global, London is the premier the leading conference & exhibition event showcasing Generative AI, Machine Learning & Data. Register your pass.

399 views06:33

Big Data Science

🏸What is AIOps
While we got used to MLOps, a new Ops phenomenon happened in IT, the need for which actually arose a long time ago. Meet AIOps - using AI to simplify IT operations management and accelerate and automate problem solving in today's complex IT environments. AIOps leverages the power of big data, analytics and machine learning for the following purposes:
• Collecting and aggregating huge and ever-growing volumes of operational data generated by many IT infrastructure components, applications and performance monitoring tools;
• Filtering useful signals from noise to reveal really important events and patterns related to the performance and availability of systems;
• identifying root causes and responding quickly to problems, sometimes automatically without human intervention.
By replacing many separate tools for manual IT operations with a single intelligent and automated platform, AIOps enables you to respond quickly and even proactively to slowdowns and system failures with much less effort. AIOps bridges the gap between all diverse, dynamic and complex IT landscapes without sacrificing application performance and availability. With more companies moving from traditional IT infrastructure to a dynamic mix of on-premises clusters, private clouds, and public clouds today, AIOps is relevant for many enterprises.
https://medium.com/geekculture/aiops-6e463cbe617a

Medium

AIOps

AIOps uses artificial intelligence to simplify IT operations management and accelerate and automate problem resolution in complex modern IT…

424 views04:25

Big Data Science

Forwarded from Big Data Science [RU]

AIPS

395 views04:25

Big Data Science

👻What is anomaly detection and how does it work
Anomaly detection is a mathematical search for deviations in controlled and uncontrolled numerical data, depending on how much a particular value differs from others or from the standard deviation in a given sample. There are many different methods for detecting anomalies, called outlier detection algorithms, each with different criteria for detecting them and therefore used in different scenarios. The most common methods used to detect anomalies are:
• General density-based methods: K-Nearest Neighbor (KNN), Local Outlier Factor (LOF), Isolation Forests, and other algorithms that can be applied to regression or classification scenarios. Each of these generates the expected behavior by following the line of the highest density of data points. Points that fall by a statistically significant amount outside these dense zones are flagged as anomaly. Most of these methods are based on distance between points, so it is important to normalize the units and scale in the dataset to get accurate results. For example, in KNN, data points are weighted by 1 / k, where k is the distance to the nearest neighbor. Therefore, the points that are closer to each other have a lot of weight, and affect what is the standard, there are more distant points. The algorithm marks points with a low 1 / k value as outliers. This is suitable for normalized data without labels, when there is no desire and ability to use algorithms with more complex calculations.
• One-class support vector machine is a supervised learning algorithm that creates a robust prediction model. Often used for classification. There is a training set of examples, each labeled as part of one of two categories. The system creates criteria for sorting new examples for each category, matches the examples with points in space in order to distinguish both categories as much as possible. The system will flag an outlier if it goes beyond any category. In the absence of labeled data, you can use unsupervised learning, which looks for clustering among the examples to define categories. This is suitable for working with 2 categories of data, when you need to find which data points lie outside each of them.
• Algorithm for clustering K-means, combining KNN-approaches based on the proximity of each data point to other nearby points and SVM, since it focuses on classification into various categories. Here, each data point is categorized based on its characteristics. The category has a center point that serves as the prototype for all other data points in the cluster. They are all compared to these prototypes to determine their k-mean, which acts as a measure of the difference between the prototype and the current data point. Data points with higher k-means are closer to the prototype, forming a cluster. K-Means Clustering can detect anomalies by marking points that do not fit any of the established categories. This is suitable for scenarios where there is untagged data from many different types that need to be organized similar to the prototypes learned.
There are other more sophisticated algorithms for unsupervised anomaly detection and multidimensional datasets. For example, Gaussian as an alternative version of the K-Means algorithm with Gaussian distribution instead of standard deviation. And Bayesian uses Bayesian probability to detect anomalies. Also, to detect anomalies, autoencoders can be used - neural networks that create coded rules for the expected output depending on the input value. Anything beyond these repetitive values is considered an anomaly and is well suited for dimensional detection tasks.

485 views04:02

Big Data Science

💥TOP 5 useful Python tools for data engineers and web developers
• Requests is an easy-to-use HTTP library for Python that allows you to make requests and interact with the API https://docs.python-requests.org/en/master/
• Advanced Python Scheduler (APScheduler) - a library for deferred execution of Python code once or with periodic repetition. When the tasks are saved in the database, their states and the restart of the scheduler will also be saved. APScheduler can also be used as a cross-platform application-specific replacement for platform-specific schedulers such as the cron daemon or Windows task scheduler. However, APScheduler is not a daemon or service, and therefore does not come with command line tools, but is intended to run inside existing applications. This library provides some ready-made building blocks for creating a scheduler service or for running it in a separate process. https://apscheduler.readthedocs.io/en/stable/userguide.html
• Watchdog - a module for tracking filesystem events through the Python API and shell utilities https://pypi.org/project/watchdog/
• Twilio - a library for automating the sending of text messages and phone calls. It is very convenient for automatic monitoring of events on third-party sites, for example, prompt tracking of discounts on the right products or the appearance of new products https://pypi.org/project/twilio/
• Random User Agent - a library for adding random user agents to requests, which is useful when web parsing data or sending a large number of requests https://pypi.org/project/random-user-agent/

PyPI

watchdog

Filesystem events monitoring

499 views06:14

Big Data Science

✈️Real-Time ML Predictions with Google's Vertex AI
One of the biggest challenges in serving ML-models is providing near real-time predictions. Some business scenarios are especially sensitive to time latency. For example, recommendation systems for online store users, estimating the delivery time of products for food tech companies, etc. On August 25, 2021, Google announced the possibility of direct interaction with Vertex AI - its unified ML platform through private endpoints. Vertex AI allows you to quickly connect a trained and tested ML model to a working application, upload it to a specially prepared server in the Google Cloud, or export it to the desired format.
Vertex Predictions is a serverless way of serving ML models that can be linked in the cloud and made predictions via a REST API. With online forecasts, it is necessary to obtain a model at the endpoint, which will link it to physical computing resources and allow it to be done in almost real time. With VPC Peering, you can configure a private connection to reach an endpoint. By doing this, user data will not pass through the public Internet, which reduces the latency of online predictions and improves security.
https://cloud.google.com/blog/products/ai-machine-learning/creating-a-private-endpoint-on-vertex-ai

Google Cloud Blog

Creating a private endpoint on Vertex AI | Google Cloud Blog

Learn the basics of VPC peering and how to use Private Endpoints on Vertex AI.

463 views06:57

Big Data Science

🏂🏸Adversarial attacks to refine molecular energy predictions
Researchers at MIT have found a new quantitative estimate of the uncertainty of molecular energies using neural networks. Neural networks are often used to predict new resources, speeds, and capabilities orders of magnitude faster than traditional methods such as quo-mechanical simulation. The results obtained can be unreliable, since ML-models are interpolated, it is possible that they fail when applied to the operational data of an external dataset. This is especially for predicting the "potential energy" (PES) or energy map of a molecule in all its configurations. To solve these problems, scientists have proposed safe zones of a neural network using adversarial attacks. The actual simulation is performed only for small parts of the molecule, and the data is fed into the neural network, which learns to predict the same properties for the rest of the molecules. These methods have been successfully tested on new materials, including catalysts for the production of hydrogen from water, cheaper polymer electrolytes for electric vehicles, magnets, etc. However, the accuracy of neural networks depends on the correctness of training data, and incorrect predictions can have disastrous consequences.
One way to find out the uncertainty of a model is to run the same data through several versions of it. To do this, the researchers had several neural networks predicting a potential surface based on the same data. If the network is confident in the prediction, the difference between the outputs of different networks is minimal and the surfaces converge more. Otherwise, the predictions of the various models vary greatly, producing a series of outputs, any of which may be the correct surface.
Forecast scatter represents the uncertainty at a particular point. The ML-model should indicate not only the best forecast, but also the uncertainty of each of them. However, each simulation can take tens to thousands of CPU hours. And to get meaningful results, you need to run multiple models at a sufficient number of points.
Therefore, the new approach only selects data points with low forecast confidence. These molecules are then modified slightly to increase the uncertainty. Additional data is computed for these molecules through simulation, and then the original training pool is added. The neural networks are trained again, and a new set of uncertainties is calculated. This process is repeated until the uncertainty associated with various points on the surface becomes well defined and cannot be further reduced.
The proposed approach has been tested on zeolites - cavernous crystals, selective forms and use in catalysis, gas separation and ion exchange. Modeling large zeolite structures is very expensive, and the researchers show how their method can provide significant savings in computer simulations. But an adversarial approach to retraining neural networks increases performance without significant computational costs.
https://news.mit.edu/2021/using-adversarial-attacks-refine-molecular-energy-predictions-0901

MIT News

Using adversarial attacks to refine molecular energy predictions

MIT engineers create machine learning models that improve themselves by automatically finding new training data to lower their uncertainty. The new algorithm allows them to build models that replace expensive physics-based simulations.

471 views03:26

Big Data Science

🕸Web scraping automation: 3 popular tools
Do you want to track prices in an online store or automate ordering food in a restaurant? Try the following remedies:
• Selenium is a well-known test automation framework that can be used to simulate user behavior and perform actions on websites such as filling out forms, clicking buttons, etc. https://selenium-python.readthedocs.io/
• Beautiful Soup is a Python-package for parsing HTML and XML documents. Creates a parse tree that can be used to extract data when parsing web pages. Very good for simple projects. https://pypi.org/project/beautifulsoup4/
• Scrapy is a fast, high-level website crawling and crawling framework used to extract structured data for mining, monitoring, and automated testing. It is great for complex projects and is much faster than the aforementioned counterparts. https://docs.scrapy.org/en/latest/

PyPI

beautifulsoup4

Screen-scraping library

445 views02:32

Big Data Science

😎Need to develop an app for real-time emotion recognition on video?
Use Face Recognition API! Open-source project for face recognition and control from Python or command line. The ML model was created using the DL face recognition algorithm and has an accuracy of 99.38% in the Labeled Faces in the Wild test.
With Face Recognition API, application development consists of 5 steps:
• receiving video in real time
• applying Python-functions from a ready-to-use API for detecting faces and emotions on objects in a video stream;
• classification of emotions into categories;
• developing a recommendation system;
• building the application and deploying to Heroku, Dash or a web server.
https://github.com/ageitgey/face_recognition

GitHub

GitHub - ageitgey/face_recognition: The world's simplest facial recognition api for Python and the command line

The world's simplest facial recognition api for Python and the command line - ageitgey/face_recognition

435 views07:35

Big Data Science

🚀Data Science в городе: продолжаем серию митапов Ситимобила про Data Science в геосервисах, логистике, приложениях Smart City и т.д. Приглашаем на 2-ю онлайн-встречу 23 сентября в 18:00 МСК. Вас ждут интересные доклады DS-практиков из Ситимобила, Optimate AI и Яндекс.Маршрутизации:
🚕Максим Шаланкин (Data Scientist в гео-сервисе Ситимобил) расскажет о жизненном цикле ML-модели прогнозирования времени в пути с учетом большой нагрузки
🚚Сергей Свиридов (CTO из Optimate AI) объяснит, что не так с классическими эвристиками и методами комбинаторной оптимизации для построения оптимальных маршрутов, и как их можно заменить динамическим программированием
🚛Даниил Тарарухин (Руководитель группы аналитики в Яндекс.Маршрутизации) поделится, как автомобильные пробки влияют на поиск оптимального маршрута и имитационное моделирование этой задачи.
После докладов спикеры ответят на вопросы слушателей.
Ведущий мероприятия – Алексей Чернобровов🛸
Регистрация для бесплатного участия: https://citymobil.timepad.ru/event/1773649/

citymobil.timepad.ru

Citymobil Data Meetup №2 / События на TimePad.ru

Ситимобил запускает митапы о применении Data science в городских и геосервисах, логистике и технологиях умных городов.

1.49K viewsedited 10:12

Big Data Science

🗣4 best practices to improve efficiency from using the Google Cloud Translation API
A web service that dynamically translates between languages using Google ML models supports over 100 languages and is actively used in practice. And if you know useful life hacks, you can reduce costs, increase productivity, and improve the security of this translation API on websites.
1. Caching translated content not only reduces the number of calls to the Google Cloud Translation API, but also reduces the load and computation usage on internal web servers and databases. This optimizes application performance and reduces shipping costs. You can configure caching in an application architecture at different levels of the application. For example, at the proxy level (NGINX or HAProxy), the application itself in memory on web servers, or an external memory caching service, as well as through a CDN.
2. Secure access based on the principle of least privilege. When accessing the Google Cloud Translation API, it is recommended that you use a Google Cloud Service account rather than api keys. A service account is a special type of authentication that represents a non-human user and can be authorized to access data in the Google API. Service accounts are not assigned passwords and cannot be used to log in through a browser, minimizing this threat vector. By following the principle of least privilege, you can grant a least privileged role with a set of permissions to access the translation API.
3. Setting up translations. If your content includes domain and context terms, Google Cloud Translation API Advanced supports custom terminology through a glossary. You can create and use your own translation models using Google AutoML Translation. Customers understand the potential risks of errors and inaccuracies by alerting users that content has been automatically translated by Google.
4. Budget control. The costs associated with the Google Cloud Translation API mainly depend on the number of characters sent to the API. For example, at $ 10 per million characters, if a web page contains 20 million characters and needs to be translated into 10 languages, the cost would be $ 10 * 20 = $ 200. Setting up alerts in your work environment will help you keep track of your budget.
https://cloud.google.com/blog/products/ai-machine-learning/four-best-practices-for-translating-your-website

Google Cloud Blog

Four best practices for translating your website | Google Cloud Blog

Translate your website with Google’s industry leading Machine Learning. Learn best practices for optimizing cost, performance, and security.

477 views03:07

Big Data Science

🍏3 useful Python libraries for Data Scientist
• JMESPath - a library that helps you query for JSON. Useful when working with a large multi-level JSON document or dictionary. JMESPath exposes the object to JavaScript-style access, making it easier to develop and test your code. It's also safe - if any of the paths don't exist, the JMESPath lookup function will return None. https://github.com/jmespath/jmespath.py
• Inflection is a Ruby-derived library that helps you handle complex string processing logic. It translates English words to singular and plural, and also converts strings from CamelCase to underscore. Useful when there are variable or data point names generated in another language or on another system that need to be converted to pythonic style in accordance with the PEP standards. https://github.com/jpvanhal/inflection
• more-itertools - a library that includes a set of useful functions that can be used in various development tasks. For example, write code quickly and gracefully to split one dictionary into multiple lists based on a common repeating key, or to loop through multiple lists. This library will automatically organize your regex implementation and set up recursive constraints. https://github.com/more-itertools/more-itertools

GitHub

GitHub - jmespath/jmespath.py: JMESPath is a query language for JSON.

JMESPath is a query language for JSON. Contribute to jmespath/jmespath.py development by creating an account on GitHub.

497 views03:27

Big Data Science

👀How to evaluate the quality of a multi-object ML model of computer vision?
Tracking multiple objects in a real-world environment is challenging, incl. due to the metrics for evaluating the quality of the ML-model, the purpose of which is to evaluate the tracking accuracy and check the trajectory of a moving object. Suppose, for each frame in the video stream, the tracking system infers the hypothesis 'n', and there are 'm' main true objects in the frame. Then the process of evaluating indicators is as follows:
• Find and match the best match between hypothesis and underlying truth based on their coordinates and using various matching algorithms.
• For each matched pair, find the error in the position of the object.
• Calculate the sum of several errors, such as misses (the tracker was unable to hypothesize for an object), false positives (when the tracker generated a hypothesis, but the object was absent) and mismatch errors (when the hypothesis of the watcher of valid information changed the current frame).
So the performance of the ML-model can be expressed in two metrics:
• MOTP (Multi-Object Tracking Precision) shows how accurately the precise positions of an object are estimated. This is the total error in estimating the location for the overlapping ground truth-hypothesis pairs across all frames, averaged over the total number of matches made. This metric is not responsible for recognizing object configurations and evaluating object trajectories. The metric ranges from 0 to 1. If the MOTP value is 1, then the system's accuracy is poor. And if it is close to zero, then the accuracy of the system is good.
• MOTA (Multi-Object Tracking Accuracy) shows how many errors the tracking system made (misses, false positives, mismatch errors). The metric ranges from –inf to 1. If the MOTA is 1, then the accuracy of the system is good. If the MOTA is near zero or less than zero, then the accuracy of the system is poor.
https://pub.towardsai.net/multi-object-tracking-metrics-1e602f364c0c

Medium

Multi-Object Tracking Metrics

The Evaluation process is one of the most important steps in build a Machine Learning Model. Especially when it comes to real-time…

451 views03:55

Big Data Science

😜Need sentiment analytics in YouTube comments?
Over 2 billion users watch YouTube videos at least once a month. Popular YouTube bloggers have billions of views. But you can't please all subscribers and public opinion is constantly changing. Build your user sentiment analysis model with Youtube-Comment-Scraper, a Python library for getting comments on YouTube videos using browser automation (only works on Windows for now). This open-source project will help create a dashboard that analyzes the attitude of subscribers to videos of popular youtubers. The work will be reduced to the following steps:
• collecting the necessary comments to the video from YouTube users;
• using a pretrained ML model to make predictions for each comment;
• visualization of model forecasts on a dashboard, incl. using Dash in Python or Shiny in R.
Add interactivity with filters to sentiment analysis results by release time, video author, and genre.
https://pypi.org/project/youtube-comment-scraper-python/

PyPI

youtube-comment-scraper-python

A python library to scrape video's comments data from youtube automatically.

458 views05:02

Big Data Science

Register for the free international online conference DataArt IT NonStop 2021!

IT NonStop will be held on November 18-20, 2021.
This year, we will be focusing on Cloud, Data, and Machine Learning & Artificial Intelligence. Market leaders will take the stage and share their own knowledge, case studies and best solutions. The main working language of the conference is English, however there will be a special Junior track on November 20 that will be delivered mostly in Russian. November 20 will be also dedicated to workshops.

More than 30 speakers from Microsoft, AWS, Ocado, Codete, Ciklum, Eleks, SoftServe, Toloka, Yandex, DataArt, and other market leaders will take stage at IT NonStop 2021. We can't list all of them in one post, so here are the selected few workshops:
— "Creating Real-Time Data Streaming powered by SQL on Kubernetes", Albert Lewandowski, Big Data DevOps Engineer, GetInData.
— "Create your own cognitive portrait in 60 minutes", Dmitry Soshnikov, Cloud Developer Advocate, Microsoft.
— "Training unbiased and accurate AI models", Robert Yenokyan, AI Lead, Pinsight.
The whole list of speakers and topics is available on our webpage and it's constantly growing.
You can still sign up for our conference. Registration is open and it's free for everyone!

Briefly about the IT NonStop Conference:
When: November 18-20
Venue: online and free of charge
Registration: https://it-nonstop.net/register-to-the-conference/?utm_source=bdscience&utm_medium=referral

410 viewsedited 15:18

Big Data Science

🍁TOP-10 the most interesting DS-conferences all over the world in October 2021
1. 5-8 Oct - NLP Summit, Applied Natural Language Processing, online. https://www.nlpsummit.org/nlp-2021/
2. 6-7 Oct - TransformX AI Conference, with 100+ speakers including Andrew Ng, Fei-fei Li, free and open to the public. Online. https://www.aicamp.ai/event/eventdetails/W2021100608
3. 6-9 Oct -The 8th IEEE International Conference on Data Science and Advanced Analytics, Porto, Portugal https://dsaa2021.dcc.fc.up.pt/
4. 12-14 Oct - Google Cloud Next '21, a global digital experience. Online. https://cloud.withgoogle.com/next
5. 12-14 Oct - Chief Data & Analytics Officers (CDAO). Online. https://cdao-fall.coriniumintelligence.com/virtual-home
6. 13-14 Oct - Big Data and AI Toronto. Online. https://www.bigdata-toronto.com/register
7. 15 – 17 Oct - DataScienceGO, UCLA Campus, Los Angeles, USA https://www.datasciencego.com/united-states
8. 19 Oct - Graph + AI Summit Fall 2021 - open conference for accelerating analytics and AI with Graph. New York, NY, USA and virtual https://info.tigergraph.com/graphai-fall
9. 20 – 21 Oct - RE.WORK Conversational AI for Enterprise Summit, Online. https://www.re-work.co/summits/conversational-ai-enterprise-summit
10. 21 Oct - DSS Mini Salon: The Future of Retail with Behavioral Data, Online. https://www.datascience.salon/snowplow-analytics-mini-virtual-salon
🍂

NLP Summit

NLP Summit 2021 - NLP Summit

528 views11:01

About

Blog

Apps

Platform