Big Data Science – Telegram
Big Data Science
3.74K subscribers
65 photos
9 videos
12 files
637 links
Big Data Science channel gathers together all interesting facts about Data Science.
For cooperation: a.chernobrovov@gmail.com
💼https://news.1rj.ru/str/bds_job — channel about Data Science jobs and career
💻https://news.1rj.ru/str/bdscience_ru — Big Data Science [RU]
Download Telegram
🚀Google уверенно держит лидерство в гонке ИИ-достижений! Встречайте новый метод масштабного обучения Switch Transformer, когда используется только подкласс веса ML-модели или параметров, которые изменяют входящие данные. Такая простая архитектура снижает время и стоимость процесса обучения, позволяя обрабатывать огромные объемы данных эффективнее сложных алгоритмов. Например, ML-модель c 1,5 трлн параметров обучилась в 4 раза быстрее, чем Т5-XXL от самой Google и в 10 раз обошла главного конкурента – алгоритм GPT-3 от Open AI.
https://syncedreview.com/2021/01/14/google-brains-switch-transformer-language-model-packs-1-6-trillion-parameters/
Учение – свет!☀️ ТОП-5 полезных книг с практическими советами для дата-инженера
1. I Hearts Logs (Jay Kreps, 2014 год, 50 стр) о роли логов в распределенной среде и принципах работы Apache Kafka
2. Designing Data-Intensive Applications (Martin Kleppmann, 2017, 550 стр) – базовые концепции разработки приложений, интенсивно использующих данные, от понятия модели данных до потоковой обработки
3. Rebuilding Reliable Data Pipelines Through Modern Tools (Ted Malaska, 2019, 100 стр.) – основы конвейерной (пайплайной) обработки данных и особенности построения эффективных конвейеров на базе современных технологий Big Data
4. Expert Hadoop Administration (Sam R. Alapati, 2016, 750 стр.) – от понятий MapReduce HDFS к разработке и обеспечению безопасности Spark-кластеров, оптимизации Hadoop и настройке YARN
5. Architecting Modern Data Platforms (Jan Kunigk, Ian Buss, Paul Wilkinson, Lars George, 2018, 600 стр) – особенности локального и облачного развертывания Big Data инфраструктуры, включая все тонкости администрирования внешних служб Hadoop, от ОЗУ сервера и спецификации ЦП кластерных узлов до требований к сетевому соединению
https://towardsdatascience.com/5-books-for-data-engineers-f174bc1e7906
🎯MLOps-tools save your time and efforts to develop, test and deploy Machine Learning models. MlFlow is one of the most useful and popular MLOps-tools. If you are interested how to use it in practice, read this brief article https://medium.com/hashmapinc/why-i-love-mlflow-951b8d1134be
😁Теперь в этом канале мы будем постить интересные новости и статьи сразу на английском языке. А русскоязычные публикации и дайджесты отечественных ивентов читайте здесь: https://news.1rj.ru/str/bdscience_ru
How to streamline the implementation of reasoning systems with ReAgent from Facebook.
ReAgent is the end-to-end platform applied Reinforcement Learning designed for large-scale, distributed recommendation/optimization tasks where we don’t have access to a simulator. The main purpose of this framework is to make the development & experimentation of deep reinforcement algorithms fast. ReAgent is built on Python. It uses PyTorch framework for data modelling. ReAgent holds different algorithms for data preprocessing, feature engineering, model training & evaluation and lastly for optimized serving. It is capable of handling Large-dimension datasets, provides optimized algorithms for data preprocessing, training, and gives a highly efficient production environment for model serving. https://analyticsindiamag.com/hands-on-to-reagent-end-to-end-platform-for-applied-reinforcement-learning/
💦Transparent interpretation of results and permanent learning in production with non-stop adaptation of neural network to new conditions and data
Liquid NN from MIT for decision making in autonomous driving and medical diagnosis based on nervous system of microscopic nematode with 302 neurons and principles of time series data ananlytics. This ML-model edged out other state-of-the-art time series algorithms by a few percentage points in accurately predicting future values in datasets, ranging from atmospheric chemistry to traffic patterns. Just changing the representation of a neuron with the differential equations, you can deal with small number of highly expressive neurons and peer into the “black box” of the network’s decision making and diagnose why the network made a certain characterization.
https://news.mit.edu/2021/machine-learning-adapts-0128
🌷Not only LightGBM and XGBoost: meet new probabilistic prediction algorithm - Natural Gradient Boosting (NGBoost). Released in 2019, NGBoost uses the Natural Gradient to address technical challenges that makes generic probabilistic prediction hard with existing gradient boosting methods. This algorithm consists of three abstract modular components: base learner, parametric probability distribution, and scoring rule. All three components are treated as hyperparameters chosen in advance before training. NGBoost makes it easier to do probabilistic regression with flexible tree-based models. Further, it has been possible to do probabilistic classification for quite some time since most classifiers are actually probabilistic classifiers in that they return probabilities over each class. For instance, logistic regression returns class probabilities as output. In this light, NGBoost doesn’t add much new but experiments on several regression datasets proved that this ML-algorithm provides competitive predictive performance of both uncertainty estimates and traditional metrics. On other hand its computing time is quite longer than other two algorithms and there’s no some useful options, e.g. early stopping, showing the intermediate results, the flexibility of choosing the base learner, setting a random state seed, dealing only with decision tree and Ridge regression,and so on. But this modular ML-algorithm for probabilistic prediction is quite competitive against other popular boosting methods. See more
http://www.51anomaly.org/pdf/NGBOOST.pdf
https://medium.com/@ODSC/using-the-ngboost-algorithm-8d337b753c58
https://towardsdatascience.com/ngboost-explained-comparison-to-lightgbm-and-xgboost-fda510903e53
https://www.groundai.com/project/ngboost-natural-gradient-boosting-for-probabilistic-prediction/1
Deep into NGBoost and probabilistic regression: what is probabilistic supervised learning and how to deal with prediction intervals. About correct interpretation of this ML-algorithm
https://towardsdatascience.com/interpreting-the-probabilistic-predictions-from-ngboost-868d6f3770b2
About tensor holography to create real time 3D-holograms for virtual reality, 3D printing and medical visualization that could be run on your smartphone. Meet new AI-method from MIT researchers https://news.mit.edu/2021/3d-holograms-vr-0310
🤓Deep fake is not too simple: interview with Belgium VFX specialist Chris Ume, creator of viral video about fake Tom Cruise. Why only ML-algorithm is not enough to get high quality result and you need thorough tune video effects manually
https://www.theverge.com/2021/3/5/22314980/tom-cruise-deepfake-tiktok-videos-ai-impersonator-chris-ume-miles-fisher
😜Not only Deep Learning: new approach to build AI systems working as human brain - sparse coding principle to supply series of local functions in synaptic learning rules and reduce number of adjusting data in NN-model. The startup Nara Logics from MIT alumnus is trying to increase effectiveness of AI by mimicking the brain structure and function at the circuit level.
https://news.mit.edu/2021/nara-logics-ai-0312
💥Meet the CLIP (Contrastive Language – Image Pre-training) - new Neural Net from OpenAI: it can be instructed in natural language to perform a great variety of classification benchmarks, without directly optimizing for the benchmark’s performance, similar to the “zero-shot” capabilities of GPT-2 and GPT-3. CLIP is based on zero-shot transfer, natural language supervision, and multimodal learning to recognize a wide variety of visual concepts in images and associate them with their names. Read more where you can use this unique ML-model https://openai.com/blog/clip/
👀 Why modern AI for Computer Vision should have Multimodal Neurons and how this Faceted Feature Visualization rises the accuracy of predictions and classifications. New paper from
OpenAI researchers https://distill.pub/2021/multimodal-neurons/
🤓How to assess the potential effectiveness of medical drugs: new method DeepBAR form MIT researchers to calculate the binding affinities between drug candidates and their targets. It is based on GAN-models for analyzing molecular structures as images
https://news.mit.edu/2021/drug-discovery-binding-affinity-0315