#Transform-Invariant #ConvolutionalNeuralNetworks for Image #Classification and Search
Convolutional neural networks (CNNs) have achieved state-of-the-art results on many visual recognition tasks. However, current CNN models still exhibit a poor ability to be invariant to spatial transformations of images. Intuitively, with sufficient layers and parameters, hierarchical combinations of convolution (matrix multiplication and non-linear activation) and pooling operations should be able to learn a robust mapping from transformed input images to transform-invariant representations. In this paper, we propose randomly transforming (rotation, scale, and translation) feature maps of CNNs during the training stage. This prevents complex dependencies of specific rotation, scale, and translation levels of training images in #CNN models. Rather, each convolutional kernel learns to detect a feature that is generally helpful for producing the transform-invariant answer given the combinatorially large variety of transform levels of its input feature maps. In this way, we do not require any extra training supervision or modification to the optimization process and training images. We show that random transformation provides significant improvements of CNNs on many benchmark tasks, including small-scale image recognition, large-scale image recognition, and image retrieval. The code is available at https://github.com/jasonustc/caffe-multigpu/tree/TICNN.
Paper
🔭 @DeepGravity
Convolutional neural networks (CNNs) have achieved state-of-the-art results on many visual recognition tasks. However, current CNN models still exhibit a poor ability to be invariant to spatial transformations of images. Intuitively, with sufficient layers and parameters, hierarchical combinations of convolution (matrix multiplication and non-linear activation) and pooling operations should be able to learn a robust mapping from transformed input images to transform-invariant representations. In this paper, we propose randomly transforming (rotation, scale, and translation) feature maps of CNNs during the training stage. This prevents complex dependencies of specific rotation, scale, and translation levels of training images in #CNN models. Rather, each convolutional kernel learns to detect a feature that is generally helpful for producing the transform-invariant answer given the combinatorially large variety of transform levels of its input feature maps. In this way, we do not require any extra training supervision or modification to the optimization process and training images. We show that random transformation provides significant improvements of CNNs on many benchmark tasks, including small-scale image recognition, large-scale image recognition, and image retrieval. The code is available at https://github.com/jasonustc/caffe-multigpu/tree/TICNN.
Paper
🔭 @DeepGravity
GitHub
jasonustc/caffe-multigpu
linux && windows compatible caffe. Contribute to jasonustc/caffe-multigpu development by creating an account on GitHub.
#AI Transformation Playbook, How to lead your company into the AI era, by Andrew #Ng
This AI Transformation Playbook draws on insights gleaned from leading the #Google Brain team and the Baidu AI Group, which played leading roles in transforming both Google and Baidu into great AI companies. It is possible for any enterprise to follow this Playbook and become a strong AI company, though these recommendations are tailored primarily for larger enterprises with a market cap/valuation from $500M to $500B.
Link
🔭 @DeepGravity
This AI Transformation Playbook draws on insights gleaned from leading the #Google Brain team and the Baidu AI Group, which played leading roles in transforming both Google and Baidu into great AI companies. It is possible for any enterprise to follow this Playbook and become a strong AI company, though these recommendations are tailored primarily for larger enterprises with a market cap/valuation from $500M to $500B.
Link
🔭 @DeepGravity
When Does Label Smoothing Help?
Rafael Müller, Simon Kornblith, Geoffrey #Hinton
The #generalization and learning speed of a multi-class neural network can often be significantly improved by using soft targets that are a weighted average of the hard targets and the uniform distribution over labels. Smoothing the labels in this way prevents the network from becoming over-confident and label smoothing has been used in many state-of-the-art models, including image classification, language translation and speech recognition. Despite its widespread use, label smoothing is still poorly understood. Here we show empirically that in addition to improving generalization, label smoothing improves model calibration which can significantly improve beam-search. However, we also observe that if a teacher network is trained with label smoothing, knowledge distillation into a student network is much less effective. To explain these observations, we visualize how label smoothing changes the representations learned by the penultimate layer of the network. We show that label smoothing encourages the representations of training examples from the same class to group in tight clusters. This results in loss of information in the logits about resemblances between instances of different classes, which is necessary for distillation, but does not hurt generalization or calibration of the model's predictions.
Paper
🔭 @DeepGravity
Rafael Müller, Simon Kornblith, Geoffrey #Hinton
The #generalization and learning speed of a multi-class neural network can often be significantly improved by using soft targets that are a weighted average of the hard targets and the uniform distribution over labels. Smoothing the labels in this way prevents the network from becoming over-confident and label smoothing has been used in many state-of-the-art models, including image classification, language translation and speech recognition. Despite its widespread use, label smoothing is still poorly understood. Here we show empirically that in addition to improving generalization, label smoothing improves model calibration which can significantly improve beam-search. However, we also observe that if a teacher network is trained with label smoothing, knowledge distillation into a student network is much less effective. To explain these observations, we visualize how label smoothing changes the representations learned by the penultimate layer of the network. We show that label smoothing encourages the representations of training examples from the same class to group in tight clusters. This results in loss of information in the logits about resemblances between instances of different classes, which is necessary for distillation, but does not hurt generalization or calibration of the model's predictions.
Paper
🔭 @DeepGravity
#DeepSpeech 0.6: Mozilla’s #Speech_to_Text Engine Gets Fast, Lean, and Ubiquitous
The #MachineLearning team at #Mozilla continues work on DeepSpeech, an automatic speech recognition (ASR) engine which aims to make speech recognition technology and trained models openly available to developers. DeepSpeech is a deep learning-based ASR engine with a simple API. We also provide pre-trained English models.
Our latest release, version v0.6, offers the highest quality, most feature-packed model so far. In this overview, we’ll show how DeepSpeech can transform your applications by enabling client-side, low-latency, and privacy-preserving speech recognition capabilities.
Link
🔭 @DeepGravity
The #MachineLearning team at #Mozilla continues work on DeepSpeech, an automatic speech recognition (ASR) engine which aims to make speech recognition technology and trained models openly available to developers. DeepSpeech is a deep learning-based ASR engine with a simple API. We also provide pre-trained English models.
Our latest release, version v0.6, offers the highest quality, most feature-packed model so far. In this overview, we’ll show how DeepSpeech can transform your applications by enabling client-side, low-latency, and privacy-preserving speech recognition capabilities.
Link
🔭 @DeepGravity
Mozilla Hacks – the Web developer blog
DeepSpeech 0.6: Mozilla’s Speech-to-Text Engine Gets Fast, Lean, and Ubiquitous
The Machine Learning team at Mozilla continues work on DeepSpeech, an automatic speech recognition (ASR) engine which aims to make speech recognition technology and trained models openly available to developers. ...
Beyond #Accuracy: #Precision and #Recall
Precision is defined as the number of true positives divided by the number of true positives plus the number of false positives. False positives are cases the model incorrectly labels as positive that are actually negative, or in our example, individuals the model classifies as terrorists that are not. While recall expresses the ability to find all relevant instances in a dataset, precision expresses the proportion of the data points our model says was relevant actually were relevant.
Link
🔭 @DeepGravity
Precision is defined as the number of true positives divided by the number of true positives plus the number of false positives. False positives are cases the model incorrectly labels as positive that are actually negative, or in our example, individuals the model classifies as terrorists that are not. While recall expresses the ability to find all relevant instances in a dataset, precision expresses the proportion of the data points our model says was relevant actually were relevant.
Link
🔭 @DeepGravity
Medium
Beyond Accuracy: Precision and Recall
Choosing the right metrics for classification tasks
A Gentle Introduction to KFold Cross-Validation
KFold vs StratifiedKFold
Just a small notebook to point out that KFold and Stratified may not do what you think.
🔭 @DeepGravity
KFold vs StratifiedKFold
Just a small notebook to point out that KFold and Stratified may not do what you think.
🔭 @DeepGravity
Deciphering interaction fingerprints from protein molecular surfaces using geometric #DeepLearning
Abstract
Predicting interactions between proteins and other biomolecules solely based on structure remains a challenge in biology. A high-level representation of protein structure, the molecular surface, displays patterns of chemical and geometric features that fingerprint a protein’s modes of interactions with other biomolecules. We hypothesize that proteins participating in similar interactions may share common fingerprints, independent of their evolutionary history. Fingerprints may be difficult to grasp by visual analysis but could be learned from large-scale datasets. We present MaSIF (molecular surface interaction fingerprinting), a conceptual framework based on a geometric deep learning method to capture fingerprints that are important for specific biomolecular interactions. We showcase MaSIF with three prediction challenges: protein pocket-ligand prediction, protein–protein interaction site prediction and ultrafast scanning of protein surfaces for prediction of protein–protein complexes. We anticipate that our conceptual framework will lead to improvements in our understanding of protein function and design.
Paper
🔭 @DeepGravity
Abstract
Predicting interactions between proteins and other biomolecules solely based on structure remains a challenge in biology. A high-level representation of protein structure, the molecular surface, displays patterns of chemical and geometric features that fingerprint a protein’s modes of interactions with other biomolecules. We hypothesize that proteins participating in similar interactions may share common fingerprints, independent of their evolutionary history. Fingerprints may be difficult to grasp by visual analysis but could be learned from large-scale datasets. We present MaSIF (molecular surface interaction fingerprinting), a conceptual framework based on a geometric deep learning method to capture fingerprints that are important for specific biomolecular interactions. We showcase MaSIF with three prediction challenges: protein pocket-ligand prediction, protein–protein interaction site prediction and ultrafast scanning of protein surfaces for prediction of protein–protein complexes. We anticipate that our conceptual framework will lead to improvements in our understanding of protein function and design.
Paper
🔭 @DeepGravity
Nature
Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning
Nature Methods - MaSIF, a deep learning-based method, finds common patterns of chemical and geometric features on biomolecular surfaces for predicting protein–ligand and protein–protein...
Machine Unlearning
Once users have shared their data online, it is generally difficult for them to revoke access and ask for the data to be deleted. Machine learning (ML) exacerbates this problem because any model trained with said data may have memorized it, putting users at risk of a successful privacy attack exposing their information. Yet, having models unlearn is notoriously difficult. After a data point is removed from a training set, one often resorts to entirely retraining downstream models from scratch. We introduce SISA training, a framework that decreases the number of model parameters affected by an unlearning request and caches intermediate outputs of the training algorithm to limit the number of model updates that need to be computed to have these parameters unlearn. This framework reduces the computational overhead associated with unlearning, even in the worst-case setting where unlearning requests are made uniformly across the training set. In some cases, we may have a prior on the distribution of unlearning requests that will be issued by users. We may take this prior into account to partition and order data accordingly and further decrease overhead from unlearning. Our evaluation spans two datasets from different application domains, with corresponding motivations for unlearning. Under no distributional assumptions, we observe that SISA training improves unlearning for the Purchase dataset by 3.13x, and 1.658x for the SVHN dataset, over retraining from scratch. We also validate how knowledge of the unlearning distribution provides further improvements in retraining time by simulating a scenario where we model unlearning requests that come from users of a commercial product that is available in countries with varying sensitivity to privacy. Our work contributes to practical data governance in machine learning.
Paper
🔭 @DeepGravity
Once users have shared their data online, it is generally difficult for them to revoke access and ask for the data to be deleted. Machine learning (ML) exacerbates this problem because any model trained with said data may have memorized it, putting users at risk of a successful privacy attack exposing their information. Yet, having models unlearn is notoriously difficult. After a data point is removed from a training set, one often resorts to entirely retraining downstream models from scratch. We introduce SISA training, a framework that decreases the number of model parameters affected by an unlearning request and caches intermediate outputs of the training algorithm to limit the number of model updates that need to be computed to have these parameters unlearn. This framework reduces the computational overhead associated with unlearning, even in the worst-case setting where unlearning requests are made uniformly across the training set. In some cases, we may have a prior on the distribution of unlearning requests that will be issued by users. We may take this prior into account to partition and order data accordingly and further decrease overhead from unlearning. Our evaluation spans two datasets from different application domains, with corresponding motivations for unlearning. Under no distributional assumptions, we observe that SISA training improves unlearning for the Purchase dataset by 3.13x, and 1.658x for the SVHN dataset, over retraining from scratch. We also validate how knowledge of the unlearning distribution provides further improvements in retraining time by simulating a scenario where we model unlearning requests that come from users of a commercial product that is available in countries with varying sensitivity to privacy. Our work contributes to practical data governance in machine learning.
Paper
🔭 @DeepGravity
This media is not supported in your browser
VIEW IN TELEGRAM
Join millions of teachers and students around the globe by doing a one-hour coding activity during Computer Science Education Week this December 9-15
Link
🔭 @DeepGravity
Link
🔭 @DeepGravity
In Defense of Uniform Convergence: Generalization via derandomization with an application to interpolating predictors
We propose to study the generalization error of a learned predictor ĥ in terms of that of a surrogate (potentially randomized) classifier that is coupled to ĥ and designed to trade empirical risk for control of generalization error. In the case where ĥ interpolates the data, it is interesting to consider theoretical surrogate classifiers that are partially derandomized or rerandomized, e.g., fit to the training data but with modified label noise. We show that replacing ĥ by its conditional distribution with respect to an arbitrary σ-field is a viable method to derandomize. We give an example, inspired by the work of Nagarajan and Kolter (2019), where the learned classifier ĥ interpolates the training data with high probability, has small risk, and, yet, does not belong to a nonrandom class with a tight uniform bound on two-sided generalization error. At the same time, we bound the risk of ĥ in terms of a surrogate that is constructed by conditioning and shown to belong to a nonrandom class with uniformly small generalization error.
Link
🔭 @DeepGravity
We propose to study the generalization error of a learned predictor ĥ in terms of that of a surrogate (potentially randomized) classifier that is coupled to ĥ and designed to trade empirical risk for control of generalization error. In the case where ĥ interpolates the data, it is interesting to consider theoretical surrogate classifiers that are partially derandomized or rerandomized, e.g., fit to the training data but with modified label noise. We show that replacing ĥ by its conditional distribution with respect to an arbitrary σ-field is a viable method to derandomize. We give an example, inspired by the work of Nagarajan and Kolter (2019), where the learned classifier ĥ interpolates the training data with high probability, has small risk, and, yet, does not belong to a nonrandom class with a tight uniform bound on two-sided generalization error. At the same time, we bound the risk of ĥ in terms of a surrogate that is constructed by conditioning and shown to belong to a nonrandom class with uniformly small generalization error.
Link
🔭 @DeepGravity
Multi-Task #ReinforcementLearning without
Interference
While deep reinforcement learning systems have demonstrated impressive results in domains ranging from game playing and robotic control, sample efficiency remains a major challenge, particularly as these algorithms learn individual tasks from scratch. Multi-task and goal-conditioned reinforcement learning have emerged as promising approaches for sharing structure across multiple tasks to enable more efficient learning. However, challenges in optimization have hamstrung such methods from realizing efficiency gains compared to learning tasks independently from scratch. Motivated by these challenges, we develop a general approach that can change the multi-task optimization landscape to alleviate conflicting gradients across tasks. In particular, we introduce two instantiations of this approach, one architectural and one algorithmic, that prevent gradients for different tasks from interfering with one another. On two challenging multi-task RL problems, we find that our approaches leads to greater final performance and learning efficiency in comparison to prior approaches.
Paper
🔭 @DeepGravity
Interference
While deep reinforcement learning systems have demonstrated impressive results in domains ranging from game playing and robotic control, sample efficiency remains a major challenge, particularly as these algorithms learn individual tasks from scratch. Multi-task and goal-conditioned reinforcement learning have emerged as promising approaches for sharing structure across multiple tasks to enable more efficient learning. However, challenges in optimization have hamstrung such methods from realizing efficiency gains compared to learning tasks independently from scratch. Motivated by these challenges, we develop a general approach that can change the multi-task optimization landscape to alleviate conflicting gradients across tasks. In particular, we introduce two instantiations of this approach, one architectural and one algorithmic, that prevent gradients for different tasks from interfering with one another. On two challenging multi-task RL problems, we find that our approaches leads to greater final performance and learning efficiency in comparison to prior approaches.
Paper
🔭 @DeepGravity
Meta-gradient updates for training return functions for #ReinforcementLearning systems,
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for reinforcement learning. The embodiments described herein apply meta-learning (and in particular, meta-gradient reinforcement learning) to learn an optimum return function G so that the training of the system is improved. This provides a more effective and efficient means of training a reinforcement learning system as the system is able to converge on an optimum set of one or more policy parameters θ more quickly by training the return function G as it goes. In particular, the return function G is made dependent on the one or more policy parameters θ and a meta-objective function J′ is used that is differentiated with respect to the one or more return parameters η to improve the training of the return function G.
#Google
#DeepMind
Paper
🔭 @DeepGravity
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for reinforcement learning. The embodiments described herein apply meta-learning (and in particular, meta-gradient reinforcement learning) to learn an optimum return function G so that the training of the system is improved. This provides a more effective and efficient means of training a reinforcement learning system as the system is able to converge on an optimum set of one or more policy parameters θ more quickly by training the return function G as it goes. In particular, the return function G is made dependent on the one or more policy parameters θ and a meta-objective function J′ is used that is differentiated with respect to the one or more return parameters η to improve the training of the return function G.
#DeepMind
Paper
🔭 @DeepGravity
Google
Meta-gradient updates for training return functions for reinforcement learning systems
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for reinforcement learning. The embodiments described herein apply meta-learning (and in particular, meta-gradient reinforcement learning) to learn an optimum…
#DeepMind ’s Dreamer #AI learns from the past to predict the future
Some AI systems achieve goals in challenging environments by drawing on representations of the world informed by past experiences. They generalize these to novel situations, enabling them to complete tasks even in settings they haven’t encountered before. As it turns out, reinforcement learning — a training technique that employs rewards to drive software policies toward goals — is particularly well-suited to learning world models that summarize an agent’s experience, and by extension to facilitating the learning of novel behaviors.
Article
🔭 @DeepGravity
Some AI systems achieve goals in challenging environments by drawing on representations of the world informed by past experiences. They generalize these to novel situations, enabling them to complete tasks even in settings they haven’t encountered before. As it turns out, reinforcement learning — a training technique that employs rewards to drive software policies toward goals — is particularly well-suited to learning world models that summarize an agent’s experience, and by extension to facilitating the learning of novel behaviors.
Article
🔭 @DeepGravity
VentureBeat
DeepMind’s Dreamer AI learns from the past to predict the future
In a new preprint research paper, researchers at DeepMind and Google propose Dreamer, an algorithm that learns to predict outcomes from experience.
Improved Few-Shot Visual Classification, by Peyman Bateni,
Few-shot learning is a fundamental task in computer vision that carries the promise of alleviating the need for exhaustively labeled data. Most few-shot learning approaches to date have focused on progressively more complex neural feature extractors and classifier adaptation strategies, as well as the refinement of the task definition itself. In this paper, we explore the hypothesis that a simple class-covariance-based distance metric, namely the Mahalanobis distance, adopted into a state of the art few-shot learning approach (CNAPS) can, in and of itself, lead to a significant performance improvement. We also discover that it is possible to learn adaptive feature extractors that allow useful estimation of the high dimensional feature covariances required by this metric from surprisingly few samples. The result of our work is a new "Simple CNAPS" architecture which has up to 9.2 trainable parameters than CNAPS and performs up to 6.1 the art on the standard few-shot image classification benchmark dataset.
Paper
🔭 @DeepGravity
Few-shot learning is a fundamental task in computer vision that carries the promise of alleviating the need for exhaustively labeled data. Most few-shot learning approaches to date have focused on progressively more complex neural feature extractors and classifier adaptation strategies, as well as the refinement of the task definition itself. In this paper, we explore the hypothesis that a simple class-covariance-based distance metric, namely the Mahalanobis distance, adopted into a state of the art few-shot learning approach (CNAPS) can, in and of itself, lead to a significant performance improvement. We also discover that it is possible to learn adaptive feature extractors that allow useful estimation of the high dimensional feature covariances required by this metric from surprisingly few samples. The result of our work is a new "Simple CNAPS" architecture which has up to 9.2 trainable parameters than CNAPS and performs up to 6.1 the art on the standard few-shot image classification benchmark dataset.
Paper
🔭 @DeepGravity
Deep Gravity
A very interesting paper by #Harvard University and #OpenAI #DeepDoubleDescent: WHERE BIGGER MODELS AND MORE DATA HURT ABSTRACT We show that a variety of modern deep learning tasks exhibit a “double-descent” phenomenon where, as we increase model size,…
YouTube
Deep Double Descent
This video explores a new study on double descent evident in Deep Learning models such as CNNs, ResNets and Transformers. The double descent phenomenon is an...