Deep Gravity – Telegram
Deep Gravity
393 subscribers
60 photos
35 videos
17 files
495 links
AI

Contact:
DeepL.Gravity@gmail.com
Download Telegram
A New Framework for Query Efficient Active #ImitationLearning

We seek to align agent policy with human expert behavior in a #reinforcementlearning (#RL) setting, without any prior knowledge about dynamics, reward function, and unsafe states. There is a human expert knowing the rewards and unsafe states based on his preference and objective, but querying that human expert is expensive. To address this challenge, we propose a new framework for imitation learning (IL) algorithm that actively and interactively learns a model of the user's reward function with efficient queries. We build an adversarial generative model of states and a successor feature (SR) model trained over transition experience collected by learning policy. Our method uses these models to select state-action pairs, asking the user to comment on the optimality or safety, and trains a adversarial neural network to predict the rewards. Different from previous papers, which are almost all based on uncertainty sampling, the key idea is to actively and efficiently select state-action pairs from both on-policy and off-policy experience, by discriminating the queried (expert) and unqueried (generated) data and maximizing the efficiency of value function learning. We call this method adversarial reward query with successor representation. We evaluate the proposed method with simulated human on a state-based 2D navigation task, robotic control tasks and the image-based video games, which have high-dimensional observation and complex state dynamics. The results show that the proposed method significantly outperforms uncertainty-based methods on learning reward models, achieving better query efficiency, where the adversarial discriminator can make the agent learn human behavior more efficiently and the SR can select states which have stronger impact on value function. Moreover, the proposed method can also learn to avoid unsafe states when training the reward model.

Paper

🔭 @DeepGravity
#MachineLearning from a Continuous Viewpoint

We present a continuous formulation of machine learning, as a problem in the calculus of variations and differential-integral equations, very much in the spirit of classical numerical analysis and statistical physics. We demonstrate that conventional machine learning models and algorithms, such as the random feature model, the shallow neural network model and the residual neural network model, can all be recovered as particular discretizations of different continuous formulations. We also present examples of new models, such as the flow-based random feature model, and new algorithms, such as the smoothed particle method and spectral method, that arise naturally from this continuous formulation. We discuss how the issues of generalization error and implicit regularization can be studied under this framework.

Paper

🔭 @DeepGravity
World Programs for Model-Based Learning and Planning in Compositional State and Action Spaces

Some of the most important tasks take place in environments which lack cheap and perfect simulators, thus hampering the application of model-free #reinforcementlearning (#RL). While model-based RL aims to learn a dynamics model, in a more general case the learner does not know a priori what the action space is. Here we propose a formalism where the learner induces a world program by learning a dynamics model and the actions in graph-based compositional environments by observing state-state transition examples. Then, the learner can perform RL with the world program as the simulator for complex planning tasks. We highlight a recent application, and propose a challenge for the community to assess world program-based planning.

Paper

🔭 @DeepGravity
Long-Term Visitation Value for Deep Exploration in Sparse Reward #ReinforcementLearning

Reinforcement learning with sparse rewards is still an open challenge. Classic methods rely on getting feedback via extrinsic rewards to train the agent, and in situations where this occurs very rarely the agent learns slowly or cannot learn at all. Similarly, if the agent receives also rewards that create suboptimal modes of the objective function, it will likely prematurely stop exploring. More recent methods add auxiliary intrinsic rewards to encourage exploration. However, auxiliary rewards lead to a non-stationary target for the Q-function. In this paper, we present a novel approach that (1) plans exploration actions far into the future by using a long-term visitation count, and (2) decouples exploration and exploitation by learning a separate function assessing the exploration value of the actions. Contrary to existing methods which use models of reward and dynamics, our approach is off-policy and model-free. We further propose new tabular environments for benchmarking exploration in reinforcement learning. Empirical results on classic and novel benchmarks show that the proposed approach outperforms existing methods in environments with sparse rewards, especially in the presence of rewards that create suboptimal modes of the objective function. Results also suggest that our approach scales gracefully with the size of the environment. Source code is available at https://github.com/sparisi/visit-value-explore

Paper

🔭 @DeepGravity
Dive into Deep Learning

An interactive #DeepLearning #book with code, math, and discussions, based on the #NumPy interface.

Book

🔭 @DeepGravity
Research Fellow / Fellow at Australian National University. Mathematical Sciences Institute and Research School of Computer Science

Lecturer/Senior Lecturer in Computer Science (Industry 4.0 Analytics). Edge Hill University UK

Postdoctoral Researcher in Energy Analytics and Machine Learning at the University of Pennsylvania

Postdoc Computer Science, Computational Biology - Next Generation Sequencing Data Analysis (m/f/d). Genomik und Immunregulation (LIMES) Bonn

Postdoc Fellow in Large Scale Senor Fusion / Intelligent Infrastructure Systems

Post-Doc in "Large Scale Senor Fusion / Intelligent Infrastructure Systems" at TUM

Two postdoctoral positions at the University of Venice, Italy. Artificial Intelligence Unit

Postdoctoral Researcher at ETS Montreal - Deep Learning for Visual Recognition

Neural Network models for language and interactive robots

Machine Learning Research Scientist position at at the NYU School of Medicine

Postdoc Position at Qatar Computing Research Institute (QCRI)

5-Year Fellowships at RISE Cyprus on AI, Communications, Visual Sciences, Human Factors, Design

PhD position: Hybrid process modeling combining mechanistic transport equations with machine learning for thermodynamic equilibria, The Helmholtz School for Data Science in Life, Earth and Energy

Two PhD positions in Deep Probabilistic Programming and protein structure prediction, Copenhagen

PhD Student - Meteorologist, Physicist, Computer Scientist or EngineerInstitut für Geowissenschaften Tübingen

PhD Studentship Artificial Intelligence Enabling Next Generation Synthesis

Staff Scientist/Postdoctoral Scholar, Neural Computation Unit, Okinawa Institute of Science and Technology

FENS-SfN Summer School on Artificial and natural computations for sensory perception: what is the link? (7-13 June 2020, Italy)

Postdoctoral Researcher in Computer Vision and Deep Learning

Research Assistant Artificial Intelligence in Life Science Applications

PhD Studentship in Neural Data Science, Computational Neuromodulation and Metalearning

#Job

🔭 @DeepGravity
Computational model discovery with #ReinforcementLearning

The motivation of this study is to leverage recent breakthroughs in artificial intelligence research to unlock novel solutions to important scientific problems encountered in computational science. To address the human intelligence limitations in discovering reduced-order models, we propose to supplement human thinking with artificial intelligence. Our three-pronged strategy consists of learning (i) models expressed in analytical form, (ii) which are evaluated a posteriori, and iii) using exclusively integral quantities from the reference solution as prior knowledge. In point (i), we pursue interpretable models expressed symbolically as opposed to black-box neural networks, the latter only being used during learning to efficiently parameterize the large search space of possible models. In point (ii), learned models are dynamically evaluated a posteriori in the computational solver instead of based on a priori information from preprocessed high-fidelity data, thereby accounting for the specificity of the solver at hand such as its numerics. Finally in point (iii), the exploration of new models is solely guided by predefined integral quantities, e.g., averaged quantities of engineering interest in Reynolds-averaged or large-eddy simulations (LES). We use a coupled deep reinforcement learning framework and computational solver to concurrently achieve these objectives. The combination of reinforcement learning with objectives (i), (ii) and (iii) differentiate our work from previous modeling attempts based on machine learning. In this report, we provide a high-level denoscription of the model discovery framework with reinforcement learning. The method is detailed for the application of discovering missing terms in differential equations. An elementary instantiation of the method is described that discovers missing terms in the Burgers' equation.

Paper

🔭 @DeepGravity