Data Science by ODS.ai 🦜 – Telegram
Data Science by ODS.ai 🦜
44.4K subscribers
870 photos
96 videos
7 files
1.93K links
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @malev
Download Telegram
Finally, there is Reinforcement Learning for MOBA (games like DoTA / Heroes of the Storm / League Of Langues). It was published several months ago in Korea , which demonstrates squad of agents fight each other in league of legend like mini game. (6 layers with LSTM/MaxOut).

The mini game has two champions which have unique skill sets and attributes, which can buff/debuff targets including themselves.

So, there is a hot start for those, who want to start digging in this direction

https://onedrive.live.com/view.aspx?resid=166F2AF156F7AB19!1089&ithint=file%2cpptx&app=PowerPoint&authkey=!AA1FLzme4BNhWUE

https://www.youtube.com/watch?v=e1eTJvS_Inw
Today ended a Megaface competiotion on recognition of faces.

The competition was about precision — teams had to find most similar faces.

There is a russian startup beating google — that's gonna hit every newspaper, so feel free to share this message to your friends.

Yes, you can beat a great company, even if you are in a small team.

It's not always about the size of a dog in a fight, it's about the size of a fight in a dog.

results:
http://megaface.cs.washington.edu/results/
paper:
http://arxiv.org/abs/1512.00596
We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our application of HPC techniques, resulting in a 7x speedup over our previous system. Because of this efficiency, experiments that previously took weeks now run in days. This enables us to iterate more quickly to identify superior architectures and algorithms. As a result, in several cases, our system is competitive with the trannoscription of human workers when benchmarked on standard datasets. Finally, using a technique called Batch Dispatch with GPUs in the data center, we show that our system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.
New paper by Andrew Ng
Guys from google did that every data scientists thought of since ImageNet release — they made a tool, which can classify all the images in your collection and tag them with the objects present on the images, so you can search through you memories just with typing keywords.

http://recode.net/2015/12/09/ex-googlers-take-on-google-photos-with-machine-smarts/
This is the recording from July 23rd SF Machine Learning Meetup at Workday Inc. San Francisco office.

Featured speaker - Ilya Sutskever
Ilya Sutskever received his PhD in 2012 from the University of Toronto working with Geoffrey Hinton. He was also a post-doc with Andrew Ng at Stanford University. After completing his PhD, he cofounded DNNResearch with Geoffrey Hinton and Alex Krizhevsky which was acquired by Google. He is interested in all aspects of neural networks and their applications.

https://clip.mn/video/yt-aUTHdgh1OjI
Yoshua Bengio:
A must-read for those interested in dialogue research, with an overview of available corpora for learning from them:

http://arxiv.org/abs/1512.05742
We have an annoucement to make.

Russian Deep Learning community is quite excited and enthusiastic about the recent Kaggle challenge put forward by Allen Institute for Artificial Intelligence (https://www.kaggle.com/c/the-allen-ai-science-challenge). Backed by a large interest group here in Moscow, we want to build off of this initiative by organising a Winter school paired with an AI-hackathon - http://qa.deephack.me . Collaborative work of many teams forms a powerful educational environment that can stimulate people to learn and work better, and may in the end lead to discoveries that would have been overlooked otherwise.

Based on our prior experience we expect a successful event! The last event like that we have organized—a week-long hackathon to improve DeepMind code to play Atari games (see http://deephack.me ) — did well. It was an academic, free for participants but competitive event that combined hacking with a crash course of educational lectures by +Yoshua Bengio, Andrey Dergachev , Alexey Dosovitski, Vitali Dunin-Barkovskyi , +Terran Lane, +Anatoly Levenchuk, +Sridhar Mahadevan , Maxim Milakov, +Sergey Plis, +Irina Rish, +Ruslan Salakhutdinov, +Jürgen Schmidhuber, +Thomas Unterthiner, Dmitri Vetrov, Alexander Zhavoronkov. The winning team was awarded with a trip to NIPS and their paper based on their work got accepted to a NIPS workshop. In fact, many other participants were inspired enough to come to NIPS on ther own.

We invite everybody who are interested in participation as a hacker or a speaker :)

More details (and registration form) can be found at http://qa.deephack.me
http://aipoly.com/ live demo of cv application — iPhone app that recognize objects on video
5 Deep Learning papers explained:


Infinite Dimensional Word Embeddings

Abstract:

We describe a method for learning word embeddings with stochastic dimensionality. Our Infinite Skip-Gram (iSG) model specifies an energy-based joint distribution over a word vector, a context vector, and their dimensionality. By employing the same techniques used to make the Infinite Restricted Boltzmann Machine (Cote & Larochelle, 2015) tractable, we define vector dimensionality over a countably infinite domain, allowing vectors to grow as needed during training.

Main Ideas:

This is a quite original use of our "infinite dimensions" trick we introduced in the iRBM. It wasn't entirely "plug and play" either, and the authors had to be smart in the approximations they proposed for training the iSG.

The qualitative results showing how the conditional on the number of dimensions contain information about polysemy are really neat! One assumption behind distributed word embeddings is that they should be able to represent the multiple meanings of words using different dimensions, so it's nice to see that this is exactly what is being learned here.

I think the only thing missing in this paper are comparisons with regular skipgram and perhaps other word embeddings methods on a specific task or on a word similarity task. In v2 of this paper, the authors do mention they are working on such results, so I'm looking forward to seeing those!

http://arxiv.org/pdf/1511.05392v2.pdf
Gradient-based Hyperparameter Optimization through Reversible Learning


Abstract (excerpt):

We compute exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the entire training procedure. These gradients allow us to optimize thousands of hyperparameters, including step-size and momentum schedules, weight initialization distributions, richly parameterized regularization schemes, and neural network architectures.

Two Cents (excerpt):

This is one of my favorite papers of this year. While the method of unrolling several steps of gradient descent (100 iterations in the paper) makes it somewhat impractical for large networks (which is probably why they considered 3-layer networks with only 50 hidden units per layer), it provides an incredibly interesting window on what are good hyper-parameter choices for neural networks. Note that, to substantially reduce the memory requirements of the method, the authors had to be quite creative and smart about how to encode changes in the network's weight changes.

There are tons of interesting experiments, which I encourage the reader to go check out (see section 3).

The experiment on "training the training set", i.e. generating the 10 examples (one per class) that would minimize the validation set loss of a network trained on these examples is a pretty cool idea (it essentially learns prototypical images of the digits from 0 to 9 on MNIST).

Note that approaches like the one in this paper make tools for automatic differentiation incredibly valuable. Python autograd, the author's automatic differentiation Python library https://github.com/HIPS/autograd (which inspired our own Torch autograd https://github.com/twitter/torch-autograd) was in fact developed in the context of this paper.

http://arxiv.org/pdf/1502.03492v3.pdf
Speed Learning on the Fly

Authors: Pierre-Yves Massé, Yann Ollivier
Date posted to arXiv: 8 Nov 2015

Abstract :

Here we propose to adapt the step size by performing a gradient descent on the step size itself, viewing the whole performance of the learning trajectory as a function of step size. Importantly, this adaptation can be computed online at little cost, without having to iterate backward passes over the full data.

Main ideas:

I think the authors are right on the money as to the challenges posed by online learning. I think these challenges are likely to be greater in the context of training neural networks online, for which little satisfactory solutions exist right now. So this is a direction of research I'm particularly excited about.

At this points, the experiments consider fairly simple learning scenarios, but I don't see any obstacle in applying the same method to neural networks. One interesting observation from the results is that results are fairly robust to variations of "the learning rate of the learning rate", compared to varying and fixing the learning rate itself.

Finally, I haven't had time to entirely digest one of their theoretical result, suggesting that their approximation actually corresponds to an exact gradient taken "alongside the effective trajectory" of gradient descent. However, that result seems quite interesting and would deserve more attention.


http://arxiv.org/pdf/1511.02540v1.pdf
Spatial Transformer Networks

Authors: Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu
Date posted to arXiv: 5 Jun 2015

Abstract :

In this work we introduce a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network. This differentiable module can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimisation process.

Main ideas:

While the work on DRAW (http://arxiv.org/abs/1502.04623) previously proposed a similar approach to learning transformations on images, this work goes significantly beyond DRAW and generalizes the approach to a much richer family of transformations. I also really like the idea of applying the spatial transformer modules within a CNN, something that wasn't in the DRAW paper.

I really don't have much negative to say about this work, it's really solid!

The only thing that comes to mind is that, in the CUB-200-2011 experiment, the authors used ImageNet pre-trained Inception networks to initialise their models. The only reason it's worth mentioning is that the test set of the CUB-200-2011 dataset actually contains images from the ImageNet training set. But fortunately, there are very few of those, so this doesn't change the overall analysis of the results. Still, I do find it interesting that, with such forms of transfer learning becoming increasingly common, it appears that we, as deep learning researchers, will need to start paying much more attention to such considerations in the future than we used to.

http://arxiv.org/pdf/1506.02025v2.pdf