Data Science by ODS.ai 🦜 – Telegram
Data Science by ODS.ai 🦜
44.2K subscribers
877 photos
97 videos
7 files
1.94K links
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @malev
Download Telegram
. Clustering is Efficient for Approximate Maximum Inner Product Search

Authors: Alex Auvolat, Sarath Chandar, Pascal Vincent, Hugo Larochelle, Yoshua Bengio
Date posted to arXiv: 21 Jul 2015

Abstract:

Efficient Maximum Inner Product Search (MIPS) is an important task that has a wide applicability in recommendation systems and classification with a large number of classes. Solutions based on locality-sensitive hashing (LSH) as well as tree-based solutions have been investigated in the recent literature, to perform approximate MIPS in sublinear time. In this paper, we compare these to another extremely simple approach for solving approximate MIPS, based on variants of the k-means clustering algorithm.

Main ideas:

Update 2015/11/23: Since I first wrote this note, I became involved in the next iterations of this work, which became v2 of the arXiv manunoscript. The notes below were made based on v1.

(Editor's note: link to version 1)

Since inner products are one of the main units of computation in neural networks, I'm very interested in MIPS as I suspect it could play an important role in scaling up neural networks. One example mentioned in the paper is that of approximating computations at the output layer of a neural network language model, corresponding to a softmax over a large number of units (as many as words in the vocabulary).

I find the combination of the "MIPS to MCSS" transformation with spherical clustering clever, cute and simple. Based on how good the results are compared to hashing, I find this direction of research quite compelling.

I would like to thank Dr. Larochelle, not only for the fantastic summaries and insights that he has been producing for several months at this point, but also for being gracious enough to allow us to reproduce extended excerpts in this and the previous article. I hope that these notes, along with the original papers themselves, provide you with some additional comprehension of the often-difficult concepts that go along with deep learning research.

Bio: Matthew Mayo is a computer science graduate student currently working on his thesis parallelizing machine learning algorithms. He is also a student of data mining, a data enthusiast, and an aspiring machine learning scientist.

http://arxiv.org/pdf/1507.05910v3.pdf
GPU-Trained System Understands Movies

The questions range from simpler ‘Who’ did ‘What’ to ‘Whom’ that can be solved by computer vision alone, to ‘Why’ and ‘How’ something happened in the movie, questions that can only be solved by exploiting both the visual information and dialogs.

https://news.developer.nvidia.com/gpu-trained-system-understands-movies/
Another paper about awesome application of Deep Learning. Now it is able to identify tumors.

The morphology of glands has been used routinely by pathologists to assess the malignancy degree of adenocarcinomas. Accurate segmentation of glands from histology images is a crucial step to obtain reliable morphological statistics for quantitative diagnosis. In this paper, we proposed an effective deep contour-aware network (DCAN) to solve this challenging problem under a unified multi-task learning framework. In the proposed network, multi-level contextual features from the hierarchical architecture are explored with auxiliary supervision for accurate gland segmentation. When incorporated with multi-task regularization during the training, the discriminative capability of intermediate features can be further improved. Moreover, our network can not only output accurate probability maps of glands, but also depict clear contours simultaneously for separating cluttered objects, which further boosts the gland segmentation performance. This unified framework can be efficient when applied to large-scale histopathological data without resorting to additional post-separating steps based on low-level cues. Our method (CUMedVision Team) won the 2015 MICCAI Gland Segmentation Challenge out of 13 competitive teams (photo of top teams), surpassing all the other methods by a significant margin.

http://appsrv.cse.cuhk.edu.hk/~hchen/research/2015miccai_gland.html
If you have any news to suggest, you can write @malev
Third:

Basic, state-of-the-art and best MOOCs are Andrew Ngs Machine Learning and Hinton's Neural Networks.
Now phones can record sound with gyroscope. Be careful.

https://crypto.stanford.edu/gyrophone/
Nice infographic on apple app charts
New startup by David Yan implements natural language processing for search.

Much like Facebook search interface, available in US English language.

https://findo.io/
A very important picture from a recent LeCun CVPR. Important for those, who want to study QA / Dialogue systems.