Who are you? I'm PhD student doing DL research, preferably about weak/self supervision. Or even unsupervised things as well.
What happens? I'm writing here some reviews of papers I read.
Why the hell? Because it allows me to practice writing, and to understand papers I read deeper.
So what? I will be happy if it's somehow interesting to someone else. Anyways, here's my archive: https://www.notion.so/Self-Supervised-Boy-papers-reading-751aa85ffca948d28feacc45dc3cb0c0.
What happens? I'm writing here some reviews of papers I read.
Why the hell? Because it allows me to practice writing, and to understand papers I read deeper.
So what? I will be happy if it's somehow interesting to someone else. Anyways, here's my archive: https://www.notion.so/Self-Supervised-Boy-papers-reading-751aa85ffca948d28feacc45dc3cb0c0.
Ярослав's Notion on Notion
Self Supervised Boy papers reading
Channel in telegram.
Self-training über alles. Another paper on self-training by Le Quoc.
They compared self-training with supervised and self-supervised pre-training for different tasks. Self-training seemingly works better, while pre-training even hurts final quality when enough labeled data is available or strong augmentation is applied.
Main practical takeaway is, self-training adds quality even after pre-training. So, it could be worthy to self-train your baseline models to have better start.
More detailed with tables here: https://www.notion.so/Rethinking-Pre-training-and-Self-training-e00596e346fa4261af68db7409fbbde6
Source here: https://arxiv.org/pdf/2006.06882.pdf
They compared self-training with supervised and self-supervised pre-training for different tasks. Self-training seemingly works better, while pre-training even hurts final quality when enough labeled data is available or strong augmentation is applied.
Main practical takeaway is, self-training adds quality even after pre-training. So, it could be worthy to self-train your baseline models to have better start.
More detailed with tables here: https://www.notion.so/Rethinking-Pre-training-and-Self-training-e00596e346fa4261af68db7409fbbde6
Source here: https://arxiv.org/pdf/2006.06882.pdf
swanky-pleasure-bcf on Notion
Rethinking Pre-training and Self-training | Notion
Setup
Unsupervised segmentation with autoregressive models. Authors proposed to scan image with different scanning orders and request that the close pixels produce close embeddings independently of the scanning order.
SoTA across the unsupervised segmentations.
More detailed with images and losses here: https://www.notion.so/Autoregressive-Unsupervised-Image-Segmentation-211c6e8ec6174fe9929e53e5140e1024
Source here: https://arxiv.org/pdf/2007.08247.pdf
SoTA across the unsupervised segmentations.
More detailed with images and losses here: https://www.notion.so/Autoregressive-Unsupervised-Image-Segmentation-211c6e8ec6174fe9929e53e5140e1024
Source here: https://arxiv.org/pdf/2007.08247.pdf
swanky-pleasure-bcf on Notion
Autoregressive Unsupervised Image Segmentation | Notion
Setup
One more update on Teacher-Student paradigm by Le Quoc.
Now Teacher is continuously updated to direct Student towards optimum w.r.t. the labeled data. On each step we took update gradient for the Teacher model as the gradient towards current pseudo-label. Then we scale this gradient w.r.t. cosine distance between two gradients of the Student model: from unlabeled and labeled data.
Achieved new SoTA on ImageNET (+1.6% top-1 acc).
More detailed with formulas here: https://www.notion.so/Meta-Pseudo-Label-b83ac7b7086e47e1bef749bc3e8e2124
Source here: https://arxiv.org/pdf/2003.10580.pdf
Now Teacher is continuously updated to direct Student towards optimum w.r.t. the labeled data. On each step we took update gradient for the Teacher model as the gradient towards current pseudo-label. Then we scale this gradient w.r.t. cosine distance between two gradients of the Student model: from unlabeled and labeled data.
Achieved new SoTA on ImageNET (+1.6% top-1 acc).
More detailed with formulas here: https://www.notion.so/Meta-Pseudo-Label-b83ac7b7086e47e1bef749bc3e8e2124
Source here: https://arxiv.org/pdf/2003.10580.pdf
swanky-pleasure-bcf on Notion
Meta Pseudo Label | Notion
Going one step further into pseudolabeling: we want to continuously train teacher $f_{\theta_T}$ aiming to improve student $f_{\theta_S}$ quality. Ideally, we want the fully trained $f_{\theta_S}$ to perform good on the labeled data $\{x_l;y_l\}$. Although…
Oral from the ICLR 2021 on usage of teacher-student setup for cross-domain transfer learning. Teacher is trained on the labelled data and produces pseudolabels for the unlabelled data in target domain. This allows student to learn worthy in-domain representations and gain 2.9% of accuracy on one-shot learning with relatively low training effort.
With more fluff here: https://www.notion.so/Self-training-for-Few-shot-Transfer-Across-Extreme-Task-Differences-bfe820f60b4b474796fd0a5b6b6ad312
Source here: https://openreview.net/pdf?id=O3Y56aqpChA
With more fluff here: https://www.notion.so/Self-training-for-Few-shot-Transfer-Across-Extreme-Task-Differences-bfe820f60b4b474796fd0a5b6b6ad312
Source here: https://openreview.net/pdf?id=O3Y56aqpChA
swanky-pleasure-bcf on Notion
Self-training for Few-shot Transfer Across Extreme Task Differences | Notion
Authors set on the few-shot learning models. Especially the transfer learning. Previous SoTA in this approach was quite a naïve but effective one. It was actually a pre-training on the available supervised data of the similar domain. In this paper self-training…
One more oral from ICLR 2021. Theoretical this time, so no way I cans setup detailed overview.
Key points:
1) Authors altered the definition of neighbourhood. Instead of measuring distance between samples, they denote sample x' as a neighbour of x if there is such augmentation A(x) that distance |a(x) - x'| is lower than the threshold.
2) Assumption 1: any small subset of the in-class samples should have expansion (via adding neighbours to subset) to the larger in-class subset of samples.
3) Assumption 2: probability of having x' as neighbour of x with them having different ground truth labels is low and almost negligible.
Authors show, that those are sufficient requirements for the consistency regularisation in self-supervision and transfer learning to show good results.
This nicely adds to the previous paper on transfer learning, where authors shown how consistency regularisation helps. Also it nicely adds to works on smart augmentation strategies.
source: https://openreview.net/pdf?id=rC8sJ4i6kaH
Key points:
1) Authors altered the definition of neighbourhood. Instead of measuring distance between samples, they denote sample x' as a neighbour of x if there is such augmentation A(x) that distance |a(x) - x'| is lower than the threshold.
2) Assumption 1: any small subset of the in-class samples should have expansion (via adding neighbours to subset) to the larger in-class subset of samples.
3) Assumption 2: probability of having x' as neighbour of x with them having different ground truth labels is low and almost negligible.
Authors show, that those are sufficient requirements for the consistency regularisation in self-supervision and transfer learning to show good results.
This nicely adds to the previous paper on transfer learning, where authors shown how consistency regularisation helps. Also it nicely adds to works on smart augmentation strategies.
source: https://openreview.net/pdf?id=rC8sJ4i6kaH
Spotlight on ICLR 2021 by Schmidhuber. Proposes the method of unsupervised keypoints location algorithm with RL application on Atari.
Very clear and simple idea.:
1. Compressing image with VAE and using features from some intermediate layer of encoder later on.
2. Trying to predict feature vector by its surrounding vectors. If the prediction error is high, we found some important object.
3. Compressing error map for image as the mixture of gaussians with fixed covariance, each center representing one keypoint.
SoTA on Atari games, more robust to input noise.
Probably, could be also used outside of simple Atari framework if you have enough data to train, and take later layers of encoder.
With colorfull images here: https://www.notion.so/Unsupervised-Object-Keypoint-Learning-Using-Local-Spatial-Predictability-ddcf36a856ff4e389050b3089cd710bc
Source here: https://openreview.net/pdf?id=GJwMHetHc73
Very clear and simple idea.:
1. Compressing image with VAE and using features from some intermediate layer of encoder later on.
2. Trying to predict feature vector by its surrounding vectors. If the prediction error is high, we found some important object.
3. Compressing error map for image as the mixture of gaussians with fixed covariance, each center representing one keypoint.
SoTA on Atari games, more robust to input noise.
Probably, could be also used outside of simple Atari framework if you have enough data to train, and take later layers of encoder.
With colorfull images here: https://www.notion.so/Unsupervised-Object-Keypoint-Learning-Using-Local-Spatial-Predictability-ddcf36a856ff4e389050b3089cd710bc
Source here: https://openreview.net/pdf?id=GJwMHetHc73
swanky-pleasure-bcf on Notion
Unsupervised Object Keypoint Learning Using Local Spatial Predictability | Notion
In this paper authors proposed the new approach to the unsupervised keypoint learning. Previous SoTA approach, Transporter, was guided by the movement between slices to learn keypoints. In current paper authors shown possible flaws of such training procedure…
Yet another paper from ICLR 2021. This one proposed advanced method of pseudolabel generation.
In a few words, if we simultaneously train an encoder-decoder model to predict the segmentation on supervised data, and to produce consistent pseudolabel and prediction on unsupervised data independent of the augmentation.
As the pseudolabel we use specifically calibrated Grad-CAM from the encoder part of the model, and we fuse it with the prediction of the decoder part, again with fancy procedure.
With some more fluff and notes here.
Source here.
In a few words, if we simultaneously train an encoder-decoder model to predict the segmentation on supervised data, and to produce consistent pseudolabel and prediction on unsupervised data independent of the augmentation.
As the pseudolabel we use specifically calibrated Grad-CAM from the encoder part of the model, and we fuse it with the prediction of the decoder part, again with fancy procedure.
With some more fluff and notes here.
Source here.
swanky-pleasure-bcf on Notion
PeudoSeg: Designing Pseudolabels for Semantic Segmentation | Notion
Authors added on topic of the pseudo-label generation for semantic segmentation. They proposed to combine several different techniques to achieve better quality of the pseudo-labels.
Pretty simple keypoint localisation pipeline with self-supervision constraints for unlabeled data. Again from ICLR 2021.
The key ideas are:
1. Add classification task for the type of keypoint as function of localisation network features. This usually is not required, because of the fixed order of keypoints in model predictions. But this small additional loss, actually boosts performance more then next two constraints.
2. Add constraint that if we localise keypoint on the spatially augmented image, result should be the same as spatially augmented localisation map.
3. Add constraint that representation vectors of keypoints should be invariant to augmentation.
And here they are, getting SoTA results for several challenging datasets even on 100% of dataset as the labeled data.
With a bit more imformation here.
Source here.
The key ideas are:
1. Add classification task for the type of keypoint as function of localisation network features. This usually is not required, because of the fixed order of keypoints in model predictions. But this small additional loss, actually boosts performance more then next two constraints.
2. Add constraint that if we localise keypoint on the spatially augmented image, result should be the same as spatially augmented localisation map.
3. Add constraint that representation vectors of keypoints should be invariant to augmentation.
And here they are, getting SoTA results for several challenging datasets even on 100% of dataset as the labeled data.
With a bit more imformation here.
Source here.
swanky-pleasure-bcf on Notion
Semi-supervised Keypoint Localization | Notion
This paper proposed method which could relatively easy be added upon popular keypoint localisation frameworks. Authors add the task of keypoint classification and impose additional constraints (invariance of keypoint position and class under input augmentations)…
Yet again simple approach leading to unsupervised segmentation. Mostly useful as pre-training though.
Proposed pipeline first mines saliency object areas (with any available framework, possibly supervised) and then makes contrast learning for pixel embeddings inside those regions. During second step individual pixel embedding is attracted to the mean embedding of its object and pushed away from mean embeddings of other objects. This additional detail differs it from some previously proposed pipelines and allows wider training, because of slower growing rate of the loss pairs.
Less briefly and with some external links here.
Source here.
Proposed pipeline first mines saliency object areas (with any available framework, possibly supervised) and then makes contrast learning for pixel embeddings inside those regions. During second step individual pixel embedding is attracted to the mean embedding of its object and pushed away from mean embeddings of other objects. This additional detail differs it from some previously proposed pipelines and allows wider training, because of slower growing rate of the loss pairs.
Less briefly and with some external links here.
Source here.
swanky-pleasure-bcf on Notion
Unsupervised Semantic Segmentation by Contrasting Objects Mask Proposals | Notion
Paper proposes versatile two-step approach of pixel-level embeddings training which could be used both for unsupervised segmentation, or as pre-training for semi-supervised segmentation. Authors argue, that the mid-range prior for training embeddings is better…
A bit old (NeurIPS 2019), but interesting take on the saliency prediction.
Instead of using direct mixture of different unsupervised salient region prediction algorithms and focusing on fusion strategy, authors proposed to use distillation in neural networks as a way to refine each algorithm predictions separately. Paper shows several steps of distillation, self-training and usage of moving average to stabilize the predictions of each method separately. After these steps, authors employ accumulated averages as labels for final network training.
Slightly more words here.
Source here.
Instead of using direct mixture of different unsupervised salient region prediction algorithms and focusing on fusion strategy, authors proposed to use distillation in neural networks as a way to refine each algorithm predictions separately. Paper shows several steps of distillation, self-training and usage of moving average to stabilize the predictions of each method separately. After these steps, authors employ accumulated averages as labels for final network training.
Slightly more words here.
Source here.
swanky-pleasure-bcf on Notion
DeepUSPS: Deep Robust Unsupervised Saliency
Prediction With Self-Supervision | Notion
Prediction With Self-Supervision | Notion
The task of saliency prediction is segmentation task which should separate salient object from background. There is bunch of unsupervised methods to predict salien object on single image, and there was already some works incorporating those different methods…