echoinside – Telegram
echoinside
107 subscribers
834 photos
65 videos
41 files
933 links
ML in computer graphics and random stuff.
Any feedback: @fogside
Download Telegram
​​EfficientNetV2: Smaller Models and Faster Training

A new paper from Google Brain with a new SOTA architecture called EfficientNetV2. The authors develop a new family of CNN models that are optimized both for accuracy and training speed. The main improvements are:

- an improved training-aware neural architecture search with new building blocks and ideas to jointly optimize training speed and parameter efficiency;
- a new approach to progressive learning that adjusts regularization along with the image size;

As a result, the new approach can reach SOTA results while training faster (up to 11x) and smaller (up to 6.8x).

Paper: https://arxiv.org/abs/2104.00298

Code will be available here:
https://github.com/google/automl/efficientnetv2

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-effnetv2

#cv #sota #nas #deeplearning
Media is too big
VIEW IN TELEGRAM
LoFTR: Detector-Free Local Feature Matching with Transformers

* project page
* pdf
* code (not released yet)

We present a novel method for local image feature matching. Instead of performing image feature detection, denoscription, and matching sequentially, we propose to first establish pixel-wise dense matches at a coarse level and later refine the good matches at a fine level. In contrast to dense methods that use cost volume to search correspondences, we use self and cross attention layers in Transformers to obtain feature denoscriptors that are conditioned on both images. The global receptive field provided by Transformers enables our method to produce dense matches in low-texture areas, where feature detectors usually struggle to produce repeatable interest points. The experiments on indoor and outdoor datasets show that LoFTR outperforms state-of-the-art methods by a large margin. LoFTR also ranks first on two public benchmarks of visual localization among the published methods.
This media is not supported in your browser
VIEW IN TELEGRAM
Reconstructing 3D Human Pose by Watching Humans in the Mirror
Very creative idea for data collection.
* project page
* pdf
* code
In this paper, we introduce the new task of reconstructing 3D human pose from a single image in which we can see the person and the person's image through a mirror. Compared to general scenarios of 3D pose estimation from a single view, the mirror reflection provides an additional view for resolving the depth ambiguity. We develop an optimization-based approach that exploits mirror symmetry constraints for accurate 3D pose reconstruction. We also provide a method to estimate the surface normal of the mirror from vanishing points in the single image. To validate the proposed approach, we collect a large-scale dataset named Mirrored-Human. The experiments show that, when trained on Mirrored-Human with our reconstructed 3D poses as pseudo ground-truth, the accuracy and generalizability of existing single-view 3D pose estimators can be largely improved.
One of the biggest demoscene conferences Revision was banned on twitch (the reason is unknown) and now they moved to ccc. Surprisingly, they have better streaming quality rn.

So, if you liked demoscenes in your childhood, they are streaming here some PC 4K demoscenes.

UPD: Okay, now it's 256 bytes — or what you can put into one tweet size.
This media is not supported in your browser
VIEW IN TELEGRAM
NeRF-VAE: A Geometry Aware 3D Scene Generative Model
* abs
* pdf
We propose NeRF-VAE, a 3D scene generative model that incorporates geometric structure via NeRF and differentiable volume rendering. In contrast to NeRF, our model takes into account shared structure across scenes, and is able to infer the structure of a novel scene -- without the need to re-train -- using amortized inference. Our model is a VAE that learns a distribution over radiance fields by conditioning them on a latent scene representation. We show that, once trained, NeRF-VAE is able to infer and render geometrically-consistent scenes from previously unseen 3D environments using very few input images. We further demonstrate that NeRF-VAE generalizes well to out-of-distribution cameras, while convolutional models do not.
#nerf #vae #generative
This media is not supported in your browser
VIEW IN TELEGRAM
Unconstrained Scene Generation with Locally Conditioned Radiance Fields

* twitter thread
* abs
* pdf
Introducing Generative Scene Networks (GSN), a generative model for learning radiance fields for realistic scenes. With GSN we can sample scenes from the learned prior and move through them with a freely moving camera.
In order to model radiance fields for unconstrained scenes we decompose them into many small locally conditioned radiance fields which are conditioned on a latent spatial representation of a scene W.
The prior learned by GSN can be used for view synthesis: by inverting GSNs generator we can complete unobserved parts of a scene conditioned on a sparse set of views.
#nerf #novel_view #indoor
Generating Furry Cars: Disentangling Object Shape and Appearance across Multiple Domains
* project page
* paper under review
The goal is to learn a generative model that learns an intermediate distribution, which borrows a subset of properties from each domain, enabling the generation of images that did not exist in any domain exclusively. This challenging problem requires an accurate disentanglement of object shape, appearance, and background from each domain, so that the appearance and shape factors from the two domains can be interchanged. Our key technical contribution is to represent object appearance with a differentiable histogram of visual features, and to optimize the generator so that two images with the same latent appearance factor but different latent shape factors produce similar histograms. On multiple multi-domain datasets, we demonstrate our method leads to accurate and consistent appearance and shape transfer across domains.
#gan
Forwarded from Gradient Dude
LatentCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions

A framework that learns meaningful directions in GANs' latent space using unsupervised contrastive learning. Instead of discovering fixed directions such as in previous work, this method can discover non-linear directions in pretrained StyleGAN2 and BigGAN models. The discovered directions may be used for image manipulation.

Authors use the differences caused by an edit operation on the feature activations to optimize the identifiability of each direction. The edit operations are modeled by several separate neural nets ∆_i(z) and learning. Given a latent code z and its generated image x = G(z), we seek to find edit operations ∆_i(z) such that the image x' = G(∆_i(z)) has semantically meaningful changes over x while still preserving the identity of x.


📝 Paper
🛠 Code (next week)

#paper_tldr #cv #gan
Media is too big
VIEW IN TELEGRAM
Decomposing 3D Scenes into Objects via Unsupervised Volume Segmentation
* project page
* abs
* pdf
We present ObSuRF, a method which turns a single image of a scene into a 3D model represented as a set of NeRFs, with each NeRF corresponding to a different object. A single forward pass of an encoder network outputs a set of latent vectors describing the objects in the scene. These vectors are used independently to condition a NeRF decoder, defining the geometry and appearance of each object. We make learning more computationally efficient by deriving a novel loss, which allows training NeRFs on RGB-D inputs without explicit ray marching. We find that after training ObSuRF on RGB-D views of training scenes, it is capable of not only recovering the 3D geometry of a scene depicted in a single input image, but also to segment it into objects, despite receiving no supervision in that regard.
#nerf #segmentation #depth
ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement
* pdf, abs
* project page
* github
* colab
Recognizing the limitations of current inversion approaches, in this work we present a novel inversion scheme that extends current encoder-based inversion methods by introducing an iterative refinement mechanism. Instead of directly predicting the latent code of a given real image using a single pass, the encoder is tasked with predicting a residual with respect to the current estimate of the inverted latent code in a self-correcting manner. Our residual-based encoder, named ReStyle, attains improved accuracy compared to current state-of-the-art encoder-based methods with a negligible increase in inference time. We analyze the behavior of ReStyle to gain valuable insights into its iterative nature. We then evaluate the performance of our residual encoder and analyze its robustness compared to optimization-based inversion and state-of-the-art encoders.
#gan #inversion
This media is not supported in your browser
VIEW IN TELEGRAM
We also introduce a new technique for solving the image toonification task using the iterative nature of our encoders.
- twitter thread
#gan #inversion
torchtyping
Type annotations for a tensor's shape, dtype, names, ...
https://github.com/patrick-kidger/torchtyping
#tools
gradSim: Differentiable simulation for system identification and visuomotor control

* youtube
* project page
* paper under review

Our main contributions are:
* gradSim, a differentiable simulator that demonstrates the ability to backprop from video pixels to the underlying physical attributes.
* We demonstrate recovering many physical properties exclusively from video observations, including friction, elasticity, deformable material parameters, and visuomotor controls (sans 3D supervision)
* A PyTorch framework facilitating interoperability with existing machine learning modules.
#differentiable_rendering #physics #simulation
NPMs: Neural Parametric Models
for 3D Deformable Shapes

* project page
* abs
* pdf
Parametric 3D models have enabled a wide variety of tasks in computer graphics and vision, such as modeling human bodies, faces, and hands. However, the construction of these parametric models is often tedious, as it requires heavy manual tweaking, and they struggle to represent additional complexity and details such as wrinkles or clothing.

To this end, we propose Neural Parametric Models (NPMs), a novel, learned alternative to traditional, parametric 3D models, which does not require hand-crafted, object-specific constraints. In particular, we learn to disentangle 4D dynamics into latent-space representations of shape and pose, leveraging the flexibility of recent developments in learned implicit functions. Crucially, once learned, our neural parametric models of shape and pose enable optimization over the learned spaces to fit new observations, similar to the fitting of a traditional parametric model, e.g., SMPL. This enables NPMs to achieve a significantly more accurate and detailed representation of observed deformable sequences.

We show that NPMs improve notably over both parametric and non-parametric state of the art in reconstruction and tracking of monocular depth sequences of clothed humans and hands. Latent-space interpolation as well as shape / pose transfer experiments further demonstrate the usefulness of NPMs.
#implicit_geometry #depth #non_rigid_reconstruction #avatars
This media is not supported in your browser
VIEW IN TELEGRAM
Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis
* project page
* abs
* pdf

* Task: render a scene from novel poses given just a few photos.
* Neural Radiance Fields (NeRF) generate crisp renderings with 20-100 photos, but overfit with only a few.
* Problem: NeRF is only trained to render observed poses, leading to artifacts when few are available.
* Key insight: Scenes share high-level semantic properties across viewpoints, and pre-trained 2D visual encoders can extract these semantics. "An X is an X from any viewpoint."
* Our proposed DietNeRF supervises NeRF from arbitrary poses by ensuring renderings have consistent high-level semantics using the CLIP Vision Transformer.
* We generate plausible novel views given 1-8 views of a test scene.
#nerf #clip #one_shot_learning
Swapping Autoencoder for Deep Image Manipulation
The code for this amazing tool was released today.
* youtube (must watch)
* project page
* code (yes)

We propose the Swapping Autoencoder, a deep model designed specifically for image manipulation, rather than random sampling. The key idea is to encode an image with two independent components and enforce that any swapped combination maps to a realistic image. In particular, we encourage the components to represent structure and texture, by enforcing one component to encode co-occurrent patch statistics across different parts of an image. As our method is trained with an encoder, finding the latent codes for a new input image becomes trivial, rather than cumbersome. As a result, it can be used to manipulate real input images in various ways, including texture swapping, local and global editing, and latent code vector arithmetic.
#gan #autoencoder #image_editing #single_image