Repurposing GANs for One-shot Semantic Part Segmentation
* abs
* project page
* not official code
- another similar work from NVIDIA
* abs
* project page
* not official code
Do GANs learn meaningful structural parts of objects during their attempt to reproduce those objects? In this work, we test this hypothesis and propose a simple and effective approach based on GANs for semantic part segmentation that requires as few as one label example along with an unlabeled dataset. Our key idea is to leverage a trained GAN to extract pixel-wise representation from the input image and use it as feature vectors for a segmentation network. Our experiments demonstrate that GANs representation is "readily discriminative" and produces surprisingly good results that are comparable to those from supervised baselines trained with significantly more labels. We believe this novel repurposing of GANs underlies a new class of unsupervised representation learning that is applicable to many other tasks.
#gan #semantic_seg- another similar work from NVIDIA
Forwarded from эйай ньюз
На реддите запостили особый колаб-ноутбук который каждый раз дает Tesla-P100 GPU и 25 Gb RAM.
Можно копировать себе и использовать. Поспешите пока лавочку не прикрыли.
Ссылка: https://colab.research.google.com/drive/1D6krVG0PPJR2Je9g5eN_2h6JP73_NUXz
Можно копировать себе и использовать. Поспешите пока лавочку не прикрыли.
Ссылка: https://colab.research.google.com/drive/1D6krVG0PPJR2Je9g5eN_2h6JP73_NUXz
High-fidelity Face Tracking for AR/VR via Deep Lighting Adaptation
[Facebook Reality Labs]
* youtube
* pdf
* abs
[Facebook Reality Labs]
* youtube
* abs
3D video avatars can empower virtual communications by providing compression, privacy, entertainment, and a sense of presence in AR/VR. Best 3D photo-realistic AR/VR avatars driven by video, that can minimize uncanny effects, rely on person-specific models. However, existing person-specific photo-realistic 3D models are not robust to lighting, hence their results typically miss subtle facial behaviors and cause artifacts in the avatar. This is a major drawback for the scalability of these models in communication systems (e.g., Messenger, Skype, FaceTime) and AR/VR. This paper addresses previous limitations by learning a deep learning lighting model, that in combination with a high-quality 3D face tracking algorithm, provides a method for subtle and robust facial motion transfer from a regular video to a 3D photo-realistic avatar. Extensive experimental validation and comparisons to other state-of-the-art methods demonstrate the effectiveness of the proposed framework in real-world scenarios with variability in pose, expression, and illumination.
#face_trackingYouTube
(CVPR 2021) High-fidelity Face Tracking for AR/VR via Deep Lighting Adaptation
3D video avatars can empower virtual communications
by providing compression, privacy, entertainment, and a
sense of presence in AR/VR. Best 3D photo-realistic AR/VR
avatars driven by video, that can minimize uncanny effects,
rely on person-specific models.…
by providing compression, privacy, entertainment, and a
sense of presence in AR/VR. Best 3D photo-realistic AR/VR
avatars driven by video, that can minimize uncanny effects,
rely on person-specific models.…
Forwarded from Being Danil Krivoruchko
Matt Winckelmann все-таки удивительный человек.
Помимо работы в двух лучших на планете моушен-студиях (и еще классного вводного курса по UE на Ентагме) у него есть еще персональные проекты. Сегодня вот узнал про свежий, и там прямо все красиво. Мэтт запустил бота Рейчел (привет, Blade runner), который в течение года генерил 3д-дейлики, которые как по мне не сильно отличаются от 99% других дейликов, и постил их в свой заведеный инстаграм аккаунт.
Результат - у бота в полтора раза больше подписчиков, чем у Мэтта. По-моему идеальный художественный комментарий на тему "экономики внимания", "инфлюенсеров" и прочей ИГ-культуры.
https://www.mwinckelmann.com/rachaelisnotreal
Помимо работы в двух лучших на планете моушен-студиях (и еще классного вводного курса по UE на Ентагме) у него есть еще персональные проекты. Сегодня вот узнал про свежий, и там прямо все красиво. Мэтт запустил бота Рейчел (привет, Blade runner), который в течение года генерил 3д-дейлики, которые как по мне не сильно отличаются от 99% других дейликов, и постил их в свой заведеный инстаграм аккаунт.
Результат - у бота в полтора раза больше подписчиков, чем у Мэтта. По-моему идеальный художественный комментарий на тему "экономики внимания", "инфлюенсеров" и прочей ИГ-культуры.
https://www.mwinckelmann.com/rachaelisnotreal
Conway's Game of Life in blender nodes. See the thread for the nodes setup.
https://twitter.com/GelamiSalami/status/1375139627351220234
#b3d
https://twitter.com/GelamiSalami/status/1375139627351220234
#b3d
Twitter
GelamiSalami
Found out a way to have image buffers in the node editor Here's Conway's Game of Life with nodes #blender #b3d #eevee https://t.co/hddRPT90MP
Unreal and Unity released proper good bois today
* Unreal MetaPet
* Unity pettable object (poor Unity...)
* Unreal MetaPet
* Unity pettable object (
Twitter
Unreal Engine
Say hello to MetaPets 🐾 the next-generation of fur-ever friends from Unreal Engine. Creating #MetaPets is as easy as a walk in the park using the new 🐶 MetaPet Creator. #UE4 Unleash your potential and see the pawsibilities 👇
Forwarded from Data Science by ODS.ai 🦜
EfficientNetV2: Smaller Models and Faster Training
A new paper from Google Brain with a new SOTA architecture called EfficientNetV2. The authors develop a new family of CNN models that are optimized both for accuracy and training speed. The main improvements are:
- an improved training-aware neural architecture search with new building blocks and ideas to jointly optimize training speed and parameter efficiency;
- a new approach to progressive learning that adjusts regularization along with the image size;
As a result, the new approach can reach SOTA results while training faster (up to 11x) and smaller (up to 6.8x).
Paper: https://arxiv.org/abs/2104.00298
Code will be available here:
https://github.com/google/automl/efficientnetv2
A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-effnetv2
#cv #sota #nas #deeplearning
A new paper from Google Brain with a new SOTA architecture called EfficientNetV2. The authors develop a new family of CNN models that are optimized both for accuracy and training speed. The main improvements are:
- an improved training-aware neural architecture search with new building blocks and ideas to jointly optimize training speed and parameter efficiency;
- a new approach to progressive learning that adjusts regularization along with the image size;
As a result, the new approach can reach SOTA results while training faster (up to 11x) and smaller (up to 6.8x).
Paper: https://arxiv.org/abs/2104.00298
Code will be available here:
https://github.com/google/automl/efficientnetv2
A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-effnetv2
#cv #sota #nas #deeplearning
Media is too big
VIEW IN TELEGRAM
LoFTR: Detector-Free Local Feature Matching with Transformers
* project page
* pdf
* code (not released yet)
* project page
* code (not released yet)
We present a novel method for local image feature matching. Instead of performing image feature detection, denoscription, and matching sequentially, we propose to first establish pixel-wise dense matches at a coarse level and later refine the good matches at a fine level. In contrast to dense methods that use cost volume to search correspondences, we use self and cross attention layers in Transformers to obtain feature denoscriptors that are conditioned on both images. The global receptive field provided by Transformers enables our method to produce dense matches in low-texture areas, where feature detectors usually struggle to produce repeatable interest points. The experiments on indoor and outdoor datasets show that LoFTR outperforms state-of-the-art methods by a large margin. LoFTR also ranks first on two public benchmarks of visual localization among the published methods.This media is not supported in your browser
VIEW IN TELEGRAM
Reconstructing 3D Human Pose by Watching Humans in the Mirror
Very creative idea for data collection.
* project page
* pdf
* code
Very creative idea for data collection.
* project page
* code
In this paper, we introduce the new task of reconstructing 3D human pose from a single image in which we can see the person and the person's image through a mirror. Compared to general scenarios of 3D pose estimation from a single view, the mirror reflection provides an additional view for resolving the depth ambiguity. We develop an optimization-based approach that exploits mirror symmetry constraints for accurate 3D pose reconstruction. We also provide a method to estimate the surface normal of the mirror from vanishing points in the single image. To validate the proposed approach, we collect a large-scale dataset named Mirrored-Human. The experiments show that, when trained on Mirrored-Human with our reconstructed 3D poses as pseudo ground-truth, the accuracy and generalizability of existing single-view 3D pose estimators can be largely improved.One of the biggest demoscene conferences Revision was banned on twitch (the reason is unknown) and now they moved to ccc. Surprisingly, they have better streaming quality rn.
So, if you liked demoscenes in your childhood, they are streaming here some PC 4K demoscenes.
UPD: Okay, now it's 256 bytes — or what you can put into one tweet size.
So, if you liked demoscenes in your childhood, they are streaming here some PC 4K demoscenes.
UPD: Okay, now it's 256 bytes — or what you can put into one tweet size.
This media is not supported in your browser
VIEW IN TELEGRAM
NeRF-VAE: A Geometry Aware 3D Scene Generative Model
* abs
* pdf
* abs
We propose NeRF-VAE, a 3D scene generative model that incorporates geometric structure via NeRF and differentiable volume rendering. In contrast to NeRF, our model takes into account shared structure across scenes, and is able to infer the structure of a novel scene -- without the need to re-train -- using amortized inference. Our model is a VAE that learns a distribution over radiance fields by conditioning them on a latent scene representation. We show that, once trained, NeRF-VAE is able to infer and render geometrically-consistent scenes from previously unseen 3D environments using very few input images. We further demonstrate that NeRF-VAE generalizes well to out-of-distribution cameras, while convolutional models do not.#nerf #vae #generative
This media is not supported in your browser
VIEW IN TELEGRAM
Unconstrained Scene Generation with Locally Conditioned Radiance Fields
* twitter thread
* abs
* pdf
* twitter thread
* abs
Introducing Generative Scene Networks (GSN), a generative model for learning radiance fields for realistic scenes. With GSN we can sample scenes from the learned prior and move through them with a freely moving camera.
In order to model radiance fields for unconstrained scenes we decompose them into many small locally conditioned radiance fields which are conditioned on a latent spatial representation of a scene W.
The prior learned by GSN can be used for view synthesis: by inverting GSNs generator we can complete unobserved parts of a scene conditioned on a sparse set of views.
#nerf #novel_view #indoorGenerating Furry Cars: Disentangling Object Shape and Appearance across Multiple Domains
* project page
* paper under review
* project page
* paper under review
The goal is to learn a generative model that learns an intermediate distribution, which borrows a subset of properties from each domain, enabling the generation of images that did not exist in any domain exclusively. This challenging problem requires an accurate disentanglement of object shape, appearance, and background from each domain, so that the appearance and shape factors from the two domains can be interchanged. Our key technical contribution is to represent object appearance with a differentiable histogram of visual features, and to optimize the generator so that two images with the same latent appearance factor but different latent shape factors produce similar histograms. On multiple multi-domain datasets, we demonstrate our method leads to accurate and consistent appearance and shape transfer across domains.
#ganForwarded from Gradient Dude
LatentCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions
A framework that learns meaningful directions in GANs' latent space using unsupervised contrastive learning. Instead of discovering fixed directions such as in previous work, this method can discover non-linear directions in pretrained StyleGAN2 and BigGAN models. The discovered directions may be used for image manipulation.
Authors use the differences caused by an edit operation on the feature activations to optimize the identifiability of each direction. The edit operations are modeled by several separate neural nets
📝 Paper
🛠 Code (next week)
#paper_tldr #cv #gan
A framework that learns meaningful directions in GANs' latent space using unsupervised contrastive learning. Instead of discovering fixed directions such as in previous work, this method can discover non-linear directions in pretrained StyleGAN2 and BigGAN models. The discovered directions may be used for image manipulation.
Authors use the differences caused by an edit operation on the feature activations to optimize the identifiability of each direction. The edit operations are modeled by several separate neural nets
∆_i(z) and learning. Given a latent code z and its generated image x = G(z), we seek to find edit operations ∆_i(z) such that the image x' = G(∆_i(z)) has semantically meaningful changes over x while still preserving the identity of x.📝 Paper
🛠 Code (next week)
#paper_tldr #cv #gan