Forwarded from Denis Sexy IT 🤖
This media is not supported in your browser
VIEW IN TELEGRAM
Логичное продолжение нейронки которая стала популярной из-за того как клево она оживляет фотографии с лицами: двигающиеся ЧБ-фотографии, портреты, мемы, все это результат работы алгоритма который называется First Order Model.
Несмотря на то, что алгоритм хорошо работает, оживлять им что-то кроме лиц довольно сложно, хоть он это и поддерживает — «гличи» и помарки создают довольно неприятный эффект.
И вот, спасибо группе ученых, скоро мемы можно будет оживлять в полный рост — новый алгоритм уже может понимать какие именно части тела как бы двигались на фотографии исходя из исходного видео – сделал нарезку с видео, там все понятно (живая фигурка особенно криповая получилась).
Страница проекта:
https://snap-research.github.io/articulated-animation/
(код проекта выложат попозже)
Несмотря на то, что алгоритм хорошо работает, оживлять им что-то кроме лиц довольно сложно, хоть он это и поддерживает — «гличи» и помарки создают довольно неприятный эффект.
И вот, спасибо группе ученых, скоро мемы можно будет оживлять в полный рост — новый алгоритм уже может понимать какие именно части тела как бы двигались на фотографии исходя из исходного видео – сделал нарезку с видео, там все понятно (живая фигурка особенно криповая получилась).
Страница проекта:
https://snap-research.github.io/articulated-animation/
(код проекта выложат попозже)
Sketch-based Normal Map Generation with Geometric Sampling
* pdf, abs
* pdf, abs
Normal map is an important and efficient way to represent complex 3D models. A designer may benefit from the auto-generation of high quality and accurate normal maps from freehand sketches in 3D content creation. This paper proposes a deep generative model for generating normal maps from users’ sketch with geometric sampling. Our generative model is based on Conditional Generative Adversarial Network with the curvature-sensitive points sampling of conditional masks. This sampling process can help eliminate the ambiguity of generation results as network input. In addition, we adopted a U-Net structure discriminator to help the generator be better trained. It is verified that the proposed framework can generate more accurate normal maps.
#gan #sketch
echoinside
Few-shot Image Generation via Cross-domain Correspondence [Adobe Research, UC Davis, UC Berkeley] * project page * pdf * code Training generative models, such as GANs, on a target domain containing limited examples (e.g., 10) can easily result in overfitting.…
code for this paper is available ☺️
https://github.com/utkarshojha/few-shot-gan-adaptation
https://github.com/utkarshojha/few-shot-gan-adaptation
GitHub
GitHub - utkarshojha/few-shot-gan-adaptation: [CVPR '21] Official repository for Few-shot Image Generation via Cross-domain Correspondence
[CVPR '21] Official repository for Few-shot Image Generation via Cross-domain Correspondence - GitHub - utkarshojha/few-shot-gan-adaptation: [CVPR '21] Official repository for Few-...
NVIDIA Omniverse Audio2Face is now available in open beta. Unfortunately, it works only on Windows rn. And it requires some RTX gpu. For some reason I think that this kind of product would be much more consumer friendly as a web app like Mixamo or MetaHuman creator.
- download
- tutorial
- download
- tutorial
Audio2Face simplifies animation of a 3D character to match any voice-over track, whether you’re animating characters for a game, film, real-time digital assistants, or just for fun. You can use the app for interactive real-time applications or as a traditional facial animation authoring tool. Run the results live or bake them out, it’s up to you.
#speech2animationdualFace: Two-Stage Drawing Guidance for Freehand Portrait Sketching (CVMJ)
[JAIST, University of Tokyo]
* youtube
* project page
* code
* abs, pdf
[JAIST, University of Tokyo]
* youtube
* project page
* code
* abs, pdf
In this paper, we propose dualFace, a portrait drawing interface to assist users with different levels of drawing skills to complete recognizable and authentic face sketches. dualFace consists of two-stage drawing assistance to provide global and local visual guidance: global guidance,
which helps users draw contour lines of portraits (i.e., geometric structure), and local guidance, which helps users draws details of facial parts (which conform to user-drawn contour lines), inspired by traditional artist workflows in portrait drawing. In the stage of global guidance, the user draws several contour lines, and dualFace then searches several relevant images from an internal database and displays the suggested face contour lines over the background of the canvas. In the stage of local guidance, we synthesize detailed portrait images with a deep generative model from user-drawn contour lines, but use the synthesized results as detailed drawing guidance.
#sketch #retrieval #faceYouTube
dualFace: Two-Stage Drawing Guidance for Freehand Portrait Sketching (CVM 2021)
An interactive portrait drawing interface for freehand sketching using generative models with global and local two stages.
Z. Huang, et al.. dualFace: Two-Stage Drawing Guidance for Freehand Portrait Sketching. Journal of Computational Visual Media,…
Z. Huang, et al.. dualFace: Two-Stage Drawing Guidance for Freehand Portrait Sketching. Journal of Computational Visual Media,…
This media is not supported in your browser
VIEW IN TELEGRAM
Sceneformer: Indoor Scene Generation with Transformers
[ Technical University of Munich]
* project page
* github
* pdf, abs
[ Technical University of Munich]
* project page
* github
* pdf, abs
We address the task of indoor scene generation by generating a sequence of objects, along with their locations and orientations conditioned on a room layout. Large-scale indoor scene datasets allow us to extract patterns from user-designed indoor scenes, and generate new scenes based on these patterns. Existing methods rely on the 2D or 3D appearance of these scenes in addition to object positions, and make assumptions about the possible relations between objects. In contrast, we do not use any appearance information, and implicitly learn object relations using the self-attention mechanism of transformers. Our method is also flexible, as it can be conditioned not only on the room layout but also on text denoscriptions of the room, using only the cross-attention mechanism of transformers.
#indoorThis media is not supported in your browser
VIEW IN TELEGRAM
Explaining in Style: Training a GAN to explain a classifier in StyleSpace
[Google research]
* project page
* pdf
[Google research]
* project page
Image classification models can depend on multiple different semantic attributes of the image. An explanation of the decision of the classifier needs to both discover and visualize these properties. Here we present StylEx, a method for doing this, by training a generative model to specifically explain multiple attributes that underlie classifier decisions. We apply StylEx to multiple domains, including animals, leaves, faces and retinal images. For these, we show how an image can be modified in different ways to change its classifier output. Our results show that the method finds attributes that align well with semantic ones, generate meaningful image-specific explanations, and are human-interpretable as measured in user-studies.
#ganKeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control
[University of Oxford, UC Berkeley, Stanford University, Google Research]
* project page
* demo
* pdf, abs
* code (plan to release on April 30)
[University of Oxford, UC Berkeley, Stanford University, Google Research]
* project page
* demo
* pdf, abs
* code (plan to release on April 30)
We present KeypointDeformer, a novel unsupervised method for shape control through automatically discovered 3D keypoints. Our approach produces intuitive and semantically consistent control of shape deformations. Moreover, our discovered 3D keypoints are consistent across object category instances despite large shape variations. Since our method is unsupervised, it can be readily deployed to new object categories without requiring expensive annotations for 3D keypoints and deformations. Our method also works on real-world 3D scans of shoes from Google scanned objects.
#3d #unsupervisedTotal Relighting:
Learning to Relight Portraits for Background Replacement
[Google Research]
* youtube
* project page
* pdf
Learning to Relight Portraits for Background Replacement
[Google Research]
* youtube
* project page
We propose a novel system for portrait relighting and background replacement. Our technique includes alpha matting, relighting, and compositing. We demonstrate that each of these stages can be tackled in a sequential pipeline without the use of priors (e.g. background or illumination) and with no specialized acquisition techniques, using only a single RGB portrait image and a novel, target HDR lighting environment as inputs. We train our model using relit portraits of subjects captured in a light stage computational illumination system, which records multiple lighting conditions, high quality geometry, and accurate alpha mattes. To perform realistic relighting for compositing, we introduce a novel per-pixel lighting representation in a deep learning framework, which explicitly models the diffuse and the specular components of appearance.
#single_image #relighting #matting
echoinside
https://stephaneginier.com/sculptgl Sculpting in web #web #tools
Emerging Properties in Self-Supervised Vision Transformers (DINO)
[Facebook AI Research, Inria, Sorbonne University]
(probably everyone already noticed)
* pdf, abs
* github
* blog
* YC review
tldr; DINO — self-distillation with no labels.
Non-contrastive SSL which looks like BYOL with some tweaks: local & global image crops sent to the student network, while only global crops sent to the teacher (multi-crop thing); cross entropy loss between teacher & student representations. Tested with resnet and vision_transformers, achieving better results with the latter. Works extremely well as a feature extractor for knn and simple linear models. It is also shown, that ViTs extract impressive class-specific segmentation maps in this unsupervised setting, which look much better than in supervised setting.
Also, batch size matters, and multi-crop matters.
It is possible to use this method with small batches without multi-crop, but this setup wasn't studied well.
#self_supervised
[Facebook AI Research, Inria, Sorbonne University]
(probably everyone already noticed)
* pdf, abs
* github
* blog
* YC review
tldr; DINO — self-distillation with no labels.
Non-contrastive SSL which looks like BYOL with some tweaks: local & global image crops sent to the student network, while only global crops sent to the teacher (multi-crop thing); cross entropy loss between teacher & student representations. Tested with resnet and vision_transformers, achieving better results with the latter. Works extremely well as a feature extractor for knn and simple linear models. It is also shown, that ViTs extract impressive class-specific segmentation maps in this unsupervised setting, which look much better than in supervised setting.
Also, batch size matters, and multi-crop matters.
For example, the performance is 72.5% after 46 hours of training without multi-crop (i.e. 2×224^2) while DINO in 2×224^2+10×96^2 crop setting reaches 74.6% in 24 hours only. Memory usage is 15,4G vs 9.3G. So, the best result of 76.1 top1 accuracy is achieved using 16 GPUs for 3 days. And it is still better computational result than in the previous unsupervised works.It is possible to use this method with small batches without multi-crop, but this setup wasn't studied well.
Results with the smaller batch sizes (bs = 128) are slightly below our default training setup of bs = 1024, and would certainly require to re-tune hyperparameters like the momentum rates for example. We have explored training a model with a batch size of 8, reaching 35.2% after 50 epochs, showing the potential for training large models that barely fit an image per GPU.
See images in comments.#self_supervised
YouTube
DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)
#dino #facebook #selfsupervised
Self-Supervised Learning is the final frontier in Representation Learning: Getting useful features without any labels. Facebook AI's new system, DINO, combines advances in Self-Supervised Learning for Computer Vision with…
Self-Supervised Learning is the final frontier in Representation Learning: Getting useful features without any labels. Facebook AI's new system, DINO, combines advances in Self-Supervised Learning for Computer Vision with…
Vector Neurons: A General Framework for SO(3)-Equivariant Networks
[Stanford University, NVIDIA Research, Google Research, University of Toronto]
* pdf, abs
* github
* project page
[Stanford University, NVIDIA Research, Google Research, University of Toronto]
* pdf, abs
* github
* project page
Invariance and equivariance to the rotation group have been widely discussed in the 3D deep learning community for pointclouds. We introduce a general framework built on top of what we call Vector Neuron representations for creating SO(3)-equivariant neural networks for pointcloud processing. Extending neurons from 1D scalars to 3D vectors, our vector neurons enable a simple mapping of SO(3) actions to latent spaces thereby providing a framework for building equivariance in common neural operations -- including linear layers, non-linearities, pooling, and normalizations. Due to their simplicity, vector neurons are versatile and, as we demonstrate, can be incorporated into diverse network architecture backbones.
We also show for the first time a rotation equivariant reconstruction network.
#3d #point_clouds #implicit_geometry
echoinside
* youtube * paper Приятная в своей изящной простоте и эффективности работа. Что сделано Набор сканов людей зарегистрирован в SMPL параметрическую модель. Получен датасет вида {меш с RGB значением для каждой вершины, набор фоток с разными известными параметрами…
The code is now available: https://github.com/sergeyprokudin/smplpix
And it is also possible to use SMPLpix renderer with DECA to make only face renders.
#smpl #neural_rendering
And it is also possible to use SMPLpix renderer with DECA to make only face renders.
#smpl #neural_rendering
GitHub
GitHub - sergeyprokudin/smplpix: SMPLpix: Neural Avatars from 3D Human Models
SMPLpix: Neural Avatars from 3D Human Models. Contribute to sergeyprokudin/smplpix development by creating an account on GitHub.
This media is not supported in your browser
VIEW IN TELEGRAM
Project Starline: Feel like you're there, together
- blogpost
Very impressive press release from Google that pushes the boundaries of telepresence.
- blogpost
Very impressive press release from Google that pushes the boundaries of telepresence.
Imagine looking through a sort of magic window, and through that window, you see another person, life-size and in three dimensions. You can talk naturally, gesture and make eye contact.
There's not much information about the technical part yet. The project was under development for a few years and some trial deployments are planned this year.To make this experience possible, we are applying research in computer vision, machine learning, spatial audio and real-time compression. We've also developed a breakthrough light field display system that creates a sense of volume and depth that can be experienced without the need for additional glasses or headsets.
echoinside
#sdf #tools #implicit_geometry Автор видео @ephtracy в твиттер делиться прогрессом по созданию собственного 3д редактора работающего с signed distance field. В том же блендере возможно моделировать hard surface объекты с помощью metaballs, что даёт похожие…
Small release for win64
https://ephtracy.github.io/index.html?page=magicacsg
review: https://youtu.be/rgwNsNCpbhg?t=208
#sdf #tools
https://ephtracy.github.io/index.html?page=magicacsg
review: https://youtu.be/rgwNsNCpbhg?t=208
#sdf #tools
ephtracy.github.io
MagicaVoxel
MagicaVoxel Official Website