https://digitalhumans.com/sophie/
Here you can try to talk to Sophie.
I think it's very funny, even though her voice is too robotic. But it gives the feeling that if this issue could be solved, it could turn into very nice experience.
Also, they have Einstein.
Here you can try to talk to Sophie.
I think it's very funny, even though her voice is too robotic. But it gives the feeling that if this issue could be solved, it could turn into very nice experience.
Also, they have Einstein.
https://tinytools.directory/
🦊
Really cool collection of tools for visual interactive projects
- author
#tools
🦊
Really cool collection of tools for visual interactive projects
- author
#tools
Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks
[MIT, ChipBrain, Amazon]
- pdf
- cleanLab (for label cleaning)
- 10 common benchmark datasets were investigated: ImageNet, CIFAR-10, CIFAR-100, Caltech-256, Quickdraw, MNIST, Amazon Reviews, IMDB, 20 News Groups, AudioSet
[MIT, ChipBrain, Amazon]
- cleanLab (for label cleaning)
- 10 common benchmark datasets were investigated: ImageNet, CIFAR-10, CIFAR-100, Caltech-256, Quickdraw, MNIST, Amazon Reviews, IMDB, 20 News Groups, AudioSet
We identify label errors in the test sets of 10 common benchmark datasets.
Label errors are identified using confident learning algorithms and then human-validated via crowdsourcing.
Errors in test sets are numerous and widespread: we estimate an average of 3.4% errors across the 10 datasets, where for example 2916 label errors comprise 6% of the ImageNet validation set.
Surprisingly, we find that lower capacity models may be practically more useful than higher capacity models in real-world datasets with high proportions of erroneously labeled data. For example, on ImageNet with corrected labels: ResNet-18 outperforms ResNet-50.
On CIFAR-10 with corrected labels: VGG-11 outperforms VGG-19.
- related articleSo, I got early access to MetaHuman creator and just want to share a few of my impressions.
First of all, this thing seems to work like GeForce Now. It makes all the expensive computations on their side and streams a live interactive video to your browser. That's why the maximum session length is 1h.
Currently MH allows you to make a character using morphing to presets and moving face rig parts manually.
It allows you to make renders with different fidelity levels and preview LODs. But I didn't find any export option.
Many people were curious about its ability to represent some particular personalities.
I made some attempts starting from myself ofc 😊. There's definitely something similar at the end, but it feels like something very important is missing. I felt like I was struggling to find necessary shapes for eyes and jaw line and eyebrows. And yes, I'm not a 3D artist and it is quite easy for me to miss smth. So, here is a review of one 3D artist.
First of all, this thing seems to work like GeForce Now. It makes all the expensive computations on their side and streams a live interactive video to your browser. That's why the maximum session length is 1h.
Currently MH allows you to make a character using morphing to presets and moving face rig parts manually.
It allows you to make renders with different fidelity levels and preview LODs. But I didn't find any export option.
Many people were curious about its ability to represent some particular personalities.
I made some attempts starting from myself ofc 😊. There's definitely something similar at the end, but it feels like something very important is missing. I felt like I was struggling to find necessary shapes for eyes and jaw line and eyebrows. And yes, I'm not a 3D artist and it is quite easy for me to miss smth. So, here is a review of one 3D artist.
This media is not supported in your browser
VIEW IN TELEGRAM
So, in its current state MH allows you to quickly make high fidelity NPC with quite significant variety of options, but it is not easy to make someone in it.
UPD:
one another artist's review
UPD:
one another artist's review
Few-shot Image Generation via Cross-domain Correspondence
[Adobe Research, UC Davis, UC Berkeley]
* project page
* pdf
* code
#gan #one_shot_learning
[Adobe Research, UC Davis, UC Berkeley]
* project page
* code
Training generative models, such as GANs, on a target domain containing limited examples (e.g., 10) can easily result in overfitting. In this work, we seek to utilize a large source domain for pretraining and transfer the diversity information from source to target. We propose to preserve the relative similarities and differences between instances in the source via a novel cross-domain distance consistency loss. To further reduce overfitting, we present an anchor-based strategy to encourage different levels of realism over different regions in the latent space. With extensive results in both photorealistic and non-photorealistic domains, we demonstrate qualitatively and quantitatively that our few-shot model automatically discovers correspondences between source and target domains and generates more diverse and realistic images than previous methods.#gan #one_shot_learning
This media is not supported in your browser
VIEW IN TELEGRAM
MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement
[Facebook Reality Labs, Carnegie Mellon University]
* pdf
[Facebook Reality Labs, Carnegie Mellon University]
This paper presents a generic method for generating full facial 3D animation from speech. Existing approaches to audio-driven facial animation exhibit uncanny or static upper face animation, fail to produce accurate and plausible co-articulation or rely on person-specific models that limit their scalability. To improve upon existing models, we propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face. At the core of our approach is a categorical latent space for facial animation that disentangles audio-correlated and audio-uncorrelated information based on a novel cross-modality loss. Our approach ensures highly accurate lip motion, while also synthesizing plausible animation of the parts of the face that are uncorrelated to the audio signal, such as eye blinks and eye brow motion.
#speech2animationStylePeople: A Generative Model of Fullbody Human Avatars
[Samsung AI Center, SkolTech]
* pdf, abs
[Samsung AI Center, SkolTech]
* pdf, abs
We propose a new type of full-body human avatars, which combines parametric mesh-based body model with a neural texture. We show that with the help of neural textures, such avatars can successfully model clothing and hair, which usually poses a problem for mesh-based approaches. We also show how these avatars can be created from multiple frames of a video using backpropagation. We then propose a generative model for such avatars that can be trained from datasets of images and videos of people. The generative model allows us to sample random avatars as well as to create dressed avatars of people from one or few images.#avatars #neural_rendering #gan #smpl #3dThis media is not supported in your browser
VIEW IN TELEGRAM
Geometry-Free View Synthesis: Transformers and no 3D Priors
* pdf, abs
* code
* pdf, abs
* code
Is a geometric model required to synthesize novel views from a single image? Being bound to local convolutions, CNNs need explicit 3D biases to model geometric transformations. In contrast, we demonstrate that a transformer-based model can synthesize entirely novel views without any hand-engineered 3D biases. This is achieved by (i) a global attention mechanism for implicitly learning long-range 3D correspondences between source and target views, and (ii) a probabilistic formulation necessary to capture the ambiguity inherent in predicting novel views from a single image, thereby overcoming the limitations of previous approaches that are restricted to relatively small viewpoint changes. We evaluate various ways to integrate 3D priors into a transformer architecture. However, our experiments show that no such geometric priors are required and that the transformer is capable of implicitly learning 3D relationships between images.Regularizing Generative Adversarial Networks under Limited Data
[Google research, UC Merced, Waymo, Yonsei University]
* pdf, abs
* code
#gan #limited_data
[Google research, UC Merced, Waymo, Yonsei University]
* pdf, abs
* code
The success of the GAN models hinges on a large amount of training data. This work proposes a regularization approach for training robust GAN models on limited data. We theoretically show a connection between the regularized loss and an f-divergence called LeCam-divergence, which we find is more robust under limited training data. Extensive experiments on several benchmark datasets demonstrate that the proposed regularization scheme 1) improves the generalization performance and stabilizes the learning dynamics of GAN models under limited training data, and 2) complements the recent data augmentation methods. These properties facilitate training GAN models to achieve state-of-the-art performance when only limited training data of the ImageNet benchmark is available.
- related work#gan #limited_data
Series of lectures from Samsung AI, in Russian
- Обзор применения нейросетей в компьютерной графике, Глеб Стеркин
- Neural rendering, генерация новых изображений без построения геометрии сцены, Глеб Стеркин
- Вся серия лекций тут
#courses
- Обзор применения нейросетей в компьютерной графике, Глеб Стеркин
- Neural rendering, генерация новых изображений без построения геометрии сцены, Глеб Стеркин
- Вся серия лекций тут
#courses
YouTube
Обзор применения нейросетей в компьютерной графике. Глеб Стеркин
В лекции рассматривается ряд сценариев, в которых нейронные сети могут упростить создание компьютерной графики: рендеринг, синтез объектов, анимация персонажей. Автор Глеб Стеркин, инженер-исследователь Samsung.
Ссылка на презентацию - https://docs.goog…
Ссылка на презентацию - https://docs.goog…