This media is not supported in your browser
VIEW IN TELEGRAM
🧧Hyper-Fast Instance Segmentation🧧
👉Novel Temporally Efficient Vision Transformer (TeViT) for VIS
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Video instance segmentation transformer
✅Contextual-info at frame/instance level
✅Nearly convolution-free framework 🤷♂️
✅The new SOTA for VIS, ~70 FPS!
✅Code & models under MIT license
More: https://bit.ly/3rCMXIn
👉Novel Temporally Efficient Vision Transformer (TeViT) for VIS
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Video instance segmentation transformer
✅Contextual-info at frame/instance level
✅Nearly convolution-free framework 🤷♂️
✅The new SOTA for VIS, ~70 FPS!
✅Code & models under MIT license
More: https://bit.ly/3rCMXIn
🔥10👍3👏1🤯1
📗Unified Scene Text/Layout Detection📗
👉World's first hierarchical scene text dataset + novel detection method
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Unified detection & geometric layout
✅Hierarchical annotations in natural scenes
✅Word, line, & paragraph level annotations
✅Source under CC Attribution Share Alike 4.0
More: https://bit.ly/3jRpezV
👉World's first hierarchical scene text dataset + novel detection method
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Unified detection & geometric layout
✅Hierarchical annotations in natural scenes
✅Word, line, & paragraph level annotations
✅Source under CC Attribution Share Alike 4.0
More: https://bit.ly/3jRpezV
🔥3🤯2❤1👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🙌 #Oculus' new Hand Tracking 🙌
👉Hands are able to move as naturally and intuitively in the #metaverse as do in real life
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Hands2.0 powered by CV & ML
✅Tracking hand-over-hand interactions
✅Crossing hands, clapping, high-fives
✅Accurate thumbs-up gesture
More: https://bit.ly/3JXPvY2
👉Hands are able to move as naturally and intuitively in the #metaverse as do in real life
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Hands2.0 powered by CV & ML
✅Tracking hand-over-hand interactions
✅Crossing hands, clapping, high-fives
✅Accurate thumbs-up gesture
More: https://bit.ly/3JXPvY2
🤯6❤4👍2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🎗️New SOTA in #3D human avatar🎗️
👉PHORHUM: photorealistic 3D human from mono-RGB
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Pixel-aligned method for 3D geometry
✅Unshaded surface color + illumination
✅Patch-based rendering losses for visible
✅Plausible color estimation for non-visible
More: https://bit.ly/3MkvBrA
👉PHORHUM: photorealistic 3D human from mono-RGB
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Pixel-aligned method for 3D geometry
✅Unshaded surface color + illumination
✅Patch-based rendering losses for visible
✅Plausible color estimation for non-visible
More: https://bit.ly/3MkvBrA
🤯4👍2🥰2❤1
This media is not supported in your browser
VIEW IN TELEGRAM
📟 What's in your hands (#3D) ? 📟
👉Reconstructing hand-held objects (from single RGB) without knowing their 3D templates🤷♂️
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Hand is highly predictive of object shape
✅Conditional-based on the articulation
✅Visual feats. / articulation-aware coords.
✅Code and models available!
More: https://bit.ly/3vuYn2a
👉Reconstructing hand-held objects (from single RGB) without knowing their 3D templates🤷♂️
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Hand is highly predictive of object shape
✅Conditional-based on the articulation
✅Visual feats. / articulation-aware coords.
✅Code and models available!
More: https://bit.ly/3vuYn2a
👍9🤯2🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🔋YODO: You Only Demonstrate Once🔋
👉A novel category-level manipulation learned in sim from single demonstration video🤯
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅One-shot IL, model-free 6D pose tracking
✅Demonstration BY single 3rd-person-view
✅manipulation including hi-precision tasks
✅Category-level Behavior Cloning
✅Attention for dynamic coords selection
✅Generalizability to novel unseen obj/env
More: https://bit.ly/3v0V4R4
👉A novel category-level manipulation learned in sim from single demonstration video🤯
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅One-shot IL, model-free 6D pose tracking
✅Demonstration BY single 3rd-person-view
✅manipulation including hi-precision tasks
✅Category-level Behavior Cloning
✅Attention for dynamic coords selection
✅Generalizability to novel unseen obj/env
More: https://bit.ly/3v0V4R4
🤯8❤3👍2😱2🤩2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
👗 Dress Code for Virtual Try-On 👗
👉UniMORE (+ YOOX) unveils a novel dataset/approach for virtual try-on.
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Hi-Res paired front-view / full-body
✅Pixel-level Semantic-Aware Discriminator
✅9 SOTA VTON approaches / 3 baselines
✅New SOTA considering res. & garments
More: https://bit.ly/3xKXSUw
👉UniMORE (+ YOOX) unveils a novel dataset/approach for virtual try-on.
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Hi-Res paired front-view / full-body
✅Pixel-level Semantic-Aware Discriminator
✅9 SOTA VTON approaches / 3 baselines
✅New SOTA considering res. & garments
More: https://bit.ly/3xKXSUw
❤3👍3🔥1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🍃Deep Equilibrium for Optical Flow🍃
👉DEQ: converge faster, less memory, often more accurate
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Novel formulation of optical flow method
✅Compatible with prior modeling/data-related
✅Sparse fixed-point correction for stability
✅Code/models under GNU Affero GPL v3.0
More: https://bit.ly/3v4fZmi
👉DEQ: converge faster, less memory, often more accurate
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Novel formulation of optical flow method
✅Compatible with prior modeling/data-related
✅Sparse fixed-point correction for stability
✅Code/models under GNU Affero GPL v3.0
More: https://bit.ly/3v4fZmi
👍3🥰2🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🌳Ultra High-Resolution Neural Saliency🌳
👉A novel ultra high-resolution saliency detector with dataset!
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Ultra Hi-Res Saliency Detection
✅5,920 pics at 4K-8K resolution
✅Pyramid Grafting Network
✅Cross-Model Grafting Module
✅AGL: Attention Guided Loss
✅Code/models under MIT
More: https://bit.ly/3MnU1Rf
👉A novel ultra high-resolution saliency detector with dataset!
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Ultra Hi-Res Saliency Detection
✅5,920 pics at 4K-8K resolution
✅Pyramid Grafting Network
✅Cross-Model Grafting Module
✅AGL: Attention Guided Loss
✅Code/models under MIT
More: https://bit.ly/3MnU1Rf
❤6👍3🤯3🔥2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🪆StyleGAN-Human for fashion 🪆
👉A novel unconditional human generation based on StyleGAN is out!
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅200,000+ labeled sample (pose/texture)
✅1024x512 StyleGAN-Human StyleGAN3
✅512x256 StyleGAN-Human StyleGAN1
✅Face model for downstream: InsetGAN
✅Source code and model available!
More: https://bit.ly/3xMg5B2
👉A novel unconditional human generation based on StyleGAN is out!
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅200,000+ labeled sample (pose/texture)
✅1024x512 StyleGAN-Human StyleGAN3
✅512x256 StyleGAN-Human StyleGAN1
✅Face model for downstream: InsetGAN
✅Source code and model available!
More: https://bit.ly/3xMg5B2
❤5👍4🔥3🤯1💩1
This media is not supported in your browser
VIEW IN TELEGRAM
💀 OSSO: Skeletal Shape from Outside 💀
👉Anatomic skeleton of a person from 3D surface of body 🦴
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Max Planck + IMATI-CNR + INRIA
✅DXA images to obtain #3D shape
✅External body to internal skeleton
More: https://bit.ly/3v7Z5TQ
👉Anatomic skeleton of a person from 3D surface of body 🦴
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Max Planck + IMATI-CNR + INRIA
✅DXA images to obtain #3D shape
✅External body to internal skeleton
More: https://bit.ly/3v7Z5TQ
👍4🤯2🔥1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🎷 Pix2Seq: object detection by #Google 🎷
👉A novel framework to perform object detection as a language modeling task
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Obj. detection as a lang-modeling task
✅BBs/labels -> seq. of discrete token
✅Encoder-decoder (one token at a time)
✅Code under Apache License 2.0
More: https://bit.ly/3F49PX3
👉A novel framework to perform object detection as a language modeling task
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Obj. detection as a lang-modeling task
✅BBs/labels -> seq. of discrete token
✅Encoder-decoder (one token at a time)
✅Code under Apache License 2.0
More: https://bit.ly/3F49PX3
👍8🤯3🔥1😱1🎉1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🌹 Generalizable Neural Performer 🌹
👉General neural framework to synthesize free-viewpoint images of arbitrary human performers
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Free-viewpoint synthesis of humans
✅Implicit Geometric Body Embedding
✅Screen-Space Occlusion-Aware Blending
✅GeneBody: 4M frames, multi-view cams
More: https://cutt.ly/SGcnQzn
👉General neural framework to synthesize free-viewpoint images of arbitrary human performers
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Free-viewpoint synthesis of humans
✅Implicit Geometric Body Embedding
✅Screen-Space Occlusion-Aware Blending
✅GeneBody: 4M frames, multi-view cams
More: https://cutt.ly/SGcnQzn
👍5🔥1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🚌 Tire-defect inspection 🚌
👉Unsupervised defects in tires using neural networks
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Impurity, same material as tire
✅Impurity, with different material
✅Damage by temp/pressure
✅Crack or etched material
More: https://bit.ly/37GX1JT
👉Unsupervised defects in tires using neural networks
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Impurity, same material as tire
✅Impurity, with different material
✅Damage by temp/pressure
✅Crack or etched material
More: https://bit.ly/37GX1JT
❤5👍3🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🧋#4D Neural Fields🧋
👉4D N.F. visual representations from monocular RGB-D 🤯
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅4D scene completion (occlusions)
✅Scene completion in cluttered scenes
✅Novel #AI for contextual point clouds
✅Data, code, models under MIT license
More: https://cutt.ly/6GveKiJ
👉4D N.F. visual representations from monocular RGB-D 🤯
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅4D scene completion (occlusions)
✅Scene completion in cluttered scenes
✅Novel #AI for contextual point clouds
✅Data, code, models under MIT license
More: https://cutt.ly/6GveKiJ
👍6🤯2🔥1🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
👔Largest dataset of human-object 👔
👉BEHAVE by Google: largest dataset of human-object interactions
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅8 subjects, 20 objects, 5 envs.
✅321 clips with 4 Kinect RGB-D
✅Masks and segmented point clouds
✅3D SMPL & mesh registration
✅Textured scan reconstructions
More: https://bit.ly/3Lx6NNo
👉BEHAVE by Google: largest dataset of human-object interactions
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅8 subjects, 20 objects, 5 envs.
✅321 clips with 4 Kinect RGB-D
✅Masks and segmented point clouds
✅3D SMPL & mesh registration
✅Textured scan reconstructions
More: https://bit.ly/3Lx6NNo
👏5👍4🔥2❤1😱1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦴ENARF-GAN Neural Articulations🦴
👉Unsupervised method for 3D geometry-aware representation of articulated objects
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Novel efficient neural representation
✅Tri-planes deformation fields for training
✅Novel GAN for articulated representations
✅Controllable 3D from real unlabeled pic
More: https://bit.ly/3xYqedN
👉Unsupervised method for 3D geometry-aware representation of articulated objects
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Novel efficient neural representation
✅Tri-planes deformation fields for training
✅Novel GAN for articulated representations
✅Controllable 3D from real unlabeled pic
More: https://bit.ly/3xYqedN
🤯3👍2❤1🔥1🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🖲️ HuMMan: 4D human dataset 🖲️
👉HuMMan: 4D dataset with 1000 humans, 400k sequences & 60M frames 🤯
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅RGB, pt-clouds, keypts, SMPL, texture
✅Mobile device in the sensor suite
✅500+ actions to cover movements
More: https://bit.ly/3vTRW8Z
👉HuMMan: 4D dataset with 1000 humans, 400k sequences & 60M frames 🤯
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅RGB, pt-clouds, keypts, SMPL, texture
✅Mobile device in the sensor suite
✅500+ actions to cover movements
More: https://bit.ly/3vTRW8Z
🥰2😱2👍1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥Neighborhood Attention Transformer 🔥
👉A novel transformer for both image classification and downstream vision tasks
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Neighborhood Attention (NA)
✅Neighborhood Attention Transformer, NAT
✅Faster training/inference, good throughput
✅Checkpoints, train, #CUDA kernel available
More: https://bit.ly/3F5aVSo
👉A novel transformer for both image classification and downstream vision tasks
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Neighborhood Attention (NA)
✅Neighborhood Attention Transformer, NAT
✅Faster training/inference, good throughput
✅Checkpoints, train, #CUDA kernel available
More: https://bit.ly/3F5aVSo
🤯4👍3🔥1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥🔥FANs: Fully Attentional Networks🔥🔥
👉#Nvidia unveils the fully attentional networks (FANs)
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Efficient fully attentional design
✅Semantic seg. & object detection
✅Model/source code soon available!
More: https://bit.ly/3vtpITs
👉#Nvidia unveils the fully attentional networks (FANs)
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Efficient fully attentional design
✅Semantic seg. & object detection
✅Model/source code soon available!
More: https://bit.ly/3vtpITs
🔥7🤯3👍2❤1
👨🏼🎨 Open-Source DALL·E 2 is out 👨🏼🎨
👉#Pytorch implementation of DALL-E 2, #OpenAI's latest text-to-image neural net.
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅SOTA for text-to-image generation
✅Source code/model under MIT License
✅"Medieval painting of wifi not working"
More: https://bit.ly/3vzsff6
👉#Pytorch implementation of DALL-E 2, #OpenAI's latest text-to-image neural net.
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅SOTA for text-to-image generation
✅Source code/model under MIT License
✅"Medieval painting of wifi not working"
More: https://bit.ly/3vzsff6
🤯14👍6😁1