This media is not supported in your browser
VIEW IN TELEGRAM
🦆 GAN + Dense Map 🦆
👉CoordGAN: structure-texture disentangled GAN with dense correspondence map
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Novel coordinate space
✅Warping to learn coordinate
✅Encoder for structure representation
✅HQ structure/texture editable images
More: https://bit.ly/3DOlOaB
👉CoordGAN: structure-texture disentangled GAN with dense correspondence map
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Novel coordinate space
✅Warping to learn coordinate
✅Encoder for structure representation
✅HQ structure/texture editable images
More: https://bit.ly/3DOlOaB
🤯4❤2🔥2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
⚓Unified shape & non-rigid motion⚓
👉CaDeX: SOTA in both shape & non-rigid motion
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Canonical Deformation Coordinate Space
✅Shape + non rigid motion representation
✅Factorization of def-homeomorphisms
✅Cycle consistency, topology & volume
✅SOTA in modelling deformable objects
More: https://bit.ly/3NM5NX1
👉CaDeX: SOTA in both shape & non-rigid motion
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Canonical Deformation Coordinate Space
✅Shape + non rigid motion representation
✅Factorization of def-homeomorphisms
✅Cycle consistency, topology & volume
✅SOTA in modelling deformable objects
More: https://bit.ly/3NM5NX1
❤4🤯1😱1
📸 ~6 BILLION CLIP-filtered pairs 📸
👉A dataset 14x bigger than the previously biggest openly accessible image-text dataset in the world.
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅2,3B English image-text pairs
✅2,2B from 100+ other languages
✅1,3B language not detected
✅KNN index for quick search
More: https://bit.ly/3LFhKvT
👉A dataset 14x bigger than the previously biggest openly accessible image-text dataset in the world.
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅2,3B English image-text pairs
✅2,2B from 100+ other languages
✅1,3B language not detected
✅KNN index for quick search
More: https://bit.ly/3LFhKvT
❤3🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🥮 PP-YOLOE: e-version of YOLO 🥮
👉 SOTA object detector up to 149+ FPS!
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Optimized PP-YOLOv2
✅S/M/L/XL for different scenarios
✅149+ FPS, with TensorRT & FP16
✅Source code & models available
More: https://bit.ly/3x454uy
👉 SOTA object detector up to 149+ FPS!
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Optimized PP-YOLOv2
✅S/M/L/XL for different scenarios
✅149+ FPS, with TensorRT & FP16
✅Source code & models available
More: https://bit.ly/3x454uy
🔥5👍3👏1🤯1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🧙 HD synthesis with LDM 🧙
👉Low-cost DM via latent space of powerful pretrained autoencoders
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Hi-res synthesis of megapixel
✅Synthesis, inpainting, stochastic SR
✅Large, consistent images of ∼1024px
✅General conditioning via cross-attention
✅Code licensed under MIT License
More: https://bit.ly/3LIVOzS
👉Low-cost DM via latent space of powerful pretrained autoencoders
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Hi-res synthesis of megapixel
✅Synthesis, inpainting, stochastic SR
✅Large, consistent images of ∼1024px
✅General conditioning via cross-attention
✅Code licensed under MIT License
More: https://bit.ly/3LIVOzS
🔥6👍3🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🎩 SinNeRF: Single Image NeRF 🎩
👉NEural Radiance Field via single view only
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅UATX + UIUC + UOregon + Picsart AI
✅"Looking only once” approach
✅semi-supervised learning process
✅Geometry/semantic pseudo-labels
✅SOTA in novel-view synthesis
More: https://bit.ly/3ujMZqF
👉NEural Radiance Field via single view only
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅UATX + UIUC + UOregon + Picsart AI
✅"Looking only once” approach
✅semi-supervised learning process
✅Geometry/semantic pseudo-labels
✅SOTA in novel-view synthesis
More: https://bit.ly/3ujMZqF
👍7🔥2👏1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 Transformer-based Tracking 🔥
👉Tracker via Transformer-based model prediction module
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Tracking by Transformer prediction
✅Extending model predictor for BBs
✅SOTA on three public benchmark
✅Code/models under GNU License 3.0
More: https://bit.ly/3ucYvUI
👉Tracker via Transformer-based model prediction module
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Tracking by Transformer prediction
✅Extending model predictor for BBs
✅SOTA on three public benchmark
✅Code/models under GNU License 3.0
More: https://bit.ly/3ucYvUI
🔥9🤯2🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
👗 In-The-Wild Virtual Try-On 👗
👉StyleGAN-based architecture for appearance flow estimation in VTON application
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Global appearance flow estimation
✅Ok with mis-alignments person/garment
✅"In-the-wild": person with natural poses
✅Code under CC BY-NC-SA 4.0 license
More: https://bit.ly/3LPR9wl
👉StyleGAN-based architecture for appearance flow estimation in VTON application
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Global appearance flow estimation
✅Ok with mis-alignments person/garment
✅"In-the-wild": person with natural poses
✅Code under CC BY-NC-SA 4.0 license
More: https://bit.ly/3LPR9wl
👏6❤3🔥1🤔1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🎇DALL·E 2 just announced!🎇
👉DALL·E 2 to create realistic images and art from natural language
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅More realistic/accurate, 4x res.
✅Better caption matching
✅Not available yet, waiting list!
More: https://bit.ly/3j9v3bR
👉DALL·E 2 to create realistic images and art from natural language
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅More realistic/accurate, 4x res.
✅Better caption matching
✅Not available yet, waiting list!
More: https://bit.ly/3j9v3bR
🔥12🤯5👍2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
👋Forecasting interactions via attention👋
👉Predicting the hand motion trajectory and the future contact points on the next active object
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Object-Centric Transformer (OCT)
✅Self-attention Transformer mechanism
✅Framework to handle uncertainty
✅SOTA on Epic-Kitchens and EGTEA
More: https://bit.ly/3v3PpbI
👉Predicting the hand motion trajectory and the future contact points on the next active object
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Object-Centric Transformer (OCT)
✅Self-attention Transformer mechanism
✅Framework to handle uncertainty
✅SOTA on Epic-Kitchens and EGTEA
More: https://bit.ly/3v3PpbI
👍4🔥2👏1🤔1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🍇SmeLU: Smooth Activation Function🍇
👉Google unveils a new smooth activation function: easy to implement, cheap & less error-prone
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Smooth to mitigate irreproducibility
✅Cheap function, better than GELU/Swish
✅0-1 slope through quadratic middle region
✅SmeLU as convolution of ReLU with box
✅Best reproducibility-accuracy tradeoff
More: https://bit.ly/3xcskXm
👉Google unveils a new smooth activation function: easy to implement, cheap & less error-prone
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Smooth to mitigate irreproducibility
✅Cheap function, better than GELU/Swish
✅0-1 slope through quadratic middle region
✅SmeLU as convolution of ReLU with box
✅Best reproducibility-accuracy tradeoff
More: https://bit.ly/3xcskXm
😱8👍4❤1🔥1😁1🤯1
📍Hyper-Dense Landmarks at 150FPS📍
👉#Microsoft unveils the SOTA in dense landmarking + #3D reconstruction. MAGIC.
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Accurate 10× as many landmarks as usual
✅Synthetic data, perfect annotations
✅NO appearance, light, diff-rendering
✅#3D @150+FPS with a single CPU thread
✅SOTA in monocular 3D reconstruction
More: https://bit.ly/37pQS40
👉#Microsoft unveils the SOTA in dense landmarking + #3D reconstruction. MAGIC.
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Accurate 10× as many landmarks as usual
✅Synthetic data, perfect annotations
✅NO appearance, light, diff-rendering
✅#3D @150+FPS with a single CPU thread
✅SOTA in monocular 3D reconstruction
More: https://bit.ly/37pQS40
👍6🔥4🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
☀️SunStage: Selfie with the Sun☀️
👉Accurate/tailored reconstruction of facial geometry/reflectance
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Novel personalized scanning
✅Disentanglement of scene params
✅Geometry, materials, lighting, poses
✅Photorealistic with a single selfie video
More: https://bit.ly/36W1Oqx
👉Accurate/tailored reconstruction of facial geometry/reflectance
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Novel personalized scanning
✅Disentanglement of scene params
✅Geometry, materials, lighting, poses
✅Photorealistic with a single selfie video
More: https://bit.ly/36W1Oqx
🔥3👏2🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
📫 Generative Neural Avatars 📫
👉3D shapes of people in a variety of garments with corresponding skinning weight
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅ETH + Uni-Tübingen + Max Planck
✅Animatable #3D human in garment
✅Directly from raw posed 3D scans
✅NO canonical, registration, manual w.
✅Geometric detail in clothing deformation
More: https://bit.ly/3M7mCdB
👉3D shapes of people in a variety of garments with corresponding skinning weight
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅ETH + Uni-Tübingen + Max Planck
✅Animatable #3D human in garment
✅Directly from raw posed 3D scans
✅NO canonical, registration, manual w.
✅Geometric detail in clothing deformation
More: https://bit.ly/3M7mCdB
👏3🔥2👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🗨️Conversational program synthesis🗨️
👉Conversational synthesis to translate English into executable code
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Conversational program synthesis
✅New multi-turn progr.benchmark
✅Open Custom library: JAXFORMER
✅Source code under BSD-3 license
More: https://bit.ly/3jjWWhk
👉Conversational synthesis to translate English into executable code
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Conversational program synthesis
✅New multi-turn progr.benchmark
✅Open Custom library: JAXFORMER
✅Source code under BSD-3 license
More: https://bit.ly/3jjWWhk
🤯4🥰2🔥1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🧯Long Video Diffusion Models🧯
👉#Google unveils a novel diffusion model for video generation
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Straightforward extension of 2D UNet
✅Longer by new conditional generation
✅SOTA in unconditional generation
More: https://bit.ly/35Y2rzg
👉#Google unveils a novel diffusion model for video generation
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Straightforward extension of 2D UNet
✅Longer by new conditional generation
✅SOTA in unconditional generation
More: https://bit.ly/35Y2rzg
🔥4🎉2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🚙 AutoRF: #3D objects in-the-wild 🚙
👉From #Meta: #3D object from just a single, in-the wild, image
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Novel view synthesis from in-the-wild
✅Normalized, object-centric representation
✅Disentangling shape, appearance & pose
✅Exploiting BBS & panoptic segmentation
✅Shape/appearance properties for objects
More: https://bit.ly/3O4ONeQ
👉From #Meta: #3D object from just a single, in-the wild, image
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅Novel view synthesis from in-the-wild
✅Normalized, object-centric representation
✅Disentangling shape, appearance & pose
✅Exploiting BBS & panoptic segmentation
✅Shape/appearance properties for objects
More: https://bit.ly/3O4ONeQ
🤯7😱2🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🌠GAN-based Darkest Dataset🌠
👉Berkeley + #Intel announce first photorealistic dataset under starlight (no moon, <0.001 lx)
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅"Darkest" dataset ever seen
✅Moonless, no external illumination
✅GAN-tuned physics-based model
✅Clips with dancing, volleyball, flags...
More: https://bit.ly/3LXxMkN
👉Berkeley + #Intel announce first photorealistic dataset under starlight (no moon, <0.001 lx)
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅"Darkest" dataset ever seen
✅Moonless, no external illumination
✅GAN-tuned physics-based model
✅Clips with dancing, volleyball, flags...
More: https://bit.ly/3LXxMkN
👍3🤯2🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🤖Populating with digital humans🤖
👉ETHZ unveils GAMMA to populate the #3D scene with digital humans
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅GenerAtive Motion primitive MArkers
✅Realistic, controllable, infinite motions
✅Tree-based search to preserve quality
✅SOTA in realistic/controllable motion
More: https://bit.ly/3OgY4AG
👉ETHZ unveils GAMMA to populate the #3D scene with digital humans
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅GenerAtive Motion primitive MArkers
✅Realistic, controllable, infinite motions
✅Tree-based search to preserve quality
✅SOTA in realistic/controllable motion
More: https://bit.ly/3OgY4AG
😱5👍4🔥2👏1🤯1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥#AIwithPapers: we are ~2,000!🔥
💙💛 Simply amazing. Thank you all 💙💛
😈 Invite your friends -> https://news.1rj.ru/str/AI_DeepLearning
💙💛 Simply amazing. Thank you all 💙💛
😈 Invite your friends -> https://news.1rj.ru/str/AI_DeepLearning
❤18🔥8🥰4👍3
This media is not supported in your browser
VIEW IN TELEGRAM
😼GARF: Gaussian Activated NeRF😼
👉GARF: Gaussian Activated R.F. for Hi-Fi reconstruction/pose
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅NeRF from imperfect camera poses
✅NO hyper-parameter tuning/initialization
✅Theoretical insight on Gaussian activation
✅Unlocking NeRF for real-world application?
More: https://bit.ly/36bvdfU
👉GARF: Gaussian Activated R.F. for Hi-Fi reconstruction/pose
𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
✅NeRF from imperfect camera poses
✅NO hyper-parameter tuning/initialization
✅Theoretical insight on Gaussian activation
✅Unlocking NeRF for real-world application?
More: https://bit.ly/36bvdfU
👍4🤩2❤1👏1🤯1