AI with Papers - Artificial Intelligence & Deep Learning – Telegram
AI with Papers - Artificial Intelligence & Deep Learning
15.8K subscribers
146 photos
260 videos
14 files
1.36K links
All the AI with papers. Every day fresh updates about #DeepLearning #MachineLearning #LLM & #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#AI #chatGPT
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🐍DS Unsupervised Video Decomposition🐍

👉Novel method to extract persistent elements of a scene

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Scene element as Deformable Sprite (DS)
Deformable Sprites by video auto-encoder
Canonical texture image for appearance
Non-rigid geom. transformation

More: https://bit.ly/37WV9w1
👍4🤯3🔥1🥰1👏1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🥓 L-SVPE for Deep Deblurring 🥓

👉L-SVPE to deblur scenes while recovering high-freq details

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Learned Spatially Varying Pixel Exposures
Next-gen focal-plane sensor + DL
Deep conv decoder for motion deblurring
Superior results over non-optimized exp.

More: https://bit.ly/3uRYQMT
🤩7👍2🤔2🎉1
This media is not supported in your browser
VIEW IN TELEGRAM
🧧Hyper-Fast Instance Segmentation🧧

👉Novel Temporally Efficient Vision Transformer (TeViT) for VIS

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Video instance segmentation transformer
Contextual-info at frame/instance level
Nearly convolution-free framework 🤷‍♂️
The new SOTA for VIS, ~70 FPS!
Code & models under MIT license

More: https://bit.ly/3rCMXIn
🔥10👍3👏1🤯1
📗Unified Scene Text/Layout Detection📗

👉World's first hierarchical scene text dataset + novel detection method

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Unified detection & geometric layout
Hierarchical annotations in natural scenes
Word, line, & paragraph level annotations
Source under CC Attribution Share Alike 4.0

More: https://bit.ly/3jRpezV
🔥3🤯21👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🙌 #Oculus' new Hand Tracking 🙌

👉Hands are able to move as naturally and intuitively in the #metaverse as do in real life

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Hands2.0 powered by CV & ML
Tracking hand-over-hand interactions
Crossing hands, clapping, high-fives
Accurate thumbs-up gesture

More: https://bit.ly/3JXPvY2
🤯64👍2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🎗️New SOTA in #3D human avatar🎗️

👉PHORHUM: photorealistic 3D human from mono-RGB

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Pixel-aligned method for 3D geometry
Unshaded surface color + illumination
Patch-based rendering losses for visible
Plausible color estimation for non-visible

More: https://bit.ly/3MkvBrA
🤯4👍2🥰21
This media is not supported in your browser
VIEW IN TELEGRAM
📟 What's in your hands (#3D) ? 📟

👉Reconstructing hand-held objects (from single RGB) without knowing their 3D templates🤷‍♂️

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Hand is highly predictive of object shape
Conditional-based on the articulation
Visual feats. / articulation-aware coords.
Code and models available!

More: https://bit.ly/3vuYn2a
👍9🤯2🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🔋YODO: You Only Demonstrate Once🔋

👉A novel category-level manipulation learned in sim from single demonstration video🤯

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
One-shot IL, model-free 6D pose tracking
Demonstration BY single 3rd-person-view
manipulation including hi-precision tasks
Category-level Behavior Cloning
Attention for dynamic coords selection
Generalizability to novel unseen obj/env

More: https://bit.ly/3v0V4R4
🤯83👍2😱2🤩2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
👗 Dress Code for Virtual Try-On 👗

👉UniMORE (+ YOOX) unveils a novel dataset/approach for virtual try-on.

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Hi-Res paired front-view / full-body
Pixel-level Semantic-Aware Discriminator
9 SOTA VTON approaches / 3 baselines
New SOTA considering res. & garments

More: https://bit.ly/3xKXSUw
3👍3🔥1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🍃Deep Equilibrium for Optical Flow🍃

👉DEQ: converge faster, less memory, often more accurate

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Novel formulation of optical flow method
Compatible with prior modeling/data-related
Sparse fixed-point correction for stability
Code/models under GNU Affero GPL v3.0

More: https://bit.ly/3v4fZmi
👍3🥰2🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🌳Ultra High-Resolution Neural Saliency🌳

👉A novel ultra high-resolution saliency detector with dataset!

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Ultra Hi-Res Saliency Detection
5,920 pics at 4K-8K resolution
Pyramid Grafting Network
Cross-Model Grafting Module
AGL: Attention Guided Loss
Code/models under MIT

More: https://bit.ly/3MnU1Rf
6👍3🤯3🔥2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🪆StyleGAN-Human for fashion 🪆

👉A novel unconditional human generation based on StyleGAN is out!

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
200,000+ labeled sample (pose/texture)
1024x512 StyleGAN-Human StyleGAN3
512x256 StyleGAN-Human StyleGAN1
Face model for downstream: InsetGAN
Source code and model available!

More: https://bit.ly/3xMg5B2
5👍4🔥3🤯1💩1
This media is not supported in your browser
VIEW IN TELEGRAM
💀 OSSO: Skeletal Shape from Outside 💀

👉Anatomic skeleton of a person from 3D surface of body 🦴

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Max Planck + IMATI-CNR + INRIA
DXA images to obtain #3D shape
External body to internal skeleton

More: https://bit.ly/3v7Z5TQ
👍4🤯2🔥1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🎷 Pix2Seq: object detection by #Google 🎷

👉A novel framework to perform object detection as a language modeling task

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Obj. detection as a lang-modeling task
BBs/labels -> seq. of discrete token
Encoder-decoder (one token at a time)
Code under Apache License 2.0

More: https://bit.ly/3F49PX3
👍8🤯3🔥1😱1🎉1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🌹 Generalizable Neural Performer 🌹

👉General neural framework to synthesize free-viewpoint images of arbitrary human performers

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Free-viewpoint synthesis of humans
Implicit Geometric Body Embedding
Screen-Space Occlusion-Aware Blending
GeneBody: 4M frames, multi-view cams

More: https://cutt.ly/SGcnQzn
👍5🔥1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🚌 Tire-defect inspection 🚌

👉Unsupervised defects in tires using neural networks

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Impurity, same material as tire
Impurity, with different material
Damage by temp/pressure
Crack or etched material

More: https://bit.ly/37GX1JT
5👍3🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🧋#4D Neural Fields🧋

👉4D N.F. visual representations from monocular RGB-D 🤯

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
4D scene completion (occlusions)
Scene completion in cluttered scenes
Novel #AI for contextual point clouds
Data, code, models under MIT license

More: https://cutt.ly/6GveKiJ
👍6🤯2🔥1🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
👔Largest dataset of human-object 👔

👉BEHAVE by Google: largest dataset of human-object interactions

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
8 subjects, 20 objects, 5 envs.
321 clips with 4 Kinect RGB-D
Masks and segmented point clouds
3D SMPL & mesh registration
Textured scan reconstructions

More: https://bit.ly/3Lx6NNo
👏5👍4🔥21😱1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦴ENARF-GAN Neural Articulations🦴

👉Unsupervised method for 3D geometry-aware representation of articulated objects

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Novel efficient neural representation
Tri-planes deformation fields for training
Novel GAN for articulated representations
Controllable 3D from real unlabeled pic

More: https://bit.ly/3xYqedN
🤯3👍21🔥1🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🖲️ HuMMan: 4D human dataset 🖲️

👉HuMMan: 4D dataset with 1000 humans, 400k sequences & 60M frames 🤯

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
RGB, pt-clouds, keypts, SMPL, texture
Mobile device in the sensor suite
500+ actions to cover movements

More: https://bit.ly/3vTRW8Z
🥰2😱2👍1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥Neighborhood Attention Transformer 🔥

👉A novel transformer for both image classification and downstream vision tasks

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Neighborhood Attention (NA)
Neighborhood Attention Transformer, NAT
Faster training/inference, good throughput
Checkpoints, train, #CUDA kernel available

More: https://bit.ly/3F5aVSo
🤯4👍3🔥1😱1