AI with Papers - Artificial Intelligence & Deep Learning – Telegram
AI with Papers - Artificial Intelligence & Deep Learning
15.8K subscribers
146 photos
260 videos
14 files
1.36K links
All the AI with papers. Every day fresh updates about #DeepLearning #MachineLearning #LLM & #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#AI #chatGPT
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🎷 Pix2Seq: object detection by #Google 🎷

👉A novel framework to perform object detection as a language modeling task

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Obj. detection as a lang-modeling task
BBs/labels -> seq. of discrete token
Encoder-decoder (one token at a time)
Code under Apache License 2.0

More: https://bit.ly/3F49PX3
👍8🤯3🔥1😱1🎉1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🌹 Generalizable Neural Performer 🌹

👉General neural framework to synthesize free-viewpoint images of arbitrary human performers

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Free-viewpoint synthesis of humans
Implicit Geometric Body Embedding
Screen-Space Occlusion-Aware Blending
GeneBody: 4M frames, multi-view cams

More: https://cutt.ly/SGcnQzn
👍5🔥1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🚌 Tire-defect inspection 🚌

👉Unsupervised defects in tires using neural networks

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Impurity, same material as tire
Impurity, with different material
Damage by temp/pressure
Crack or etched material

More: https://bit.ly/37GX1JT
5👍3🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🧋#4D Neural Fields🧋

👉4D N.F. visual representations from monocular RGB-D 🤯

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
4D scene completion (occlusions)
Scene completion in cluttered scenes
Novel #AI for contextual point clouds
Data, code, models under MIT license

More: https://cutt.ly/6GveKiJ
👍6🤯2🔥1🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
👔Largest dataset of human-object 👔

👉BEHAVE by Google: largest dataset of human-object interactions

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
8 subjects, 20 objects, 5 envs.
321 clips with 4 Kinect RGB-D
Masks and segmented point clouds
3D SMPL & mesh registration
Textured scan reconstructions

More: https://bit.ly/3Lx6NNo
👏5👍4🔥21😱1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦴ENARF-GAN Neural Articulations🦴

👉Unsupervised method for 3D geometry-aware representation of articulated objects

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Novel efficient neural representation
Tri-planes deformation fields for training
Novel GAN for articulated representations
Controllable 3D from real unlabeled pic

More: https://bit.ly/3xYqedN
🤯3👍21🔥1🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🖲️ HuMMan: 4D human dataset 🖲️

👉HuMMan: 4D dataset with 1000 humans, 400k sequences & 60M frames 🤯

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
RGB, pt-clouds, keypts, SMPL, texture
Mobile device in the sensor suite
500+ actions to cover movements

More: https://bit.ly/3vTRW8Z
🥰2😱2👍1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥Neighborhood Attention Transformer 🔥

👉A novel transformer for both image classification and downstream vision tasks

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Neighborhood Attention (NA)
Neighborhood Attention Transformer, NAT
Faster training/inference, good throughput
Checkpoints, train, #CUDA kernel available

More: https://bit.ly/3F5aVSo
🤯4👍3🔥1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥🔥FANs: Fully Attentional Networks🔥🔥

👉#Nvidia unveils the fully attentional networks (FANs)

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Efficient fully attentional design
Semantic seg. & object detection
Model/source code soon available!

More: https://bit.ly/3vtpITs
🔥7🤯3👍21
👨🏼‍🎨 Open-Source DALL·E 2 is out 👨🏼‍🎨

👉#Pytorch implementation of DALL-E 2, #OpenAI's latest text-to-image neural net.

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
SOTA for text-to-image generation
Source code/model under MIT License
"Medieval painting of wifi not working"

More: https://bit.ly/3vzsff6
🤯14👍6😁1
This media is not supported in your browser
VIEW IN TELEGRAM
ViTPose: Transformer for Pose

👉ViTPose from ViTAE, ViT for human pose

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Plain/nonhierarchical ViT for pose
Deconv-layers after ViT for keypoints
Just the baseline is the new SOTA
Source code & models available soon!

More: https://bit.ly/3MJ0kz1
👍5🤯4🔥1🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🧳 Unsupervised HD Motion Transfer 🧳

👉Novel e2e unsupervised motion transfer for image animation

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
TPS motion estimation + Dropout
Novel E2E unsupervised motion transfer
Optical flow + multi-res. occlusion mask
Code and models under MIT license

More: https://bit.ly/3MGNPns
🔥8👍6🤯42😱2
This media is not supported in your browser
VIEW IN TELEGRAM
🚤 Neural Self-Calibration in the wild 🚤

👉 Learning algorithm to regress calibration params from in the wild clips

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Params purely from self-supervision
S.S. depth/pose learning as objective
POV, fisheye, catadioptric: no changes
SOTA results on EuRoC MAV dataset

More: https://bit.ly/3w1n6LB
👍8🤩2🔥1🥰1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🦅 ConDor: S.S. Canonicalization 🦅

👉Self-Supervised Canonicalization for full/partial 3D points cloud

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
RRC + Stanford + KAIST + Brown
On top of Tensor Field Networks (TFNs)
Unseen 3D -> equivariant canonical
Co-segmentation, NO supervision
Code and model under MIT license

More: https://bit.ly/3MNDyGa
🔥4👍1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦀 Event-aided Direct Sparse Odometry 🦀

👉EDS: direct monocular visual odometry using events/frames

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Mono 6-DOF visual odometry + events
Direct photometric bundle adjustment
Camera motion tracking by sparse pixels
A new dataset with HQ events and frame

More: https://bit.ly/3s9FiBN
🔥5👍3🤯1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🫀BlobGAN: Blob-Disentangled Scene🫀

👉Unsupervised, mid-level (blobs) generation of scenes

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Spatial, depth-ordered Gaussian blobs
Reaching for supervised level, and more
Source under BSD-2 "Simplified" License

More: https://bit.ly/3kRyGnj
🔥8👍1🥰1🤯1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🦕E2EVE editor via pre-trained artist🦕

👉E2EVE generates a new version of the source image that resembles the "driver" one

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Blending regions by driver image
E2E cond-probability of the edits
S.S. augmenting in target domain
Implemented as SOTA transformer
Code/models available (soon)

More: https://bit.ly/3P9TDYW
🤯5👍2🤩21🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🐶 Bringing pets in #metaverse 🐶

👉ARTEMIS: pipeline for generating articulated neural pets for virtual worlds

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
ARTiculated, appEarance, Mo-synthesIS
Motion control, animation & rendering
Neural-generated (NGI) animal engine
SOTA animal mocap + neural control

More: https://bit.ly/3LZSLDU
4👍2🥰2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
😍Animated hand in 1972, damn romantic😍

👉Q: is #VR the technology that developed least in the last 30 years? 🤔

More: https://bit.ly/3snxNaq
👍73🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
⏏️Ensembling models for GAN training⏏️

👉Pretrained vision models to improve the GAN training. FID by 1.5 to 2×!

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
CV models as ensemble of discriminators
Improving GAN in limited / large-scale set
10k samples matches StyleGAN2 w/ 1.6M
Source code / models under MIT license

More: https://bit.ly/3wgUVsr
🤯6🔥2
This media is not supported in your browser
VIEW IN TELEGRAM
🤯Cooperative Driving + AUTOCASTSIM🤯

👉COOPERNAUT: cross-vehicle perception for vision-based cooperative driving

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
UTexas + #Stanford + #Sony #AI
LiDAR into compact point-based
Network-augmented simulator
Source code and models available

More: https://bit.ly/3sr5HLk
🔥6🤯3🥰1