This media is not supported in your browser
VIEW IN TELEGRAM
🔥 Depth Anything v2 is out! 🔥
👉 Depth Anything V2: outperforming V1 in robustness and fine-grained details. Trained w/ 595K synthetic labels and 62M+ real unlabeled images, the new SOTA in MDE. Code & Models available💙
👉Review https://t.ly/QX9Nu
👉Paper arxiv.org/pdf/2406.09414
👉Project depth-anything-v2.github.io/
👉Repo github.com/DepthAnything/Depth-Anything-V2
👉Data huggingface.co/datasets/depth-anything/DA-2K
👉 Depth Anything V2: outperforming V1 in robustness and fine-grained details. Trained w/ 595K synthetic labels and 62M+ real unlabeled images, the new SOTA in MDE. Code & Models available💙
👉Review https://t.ly/QX9Nu
👉Paper arxiv.org/pdf/2406.09414
👉Project depth-anything-v2.github.io/
👉Repo github.com/DepthAnything/Depth-Anything-V2
👉Data huggingface.co/datasets/depth-anything/DA-2K
🔥10🤯9⚡1❤1👍1🥰1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🪅Anomaly Object-Detection🪅
👉The University of Edinburgh introduces a novel anomaly detection problem that focuses on identifying ‘odd-looking’ objects relative to the other instances within a multiple-views scene. Code announced💙
👉Review https://t.ly/3dGHp
👉Paper arxiv.org/pdf/2406.20099
👉Repo https://lnkd.in/d9x6FpUq
👉The University of Edinburgh introduces a novel anomaly detection problem that focuses on identifying ‘odd-looking’ objects relative to the other instances within a multiple-views scene. Code announced💙
👉Review https://t.ly/3dGHp
👉Paper arxiv.org/pdf/2406.20099
👉Repo https://lnkd.in/d9x6FpUq
❤10🔥6👍3👏3⚡1
This media is not supported in your browser
VIEW IN TELEGRAM
🪩 MimicMotion: HQ Motion Generation 🪩
👉#Tencent opens a novel controllable video generation framework, dubbed MimicMotion, which can generate HQ videos of arbitrary length mimicking specific motion guidance. Source Code available💙
👉Review https://t.ly/XFoin
👉Paper arxiv.org/pdf/2406.19680
👉Project https://lnkd.in/eW-CMg_C
👉Code https://lnkd.in/eZ6SC2bc
👉#Tencent opens a novel controllable video generation framework, dubbed MimicMotion, which can generate HQ videos of arbitrary length mimicking specific motion guidance. Source Code available💙
👉Review https://t.ly/XFoin
👉Paper arxiv.org/pdf/2406.19680
👉Project https://lnkd.in/eW-CMg_C
👉Code https://lnkd.in/eZ6SC2bc
🔥12🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🪴 CAVIS: SOTA Context-Aware Segmentation🪴
👉DGIST unveils the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. It's the new SOTA in several benchmarks. Source Code announced💙
👉Review https://t.ly/G5obN
👉Paper arxiv.org/pdf/2407.03010
👉Repo github.com/Seung-Hun-Lee/CAVIS
👉Project seung-hun-lee.github.io/projects/CAVIS
👉DGIST unveils the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. It's the new SOTA in several benchmarks. Source Code announced💙
👉Review https://t.ly/G5obN
👉Paper arxiv.org/pdf/2407.03010
👉Repo github.com/Seung-Hun-Lee/CAVIS
👉Project seung-hun-lee.github.io/projects/CAVIS
❤6👍5🔥4👏2
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 Segment Any 4D Gaussians 🔥
👉SA4G is a novel framework to segment anything in #4D Gaussians world. HQ segmentation within seconds in 4D Gaussians and remove, recolor, compose, and render HQ anything masks. Source Code available within August 2024💙
👉Review https://t.ly/uw3FS
👉Paper https://arxiv.org/pdf/2407.04504
👉Project https://jsxzs.github.io/sa4d/
👉Repo https://github.com/hustvl/SA4D
👉SA4G is a novel framework to segment anything in #4D Gaussians world. HQ segmentation within seconds in 4D Gaussians and remove, recolor, compose, and render HQ anything masks. Source Code available within August 2024💙
👉Review https://t.ly/uw3FS
👉Paper https://arxiv.org/pdf/2407.04504
👉Project https://jsxzs.github.io/sa4d/
👉Repo https://github.com/hustvl/SA4D
🤯5👍3❤2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🤖 CODERS: Stereo Detection, 6D & Shape 🤖
👉CODERS: one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images. Source Code announced💙
👉Review https://t.ly/Xpizz
👉Paper https://lnkd.in/dr5ZxC46
👉Project xingyoujun.github.io/coders/
👉Repo (TBA)
👉CODERS: one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images. Source Code announced💙
👉Review https://t.ly/Xpizz
👉Paper https://lnkd.in/dr5ZxC46
👉Project xingyoujun.github.io/coders/
👉Repo (TBA)
🔥12❤1👍1🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🐸 Tracking Everything via Decomposition 🐸
👉Hefei unveils a novel decoupled representation that divides static scenes and dynamic objects in terms of motion and appearance. A more robust tracking through occlusions and deformations. Source Code announced under MIT License💙
👉Review https://t.ly/OsFTO
👉Paper https://arxiv.org/pdf/2407.06531
👉Repo github.com/qianduoduolr/DecoMotion
👉Hefei unveils a novel decoupled representation that divides static scenes and dynamic objects in terms of motion and appearance. A more robust tracking through occlusions and deformations. Source Code announced under MIT License💙
👉Review https://t.ly/OsFTO
👉Paper https://arxiv.org/pdf/2407.06531
👉Repo github.com/qianduoduolr/DecoMotion
🔥9👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🍾TAPVid-3D: benchmark for TAP-3D🍾
👉#Deepmind (+College London & Oxford) introduces TAPVid-3D, a new benchmark for evaluating long-range Tracking Any Point in 3D: 4,000+ real-world videos, composed of three different data sources spanning a variety of object types, motion patterns, and indoor/outdoor environments. Data & Code available, Apache 2.0💙
👉Review https://t.ly/SsptD
👉Paper arxiv.org/pdf/2407.05921
👉Project tapvid3d.github.io/
👉Code github.com/google-deepmind/tapnet/tree/main/tapnet/tapvid3d
👉#Deepmind (+College London & Oxford) introduces TAPVid-3D, a new benchmark for evaluating long-range Tracking Any Point in 3D: 4,000+ real-world videos, composed of three different data sources spanning a variety of object types, motion patterns, and indoor/outdoor environments. Data & Code available, Apache 2.0💙
👉Review https://t.ly/SsptD
👉Paper arxiv.org/pdf/2407.05921
👉Project tapvid3d.github.io/
👉Code github.com/google-deepmind/tapnet/tree/main/tapnet/tapvid3d
🔥3👍1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 940+ FPS Multi-Person Pose Estimation 🔥
👉RTMW (Real-Time Multi-person Whole-body pose estimation models) is a series of high-perf. models for 2D/3D body pose estimation. Over 940 FPS on #GPU! Code & models 💙
👉Review https://t.ly/XkBmg
👉Paper arxiv.org/pdf/2407.08634
👉Repo github.com/open-mmlab/mmpose/tree/main/projects/rtmpose
👉RTMW (Real-Time Multi-person Whole-body pose estimation models) is a series of high-perf. models for 2D/3D body pose estimation. Over 940 FPS on #GPU! Code & models 💙
👉Review https://t.ly/XkBmg
👉Paper arxiv.org/pdf/2407.08634
👉Repo github.com/open-mmlab/mmpose/tree/main/projects/rtmpose
❤8🔥4👏1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🥥 OmniNOCS: largest 3D NOCS 🥥
👉OmniNOCS by #Google (+Georgia) is a unified NOCS (Normalized Object Coordinate Space) dataset that contains data across different domains with 90+ object classes. The largest NOCS dataset to date. Data & Code available under Apache 2.0💙
👉Review https://t.ly/xPgBn
👉Paper arxiv.org/pdf/2407.08711
👉Project https://omninocs.github.io/
👉Data github.com/google-deepmind/omninocs
👉OmniNOCS by #Google (+Georgia) is a unified NOCS (Normalized Object Coordinate Space) dataset that contains data across different domains with 90+ object classes. The largest NOCS dataset to date. Data & Code available under Apache 2.0💙
👉Review https://t.ly/xPgBn
👉Paper arxiv.org/pdf/2407.08711
👉Project https://omninocs.github.io/
👉Data github.com/google-deepmind/omninocs
🔥4❤3👏2👍1🥰1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
💌 KineTy: Typography Diffusion 💌
👉GIST introduces a novel realistic kinetic typography generation driven by text. Guided video diffusion models to achieve visually-pleasing text appearances. Repo to be released under Attribution-NC 4.0💙
👉Review https://t.ly/2FWo9
👉Paper arxiv.org/pdf/2407.10476
👉Project seonmip.github.io/kinety/
👉Repo github.com/SeonmiP/KineTy/tree/main
👉GIST introduces a novel realistic kinetic typography generation driven by text. Guided video diffusion models to achieve visually-pleasing text appearances. Repo to be released under Attribution-NC 4.0💙
👉Review https://t.ly/2FWo9
👉Paper arxiv.org/pdf/2407.10476
👉Project seonmip.github.io/kinety/
👉Repo github.com/SeonmiP/KineTy/tree/main
❤4👍1🔥1🥰1
📈Gradient Boosting Reinforcement Learning📈
👉#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code released💙
👉Review https://t.ly/zv9pl
👉Paper https://arxiv.org/pdf/2407.08250
👉Code https://github.com/NVlabs/gbrl
👉#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code released💙
👉Review https://t.ly/zv9pl
👉Paper https://arxiv.org/pdf/2407.08250
👉Code https://github.com/NVlabs/gbrl
❤7🤯4👍3🔥1🥰1
Hi folks,
I need you help 🙏
👉 Could you help me understanding what do you think about the lasting of the hiring process for #AI roles? Any comment here will be appreciated :)
Vote here: https://t.ly/UMRXH
Thanks <3
I need you help 🙏
👉 Could you help me understanding what do you think about the lasting of the hiring process for #AI roles? Any comment here will be appreciated :)
Vote here: https://t.ly/UMRXH
Thanks <3
Linkedin
#ai #artificialintelligence #machinelearning #ml #ai #deeplearning #computervision #hiring | Alessandro Ferrari
👽 ARGO Vision is gonna open new positions for #AI & research in computer vision. I'm doing my best to make the hiring process the smoother as possible. Our current process is managed by a quick tech/intro interview with me, followed by a tech/scientific/coding…
👍5
This media is not supported in your browser
VIEW IN TELEGRAM
🧿 Shape of Motion for 4D 🧿
👉 Google (+Berkeley) unveils a novel method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. Impressive tracking capabilities. Source Code released 💙
👉Review https://t.ly/d9RsA
👉Project https://shape-of-motion.github.io/
👉Paper arxiv.org/pdf/2407.13764
👉Code github.com/vye16/shape-of-motion/
👉 Google (+Berkeley) unveils a novel method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. Impressive tracking capabilities. Source Code released 💙
👉Review https://t.ly/d9RsA
👉Project https://shape-of-motion.github.io/
👉Paper arxiv.org/pdf/2407.13764
👉Code github.com/vye16/shape-of-motion/
❤5🤯4🔥2👍1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🎭 TRG: new SOTA 6DoF Head 🎭
👉ECE (Korea) unveils TRG, a novel landmark-based method for estimating a 6DoF head pose which stands out for its explicit bidirectional interaction structure. Experiments on ARKitFace & BIWI confirm it's the new SOTA. Source Code & Models to be released💙
👉Review https://t.ly/lOIRA
👉Paper https://lnkd.in/dCWEwNyF
👉Code https://lnkd.in/dzRrwKBD
👉ECE (Korea) unveils TRG, a novel landmark-based method for estimating a 6DoF head pose which stands out for its explicit bidirectional interaction structure. Experiments on ARKitFace & BIWI confirm it's the new SOTA. Source Code & Models to be released💙
👉Review https://t.ly/lOIRA
👉Paper https://lnkd.in/dCWEwNyF
👉Code https://lnkd.in/dzRrwKBD
🔥5🤯3👍1🥰1
🏆Who's the REAL SOTA tracker in the world?🏆
👉BofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code available💙
👉Review https://t.ly/WB9AR
👉Paper https://arxiv.org/pdf/2407.15707
👉Code github.com/BasitAlawode/Best_of_N_Trackers
👉BofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code available💙
👉Review https://t.ly/WB9AR
👉Paper https://arxiv.org/pdf/2407.15707
👉Code github.com/BasitAlawode/Best_of_N_Trackers
🔥5🤯5👍2❤1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🐢 TAPTRv2: new SOTA for TAP 🐢
👉TAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 coming💙
👉Review https://t.ly/H84ae
👉Paper v1 https://lnkd.in/d4vD_6xx
👉Paper v2 https://lnkd.in/dE_TUzar
👉Project https://taptr.github.io/
👉Code https://lnkd.in/dgfs9Qdy
👉TAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 coming💙
👉Review https://t.ly/H84ae
👉Paper v1 https://lnkd.in/d4vD_6xx
👉Paper v2 https://lnkd.in/dE_TUzar
👉Project https://taptr.github.io/
👉Code https://lnkd.in/dgfs9Qdy
👍6🔥3🤯3❤2😱1
🧱EAFormer: Scene Text-Segm.🧱
👉A novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on 🤗
👉Review https://t.ly/0G2uX
👉Paper arxiv.org/pdf/2407.17020
👉Project hyangyu.github.io/EAFormer/
👉Data huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
👉A novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on 🤗
👉Review https://t.ly/0G2uX
👉Paper arxiv.org/pdf/2407.17020
👉Project hyangyu.github.io/EAFormer/
👉Data huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
❤14🔥6👍1🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
👽 Keypoint Promptable Re-ID 👽
👉KPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soon💙
👉Review https://t.ly/vCXV_
👉Paper https://arxiv.org/pdf/2407.18112
👉Repo github.com/VlSomers/keypoint_promptable_reidentification
👉KPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soon💙
👉Review https://t.ly/vCXV_
👉Paper https://arxiv.org/pdf/2407.18112
👉Repo github.com/VlSomers/keypoint_promptable_reidentification
🔥6👍3🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🎁 A guide for modern CV 🎁
👉In the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.
𝐁𝐨𝐨𝐤𝐬:
✅DL with Python https://t.ly/VjaVx
✅Python OOP https://t.ly/pTQRm
V𝐢𝐝𝐞𝐨 𝐂𝐨𝐮𝐫𝐬𝐞𝐬:
✅Berkeley | Modern CV (2023) https://t.ly/AU7S3
𝐋𝐢𝐛𝐫𝐚𝐫𝐢𝐞𝐬:
✅PyTorch https://lnkd.in/dTvJbjAx
✅PyTorchLighting https://lnkd.in/dAruPA6T
✅Albumentations https://albumentations.ai/
𝐏𝐚𝐩𝐞𝐫𝐬:
✅EfficientNet https://lnkd.in/dTsT44ae
✅ViT https://lnkd.in/dB5yKdaW
✅UNet https://lnkd.in/dnpKVa6T
✅DeepLabV3+ https://lnkd.in/dVvqkmPk
✅YOLOv1: https://lnkd.in/dQ9rs53B
✅YOLOv2: arxiv.org/abs/1612.08242
✅YOLOX: https://lnkd.in/d9ZtsF7g
✅SAM: https://arxiv.org/abs/2304.02643
👉More papers and the full list: https://t.ly/WAwAk
👉In the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.
𝐁𝐨𝐨𝐤𝐬:
✅DL with Python https://t.ly/VjaVx
✅Python OOP https://t.ly/pTQRm
V𝐢𝐝𝐞𝐨 𝐂𝐨𝐮𝐫𝐬𝐞𝐬:
✅Berkeley | Modern CV (2023) https://t.ly/AU7S3
𝐋𝐢𝐛𝐫𝐚𝐫𝐢𝐞𝐬:
✅PyTorch https://lnkd.in/dTvJbjAx
✅PyTorchLighting https://lnkd.in/dAruPA6T
✅Albumentations https://albumentations.ai/
𝐏𝐚𝐩𝐞𝐫𝐬:
✅EfficientNet https://lnkd.in/dTsT44ae
✅ViT https://lnkd.in/dB5yKdaW
✅UNet https://lnkd.in/dnpKVa6T
✅DeepLabV3+ https://lnkd.in/dVvqkmPk
✅YOLOv1: https://lnkd.in/dQ9rs53B
✅YOLOv2: arxiv.org/abs/1612.08242
✅YOLOX: https://lnkd.in/d9ZtsF7g
✅SAM: https://arxiv.org/abs/2304.02643
👉More papers and the full list: https://t.ly/WAwAk
❤34👍19
This media is not supported in your browser
VIEW IN TELEGRAM
🪄 Diffusion Models for Transparency 🪄
👉MIT (+ #Google) unveils Alchemist, a novel method to control material attributes of objects like roughness, metallic, albedo & transparency in real images. Amazing work but code not announced🥺
👉Review https://t.ly/U98_G
👉Paper arxiv.org/pdf/2312.02970
👉Project www.prafullsharma.net/alchemist/
👉MIT (+ #Google) unveils Alchemist, a novel method to control material attributes of objects like roughness, metallic, albedo & transparency in real images. Amazing work but code not announced🥺
👉Review https://t.ly/U98_G
👉Paper arxiv.org/pdf/2312.02970
👉Project www.prafullsharma.net/alchemist/
🔥17👍4⚡1❤1🤯1