This media is not supported in your browser
VIEW IN TELEGRAM
🪴 CAVIS: SOTA Context-Aware Segmentation🪴
👉DGIST unveils the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. It's the new SOTA in several benchmarks. Source Code announced💙
👉Review https://t.ly/G5obN
👉Paper arxiv.org/pdf/2407.03010
👉Repo github.com/Seung-Hun-Lee/CAVIS
👉Project seung-hun-lee.github.io/projects/CAVIS
👉DGIST unveils the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. It's the new SOTA in several benchmarks. Source Code announced💙
👉Review https://t.ly/G5obN
👉Paper arxiv.org/pdf/2407.03010
👉Repo github.com/Seung-Hun-Lee/CAVIS
👉Project seung-hun-lee.github.io/projects/CAVIS
❤6👍5🔥4👏2
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 Segment Any 4D Gaussians 🔥
👉SA4G is a novel framework to segment anything in #4D Gaussians world. HQ segmentation within seconds in 4D Gaussians and remove, recolor, compose, and render HQ anything masks. Source Code available within August 2024💙
👉Review https://t.ly/uw3FS
👉Paper https://arxiv.org/pdf/2407.04504
👉Project https://jsxzs.github.io/sa4d/
👉Repo https://github.com/hustvl/SA4D
👉SA4G is a novel framework to segment anything in #4D Gaussians world. HQ segmentation within seconds in 4D Gaussians and remove, recolor, compose, and render HQ anything masks. Source Code available within August 2024💙
👉Review https://t.ly/uw3FS
👉Paper https://arxiv.org/pdf/2407.04504
👉Project https://jsxzs.github.io/sa4d/
👉Repo https://github.com/hustvl/SA4D
🤯5👍3❤2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🤖 CODERS: Stereo Detection, 6D & Shape 🤖
👉CODERS: one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images. Source Code announced💙
👉Review https://t.ly/Xpizz
👉Paper https://lnkd.in/dr5ZxC46
👉Project xingyoujun.github.io/coders/
👉Repo (TBA)
👉CODERS: one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images. Source Code announced💙
👉Review https://t.ly/Xpizz
👉Paper https://lnkd.in/dr5ZxC46
👉Project xingyoujun.github.io/coders/
👉Repo (TBA)
🔥12❤1👍1🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🐸 Tracking Everything via Decomposition 🐸
👉Hefei unveils a novel decoupled representation that divides static scenes and dynamic objects in terms of motion and appearance. A more robust tracking through occlusions and deformations. Source Code announced under MIT License💙
👉Review https://t.ly/OsFTO
👉Paper https://arxiv.org/pdf/2407.06531
👉Repo github.com/qianduoduolr/DecoMotion
👉Hefei unveils a novel decoupled representation that divides static scenes and dynamic objects in terms of motion and appearance. A more robust tracking through occlusions and deformations. Source Code announced under MIT License💙
👉Review https://t.ly/OsFTO
👉Paper https://arxiv.org/pdf/2407.06531
👉Repo github.com/qianduoduolr/DecoMotion
🔥9👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🍾TAPVid-3D: benchmark for TAP-3D🍾
👉#Deepmind (+College London & Oxford) introduces TAPVid-3D, a new benchmark for evaluating long-range Tracking Any Point in 3D: 4,000+ real-world videos, composed of three different data sources spanning a variety of object types, motion patterns, and indoor/outdoor environments. Data & Code available, Apache 2.0💙
👉Review https://t.ly/SsptD
👉Paper arxiv.org/pdf/2407.05921
👉Project tapvid3d.github.io/
👉Code github.com/google-deepmind/tapnet/tree/main/tapnet/tapvid3d
👉#Deepmind (+College London & Oxford) introduces TAPVid-3D, a new benchmark for evaluating long-range Tracking Any Point in 3D: 4,000+ real-world videos, composed of three different data sources spanning a variety of object types, motion patterns, and indoor/outdoor environments. Data & Code available, Apache 2.0💙
👉Review https://t.ly/SsptD
👉Paper arxiv.org/pdf/2407.05921
👉Project tapvid3d.github.io/
👉Code github.com/google-deepmind/tapnet/tree/main/tapnet/tapvid3d
🔥3👍1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 940+ FPS Multi-Person Pose Estimation 🔥
👉RTMW (Real-Time Multi-person Whole-body pose estimation models) is a series of high-perf. models for 2D/3D body pose estimation. Over 940 FPS on #GPU! Code & models 💙
👉Review https://t.ly/XkBmg
👉Paper arxiv.org/pdf/2407.08634
👉Repo github.com/open-mmlab/mmpose/tree/main/projects/rtmpose
👉RTMW (Real-Time Multi-person Whole-body pose estimation models) is a series of high-perf. models for 2D/3D body pose estimation. Over 940 FPS on #GPU! Code & models 💙
👉Review https://t.ly/XkBmg
👉Paper arxiv.org/pdf/2407.08634
👉Repo github.com/open-mmlab/mmpose/tree/main/projects/rtmpose
❤8🔥4👏1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🥥 OmniNOCS: largest 3D NOCS 🥥
👉OmniNOCS by #Google (+Georgia) is a unified NOCS (Normalized Object Coordinate Space) dataset that contains data across different domains with 90+ object classes. The largest NOCS dataset to date. Data & Code available under Apache 2.0💙
👉Review https://t.ly/xPgBn
👉Paper arxiv.org/pdf/2407.08711
👉Project https://omninocs.github.io/
👉Data github.com/google-deepmind/omninocs
👉OmniNOCS by #Google (+Georgia) is a unified NOCS (Normalized Object Coordinate Space) dataset that contains data across different domains with 90+ object classes. The largest NOCS dataset to date. Data & Code available under Apache 2.0💙
👉Review https://t.ly/xPgBn
👉Paper arxiv.org/pdf/2407.08711
👉Project https://omninocs.github.io/
👉Data github.com/google-deepmind/omninocs
🔥4❤3👏2👍1🥰1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
💌 KineTy: Typography Diffusion 💌
👉GIST introduces a novel realistic kinetic typography generation driven by text. Guided video diffusion models to achieve visually-pleasing text appearances. Repo to be released under Attribution-NC 4.0💙
👉Review https://t.ly/2FWo9
👉Paper arxiv.org/pdf/2407.10476
👉Project seonmip.github.io/kinety/
👉Repo github.com/SeonmiP/KineTy/tree/main
👉GIST introduces a novel realistic kinetic typography generation driven by text. Guided video diffusion models to achieve visually-pleasing text appearances. Repo to be released under Attribution-NC 4.0💙
👉Review https://t.ly/2FWo9
👉Paper arxiv.org/pdf/2407.10476
👉Project seonmip.github.io/kinety/
👉Repo github.com/SeonmiP/KineTy/tree/main
❤4👍1🔥1🥰1
📈Gradient Boosting Reinforcement Learning📈
👉#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code released💙
👉Review https://t.ly/zv9pl
👉Paper https://arxiv.org/pdf/2407.08250
👉Code https://github.com/NVlabs/gbrl
👉#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code released💙
👉Review https://t.ly/zv9pl
👉Paper https://arxiv.org/pdf/2407.08250
👉Code https://github.com/NVlabs/gbrl
❤7🤯4👍3🔥1🥰1
Hi folks,
I need you help 🙏
👉 Could you help me understanding what do you think about the lasting of the hiring process for #AI roles? Any comment here will be appreciated :)
Vote here: https://t.ly/UMRXH
Thanks <3
I need you help 🙏
👉 Could you help me understanding what do you think about the lasting of the hiring process for #AI roles? Any comment here will be appreciated :)
Vote here: https://t.ly/UMRXH
Thanks <3
Linkedin
#ai #artificialintelligence #machinelearning #ml #ai #deeplearning #computervision #hiring | Alessandro Ferrari
👽 ARGO Vision is gonna open new positions for #AI & research in computer vision. I'm doing my best to make the hiring process the smoother as possible. Our current process is managed by a quick tech/intro interview with me, followed by a tech/scientific/coding…
👍5
This media is not supported in your browser
VIEW IN TELEGRAM
🧿 Shape of Motion for 4D 🧿
👉 Google (+Berkeley) unveils a novel method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. Impressive tracking capabilities. Source Code released 💙
👉Review https://t.ly/d9RsA
👉Project https://shape-of-motion.github.io/
👉Paper arxiv.org/pdf/2407.13764
👉Code github.com/vye16/shape-of-motion/
👉 Google (+Berkeley) unveils a novel method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. Impressive tracking capabilities. Source Code released 💙
👉Review https://t.ly/d9RsA
👉Project https://shape-of-motion.github.io/
👉Paper arxiv.org/pdf/2407.13764
👉Code github.com/vye16/shape-of-motion/
❤5🤯4🔥2👍1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🎭 TRG: new SOTA 6DoF Head 🎭
👉ECE (Korea) unveils TRG, a novel landmark-based method for estimating a 6DoF head pose which stands out for its explicit bidirectional interaction structure. Experiments on ARKitFace & BIWI confirm it's the new SOTA. Source Code & Models to be released💙
👉Review https://t.ly/lOIRA
👉Paper https://lnkd.in/dCWEwNyF
👉Code https://lnkd.in/dzRrwKBD
👉ECE (Korea) unveils TRG, a novel landmark-based method for estimating a 6DoF head pose which stands out for its explicit bidirectional interaction structure. Experiments on ARKitFace & BIWI confirm it's the new SOTA. Source Code & Models to be released💙
👉Review https://t.ly/lOIRA
👉Paper https://lnkd.in/dCWEwNyF
👉Code https://lnkd.in/dzRrwKBD
🔥5🤯3👍1🥰1
🏆Who's the REAL SOTA tracker in the world?🏆
👉BofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code available💙
👉Review https://t.ly/WB9AR
👉Paper https://arxiv.org/pdf/2407.15707
👉Code github.com/BasitAlawode/Best_of_N_Trackers
👉BofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code available💙
👉Review https://t.ly/WB9AR
👉Paper https://arxiv.org/pdf/2407.15707
👉Code github.com/BasitAlawode/Best_of_N_Trackers
🔥5🤯5👍2❤1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🐢 TAPTRv2: new SOTA for TAP 🐢
👉TAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 coming💙
👉Review https://t.ly/H84ae
👉Paper v1 https://lnkd.in/d4vD_6xx
👉Paper v2 https://lnkd.in/dE_TUzar
👉Project https://taptr.github.io/
👉Code https://lnkd.in/dgfs9Qdy
👉TAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 coming💙
👉Review https://t.ly/H84ae
👉Paper v1 https://lnkd.in/d4vD_6xx
👉Paper v2 https://lnkd.in/dE_TUzar
👉Project https://taptr.github.io/
👉Code https://lnkd.in/dgfs9Qdy
👍6🔥3🤯3❤2😱1
🧱EAFormer: Scene Text-Segm.🧱
👉A novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on 🤗
👉Review https://t.ly/0G2uX
👉Paper arxiv.org/pdf/2407.17020
👉Project hyangyu.github.io/EAFormer/
👉Data huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
👉A novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on 🤗
👉Review https://t.ly/0G2uX
👉Paper arxiv.org/pdf/2407.17020
👉Project hyangyu.github.io/EAFormer/
👉Data huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
❤14🔥6👍1🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
👽 Keypoint Promptable Re-ID 👽
👉KPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soon💙
👉Review https://t.ly/vCXV_
👉Paper https://arxiv.org/pdf/2407.18112
👉Repo github.com/VlSomers/keypoint_promptable_reidentification
👉KPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soon💙
👉Review https://t.ly/vCXV_
👉Paper https://arxiv.org/pdf/2407.18112
👉Repo github.com/VlSomers/keypoint_promptable_reidentification
🔥6👍3🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🎁 A guide for modern CV 🎁
👉In the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.
𝐁𝐨𝐨𝐤𝐬:
✅DL with Python https://t.ly/VjaVx
✅Python OOP https://t.ly/pTQRm
V𝐢𝐝𝐞𝐨 𝐂𝐨𝐮𝐫𝐬𝐞𝐬:
✅Berkeley | Modern CV (2023) https://t.ly/AU7S3
𝐋𝐢𝐛𝐫𝐚𝐫𝐢𝐞𝐬:
✅PyTorch https://lnkd.in/dTvJbjAx
✅PyTorchLighting https://lnkd.in/dAruPA6T
✅Albumentations https://albumentations.ai/
𝐏𝐚𝐩𝐞𝐫𝐬:
✅EfficientNet https://lnkd.in/dTsT44ae
✅ViT https://lnkd.in/dB5yKdaW
✅UNet https://lnkd.in/dnpKVa6T
✅DeepLabV3+ https://lnkd.in/dVvqkmPk
✅YOLOv1: https://lnkd.in/dQ9rs53B
✅YOLOv2: arxiv.org/abs/1612.08242
✅YOLOX: https://lnkd.in/d9ZtsF7g
✅SAM: https://arxiv.org/abs/2304.02643
👉More papers and the full list: https://t.ly/WAwAk
👉In the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.
𝐁𝐨𝐨𝐤𝐬:
✅DL with Python https://t.ly/VjaVx
✅Python OOP https://t.ly/pTQRm
V𝐢𝐝𝐞𝐨 𝐂𝐨𝐮𝐫𝐬𝐞𝐬:
✅Berkeley | Modern CV (2023) https://t.ly/AU7S3
𝐋𝐢𝐛𝐫𝐚𝐫𝐢𝐞𝐬:
✅PyTorch https://lnkd.in/dTvJbjAx
✅PyTorchLighting https://lnkd.in/dAruPA6T
✅Albumentations https://albumentations.ai/
𝐏𝐚𝐩𝐞𝐫𝐬:
✅EfficientNet https://lnkd.in/dTsT44ae
✅ViT https://lnkd.in/dB5yKdaW
✅UNet https://lnkd.in/dnpKVa6T
✅DeepLabV3+ https://lnkd.in/dVvqkmPk
✅YOLOv1: https://lnkd.in/dQ9rs53B
✅YOLOv2: arxiv.org/abs/1612.08242
✅YOLOX: https://lnkd.in/d9ZtsF7g
✅SAM: https://arxiv.org/abs/2304.02643
👉More papers and the full list: https://t.ly/WAwAk
❤34👍19
This media is not supported in your browser
VIEW IN TELEGRAM
🪄 Diffusion Models for Transparency 🪄
👉MIT (+ #Google) unveils Alchemist, a novel method to control material attributes of objects like roughness, metallic, albedo & transparency in real images. Amazing work but code not announced🥺
👉Review https://t.ly/U98_G
👉Paper arxiv.org/pdf/2312.02970
👉Project www.prafullsharma.net/alchemist/
👉MIT (+ #Google) unveils Alchemist, a novel method to control material attributes of objects like roughness, metallic, albedo & transparency in real images. Amazing work but code not announced🥺
👉Review https://t.ly/U98_G
👉Paper arxiv.org/pdf/2312.02970
👉Project www.prafullsharma.net/alchemist/
🔥17👍4⚡1❤1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥🔥 SAM v2 is out! 🔥🔥
👉#Meta announced SAM 2, the novel unified model for real-time promptable segmentation in images and videos. 6x faster, it's the new SOTA by a large margin. Source Code, Dataset, Models & Demo released under permissive licenses💙
👉Review https://t.ly/oovJZ
👉Paper https://t.ly/sCxMY
👉Demo https://sam2.metademolab.com
👉Project ai.meta.com/blog/segment-anything-2/
👉Models github.com/facebookresearch/segment-anything-2
👉#Meta announced SAM 2, the novel unified model for real-time promptable segmentation in images and videos. 6x faster, it's the new SOTA by a large margin. Source Code, Dataset, Models & Demo released under permissive licenses💙
👉Review https://t.ly/oovJZ
👉Paper https://t.ly/sCxMY
👉Demo https://sam2.metademolab.com
👉Project ai.meta.com/blog/segment-anything-2/
👉Models github.com/facebookresearch/segment-anything-2
🔥27❤10🤯4👍2🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
👋 Real-time Expressive Hands 👋
👉Zhejiang unveils XHand, a novel expressive hand avatar designed to comprehensively generate hand shape, appearance, and deformations in real-time. Source Code released (Apache 2.0) the Jul. 31st, 2024💙
👉Review https://t.ly/8obbB
👉Project https://lnkd.in/dRtVGe6i
👉Paper https://lnkd.in/daCx2iB7
👉Code https://lnkd.in/dZ9pgzug
👉Zhejiang unveils XHand, a novel expressive hand avatar designed to comprehensively generate hand shape, appearance, and deformations in real-time. Source Code released (Apache 2.0) the Jul. 31st, 2024💙
👉Review https://t.ly/8obbB
👉Project https://lnkd.in/dRtVGe6i
👉Paper https://lnkd.in/daCx2iB7
👉Code https://lnkd.in/dZ9pgzug
👏6👍3❤2🤣2⚡1🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🧪 Click-Attention Segmentation 🧪
👉An interesting image patch-based click attention algorithm and an affinity loss inspired by SASFormer. This novel approach aims to decouple positive and negative clicks, guiding positive ones to focus on the target object and negative ones on the background. Code released under Apache💙
👉Review https://t.ly/tG05L
👉Paper https://arxiv.org/pdf/2408.06021
👉Code https://github.com/hahamyt/ClickAttention
👉An interesting image patch-based click attention algorithm and an affinity loss inspired by SASFormer. This novel approach aims to decouple positive and negative clicks, guiding positive ones to focus on the target object and negative ones on the background. Code released under Apache💙
👉Review https://t.ly/tG05L
👉Paper https://arxiv.org/pdf/2408.06021
👉Code https://github.com/hahamyt/ClickAttention
❤12🔥3👍2👏1🤩1