This media is not supported in your browser
VIEW IN TELEGRAM
🎁 A guide for modern CV 🎁
👉In the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.
𝐁𝐨𝐨𝐤𝐬:
✅DL with Python https://t.ly/VjaVx
✅Python OOP https://t.ly/pTQRm
V𝐢𝐝𝐞𝐨 𝐂𝐨𝐮𝐫𝐬𝐞𝐬:
✅Berkeley | Modern CV (2023) https://t.ly/AU7S3
𝐋𝐢𝐛𝐫𝐚𝐫𝐢𝐞𝐬:
✅PyTorch https://lnkd.in/dTvJbjAx
✅PyTorchLighting https://lnkd.in/dAruPA6T
✅Albumentations https://albumentations.ai/
𝐏𝐚𝐩𝐞𝐫𝐬:
✅EfficientNet https://lnkd.in/dTsT44ae
✅ViT https://lnkd.in/dB5yKdaW
✅UNet https://lnkd.in/dnpKVa6T
✅DeepLabV3+ https://lnkd.in/dVvqkmPk
✅YOLOv1: https://lnkd.in/dQ9rs53B
✅YOLOv2: arxiv.org/abs/1612.08242
✅YOLOX: https://lnkd.in/d9ZtsF7g
✅SAM: https://arxiv.org/abs/2304.02643
👉More papers and the full list: https://t.ly/WAwAk
👉In the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.
𝐁𝐨𝐨𝐤𝐬:
✅DL with Python https://t.ly/VjaVx
✅Python OOP https://t.ly/pTQRm
V𝐢𝐝𝐞𝐨 𝐂𝐨𝐮𝐫𝐬𝐞𝐬:
✅Berkeley | Modern CV (2023) https://t.ly/AU7S3
𝐋𝐢𝐛𝐫𝐚𝐫𝐢𝐞𝐬:
✅PyTorch https://lnkd.in/dTvJbjAx
✅PyTorchLighting https://lnkd.in/dAruPA6T
✅Albumentations https://albumentations.ai/
𝐏𝐚𝐩𝐞𝐫𝐬:
✅EfficientNet https://lnkd.in/dTsT44ae
✅ViT https://lnkd.in/dB5yKdaW
✅UNet https://lnkd.in/dnpKVa6T
✅DeepLabV3+ https://lnkd.in/dVvqkmPk
✅YOLOv1: https://lnkd.in/dQ9rs53B
✅YOLOv2: arxiv.org/abs/1612.08242
✅YOLOX: https://lnkd.in/d9ZtsF7g
✅SAM: https://arxiv.org/abs/2304.02643
👉More papers and the full list: https://t.ly/WAwAk
❤34👍19
This media is not supported in your browser
VIEW IN TELEGRAM
🪄 Diffusion Models for Transparency 🪄
👉MIT (+ #Google) unveils Alchemist, a novel method to control material attributes of objects like roughness, metallic, albedo & transparency in real images. Amazing work but code not announced🥺
👉Review https://t.ly/U98_G
👉Paper arxiv.org/pdf/2312.02970
👉Project www.prafullsharma.net/alchemist/
👉MIT (+ #Google) unveils Alchemist, a novel method to control material attributes of objects like roughness, metallic, albedo & transparency in real images. Amazing work but code not announced🥺
👉Review https://t.ly/U98_G
👉Paper arxiv.org/pdf/2312.02970
👉Project www.prafullsharma.net/alchemist/
🔥17👍4⚡1❤1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥🔥 SAM v2 is out! 🔥🔥
👉#Meta announced SAM 2, the novel unified model for real-time promptable segmentation in images and videos. 6x faster, it's the new SOTA by a large margin. Source Code, Dataset, Models & Demo released under permissive licenses💙
👉Review https://t.ly/oovJZ
👉Paper https://t.ly/sCxMY
👉Demo https://sam2.metademolab.com
👉Project ai.meta.com/blog/segment-anything-2/
👉Models github.com/facebookresearch/segment-anything-2
👉#Meta announced SAM 2, the novel unified model for real-time promptable segmentation in images and videos. 6x faster, it's the new SOTA by a large margin. Source Code, Dataset, Models & Demo released under permissive licenses💙
👉Review https://t.ly/oovJZ
👉Paper https://t.ly/sCxMY
👉Demo https://sam2.metademolab.com
👉Project ai.meta.com/blog/segment-anything-2/
👉Models github.com/facebookresearch/segment-anything-2
🔥27❤10🤯4👍2🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
👋 Real-time Expressive Hands 👋
👉Zhejiang unveils XHand, a novel expressive hand avatar designed to comprehensively generate hand shape, appearance, and deformations in real-time. Source Code released (Apache 2.0) the Jul. 31st, 2024💙
👉Review https://t.ly/8obbB
👉Project https://lnkd.in/dRtVGe6i
👉Paper https://lnkd.in/daCx2iB7
👉Code https://lnkd.in/dZ9pgzug
👉Zhejiang unveils XHand, a novel expressive hand avatar designed to comprehensively generate hand shape, appearance, and deformations in real-time. Source Code released (Apache 2.0) the Jul. 31st, 2024💙
👉Review https://t.ly/8obbB
👉Project https://lnkd.in/dRtVGe6i
👉Paper https://lnkd.in/daCx2iB7
👉Code https://lnkd.in/dZ9pgzug
👏6👍3❤2🤣2⚡1🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🧪 Click-Attention Segmentation 🧪
👉An interesting image patch-based click attention algorithm and an affinity loss inspired by SASFormer. This novel approach aims to decouple positive and negative clicks, guiding positive ones to focus on the target object and negative ones on the background. Code released under Apache💙
👉Review https://t.ly/tG05L
👉Paper https://arxiv.org/pdf/2408.06021
👉Code https://github.com/hahamyt/ClickAttention
👉An interesting image patch-based click attention algorithm and an affinity loss inspired by SASFormer. This novel approach aims to decouple positive and negative clicks, guiding positive ones to focus on the target object and negative ones on the background. Code released under Apache💙
👉Review https://t.ly/tG05L
👉Paper https://arxiv.org/pdf/2408.06021
👉Code https://github.com/hahamyt/ClickAttention
❤12🔥3👍2👏1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🏗️ #Adobe Instant TurboEdit 🏗️
👉Adobe unveils a novel real-time text-based disentangled real image editing method built upon 4-step SDXL Turbo. SOTA HQ image editing using ultra fast few-step diffusion. No code announced but easy to guess it will be released in commercial tools.
👉Review https://t.ly/Na7-y
👉Paper https://lnkd.in/dVs9RcCK
👉Project https://lnkd.in/dGCqwh9Z
👉Code 😢
👉Adobe unveils a novel real-time text-based disentangled real image editing method built upon 4-step SDXL Turbo. SOTA HQ image editing using ultra fast few-step diffusion. No code announced but easy to guess it will be released in commercial tools.
👉Review https://t.ly/Na7-y
👉Paper https://lnkd.in/dVs9RcCK
👉Project https://lnkd.in/dGCqwh9Z
👉Code 😢
🔥14👍4🥰2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦓 Zebra Detection & Pose 🦓
👉The first synthetic dataset that can be used for both detection and 2D pose estimation of zebras without applying any bridging strategies. Code, results, models, and the synthetic, training/validation data, including 104K manually labeled images open-sourced💙
👉Review https://t.ly/HTEZZ
👉Paper https://lnkd.in/dQYT-fyq
👉Project https://lnkd.in/dAnNXgG3
👉Code https://lnkd.in/dhvU97xD
👉The first synthetic dataset that can be used for both detection and 2D pose estimation of zebras without applying any bridging strategies. Code, results, models, and the synthetic, training/validation data, including 104K manually labeled images open-sourced💙
👉Review https://t.ly/HTEZZ
👉Paper https://lnkd.in/dQYT-fyq
👉Project https://lnkd.in/dAnNXgG3
👉Code https://lnkd.in/dhvU97xD
👏7👍3❤1🔥1🥰1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🦧Sapiens: SOTA ViTs for human🦧
👉META unveils Sapiens, a family of models for human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Source Code announced, coming💙
👉Review https://t.ly/GKQI0
👉Paper arxiv.org/pdf/2408.12569
👉Project rawalkhirodkar.github.io/sapiens
👉Code github.com/facebookresearch/sapiens
👉META unveils Sapiens, a family of models for human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Source Code announced, coming💙
👉Review https://t.ly/GKQI0
👉Paper arxiv.org/pdf/2408.12569
👉Project rawalkhirodkar.github.io/sapiens
👉Code github.com/facebookresearch/sapiens
🔥19❤7🥰2👍1🤯1
AI with Papers - Artificial Intelligence & Deep Learning
🦧Sapiens: SOTA ViTs for human🦧 👉META unveils Sapiens, a family of models for human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Source Code announced, coming💙 👉Review https://t.ly/GKQI0…
🔥🔥🔥🔥🔥 SOURCE CODE IS OUT !!! 🔥🔥🔥🔥🔥
Thanks Danny for the info 🥇
Thanks Danny for the info 🥇
👍11🔥4😍4❤3😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🐺 Diffusion Game Engine 🐺
👉#Google unveils GameNGen: the first game engine powered entirely by a neural #AI that enables real-time interaction with a complex environment over long trajectories at HQ. No code announced but I love it 💙
👉Review https://t.ly/_WR5z
👉Paper https://lnkd.in/dZqgiqb9
👉Project https://lnkd.in/dJUd2Fr6
👉#Google unveils GameNGen: the first game engine powered entirely by a neural #AI that enables real-time interaction with a complex environment over long trajectories at HQ. No code announced but I love it 💙
👉Review https://t.ly/_WR5z
👉Paper https://lnkd.in/dZqgiqb9
👉Project https://lnkd.in/dJUd2Fr6
🔥10👍5❤2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🫒 Omni Urban Scene Reconstruction 🫒
👉OmniRe is novel holistic approach for efficiently reconstructing HD dynamic urban scenes from on-device logs. It's able to create the simulation of reconstructed scenarios with actors in real-time (~60 Hz). Code released💙
👉Review https://t.ly/SXVPa
👉Paper arxiv.org/pdf/2408.16760
👉Project ziyc.github.io/omnire/
👉Code github.com/ziyc/drivestudio
👉OmniRe is novel holistic approach for efficiently reconstructing HD dynamic urban scenes from on-device logs. It's able to create the simulation of reconstructed scenarios with actors in real-time (~60 Hz). Code released💙
👉Review https://t.ly/SXVPa
👉Paper arxiv.org/pdf/2408.16760
👉Project ziyc.github.io/omnire/
👉Code github.com/ziyc/drivestudio
🔥10👍9❤3🤯1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
💄Interactive Drag-based Editing💄
👉CSE unveils InstantDrag: novel pipeline designed to enhance editing interactivity and speed, taking only an image and a drag instruction as input. Source Code announced, coming💙
👉Review https://t.ly/hy6SL
👉Paper arxiv.org/pdf/2409.08857
👉Project joonghyuk.com/instantdrag-web/
👉Code github.com/alex4727/InstantDrag
👉CSE unveils InstantDrag: novel pipeline designed to enhance editing interactivity and speed, taking only an image and a drag instruction as input. Source Code announced, coming💙
👉Review https://t.ly/hy6SL
👉Paper arxiv.org/pdf/2409.08857
👉Project joonghyuk.com/instantdrag-web/
👉Code github.com/alex4727/InstantDrag
🔥13👍3😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🌭Hand-Object interaction Pretraining🌭
👉Berkeley unveils HOP, a novel approach to learn general robot manipulation priors from 3D hand-object interaction trajectories.
👉Review https://t.ly/FLqvJ
👉Paper https://arxiv.org/pdf/2409.08273
👉Project https://hgaurav2k.github.io/hop/
👉Berkeley unveils HOP, a novel approach to learn general robot manipulation priors from 3D hand-object interaction trajectories.
👉Review https://t.ly/FLqvJ
👉Paper https://arxiv.org/pdf/2409.08273
👉Project https://hgaurav2k.github.io/hop/
🥰3❤1👍1🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🧸Motion Instruction Fine-Tuning🧸
👉MotIF is a novel method that fine-tunes pre-trained VLMs to equip the capability to distinguish nuanced robotic motions with different shapes and semantic groundings. A work by MIT, Stanford, and CMU. Source Code announced, coming💙
👉Review https://t.ly/iJ2UY
👉Paper https://arxiv.org/pdf/2409.10683
👉Project https://motif-1k.github.io/
👉Code coming
👉MotIF is a novel method that fine-tunes pre-trained VLMs to equip the capability to distinguish nuanced robotic motions with different shapes and semantic groundings. A work by MIT, Stanford, and CMU. Source Code announced, coming💙
👉Review https://t.ly/iJ2UY
👉Paper https://arxiv.org/pdf/2409.10683
👉Project https://motif-1k.github.io/
👉Code coming
👍1🔥1🤯1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
⚽ SoccerNet 2024 Results ⚽
👉SoccerNet is the annual video understanding challenge for football. These challenges aim to advance research across multiple themes in football. The 2024 results are out!
👉Review https://t.ly/DUPgx
👉Paper arxiv.org/pdf/2409.10587
👉Repo github.com/SoccerNet
👉Project www.soccer-net.org/
👉SoccerNet is the annual video understanding challenge for football. These challenges aim to advance research across multiple themes in football. The 2024 results are out!
👉Review https://t.ly/DUPgx
👉Paper arxiv.org/pdf/2409.10587
👉Repo github.com/SoccerNet
👉Project www.soccer-net.org/
🔥12👍6🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🌏 JoyHallo: Mandarin Digital Human 🌏
👉JD Health faced the challenges of audio-driven video generation in Mandarin, a task complicated by the language’s intricate lip movements and the scarcity of HQ datasets. Impressive results (-> audio ON). Code Models available💙
👉Review https://t.ly/5NGDh
👉Paper arxiv.org/pdf/2409.13268
👉Project jdh-algo.github.io/JoyHallo/
👉Code github.com/jdh-algo/JoyHallo
👉JD Health faced the challenges of audio-driven video generation in Mandarin, a task complicated by the language’s intricate lip movements and the scarcity of HQ datasets. Impressive results (-> audio ON). Code Models available💙
👉Review https://t.ly/5NGDh
👉Paper arxiv.org/pdf/2409.13268
👉Project jdh-algo.github.io/JoyHallo/
👉Code github.com/jdh-algo/JoyHallo
🔥9👍1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🎢 Robo-quadruped Parkour🎢
👉LAAS-CNRS unveils a novel RL approach to perform agile skills that are reminiscent of parkour, such as walking, climbing high steps, leaping over gaps, and crawling under obstacles. Data and Code available💙
👉Review https://t.ly/-6VRm
👉Paper arxiv.org/pdf/2409.13678
👉Project gepetto.github.io/SoloParkour/
👉Code github.com/Gepetto/SoloParkour
👉LAAS-CNRS unveils a novel RL approach to perform agile skills that are reminiscent of parkour, such as walking, climbing high steps, leaping over gaps, and crawling under obstacles. Data and Code available💙
👉Review https://t.ly/-6VRm
👉Paper arxiv.org/pdf/2409.13678
👉Project gepetto.github.io/SoloParkour/
👉Code github.com/Gepetto/SoloParkour
🔥5👍2👏1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🩰 Dressed Humans in the wild 🩰
👉ETH (+ #Microsoft ) ReLoo: novel 3D-HQ reconstruction of humans dressed in loose garments from mono in-the-wild clips. No prior assumptions about the garments. Source Code announced, coming 💙
👉Review https://t.ly/evgmN
👉Paper arxiv.org/pdf/2409.15269
👉Project moygcc.github.io/ReLoo/
👉Code github.com/eth-ait/ReLoo
👉ETH (+ #Microsoft ) ReLoo: novel 3D-HQ reconstruction of humans dressed in loose garments from mono in-the-wild clips. No prior assumptions about the garments. Source Code announced, coming 💙
👉Review https://t.ly/evgmN
👉Paper arxiv.org/pdf/2409.15269
👉Project moygcc.github.io/ReLoo/
👉Code github.com/eth-ait/ReLoo
🤯9❤2👍1🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🌾 New SOTA Edge Detection 🌾
👉CUP (+ ESPOCH) unveils the new SOTA for Edge Detection (NBED); superior performance consistently across multiple benchmarks, even compared with huge computational cost and complex training models. Source Code released💙
👉Review https://t.ly/zUMcS
👉Paper arxiv.org/pdf/2409.14976
👉Code github.com/Li-yachuan/NBED
👉CUP (+ ESPOCH) unveils the new SOTA for Edge Detection (NBED); superior performance consistently across multiple benchmarks, even compared with huge computational cost and complex training models. Source Code released💙
👉Review https://t.ly/zUMcS
👉Paper arxiv.org/pdf/2409.14976
👉Code github.com/Li-yachuan/NBED
🔥11👍5👏1
This media is not supported in your browser
VIEW IN TELEGRAM
👩🦰 SOTA Gaussian Haircut 👩🦰
👉ETH et. al unveils Gaussian Haircut, the new SOTA in hair reconstruction via dual representation (classic + 3D Gaussian). Code and Model announced💙
👉Review https://t.ly/aiOjq
👉Paper arxiv.org/pdf/2409.14778
👉Project https://lnkd.in/dFRm2ycb
👉Repo https://lnkd.in/d5NWNkb5
👉ETH et. al unveils Gaussian Haircut, the new SOTA in hair reconstruction via dual representation (classic + 3D Gaussian). Code and Model announced💙
👉Review https://t.ly/aiOjq
👉Paper arxiv.org/pdf/2409.14778
👉Project https://lnkd.in/dFRm2ycb
👉Repo https://lnkd.in/d5NWNkb5
🔥16👍2❤1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🍇SPARK: Real-time Face Capture🍇
👉Technicolor Group unveils SPARK, a novel high-precision 3D face capture via collection of unconstrained videos of a subject as prior information. New SOTA able to handle unseen pose, expression and lighting. Impressive results. Code & Model announced💙
👉Review https://t.ly/rZOgp
👉Paper arxiv.org/pdf/2409.07984
👉Project kelianb.github.io/SPARK/
👉Repo github.com/KelianB/SPARK/
👉Technicolor Group unveils SPARK, a novel high-precision 3D face capture via collection of unconstrained videos of a subject as prior information. New SOTA able to handle unseen pose, expression and lighting. Impressive results. Code & Model announced💙
👉Review https://t.ly/rZOgp
👉Paper arxiv.org/pdf/2409.07984
👉Project kelianb.github.io/SPARK/
👉Repo github.com/KelianB/SPARK/
🔥10❤2👏1💩1