This media is not supported in your browser
VIEW IN TELEGRAM
🧤World-Space Ego 3D Hands🧤
👉The Imperial College unveils HaWoR, a novel world-space 3D hand motion estimation for egocentric videos. The new SOTA on both cam pose estimation & hand motion reconstruction. Code under Attribution-NC-ND 4.0 Int.💙
👉Review https://t.ly/ozJn7
👉Paper arxiv.org/pdf/2501.02973
👉Project hawor-project.github.io/
👉Code github.com/ThunderVVV/HaWoR
👉The Imperial College unveils HaWoR, a novel world-space 3D hand motion estimation for egocentric videos. The new SOTA on both cam pose estimation & hand motion reconstruction. Code under Attribution-NC-ND 4.0 Int.💙
👉Review https://t.ly/ozJn7
👉Paper arxiv.org/pdf/2501.02973
👉Project hawor-project.github.io/
👉Code github.com/ThunderVVV/HaWoR
🔥4😢1🤩1
🔥 "Nuclear" AI vs. Hyper-Cheap Inference 🔥
⭐ What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
⭐ What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
Anonymous Poll
24%
🤲Portabile Training Workstation
34%
⚛️Nuclear energy for AI training
33%
🖲️Cheaper Only-inference devices
9%
💰Cloud-intensive Only-inference
👍4❤1🔥1🤯1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
⚽ FIFA 3D Human Pose ⚽
👉#FIFA WorldPose is a novel dataset for multi-person global pose estimation in the wild, featuring footage from the 2022 World Cup. 2.5M+ annotation, released 💙
👉Review https://t.ly/kvGVQ
👉Paper arxiv.org/pdf/2501.02771
👉Project https://lnkd.in/d5hFWpY2
👉Dataset https://lnkd.in/dAphJ9WA
👉#FIFA WorldPose is a novel dataset for multi-person global pose estimation in the wild, featuring footage from the 2022 World Cup. 2.5M+ annotation, released 💙
👉Review https://t.ly/kvGVQ
👉Paper arxiv.org/pdf/2501.02771
👉Project https://lnkd.in/d5hFWpY2
👉Dataset https://lnkd.in/dAphJ9WA
🤩7❤6🤯3👏1💩1😍1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 Depth Any Camera (SOTA) 🔥
👉DAC is a novel and powerful zero-shot metric depth estimation framework that extends a perspective-trained model to effectively handle cams with varying FoVs (including large fisheye & 360◦). Code announced (not available yet)💙
👉Review https://t.ly/1qz4F
👉Paper arxiv.org/pdf/2501.02464
👉Project yuliangguo.github.io/depth-any-camera/
👉Repo github.com/yuliangguo/depth_any_camera
👉DAC is a novel and powerful zero-shot metric depth estimation framework that extends a perspective-trained model to effectively handle cams with varying FoVs (including large fisheye & 360◦). Code announced (not available yet)💙
👉Review https://t.ly/1qz4F
👉Paper arxiv.org/pdf/2501.02464
👉Project yuliangguo.github.io/depth-any-camera/
👉Repo github.com/yuliangguo/depth_any_camera
👍12🔥5🤩4❤2😍1
This media is not supported in your browser
VIEW IN TELEGRAM
❤️🔥 Uncommon object in #3D ❤️🔥
👉#META releases uCO3D, a new object-centric dataset for 3D AI. The largest publicly-available collection of HD videos of objects with 3D annotations that ensures full-360◦ coverage. Code & data under CCA 4.0💙
👉Review https://t.ly/Z_tvA
👉Paper https://arxiv.org/pdf/2501.07574
👉Project https://uco3d.github.io/
👉Repo github.com/facebookresearch/uco3d
👉#META releases uCO3D, a new object-centric dataset for 3D AI. The largest publicly-available collection of HD videos of objects with 3D annotations that ensures full-360◦ coverage. Code & data under CCA 4.0💙
👉Review https://t.ly/Z_tvA
👉Paper https://arxiv.org/pdf/2501.07574
👉Project https://uco3d.github.io/
👉Repo github.com/facebookresearch/uco3d
❤11⚡2😍2👍1👏1🤩1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🏆Universal Detector-Free Match🏆
👉MatchAnything: novel detector-free universal matcher across unseen real-world single/cross-modality domains. Same weights for everything. Code announced, to be released 💙
👉Review https://t.ly/sx92L
👉Paper https://lnkd.in/dWwRwGyY
👉Project https://lnkd.in/dCwb2Yte
👉Repo https://lnkd.in/dnUXYzQ5
👉MatchAnything: novel detector-free universal matcher across unseen real-world single/cross-modality domains. Same weights for everything. Code announced, to be released 💙
👉Review https://t.ly/sx92L
👉Paper https://lnkd.in/dWwRwGyY
👉Project https://lnkd.in/dCwb2Yte
👉Repo https://lnkd.in/dnUXYzQ5
❤8🤯7🔥4👏3⚡1🤩1😍1🍾1
🆘 Help: Looking for Outstanding Speakers 🆘
👉Who would you suggest as a speaker for your ideal conference on AI (CV, LLM, RAG, ML, HW Optimization, AI & Space, etc.)? Only “hardcore” technical talks, no commercial at all. Please comment here with name, topic and affiliation (es: Paul Gascoigne, Computer Vision & Football, Scotland Team).
⭐Guaranteed tickets & more for the suggestions that will become invited speakers ;)
👉Who would you suggest as a speaker for your ideal conference on AI (CV, LLM, RAG, ML, HW Optimization, AI & Space, etc.)? Only “hardcore” technical talks, no commercial at all. Please comment here with name, topic and affiliation (es: Paul Gascoigne, Computer Vision & Football, Scotland Team).
⭐Guaranteed tickets & more for the suggestions that will become invited speakers ;)
❤5🔥4👍3
This media is not supported in your browser
VIEW IN TELEGRAM
🧞♂️Omni-RGPT: SOTA MLLM Understanding🧞♂️
👉 #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.
👉Review https://t.ly/KHnQ7
👉Paper arxiv.org/pdf/2501.08326
👉Project miranheo.github.io/omni-rgpt/
👉Repo TBA soon
👉 #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.
👉Review https://t.ly/KHnQ7
👉Paper arxiv.org/pdf/2501.08326
👉Project miranheo.github.io/omni-rgpt/
👉Repo TBA soon
🔥10❤3🍾2⚡1👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 GAGA: Group Any Gaussians 🔥
👉GAGA is a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Code available, recently updated💙
👉Review https://t.ly/Nk_jT
👉Paper www.gaga.gallery/static/pdf/Gaga.pdf
👉Project www.gaga.gallery/
👉Repo github.com/weijielyu/Gaga
👉GAGA is a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Code available, recently updated💙
👉Review https://t.ly/Nk_jT
👉Paper www.gaga.gallery/static/pdf/Gaga.pdf
👉Project www.gaga.gallery/
👉Repo github.com/weijielyu/Gaga
🔥11❤3👍2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🎁Free Book: LLM Foundations🎁
👉A fully free book just released on arXiv to outline the basic concepts of #LLMs and related techniques with a focus on the foundational aspects.
✅Chapter 1: basics of pre-training
✅Chapter 2: gen-models & LLMs
✅Chapter 3: prompting methods
✅Chapter 4: alignment methods
👉If you have any background in ML, along with a certain understanding of stuff like Transformers, this book will be "smooth". However, even without this prior knowledge, it is still perfectly fine because the contents of each chapter are self-contained.
👉Review https://t.ly/9LGCa
👉Book https://lnkd.in/d3VkswZf
👉A fully free book just released on arXiv to outline the basic concepts of #LLMs and related techniques with a focus on the foundational aspects.
✅Chapter 1: basics of pre-training
✅Chapter 2: gen-models & LLMs
✅Chapter 3: prompting methods
✅Chapter 4: alignment methods
👉If you have any background in ML, along with a certain understanding of stuff like Transformers, this book will be "smooth". However, even without this prior knowledge, it is still perfectly fine because the contents of each chapter are self-contained.
👉Review https://t.ly/9LGCa
👉Book https://lnkd.in/d3VkswZf
❤17🔥6👏3😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🏄♀️ GSTAR: Gaussian Surface Tracking 🏄♀️
👉ETH Zurich unveils GSTAR, a novel framework for photo-realistic rendering, surface reconstruction, and 3D tracking for dynamic scenes while handling topology changes. Code announced💙
👉Review https://t.ly/udpMq
👉Paper arxiv.org/pdf/2501.10283
👉Project chengwei-zheng.github.io/GSTAR/
👉Repo TBA
👉ETH Zurich unveils GSTAR, a novel framework for photo-realistic rendering, surface reconstruction, and 3D tracking for dynamic scenes while handling topology changes. Code announced💙
👉Review https://t.ly/udpMq
👉Paper arxiv.org/pdf/2501.10283
👉Project chengwei-zheng.github.io/GSTAR/
👉Repo TBA
🔥8🤩3👍2😍2❤1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🧽 Diffusion Video Inpainting 🧽
👉#Alibaba unveils a technical report about DiffuEraser, a video inpainting model based on stable diffusion, designed to fill masked regions with greater details and more coherent structures. Code & weights released under Apache💙
👉Review https://t.ly/7rEll
👉Paper arxiv.org/pdf/2501.10018
👉Project lixiaowen-xw.github.io/DiffuEraser-page/
👉Repo github.com/lixiaowen-xw/DiffuEraser
👉#Alibaba unveils a technical report about DiffuEraser, a video inpainting model based on stable diffusion, designed to fill masked regions with greater details and more coherent structures. Code & weights released under Apache💙
👉Review https://t.ly/7rEll
👉Paper arxiv.org/pdf/2501.10018
👉Project lixiaowen-xw.github.io/DiffuEraser-page/
👉Repo github.com/lixiaowen-xw/DiffuEraser
🔥14❤3👍2⚡1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈 #Nvidia Foundation ZS-Stereo 🌈
👉Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be released💙
👉Review https://t.ly/rfBr5
👉Paper arxiv.org/pdf/2501.09898
👉Project nvlabs.github.io/FoundationStereo/
👉Repo github.com/NVlabs/FoundationStereo/tree/master
👉Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be released💙
👉Review https://t.ly/rfBr5
👉Paper arxiv.org/pdf/2501.09898
👉Project nvlabs.github.io/FoundationStereo/
👉Repo github.com/NVlabs/FoundationStereo/tree/master
❤6🔥6🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 [SOTA] Long-Video Depth Anything 🔥
👉ByteDance unveils Video Depth Anything: HQ, consistent depth estimation in SUPER-long videos (over several minutes) without sacrificing efficiency. Based on Depth Anything V2 with a novel efficient spatial-temporal head. Repo available under Apache 2.0💙
👉Review https://t.ly/Q4ZZd
👉Paper arxiv.org/pdf/2501.12375
👉Project https://lnkd.in/dKNwJzbM
👉Repo https://lnkd.in/ddfwwpCj
👉ByteDance unveils Video Depth Anything: HQ, consistent depth estimation in SUPER-long videos (over several minutes) without sacrificing efficiency. Based on Depth Anything V2 with a novel efficient spatial-temporal head. Repo available under Apache 2.0💙
👉Review https://t.ly/Q4ZZd
👉Paper arxiv.org/pdf/2501.12375
👉Project https://lnkd.in/dKNwJzbM
👉Repo https://lnkd.in/ddfwwpCj
🔥9🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🧵Time-Aware Pts-Tracking🧵
👉Chrono: feature backbone specifically designed for point tracking with built-in temporal awareness. Long-term temporal context, enabling precise prediction even without the refinements. Code announced💙
👉Review https://t.ly/XAL7G
👉Paper arxiv.orgzpdf/2501.12218
👉Project cvlab-kaist.github.io/Chrono/
👉Repo github.com/cvlab-kaist/Chrono
👉Chrono: feature backbone specifically designed for point tracking with built-in temporal awareness. Long-term temporal context, enabling precise prediction even without the refinements. Code announced💙
👉Review https://t.ly/XAL7G
👉Paper arxiv.orgzpdf/2501.12218
👉Project cvlab-kaist.github.io/Chrono/
👉Repo github.com/cvlab-kaist/Chrono
❤5🔥5👍3😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🎤EMO2: Audio-Driven Avatar🎤
👉Alibaba previews a novel audio-driven talking head method capable of simultaneously generating highly expressive facial expressions and hand gestures. Turn your audio ON. Stunning results but no code 🥺
👉Review https://t.ly/x8slQ
👉Paper arxiv.org/pdf/2501.10687
👉Project humanaigc.github.io/emote-portrait-alive-2/
👉Repo 🥺
👉Alibaba previews a novel audio-driven talking head method capable of simultaneously generating highly expressive facial expressions and hand gestures. Turn your audio ON. Stunning results but no code 🥺
👉Review https://t.ly/x8slQ
👉Paper arxiv.org/pdf/2501.10687
👉Project humanaigc.github.io/emote-portrait-alive-2/
👉Repo 🥺
🤯7❤6👍2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦠A-Life with Foundation Models🦠
👉A super team unveils ASAL, a new paradigm for Artificial Life research. A diverse range of ALife substrates including Boids, Particle Life, Game of Life, Lenia & Neural Cellular Automata. Code under Apache 2.0💙
👉Review https://t.ly/7SZ8A
👉Paper arxiv.org/pdf/2412.17799
👉Project http://pub.sakana.ai/asal/
👉Repo https://lnkd.in/dP5yxKtw
👉A super team unveils ASAL, a new paradigm for Artificial Life research. A diverse range of ALife substrates including Boids, Particle Life, Game of Life, Lenia & Neural Cellular Automata. Code under Apache 2.0💙
👉Review https://t.ly/7SZ8A
👉Paper arxiv.org/pdf/2412.17799
👉Project http://pub.sakana.ai/asal/
👉Repo https://lnkd.in/dP5yxKtw
❤11⚡2🤩2
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 The code of DynOMo is out 🔥
👉DynOMo is a novel model able to track any point in a dynamic scene over time through 3D reconstruction from monocular video: 2D and 3D point tracking from unposed monocular camera input
👉Review https://t.ly/t5pCf
👉Paper https://lnkd.in/dwhzz4_t
👉Repo github.com/dvl-tum/DynOMo
👉Project https://lnkd.in/dMyku2HW
👉DynOMo is a novel model able to track any point in a dynamic scene over time through 3D reconstruction from monocular video: 2D and 3D point tracking from unposed monocular camera input
👉Review https://t.ly/t5pCf
👉Paper https://lnkd.in/dwhzz4_t
👉Repo github.com/dvl-tum/DynOMo
👉Project https://lnkd.in/dMyku2HW
🔥7❤5😍5👍2🤩2🍾2⚡1
This media is not supported in your browser
VIEW IN TELEGRAM
🪆SOTA Points Segmentation🪆
👉VGG Oxford unveils a novel loss to segment objects in videos based on their motion and NO other forms of supervision! Training the net using long-term point trajectories as a supervisory signal to complement optical flow. New SOTA!
👉Review https://t.ly/8Bsbt
👉Paper https://arxiv.org/pdf/2501.12392
👉Code https://github.com/karazijal/lrtl
👉Project www.robots.ox.ac.uk/~vgg/research/lrtl/
👉VGG Oxford unveils a novel loss to segment objects in videos based on their motion and NO other forms of supervision! Training the net using long-term point trajectories as a supervisory signal to complement optical flow. New SOTA!
👉Review https://t.ly/8Bsbt
👉Paper https://arxiv.org/pdf/2501.12392
👉Code https://github.com/karazijal/lrtl
👉Project www.robots.ox.ac.uk/~vgg/research/lrtl/
🔥3❤2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🎨MatAnyone: Human Matting🎨
👉MatAnyone is a novel approach for human video matting that supports the target assignment. Stable tracking in long videos even with complex/ambiguous BGs. Code & 🤗-Demo announced💙
👉Review https://t.ly/NVXsT
👉Paper arxiv.org/pdf/2501.14677
👉Project pq-yang.github.io/projects/MatAnyone
👉Repo TBA
👉MatAnyone is a novel approach for human video matting that supports the target assignment. Stable tracking in long videos even with complex/ambiguous BGs. Code & 🤗-Demo announced💙
👉Review https://t.ly/NVXsT
👉Paper arxiv.org/pdf/2501.14677
👉Project pq-yang.github.io/projects/MatAnyone
👉Repo TBA
❤16👏2🤩2👍1🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🦕[SOTA] Visual Grounding VOS🦕
👉ReferDINO is the first end-to-end approach for adapting foundational visual grounding models to RVOS. Code & models to be released soon💙
👉Review https://t.ly/SDFy9
👉Paper arxiv.org/pdf/2501.14607
👉Project isee-laboratory.github.io/ReferDINO/
👉Repo github.com/iSEE-Laboratory/ReferDINO
👉ReferDINO is the first end-to-end approach for adapting foundational visual grounding models to RVOS. Code & models to be released soon💙
👉Review https://t.ly/SDFy9
👉Paper arxiv.org/pdf/2501.14607
👉Project isee-laboratory.github.io/ReferDINO/
👉Repo github.com/iSEE-Laboratory/ReferDINO
🤯4❤1🔥1🤩1