This media is not supported in your browser
VIEW IN TELEGRAM
🐯UniAnimate-DiT: Human Animation🐯
👉UniAnimate-DiT is a novel n' effective framework based on Wan2.1 for consistent human image animation. LoRAs to finetune the model parameters -reducing memory- maintaining the original model’s generative skills. Training and inference code released💙
👉Review https://t.ly/1I50N
👉Paper https://arxiv.org/pdf/2504.11289
👉Repo https://github.com/ali-vilab/UniAnimate-DiT
👉UniAnimate-DiT is a novel n' effective framework based on Wan2.1 for consistent human image animation. LoRAs to finetune the model parameters -reducing memory- maintaining the original model’s generative skills. Training and inference code released💙
👉Review https://t.ly/1I50N
👉Paper https://arxiv.org/pdf/2504.11289
👉Repo https://github.com/ali-vilab/UniAnimate-DiT
🔥9😍4👍2👏2
This media is not supported in your browser
VIEW IN TELEGRAM
🔥General attention-based object🔥
👉GATE3D is a novel framework designed specifically for generalized monocular 3D object detection via weak supervision. GATE3D effectively bridges domain gaps by employing consistency losses between 2D and 3D predictions.
👉Review https://t.ly/O7wqH
👉Paper https://lnkd.in/dc5VTUj9
👉Project https://lnkd.in/dzrt-qQV
👉GATE3D is a novel framework designed specifically for generalized monocular 3D object detection via weak supervision. GATE3D effectively bridges domain gaps by employing consistency losses between 2D and 3D predictions.
👉Review https://t.ly/O7wqH
👉Paper https://lnkd.in/dc5VTUj9
👉Project https://lnkd.in/dzrt-qQV
🔥8👍3👏1🤯1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🔍Event Blurry Super-Resolution🔍
👉USTC unveils Ev-DeblurVSR: event signals into BVSR for a novel event-enhanced network. Blurry Video Super-Resolution (BVSR) aiming at generating HR videos from low-resolution and blurry inputs. Pretrained models and test released under Apache💙
👉Review https://t.ly/x6hRs
👉Paper https://lnkd.in/dzbkCJMh
👉Repo https://lnkd.in/dmvsc-yS
👉USTC unveils Ev-DeblurVSR: event signals into BVSR for a novel event-enhanced network. Blurry Video Super-Resolution (BVSR) aiming at generating HR videos from low-resolution and blurry inputs. Pretrained models and test released under Apache💙
👉Review https://t.ly/x6hRs
👉Paper https://lnkd.in/dzbkCJMh
👉Repo https://lnkd.in/dmvsc-yS
🔥20❤8🤯6🤩1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 #Apple Co-Motion is out! 🔥
👉Apple unveils a novel approach for detecting & tracking detailed 3D poses of multiple people from single monocular stream. Temporally coherent predictions in crowded scenes with hard poses & occlusions. New SOTA, 10x faster! Code & Models released only for research💙
👉Review https://t.ly/-86CO
👉Paper https://lnkd.in/dQsVGY7q
👉Repo https://lnkd.in/dh7j7N89
👉Apple unveils a novel approach for detecting & tracking detailed 3D poses of multiple people from single monocular stream. Temporally coherent predictions in crowded scenes with hard poses & occlusions. New SOTA, 10x faster! Code & Models released only for research💙
👉Review https://t.ly/-86CO
👉Paper https://lnkd.in/dQsVGY7q
👉Repo https://lnkd.in/dh7j7N89
👍7🤣6❤5🔥2😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🧊TAP in Persistent 3D Geometry🧊
👉TAPIP3D is the novel SOTA for long-term 3D point tracking in mono-RGB/RGB-D. Videos as camera-stabilized spatio-temporal feature clouds, leveraging depth & motion to lift 2D video feats into a 3D world space where camera motion is effectively canceled. Code under Apache💙
👉Review https://t.ly/oooMy
👉Paper https://lnkd.in/d8uqjdE4
👉Project https://tapip3d.github.io/
👉Repo https://lnkd.in/dsvHP_8u
👉TAPIP3D is the novel SOTA for long-term 3D point tracking in mono-RGB/RGB-D. Videos as camera-stabilized spatio-temporal feature clouds, leveraging depth & motion to lift 2D video feats into a 3D world space where camera motion is effectively canceled. Code under Apache💙
👉Review https://t.ly/oooMy
👉Paper https://lnkd.in/d8uqjdE4
👉Project https://tapip3d.github.io/
👉Repo https://lnkd.in/dsvHP_8u
🔥7❤2😍2👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🦧 #Nvidia Describe Anything 🦧
👉Nvidia unveils Describe Anything Model (DAM) the new SOTA in generating detailed denoscriptions for user-specified regions in images/videos, marked by points, boxes, scribbles, or masks. Repo under Apache, Dataset available and live demo on 🤗
👉Review https://t.ly/la4JD
👉Paper https://lnkd.in/dZh82xtV
👉Project https://lnkd.in/dcv9V2ZF
👉Repo https://lnkd.in/dJB9Ehtb
🤗Demo https://lnkd.in/dXDb2MWU
👉Nvidia unveils Describe Anything Model (DAM) the new SOTA in generating detailed denoscriptions for user-specified regions in images/videos, marked by points, boxes, scribbles, or masks. Repo under Apache, Dataset available and live demo on 🤗
👉Review https://t.ly/la4JD
👉Paper https://lnkd.in/dZh82xtV
👉Project https://lnkd.in/dcv9V2ZF
👉Repo https://lnkd.in/dJB9Ehtb
🤗Demo https://lnkd.in/dXDb2MWU
🔥10👍5❤1
This media is not supported in your browser
VIEW IN TELEGRAM
📍Moving Points -> Depth📍
👉KAIST & Adobe propose Seurat, a novel method that infers relative depth by examining the spatial relationships and temporal evolution of a set of tracked 2D trajectories (via off-the-shelf point tracking models). Repo & Demo to be released💙
👉Review https://t.ly/qA2P5
👉Paper https://lnkd.in/dpXDaQtM
👉Project https://lnkd.in/d9qWYsjP
👉Repo https://lnkd.in/dZEMDiJh
👉KAIST & Adobe propose Seurat, a novel method that infers relative depth by examining the spatial relationships and temporal evolution of a set of tracked 2D trajectories (via off-the-shelf point tracking models). Repo & Demo to be released💙
👉Review https://t.ly/qA2P5
👉Paper https://lnkd.in/dpXDaQtM
👉Project https://lnkd.in/d9qWYsjP
👉Repo https://lnkd.in/dZEMDiJh
❤9🔥3👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🌼SOTA Textured 3D-Guided VTON🌼
👉#ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expanse of motion coherence. Code & benchmark to be released💙
👉Review https://t.ly/0tjdC
👉Paper https://lnkd.in/dFseYSXz
👉Project https://lnkd.in/djtqzrzs
👉Repo TBA
👉#ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expanse of motion coherence. Code & benchmark to be released💙
👉Review https://t.ly/0tjdC
👉Paper https://lnkd.in/dFseYSXz
👉Project https://lnkd.in/djtqzrzs
👉Repo TBA
🤯9👍7❤4🔥2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🍏#Nvidia Dynamic Pose 🍏
👉Nvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia license💙
👉Review https://t.ly/wrcb0
👉Paper https://lnkd.in/dycGjAyy
👉Project https://lnkd.in/dDZ2Ej_Q
🤗Data https://lnkd.in/d8yUSB7m
👉Nvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia license💙
👉Review https://t.ly/wrcb0
👉Paper https://lnkd.in/dycGjAyy
👉Project https://lnkd.in/dDZ2Ej_Q
🤗Data https://lnkd.in/d8yUSB7m
🔥4👍2❤1🤯1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 S3MOT: SOTA 3D MOT 🔥
👉S3MOT: Selective-State-Space model-based MOT that efficiently infers 3D motion and object associations from 2D images through three core components. New SOTA on KITTI with 76.86 HOTA at 31 FPS! Code & Weights to be released under MIT license💙
👉Review https://t.ly/H_JPv
👉Paper https://arxiv.org/pdf/2504.18068
👉Repo https://github.com/bytepioneerX/s3mot
👉S3MOT: Selective-State-Space model-based MOT that efficiently infers 3D motion and object associations from 2D images through three core components. New SOTA on KITTI with 76.86 HOTA at 31 FPS! Code & Weights to be released under MIT license💙
👉Review https://t.ly/H_JPv
👉Paper https://arxiv.org/pdf/2504.18068
👉Repo https://github.com/bytepioneerX/s3mot
🔥7😍2👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 Diffusion Model <-> Depth 🔥
👉ETH & CMU on how to turn a single-image latent diffusion model (LDM) into the SOTA video depth estimator: video depth without video models. Repo released under Apache 2.0 and HF demo available💙
👉Review https://t.ly/sP9ma
👉Paper arxiv.org/pdf/2411.19189
👉Project rollingdepth.github.io/
👉Repo github.com/prs-eth/rollingdepth
🤗Demo huggingface.co/spaces/prs-eth/rollingdepthhttps://t.ly/sP9ma
👉ETH & CMU on how to turn a single-image latent diffusion model (LDM) into the SOTA video depth estimator: video depth without video models. Repo released under Apache 2.0 and HF demo available💙
👉Review https://t.ly/sP9ma
👉Paper arxiv.org/pdf/2411.19189
👉Project rollingdepth.github.io/
👉Repo github.com/prs-eth/rollingdepth
🤗Demo huggingface.co/spaces/prs-eth/rollingdepthhttps://t.ly/sP9ma
❤12🔥6👍3👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🩷Dance vs. #ComputerVision🩷
👉The Saint-Etienne university proposed a new 3D human body pose estimation pipeline to deal with dance analysis. Project page w/ results and interactive demo released💙
👉Review https://t.ly/JEdM3
👉Paper arxiv.org/pdf/2505.07249
👉Project https://lnkd.in/dD5dsMv5
👉The Saint-Etienne university proposed a new 3D human body pose estimation pipeline to deal with dance analysis. Project page w/ results and interactive demo released💙
👉Review https://t.ly/JEdM3
👉Paper arxiv.org/pdf/2505.07249
👉Project https://lnkd.in/dD5dsMv5
❤9👍1🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🧞♀️GENMO: Generalist Human Motion 🧞♀️
👉#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the moment🥲
👉Review https://t.ly/Q5T_Y
👉Paper https://lnkd.in/ds36BY49
👉Project https://lnkd.in/dAYHhuFU
👉#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the moment🥲
👉Review https://t.ly/Q5T_Y
👉Paper https://lnkd.in/ds36BY49
👉Project https://lnkd.in/dAYHhuFU
🔥13❤3👍3😢1😍1
Dear friends,
I’m truly sorry for being away from the group for so long. I know: no updates so far while AI is running faster than speed of light.
I’m going through a very difficult time in my life and I need some space to heal. This spare-time project (but important for a lot of people here) needs energy and commitment I don’t have right now. I’m sorry, be patient. I’ll be back.
Love u all,
Alessandro.
I’m truly sorry for being away from the group for so long. I know: no updates so far while AI is running faster than speed of light.
I’m going through a very difficult time in my life and I need some space to heal. This spare-time project (but important for a lot of people here) needs energy and commitment I don’t have right now. I’m sorry, be patient. I’ll be back.
Love u all,
Alessandro.
❤400👍28😢27
Hi everybody,
I took a few weeks to take a breath from a lot of stuff, I dedicated all my mental energy to keep working and I dedicated all my spare time to take care of myself. Despite I'm still not ok (BTW, my health was/is always good), I feel it's time to come back and support this wonderful community in this journey. I feel the responsibility of that, time to get in the ring.
I'm very sorry for being out so long, but sometime life hits really hard. I got an incredible support from unknown people from all around the world. It's amazing.
Thanks again, you rock!
Alessandro.
I took a few weeks to take a breath from a lot of stuff, I dedicated all my mental energy to keep working and I dedicated all my spare time to take care of myself. Despite I'm still not ok (BTW, my health was/is always good), I feel it's time to come back and support this wonderful community in this journey. I feel the responsibility of that, time to get in the ring.
I'm very sorry for being out so long, but sometime life hits really hard. I got an incredible support from unknown people from all around the world. It's amazing.
Thanks again, you rock!
Alessandro.
1❤203👍16🔥16👏5🍾3😢2💩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦖 DINOv3 is out 🦖
👉#Meta unveils DINOv3! A novel foundation model outperforming the previous SOTAs in computer vision. Code & weights released under DINOv3 License💙
👉Review https://t.ly/-S3ZL
👉Paper https://t.ly/ervOT
👉Project https://lnkd.in/dHFf3esd
👉Repo https://lnkd.in/dPxhDxAq
🤗HF https://lnkd.in/dWGudY2i
👉#Meta unveils DINOv3! A novel foundation model outperforming the previous SOTAs in computer vision. Code & weights released under DINOv3 License💙
👉Review https://t.ly/-S3ZL
👉Paper https://t.ly/ervOT
👉Project https://lnkd.in/dHFf3esd
👉Repo https://lnkd.in/dPxhDxAq
🤗HF https://lnkd.in/dWGudY2i
❤46🔥13👍2😍1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🤖 Impact of SuperHuman AI 🤖
👉The NoProfit AI Futures Project unveils a (dystopic) scenario about what super-AI might look like. Forecast from today to the bio-engineered human-like creatures. A fascinating speculation of the future with the "slow-down" and "race" scenarios. Enjoy 💙
👉Review https://t.ly/EgmfJ
👉Project https://ai-2027.com/
👉The NoProfit AI Futures Project unveils a (dystopic) scenario about what super-AI might look like. Forecast from today to the bio-engineered human-like creatures. A fascinating speculation of the future with the "slow-down" and "race" scenarios. Enjoy 💙
👉Review https://t.ly/EgmfJ
👉Project https://ai-2027.com/
❤7🔥2🤯2🤣1
This media is not supported in your browser
VIEW IN TELEGRAM
🏓TOTNet: Occlusion-aware Tracking🏓
👉TOTNet: novel Temporal Occlusion Tracking Network that leverages 3D-convs, visibility-weighted loss, & occlusion augmentation to improve performance under occlusions. Code & Data under MIT💙
👉Review https://t.ly/Q0jAf
👉Paper https://lnkd.in/dUYsa-GC
👉Repo https://lnkd.in/d3QGUHYb
👉TOTNet: novel Temporal Occlusion Tracking Network that leverages 3D-convs, visibility-weighted loss, & occlusion augmentation to improve performance under occlusions. Code & Data under MIT💙
👉Review https://t.ly/Q0jAf
👉Paper https://lnkd.in/dUYsa-GC
👉Repo https://lnkd.in/d3QGUHYb
🔥10❤6👍1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🔀Feed-Forward 4D video🔀
👉4DNeX is the first feed-forward framework for generating 4D scene representations from a single image by fine-tuning diffusion model. HQ dynamic pt-clouds & downstream tasks such as novel-view video synthesis with strong generalizability. Code/Data announced 💙
👉Review https://t.ly/SpkD-
👉Paper arxiv.org/pdf/2508.13154
👉Project https://4dnex.github.io/
👉Repo github.com/3DTopia/4DNeX
👉Data https://lnkd.in/dh4_3Ghf
👉Demo https://lnkd.in/dztyzwgg
👉4DNeX is the first feed-forward framework for generating 4D scene representations from a single image by fine-tuning diffusion model. HQ dynamic pt-clouds & downstream tasks such as novel-view video synthesis with strong generalizability. Code/Data announced 💙
👉Review https://t.ly/SpkD-
👉Paper arxiv.org/pdf/2508.13154
👉Project https://4dnex.github.io/
👉Repo github.com/3DTopia/4DNeX
👉Data https://lnkd.in/dh4_3Ghf
👉Demo https://lnkd.in/dztyzwgg
❤10🔥7👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈DAViD: Synthetic Depth-Normal-Segmentation🌈
👉#Microsoft's DAViD: 100% synthetic dataset/models for human Depth, Normals & Segmentation. Dataset available, models & runtime under MIT💙
👉Review https://t.ly/-SlO_
👉Paper https://lnkd.in/eCmMXpTg
👉Project https://lnkd.in/eurCSWkm
👉Repo https://lnkd.in/e7PWFgP2
👉#Microsoft's DAViD: 100% synthetic dataset/models for human Depth, Normals & Segmentation. Dataset available, models & runtime under MIT💙
👉Review https://t.ly/-SlO_
👉Paper https://lnkd.in/eCmMXpTg
👉Project https://lnkd.in/eurCSWkm
👉Repo https://lnkd.in/e7PWFgP2
👍7❤6🔥3🤩1