This media is not supported in your browser
VIEW IN TELEGRAM
🪬UHM: Authentic Hand by Phone🪬
👉 META unveils UHM, novel 3D high-fidelity avatarization of your (yes, the your one) hand. Adaptation pipeline fits the pre-trained UHM via phone scan. Source Code released 💙
👉Review https://t.ly/fU5rA
👉Paper https://lnkd.in/dyGaiAnq
👉Code https://lnkd.in/d9B_XFAA
👉 META unveils UHM, novel 3D high-fidelity avatarization of your (yes, the your one) hand. Adaptation pipeline fits the pre-trained UHM via phone scan. Source Code released 💙
👉Review https://t.ly/fU5rA
👉Paper https://lnkd.in/dyGaiAnq
👉Code https://lnkd.in/d9B_XFAA
👍4❤1🔥1🤯1
🔥EfficientTrain++: Efficient Foundation Visual Backbone Training🔥
👉Tsinghua unveils EfficientTrain++, a simple, general, surprisingly effective, off-the-shelf approach to reduce the training time of various popular models (e.g., ResNet, ConvNeXt, DeiT, PVT, Swin, CSWin, and CAFormer). Up to 3.0× faster on ImageNet-1K/22K without sacrificing accuracy. Source Code released 💙
👉Review https://t.ly/D8ttv
👉Paper https://arxiv.org/pdf/2405.08768
👉Code https://github.com/LeapLabTHU/EfficientTrain
👉Tsinghua unveils EfficientTrain++, a simple, general, surprisingly effective, off-the-shelf approach to reduce the training time of various popular models (e.g., ResNet, ConvNeXt, DeiT, PVT, Swin, CSWin, and CAFormer). Up to 3.0× faster on ImageNet-1K/22K without sacrificing accuracy. Source Code released 💙
👉Review https://t.ly/D8ttv
👉Paper https://arxiv.org/pdf/2405.08768
👉Code https://github.com/LeapLabTHU/EfficientTrain
👍9🔥3🤯3❤2🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🫀 EchoTracker: Tracking Echocardiography🫀
👉EchoTracker: two-fold coarse-to-fine model that facilitates the tracking of queried points on a tissue surface across ultrasound. Source Code released💙
👉Review https://t.ly/NyBe0
👉Paper https://arxiv.org/pdf/2405.08587
👉Code https://github.com/riponazad/echotracker/
👉EchoTracker: two-fold coarse-to-fine model that facilitates the tracking of queried points on a tissue surface across ultrasound. Source Code released💙
👉Review https://t.ly/NyBe0
👉Paper https://arxiv.org/pdf/2405.08587
👉Code https://github.com/riponazad/echotracker/
❤15👍1🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🦕 Grounding DINO 1.5 Pro/Edge 🦕
👉Grounding DINO 1.5, a suite of advanced open-set object detection models to advanced the "Edge" of open-set object detection. Source Code released under Apache 2.0💙
👉Review https://t.ly/kS-og
👉Paper https://lnkd.in/dNakMge2
👉Code https://lnkd.in/djhnQmrm
👉Grounding DINO 1.5, a suite of advanced open-set object detection models to advanced the "Edge" of open-set object detection. Source Code released under Apache 2.0💙
👉Review https://t.ly/kS-og
👉Paper https://lnkd.in/dNakMge2
👉Code https://lnkd.in/djhnQmrm
🔥22❤1👍1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
⚽3D Shot Posture in Broadcast⚽
👉Nagoya Univeristy unveils 3DSP soccer broadcast videos, the most extensive sports image dataset with 2D pose annotations ever.
👉Review https://t.ly/IIMeZ
👉Paper https://arxiv.org/pdf/2405.12070
👉Code https://github.com/calvinyeungck/3D-Shot-Posture-Dataset/tree/master
👉Nagoya Univeristy unveils 3DSP soccer broadcast videos, the most extensive sports image dataset with 2D pose annotations ever.
👉Review https://t.ly/IIMeZ
👉Paper https://arxiv.org/pdf/2405.12070
👉Code https://github.com/calvinyeungck/3D-Shot-Posture-Dataset/tree/master
🔥8🥰1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🖼️ Diffusive Images that Sound 🖼️
👉The University of Michigan unveils a diffusion model able to generate spectrograms that look like images but can also be played as sound.
👉Review https://t.ly/ADtYM
👉Paper arxiv.org/pdf/2405.12221
👉Project ificl.github.io/images-that-sound
👉Code github.com/IFICL/images-that-sound
👉The University of Michigan unveils a diffusion model able to generate spectrograms that look like images but can also be played as sound.
👉Review https://t.ly/ADtYM
👉Paper arxiv.org/pdf/2405.12221
👉Project ificl.github.io/images-that-sound
👉Code github.com/IFICL/images-that-sound
🤯11❤6😍5🔥4👍1
This media is not supported in your browser
VIEW IN TELEGRAM
👚ViViD: Diffusion VTON👚
👉ViViD is a novel framework employing powerful diffusion models to tackle the task of video virtual try-on. Code announced, not released yet😢
👉Review https://t.ly/h_SyP
👉Paper arxiv.org/pdf/2405.11794
👉Repo https://lnkd.in/dT4_bzPw
👉Project https://lnkd.in/dCK5ug4v
👉ViViD is a novel framework employing powerful diffusion models to tackle the task of video virtual try-on. Code announced, not released yet😢
👉Review https://t.ly/h_SyP
👉Paper arxiv.org/pdf/2405.11794
👉Repo https://lnkd.in/dT4_bzPw
👉Project https://lnkd.in/dCK5ug4v
🔥13🤩3❤1👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🍀OmniGlue: Foundation Matcher🍀
👉#Google OmniGlue from #CVPR24: the first learnable image matcher powered by foundation models. Impressive OOD results!
👉Review https://t.ly/ezaIc
👉Paper https://arxiv.org/pdf/2405.12979
👉Project hwjiang1510.github.io/OmniGlue/
👉Code https://github.com/google-research/omniglue/
👉#Google OmniGlue from #CVPR24: the first learnable image matcher powered by foundation models. Impressive OOD results!
👉Review https://t.ly/ezaIc
👉Paper https://arxiv.org/pdf/2405.12979
👉Project hwjiang1510.github.io/OmniGlue/
👉Code https://github.com/google-research/omniglue/
🤯10❤6👍2👏1
🔥 YOLOv10 is out 🔥
👉YOLOv10: novel real-time end-to-end object detection. Code released under GNU AGPL v3.0💙
👉Review https://shorturl.at/ZIHBh
👉Paper arxiv.org/pdf/2405.14458
👉Code https://github.com/THU-MIG/yolov10/
👉YOLOv10: novel real-time end-to-end object detection. Code released under GNU AGPL v3.0💙
👉Review https://shorturl.at/ZIHBh
👉Paper arxiv.org/pdf/2405.14458
👉Code https://github.com/THU-MIG/yolov10/
🔥25❤3👍2⚡1
This media is not supported in your browser
VIEW IN TELEGRAM
⛈️Unsupervised Neuromorphic Motion⛈️
👉The Western Sydney University unveils a novel unsupervised event-based motion segmentation algorithm, employing the #Prophesee Gen4 HD event camera.
👉Review https://t.ly/UZzIZ
👉Paper arxiv.org/pdf/2405.15209
👉Project samiarja.github.io/evairborne
👉Repo (empty) github.com/samiarja/ev/_deep/_motion_segmentation
👉The Western Sydney University unveils a novel unsupervised event-based motion segmentation algorithm, employing the #Prophesee Gen4 HD event camera.
👉Review https://t.ly/UZzIZ
👉Paper arxiv.org/pdf/2405.15209
👉Project samiarja.github.io/evairborne
👉Repo (empty) github.com/samiarja/ev/_deep/_motion_segmentation
👍5🔥1🥰1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🦓 Z.S. Diffusive Segmentation 🦓
👉KAUST (+MPI) announced the first zero-shot approach for Video Semantic Segmentation (VSS) based on pre-trained diffusion models. Source Code released under MIT💙
👉Review https://t.ly/v_64K
👉Paper arxiv.org/pdf/2405.16947
👉Project https://lnkd.in/dcSt4dQx
👉Code https://lnkd.in/dcZfM8F3
👉KAUST (+MPI) announced the first zero-shot approach for Video Semantic Segmentation (VSS) based on pre-trained diffusion models. Source Code released under MIT💙
👉Review https://t.ly/v_64K
👉Paper arxiv.org/pdf/2405.16947
👉Project https://lnkd.in/dcSt4dQx
👉Code https://lnkd.in/dcZfM8F3
🤯4🔥2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🪰 Dynamic Gaussian Fusion via 4D Motion Scaffolds 🪰
👉MoSca is a novel 4D Motion Scaffolds to reconstruct/synthesize novel views of dynamic scenes from monocular videos in the wild!
👉Review https://t.ly/nSdEL
👉Paper arxiv.org/pdf/2405.17421
👉Code github.com/JiahuiLei/MoSca
👉Project https://lnkd.in/dkjMVcqZ
👉MoSca is a novel 4D Motion Scaffolds to reconstruct/synthesize novel views of dynamic scenes from monocular videos in the wild!
👉Review https://t.ly/nSdEL
👉Paper arxiv.org/pdf/2405.17421
👉Code github.com/JiahuiLei/MoSca
👉Project https://lnkd.in/dkjMVcqZ
🔥6👍1🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🧤Transformer-based 4D Hands🧤
👉4DHands is a novel and robust approach to recovering interactive hand meshes and their relative movement from monocular inputs. Authors: Beijing NU, Tsinghua & Lenovo. No code announced 😢
👉Review https://t.ly/wvG-l
👉Paper arxiv.org/pdf/2405.20330
👉Project 4dhands.github.io/
👉4DHands is a novel and robust approach to recovering interactive hand meshes and their relative movement from monocular inputs. Authors: Beijing NU, Tsinghua & Lenovo. No code announced 😢
👉Review https://t.ly/wvG-l
👉Paper arxiv.org/pdf/2405.20330
👉Project 4dhands.github.io/
🔥4🤯3❤1👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🎭New 2D Landmarks SOTA🎭
👉Flawless AI unveils FaceLift, a novel semi-supervised approach that learns 3D landmarks by directly lifting (visible) hand-labeled 2D landmarks and ensures better definition alignment, with no need for 3D landmark datasets. No code announced🥹
👉Review https://t.ly/lew9a
👉Paper arxiv.org/pdf/2405.19646
👉Project davidcferman.github.io/FaceLift
👉Flawless AI unveils FaceLift, a novel semi-supervised approach that learns 3D landmarks by directly lifting (visible) hand-labeled 2D landmarks and ensures better definition alignment, with no need for 3D landmark datasets. No code announced🥹
👉Review https://t.ly/lew9a
👉Paper arxiv.org/pdf/2405.19646
👉Project davidcferman.github.io/FaceLift
🔥16❤5😢5👏2💩2⚡1👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🐳 MultiPly: in-the-wild Multi-People 🐳
👉MultiPly: novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. It's the new SOTA over the publicly available datasets and in-the-wild videos. Source Code announced, coming💙
👉Review https://t.ly/_xjk_
👉Paper arxiv.org/pdf/2406.01595
👉Project eth-ait.github.io/MultiPly
👉Repo github.com/eth-ait/MultiPly
👉MultiPly: novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. It's the new SOTA over the publicly available datasets and in-the-wild videos. Source Code announced, coming💙
👉Review https://t.ly/_xjk_
👉Paper arxiv.org/pdf/2406.01595
👉Project eth-ait.github.io/MultiPly
👉Repo github.com/eth-ait/MultiPly
🔥14👍4👏2❤1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
👹AI and the Everything in the Whole Wide World Benchmark👹
👉Last week Yann LeCun said something like "LLMs will not reach human intelligence". It's clear the on-going #deeplearning is not ready for "general AI", a "radical alternative" is necessary to create a “superintelligence”.
👉Review https://t.ly/isdxM
👉News https://lnkd.in/dFraieZS
👉Paper https://lnkd.in/da-7PnVT
👉Last week Yann LeCun said something like "LLMs will not reach human intelligence". It's clear the on-going #deeplearning is not ready for "general AI", a "radical alternative" is necessary to create a “superintelligence”.
👉Review https://t.ly/isdxM
👉News https://lnkd.in/dFraieZS
👉Paper https://lnkd.in/da-7PnVT
❤5👍2👏1💩1
This media is not supported in your browser
VIEW IN TELEGRAM
📞FacET: VideoCall Change Your Expression📞
👉Columbia University unveils FacET: discovering behavioral differences between conversing face-to-face (F2F) and on video-calls (VCs).
👉Review https://t.ly/qsQmt
👉Paper arxiv.org/pdf/2406.00955
👉Project facet.cs.columbia.edu/
👉Repo (empty) github.com/stellargo/facet
👉Columbia University unveils FacET: discovering behavioral differences between conversing face-to-face (F2F) and on video-calls (VCs).
👉Review https://t.ly/qsQmt
👉Paper arxiv.org/pdf/2406.00955
👉Project facet.cs.columbia.edu/
👉Repo (empty) github.com/stellargo/facet
🔥8❤1👍1👏1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🚙 UA-Track: Uncertainty-Aware MOT🚙
👉UA-Track: novel Uncertainty-Aware 3D MOT framework which tackles the uncertainty problem from multiple aspects. Code announced, not released yet.
👉Review https://t.ly/RmVSV
👉Paper https://arxiv.org/pdf/2406.02147
👉Project https://liautoad.github.io/ua-track-website
👉UA-Track: novel Uncertainty-Aware 3D MOT framework which tackles the uncertainty problem from multiple aspects. Code announced, not released yet.
👉Review https://t.ly/RmVSV
👉Paper https://arxiv.org/pdf/2406.02147
👉Project https://liautoad.github.io/ua-track-website
👍8❤1🔥1🥰1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🧊 Universal 6D Pose/Tracking 🧊
👉Omni6DPose is a novel dataset for 6D Object Pose with 1.5M+ annotations. Extra: GenPose++, the novel SOTA in category-level 6D estimation/tracking thanks to two pivotal improvements.
👉Review https://t.ly/Ywgl1
👉Paper arxiv.org/pdf/2406.04316
👉Project https://lnkd.in/dHBvenhX
👉Lib https://lnkd.in/d8Yc-KFh
👉Omni6DPose is a novel dataset for 6D Object Pose with 1.5M+ annotations. Extra: GenPose++, the novel SOTA in category-level 6D estimation/tracking thanks to two pivotal improvements.
👉Review https://t.ly/Ywgl1
👉Paper arxiv.org/pdf/2406.04316
👉Project https://lnkd.in/dHBvenhX
👉Lib https://lnkd.in/d8Yc-KFh
❤12👍4🤩2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
👗 SOTA Multi-Garment VTOn Editing 👗
👉#Google (+UWA) unveils M&M VTO, novel mix 'n' match virtual try-on that takes as input multiple garment images, text denoscription for garment layout and an image of a person. It's the new SOTA both qualitatively and quantitatively. Impressive results!
👉Review https://t.ly/66mLN
👉Paper arxiv.org/pdf/2406.04542
👉Project https://mmvto.github.io
👉#Google (+UWA) unveils M&M VTO, novel mix 'n' match virtual try-on that takes as input multiple garment images, text denoscription for garment layout and an image of a person. It's the new SOTA both qualitatively and quantitatively. Impressive results!
👉Review https://t.ly/66mLN
👉Paper arxiv.org/pdf/2406.04542
👉Project https://mmvto.github.io
👍4❤3🥰3🔥1🤯1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
👑 Kling AI vs. OpenAI Sora 👑
👉Kling: the ultimate Chinese text-to-video model - rival to #OpenAI’s Sora. No papers or tech info to check, but stunning results from the official site.
👉Review https://t.ly/870DQ
👉Paper ???
👉Project https://kling.kuaishou.com/
👉Kling: the ultimate Chinese text-to-video model - rival to #OpenAI’s Sora. No papers or tech info to check, but stunning results from the official site.
👉Review https://t.ly/870DQ
👉Paper ???
👉Project https://kling.kuaishou.com/
🔥6👍3❤1🥰1