This media is not supported in your browser
VIEW IN TELEGRAM
🔥Orient Anything V2 is out🔥
👉Orient Anything V2 is a foundation model for unified understanding of object 3D orientation and rotation from single or paired images. Repo under CC-BY-4.0💙
👉Review https://t.ly/Ht7Xd
👉Paper arxiv.org/pdf/2601.05573
👉Project orient-anythingv2.github.io/
👉Repo github.com/SpatialVision/Orient-Anything-V2
👉Orient Anything V2 is a foundation model for unified understanding of object 3D orientation and rotation from single or paired images. Repo under CC-BY-4.0💙
👉Review https://t.ly/Ht7Xd
👉Paper arxiv.org/pdf/2601.05573
👉Project orient-anythingv2.github.io/
👉Repo github.com/SpatialVision/Orient-Anything-V2
❤5🔥2👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🫛Active Object Reconstruction🫛
👉ObjSplat (Beijing) autonomously plans viewpoints and progressively reconstructs an unknown object into a Hi-Fi Gaussian model and water-tight mesh, enabling direct use in physics simulations. Tough paper and repo announced💙
👉Review https://t.ly/au6HE
👉Paper arxiv.org/pdf/2601.06997
👉Project li-yuetao.github.io/ObjSplat-page/
👉Repo https://github.com/Li-Yuetao/ObjSplat
👉ObjSplat (Beijing) autonomously plans viewpoints and progressively reconstructs an unknown object into a Hi-Fi Gaussian model and water-tight mesh, enabling direct use in physics simulations. Tough paper and repo announced💙
👉Review https://t.ly/au6HE
👉Paper arxiv.org/pdf/2601.06997
👉Project li-yuetao.github.io/ObjSplat-page/
👉Repo https://github.com/Li-Yuetao/ObjSplat
❤7👍1
In 2026, who should we keep an eye on?
Vote: https://www.linkedin.com/posts/visionarynet_ai-deeplearning-aiwithpapers-activity-7416886610795077632-qQeP/
Vote: https://www.linkedin.com/posts/visionarynet_ai-deeplearning-aiwithpapers-activity-7416886610795077632-qQeP/
❤2🔥2🤯1
👉Games Workshop (Warhammer) is banning the use of AI in creative and design processes to protect IP and human creativity. A decision that goes against the current hype of widespread AI adoption.
And what about your organization? I need your help👇
Vote: https://www.linkedin.com/posts/visionarynet_ai-activity-7417106327019196417-TpGL
And what about your organization? I need your help👇
Vote: https://www.linkedin.com/posts/visionarynet_ai-activity-7417106327019196417-TpGL
❤3🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
💚Segment Anything Geometry💚
👉3AM (NYCU + #Nvidia) offers cross-view correspondence even under large viewpoint changes, cluttered scenes, and variations in capture conditions, enabling robust object tracking from both videos & casual multi-view images. Repo (coming) & Demo available💙
👉Review https://t.ly/olZwE
👉Paper https://arxiv.org/pdf/2601.08831
👉Project https://jayisaking.github.io/3AM-Page/
👉Repo https://github.com/jayisaking
👉Demo https://huggingface.co/spaces/nycu-cplab/3AM
👉3AM (NYCU + #Nvidia) offers cross-view correspondence even under large viewpoint changes, cluttered scenes, and variations in capture conditions, enabling robust object tracking from both videos & casual multi-view images. Repo (coming) & Demo available💙
👉Review https://t.ly/olZwE
👉Paper https://arxiv.org/pdf/2601.08831
👉Project https://jayisaking.github.io/3AM-Page/
👉Repo https://github.com/jayisaking
👉Demo https://huggingface.co/spaces/nycu-cplab/3AM
🔥10❤4👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🎇 Multi-target SAM3 🎇
👉SAM3-DMS is a novel training-free decoupled strategy that utilizes fine-grained memory selection on individual objects. Robust identity preservation and tracking stability. Repo under SAM License💙
👉Review https://t.ly/jJOAr
👉Paper https://arxiv.org/pdf/2601.09699
👉Repo https://github.com/FudanCVL/SAM3-DMS
👉SAM3-DMS is a novel training-free decoupled strategy that utilizes fine-grained memory selection on individual objects. Robust identity preservation and tracking stability. Repo under SAM License💙
👉Review https://t.ly/jJOAr
👉Paper https://arxiv.org/pdf/2601.09699
👉Repo https://github.com/FudanCVL/SAM3-DMS
🔥5❤2👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🍿100M Video Action Dataset🍿
👉Action100M by META is a large-scale dataset w/ 1.2M instructional videos (14.6 years of duration), yielding O(100M) temporally localized segments with open-vocabulary action supervision and rich captions. Repo under FAIR NC Research License💙
👉Review https://t.ly/w5KXe
👉Paper arxiv.org/pdf/2601.10592
👉Repo github.com/facebookresearch/Action100M
👉Action100M by META is a large-scale dataset w/ 1.2M instructional videos (14.6 years of duration), yielding O(100M) temporally localized segments with open-vocabulary action supervision and rich captions. Repo under FAIR NC Research License💙
👉Review https://t.ly/w5KXe
👉Paper arxiv.org/pdf/2601.10592
👉Repo github.com/facebookresearch/Action100M
🔥10👍2👏2❤1
This media is not supported in your browser
VIEW IN TELEGRAM
💜Interactive Humanoid Generation💜
👉FlowAct-R1 by ByteDance is a novel framework that enables lifelike, responsive, and high-fidelity humanoid video generation for seamless real-time interaction. No code but impressive results (see video with audio) 💙
👉Review https://t.ly/aQhol
👉Paper arxiv.org/pdf/2601.10103
👉Project grisoon.github.io/FlowAct-R1/
👉FlowAct-R1 by ByteDance is a novel framework that enables lifelike, responsive, and high-fidelity humanoid video generation for seamless real-time interaction. No code but impressive results (see video with audio) 💙
👉Review https://t.ly/aQhol
👉Paper arxiv.org/pdf/2601.10103
👉Project grisoon.github.io/FlowAct-R1/
❤9🤯6🔥2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
💢3D Human Gen-Seg💢
👉CoMoVi takes an input image with a text denoscription and generates 3D human motion & video sequence synchronously within a single diffusion denoising loop. Repo & Dataset releasing💙
👉Review https://t.ly/khSkm
👉Paper arxiv.org/pdf/2601.10632
👉Project igl-hkust.github.io/CoMoVi/
👉Repo github.com/IGL-HKUST/CoMoVi
👉Data huggingface.co/datasets/AfterJourney/CoMoVi-Dataset
👉CoMoVi takes an input image with a text denoscription and generates 3D human motion & video sequence synchronously within a single diffusion denoising loop. Repo & Dataset releasing💙
👉Review https://t.ly/khSkm
👉Paper arxiv.org/pdf/2601.10632
👉Project igl-hkust.github.io/CoMoVi/
👉Repo github.com/IGL-HKUST/CoMoVi
👉Data huggingface.co/datasets/AfterJourney/CoMoVi-Dataset
🔥3❤1
This media is not supported in your browser
VIEW IN TELEGRAM
👹SOTA Part-level Generator👹
👉A novel a text-to-motion model that learns to compose complex motions through hierarchical conditioning on part-, action- & sequence-level text, enabling fine-grained control over body parts & timing. Code, models & Dataset to be released💙
👉Review https://t.ly/leB_R
👉Paper arxiv.org/pdf/2601.10909
👉Project coral79.github.io/frankenmotion/
👉Repo github.com/Coral79/FrankenMotion-Code
👉A novel a text-to-motion model that learns to compose complex motions through hierarchical conditioning on part-, action- & sequence-level text, enabling fine-grained control over body parts & timing. Code, models & Dataset to be released💙
👉Review https://t.ly/leB_R
👉Paper arxiv.org/pdf/2601.10909
👉Project coral79.github.io/frankenmotion/
👉Repo github.com/Coral79/FrankenMotion-Code
❤3🔥2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
💚 #META 3D Casual Captures 💚
👉#META unveils ShapeR, a novel approach for conditional 3D object shape generation from casually captured sequences. Impressive results. Repo under CC BY-NC 4.0💙
👉Review https://t.ly/j08sJ
👉Paper arxiv.org/pdf/2601.11514
👉Project facebookresearch.github.io/ShapeR/
👉Repo github.com/facebookresearch/ShapeR
👉#META unveils ShapeR, a novel approach for conditional 3D object shape generation from casually captured sequences. Impressive results. Repo under CC BY-NC 4.0💙
👉Review https://t.ly/j08sJ
👉Paper arxiv.org/pdf/2601.11514
👉Project facebookresearch.github.io/ShapeR/
👉Repo github.com/facebookresearch/ShapeR
🔥7❤4👏1
💊Foundation Medical SAM3 💊
👉Medical SAM3: foundation model for universal prompt-driven medical image segmentation, by fully fine-tuning SAM3 on large-scale, heterogeneous 2D/3D medical imaging datasets with paired segmentation masks-text prompts. Repo & Demo announced💙
👉Review https://t.ly/C6jcy
👉Paper https://arxiv.org/pdf/2601.10880
👉Project chongcongjiang.github.io/MedicalSAM3/#
👉Repo github.com/AIM-Research-Lab/Medical-SAM3
👉Medical SAM3: foundation model for universal prompt-driven medical image segmentation, by fully fine-tuning SAM3 on large-scale, heterogeneous 2D/3D medical imaging datasets with paired segmentation masks-text prompts. Repo & Demo announced💙
👉Review https://t.ly/C6jcy
👉Paper https://arxiv.org/pdf/2601.10880
👉Project chongcongjiang.github.io/MedicalSAM3/#
👉Repo github.com/AIM-Research-Lab/Medical-SAM3
❤13🔥3👍2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🦧Mask-Guided Matting🦧
👉VideoMaMa is novel a diffusion-based model that converts binary masks into continuous alpha mattes. Repo, Dataset & Demo💙
👉Review https://t.ly/l_0f8
👉Paper arxiv.org/pdf/2601.14255
👉Project cvlab-kaist.github.io/VideoMaMa
👉Repo github.com/cvlab-kaist/VideoMaMa
👉Demo huggingface.co/spaces/SammyLim/VideoMaMa
👉VideoMaMa is novel a diffusion-based model that converts binary masks into continuous alpha mattes. Repo, Dataset & Demo💙
👉Review https://t.ly/l_0f8
👉Paper arxiv.org/pdf/2601.14255
👉Project cvlab-kaist.github.io/VideoMaMa
👉Repo github.com/cvlab-kaist/VideoMaMa
👉Demo huggingface.co/spaces/SammyLim/VideoMaMa
❤5🔥2👍1
This media is not supported in your browser
VIEW IN TELEGRAM
💜MoRo: Human Motion💜
👉Masked modeling for human motion Recovery under Occlusions. Given a monocular video captured from a static camera, MoRo (by ETHZ & META) robustly reconstructs accurate/physically plausible human motion, even under challenging occlusions. Repo released💙
👉Review https://t.ly/kK_je
👉Paper arxiv.org/pdf/2601.16079
👉Project mikeqzy.github.io/MoRo/
👉Repo github.com/mikeqzy/MoRo
👉Masked modeling for human motion Recovery under Occlusions. Given a monocular video captured from a static camera, MoRo (by ETHZ & META) robustly reconstructs accurate/physically plausible human motion, even under challenging occlusions. Repo released💙
👉Review https://t.ly/kK_je
👉Paper arxiv.org/pdf/2601.16079
👉Project mikeqzy.github.io/MoRo/
👉Repo github.com/mikeqzy/MoRo
❤6👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 BBoxMaskPose v2 is fire 🔥
👉BBoxMaskPose v2 by ČVUT offers SOTA performance in detection, segmentation & 2D pose in crowded scenes. It enables 3D human reconstruction even in scenes with complex interactions. Code, Models & data available💙
👉Review https://t.ly/GkkDl
👉Paper arxiv.org/pdf/2601.15200
👉Project https://lnkd.in/dQ_3hxjC
👉Repo https://lnkd.in/dVqwD3jN
👉BBoxMaskPose v2 by ČVUT offers SOTA performance in detection, segmentation & 2D pose in crowded scenes. It enables 3D human reconstruction even in scenes with complex interactions. Code, Models & data available💙
👉Review https://t.ly/GkkDl
👉Paper arxiv.org/pdf/2601.15200
👉Project https://lnkd.in/dQ_3hxjC
👉Repo https://lnkd.in/dVqwD3jN
❤5👍3👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🦠Generalized-Scale Counting🦠
👉GeCo2 (Ljubljana) is a novel e2e SOTA few-shot method that explicitly addresses the object scale issues. Repo & Demo 💙
👉Review https://t.ly/2_7I8
👉Paper https://arxiv.org/pdf/2511.08048
👉Repo https://github.com/jerpelhan/GECO2
👉Demo huggingface.co/spaces/jerpelhan/GECO2-demo
👉GeCo2 (Ljubljana) is a novel e2e SOTA few-shot method that explicitly addresses the object scale issues. Repo & Demo 💙
👉Review https://t.ly/2_7I8
👉Paper https://arxiv.org/pdf/2511.08048
👉Repo https://github.com/jerpelhan/GECO2
👉Demo huggingface.co/spaces/jerpelhan/GECO2-demo
👍11❤1🔥1
🔥🔥Super-Hard Poll folks🔥🔥
👉 This dilemma is driving me crazy. Vote: https://www.linkedin.com/posts/visionarynet_activity-7421974594917588992-YNAG
(and of course comment here)
👉 This dilemma is driving me crazy. Vote: https://www.linkedin.com/posts/visionarynet_activity-7421974594917588992-YNAG
(and of course comment here)
❤5👍1🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🌻MLLMs Fine Segmentation🌻
👉SimpleSeg: MLLMs with native pixel-level perception. Repo & Model available💙
👉Review https://t.ly/eVguh
👉Paper arxiv.org/pdf/2601.19228
👉Project simpleseg.github.io/
👉Repo github.com/songtianhui/SimpleSeg
👉SimpleSeg: MLLMs with native pixel-level perception. Repo & Model available💙
👉Review https://t.ly/eVguh
👉Paper arxiv.org/pdf/2601.19228
👉Project simpleseg.github.io/
👉Repo github.com/songtianhui/SimpleSeg
🔥4👍3❤2👏1
🔥 DeepSeek-OCR 2 is out 🔥
👉DeepSeek-AI announced the new version of its powerful SOTA OCR. A new architectural approach with the potential to achieve genuine 2D reasoning. Codes & weights💙
👉Review https://t.ly/gX4bX
👉Paper https://arxiv.org/pdf/2601.20552
👉Repo github.com/deepseek-ai/DeepSeek-OCR-2
👉DeepSeek-AI announced the new version of its powerful SOTA OCR. A new architectural approach with the potential to achieve genuine 2D reasoning. Codes & weights💙
👉Review https://t.ly/gX4bX
👉Paper https://arxiv.org/pdf/2601.20552
👉Repo github.com/deepseek-ai/DeepSeek-OCR-2
❤8🔥7👏1
This media is not supported in your browser
VIEW IN TELEGRAM
📊 SOTA Style Transfer 📊
👉TeleAI unveils TeleStyle, a lightweight yet effective model for image/video stylization. Built upon Qwen-Image-Edit, TeleStyle leverages the base model’s robust capabilities in content preservation & style customization. Code & Model released💙
👉Review https://t.ly/viVR0
👉Paper arxiv.org/pdf/2601.20175
👉Project tele-ai.github.io/TeleStyle/
👉Repo github.com/Tele-AI/TeleStyle
👉TeleAI unveils TeleStyle, a lightweight yet effective model for image/video stylization. Built upon Qwen-Image-Edit, TeleStyle leverages the base model’s robust capabilities in content preservation & style customization. Code & Model released💙
👉Review https://t.ly/viVR0
👉Paper arxiv.org/pdf/2601.20175
👉Project tele-ai.github.io/TeleStyle/
👉Repo github.com/Tele-AI/TeleStyle
❤12👍2🔥1🤯1🤣1
This media is not supported in your browser
VIEW IN TELEGRAM
🍑 Metric Anything is out 🍑
👉Metric Anything (Li Auto inc.) is a simple and scalable pretraining framework that learns metric depth from noisy, diverse 3D sources without manually engineered prompts, camera-specific modeling, or task-specific architectures. Impressive. Code announced 💙
👉Review https://t.ly/54Ccr
👉Paper arxiv.org/pdf/2601.22054
👉Project metric-anything.github.io/metric-anything-io/
👉Repo github.com/metric-anything/metric-anything
👉Metric Anything (Li Auto inc.) is a simple and scalable pretraining framework that learns metric depth from noisy, diverse 3D sources without manually engineered prompts, camera-specific modeling, or task-specific architectures. Impressive. Code announced 💙
👉Review https://t.ly/54Ccr
👉Paper arxiv.org/pdf/2601.22054
👉Project metric-anything.github.io/metric-anything-io/
👉Repo github.com/metric-anything/metric-anything
🔥11❤5👏1