This media is not supported in your browser
VIEW IN TELEGRAM
🌻MLLMs Fine Segmentation🌻
👉SimpleSeg: MLLMs with native pixel-level perception. Repo & Model available💙
👉Review https://t.ly/eVguh
👉Paper arxiv.org/pdf/2601.19228
👉Project simpleseg.github.io/
👉Repo github.com/songtianhui/SimpleSeg
👉SimpleSeg: MLLMs with native pixel-level perception. Repo & Model available💙
👉Review https://t.ly/eVguh
👉Paper arxiv.org/pdf/2601.19228
👉Project simpleseg.github.io/
👉Repo github.com/songtianhui/SimpleSeg
🔥4👍3❤2👏1
🔥 DeepSeek-OCR 2 is out 🔥
👉DeepSeek-AI announced the new version of its powerful SOTA OCR. A new architectural approach with the potential to achieve genuine 2D reasoning. Codes & weights💙
👉Review https://t.ly/gX4bX
👉Paper https://arxiv.org/pdf/2601.20552
👉Repo github.com/deepseek-ai/DeepSeek-OCR-2
👉DeepSeek-AI announced the new version of its powerful SOTA OCR. A new architectural approach with the potential to achieve genuine 2D reasoning. Codes & weights💙
👉Review https://t.ly/gX4bX
👉Paper https://arxiv.org/pdf/2601.20552
👉Repo github.com/deepseek-ai/DeepSeek-OCR-2
❤8🔥7👏1
This media is not supported in your browser
VIEW IN TELEGRAM
📊 SOTA Style Transfer 📊
👉TeleAI unveils TeleStyle, a lightweight yet effective model for image/video stylization. Built upon Qwen-Image-Edit, TeleStyle leverages the base model’s robust capabilities in content preservation & style customization. Code & Model released💙
👉Review https://t.ly/viVR0
👉Paper arxiv.org/pdf/2601.20175
👉Project tele-ai.github.io/TeleStyle/
👉Repo github.com/Tele-AI/TeleStyle
👉TeleAI unveils TeleStyle, a lightweight yet effective model for image/video stylization. Built upon Qwen-Image-Edit, TeleStyle leverages the base model’s robust capabilities in content preservation & style customization. Code & Model released💙
👉Review https://t.ly/viVR0
👉Paper arxiv.org/pdf/2601.20175
👉Project tele-ai.github.io/TeleStyle/
👉Repo github.com/Tele-AI/TeleStyle
❤12👍2🔥1🤯1🤣1
This media is not supported in your browser
VIEW IN TELEGRAM
🍑 Metric Anything is out 🍑
👉Metric Anything (Li Auto inc.) is a simple and scalable pretraining framework that learns metric depth from noisy, diverse 3D sources without manually engineered prompts, camera-specific modeling, or task-specific architectures. Impressive. Code announced 💙
👉Review https://t.ly/54Ccr
👉Paper arxiv.org/pdf/2601.22054
👉Project metric-anything.github.io/metric-anything-io/
👉Repo github.com/metric-anything/metric-anything
👉Metric Anything (Li Auto inc.) is a simple and scalable pretraining framework that learns metric depth from noisy, diverse 3D sources without manually engineered prompts, camera-specific modeling, or task-specific architectures. Impressive. Code announced 💙
👉Review https://t.ly/54Ccr
👉Paper arxiv.org/pdf/2601.22054
👉Project metric-anything.github.io/metric-anything-io/
👉Repo github.com/metric-anything/metric-anything
🔥11❤5👏1
❤7
This media is not supported in your browser
VIEW IN TELEGRAM
🌈Segment Any Events by Language🌈
👉SEAL (by NUS) is the first Semantic-aware Segment Any Events framework that addresses Open-Vocabulary Event Instance Segmentation. Code announced💙
👉Review https://t.ly/1ZMF0
👉Paper https://arxiv.org/pdf/2601.23159
👉Project https://0nandon.github.io/SEAL/
👉Repo https://github.com/0nandon/SEAL
👉SEAL (by NUS) is the first Semantic-aware Segment Any Events framework that addresses Open-Vocabulary Event Instance Segmentation. Code announced💙
👉Review https://t.ly/1ZMF0
👉Paper https://arxiv.org/pdf/2601.23159
👉Project https://0nandon.github.io/SEAL/
👉Repo https://github.com/0nandon/SEAL
🔥7❤4👏1🤯1
👉RAM prices skyrocketing
👉Me acting like a rich kid.
Let's talk: https://www.linkedin.com/posts/visionarynet_ai-ram-ddr5-activity-7424127924020072448-NbaO
👉Me acting like a rich kid.
Let's talk: https://www.linkedin.com/posts/visionarynet_ai-ram-ddr5-activity-7424127924020072448-NbaO
🤣24❤4🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🐮CoWTracker: Track-Warping🐮
👉CoWTracker (VGG + META) is a novel dense point tracker that eschews cost volumes in favor of warping. Code/Models under FAIR NC💙
👉Review https://t.ly/6bAn9
👉Paper https://arxiv.org/pdf/2602.04877
👉Project https://cowtracker.github.io/
👉Repo https://github.com/facebookresearch/cowtracker
👉CoWTracker (VGG + META) is a novel dense point tracker that eschews cost volumes in favor of warping. Code/Models under FAIR NC💙
👉Review https://t.ly/6bAn9
👉Paper https://arxiv.org/pdf/2602.04877
👉Project https://cowtracker.github.io/
👉Repo https://github.com/facebookresearch/cowtracker
🔥4❤1👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈TrajVG Trajectory-Geometry🌈
👉TrajVG is a novel reconstruction framework that makes cross-frame 3D correspondence an explicit prediction by estimating camera-coordinate 3D trajectories. Code announced💙
👉Review https://t.ly/yVi01
👉Paper arxiv.org/pdf/2602.04439
👉Project xingy038.github.io/TrajVG/
👉Repo github.com/xingy038/TrajVG
👉TrajVG is a novel reconstruction framework that makes cross-frame 3D correspondence an explicit prediction by estimating camera-coordinate 3D trajectories. Code announced💙
👉Review https://t.ly/yVi01
👉Paper arxiv.org/pdf/2602.04439
👉Project xingy038.github.io/TrajVG/
👉Repo github.com/xingy038/TrajVG
❤7🔥1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🪙MOMENTUM #NeurIPS 2025 🪙
👉MOMENTUM by Google (H/T Huguens Jean, Ph.D.) is a production multimodal agent architecture built on the Google ADK. It orchestrates 22 specialized tools (Gemini for reasoning, Imagen 4.0 for image generation, and Veo 3.1 for synthesis). Code announced💙
👉Review https://t.ly/06h7Q
👉Paper https://momentum-project-page-232993426383.us-central1.run.app/momentum_paper.pdf
👉Project https://momentum-project-page-232993426383.us-central1.run.app/
👉Repo TBA
👉MOMENTUM by Google (H/T Huguens Jean, Ph.D.) is a production multimodal agent architecture built on the Google ADK. It orchestrates 22 specialized tools (Gemini for reasoning, Imagen 4.0 for image generation, and Veo 3.1 for synthesis). Code announced💙
👉Review https://t.ly/06h7Q
👉Paper https://momentum-project-page-232993426383.us-central1.run.app/momentum_paper.pdf
👉Project https://momentum-project-page-232993426383.us-central1.run.app/
👉Repo TBA
👍3❤1🔥1
😶🌫️ SOTA Full-Head Synthesis 😶🌫️
👉HyPlaneHead, the new SOTA in full-head image synthesis, delivering HQ results with significantly fewer artifacts compared to existing 3D-aware models. Repo announced💙
👉Review https://t.ly/WYfP3
👉Paper arxiv.org/pdf/2509.16748
👉Project https://lhyfst.github.io/hyplanehead/
👉Repo github.com/lhyfst/HyPlaneHead
👉HyPlaneHead, the new SOTA in full-head image synthesis, delivering HQ results with significantly fewer artifacts compared to existing 3D-aware models. Repo announced💙
👉Review https://t.ly/WYfP3
👉Paper arxiv.org/pdf/2509.16748
👉Project https://lhyfst.github.io/hyplanehead/
👉Repo github.com/lhyfst/HyPlaneHead
❤3🔥3👍2👏1😢1
This media is not supported in your browser
VIEW IN TELEGRAM
🍟 AnyTouch 2 is out 🍟
👉AnyTouch 2 is a general tactile representation learning framework for diverse optical tactile sensors that unifies object-level understanding with fine-grained, force-aware dynamic perception. Repo, Model & Data💙
👉Review https://t.ly/fP4dP
👉Paper https://arxiv.org/pdf/2602.09617
👉Project gewu-lab.github.io/AnyTouch2/
👉Repo github.com/GeWu-Lab/AnyTouch2
👉AnyTouch 2 is a general tactile representation learning framework for diverse optical tactile sensors that unifies object-level understanding with fine-grained, force-aware dynamic perception. Repo, Model & Data💙
👉Review https://t.ly/fP4dP
👉Paper https://arxiv.org/pdf/2602.09617
👉Project gewu-lab.github.io/AnyTouch2/
👉Repo github.com/GeWu-Lab/AnyTouch2
❤6🔥1
🍌 AGENT BANANA (SOTA) 🍌
👉Agent Banana is the novel SOTA agentic system for HD, native-resolution image editing through reasoning-based NL interaction, where each edit is context-aware, logically dependent, and locally precise. Code announced💙
👉Review https://t.ly/EXaCH
👉Paper https://arxiv.org/pdf/2602.09084
👉Project https://agent-banana.github.io/
👉Repo https://github.com/taco-group/agent-banana
👉Agent Banana is the novel SOTA agentic system for HD, native-resolution image editing through reasoning-based NL interaction, where each edit is context-aware, logically dependent, and locally precise. Code announced💙
👉Review https://t.ly/EXaCH
👉Paper https://arxiv.org/pdf/2602.09084
👉Project https://agent-banana.github.io/
👉Repo https://github.com/taco-group/agent-banana
❤12👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🛠️ IndustryShapes 6D Pose 🛠️
👉IndustryShapes by NTUA is a new RGB-D dataset of industrial tools, designed for both instance-level and novel object 6D pose estimation. Dataset available💙
👉Review https://t.ly/KKcuH
👉Paper https://arxiv.org/pdf/2602.05555
👉Project https://pose-lab.github.io/IndustryShapes/
👉Dataset https://huggingface.co/datasets/POSE-Lab/IndustryShapes
👉IndustryShapes by NTUA is a new RGB-D dataset of industrial tools, designed for both instance-level and novel object 6D pose estimation. Dataset available💙
👉Review https://t.ly/KKcuH
👉Paper https://arxiv.org/pdf/2602.05555
👉Project https://pose-lab.github.io/IndustryShapes/
👉Dataset https://huggingface.co/datasets/POSE-Lab/IndustryShapes
❤8🔥2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🤖Generalized Human Tracking🤖
👉Beijing Institute of Technology & Humanoid Robotics Shangai present a novel learning framework for general humanoid whole-body control. Impressive results in imitation.
👉Review https://t.ly/ucmuB
👉Paper arxiv.org/pdf/2601.23080
👉Project zeonsunlightyu.github.io/RGMT.github.io
👉Beijing Institute of Technology & Humanoid Robotics Shangai present a novel learning framework for general humanoid whole-body control. Impressive results in imitation.
👉Review https://t.ly/ucmuB
👉Paper arxiv.org/pdf/2601.23080
👉Project zeonsunlightyu.github.io/RGMT.github.io
🔥11❤2🤯2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🫧SurfPhase: 3D Interfacial Dynamics🫧
👉SurfPhase is a novel model for reconstructing 3D interfacial dynamics from sparse camera views. Repo/Dataset announced💙
👉Review https://t.ly/g2P5F
👉Paper https://arxiv.org/pdf/2602.11154
👉Project https://yuegao.me/SurfPhase/
👉Repo github.com/yuegao/SurfPhase
👉SurfPhase is a novel model for reconstructing 3D interfacial dynamics from sparse camera views. Repo/Dataset announced💙
👉Review https://t.ly/g2P5F
👉Paper https://arxiv.org/pdf/2602.11154
👉Project https://yuegao.me/SurfPhase/
👉Repo github.com/yuegao/SurfPhase
❤4🔥2👍1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🪿Teaching AI to illusions🪿
👉Stroke of Surprise by NYCU is a novel generative framework that optimizes vector strokes to satisfy distinct semantic interpretations at different drawing stages. As strokes are progressively added, the sketch reveals a completely different subject. Code released💙
👉Review https://t.ly/98Oim
👉Paper https://lnkd.in/dTA7iuce
👉Project https://lnkd.in/dhTMGw23
👉Repo https://lnkd.in/deQyDGFu
👉Stroke of Surprise by NYCU is a novel generative framework that optimizes vector strokes to satisfy distinct semantic interpretations at different drawing stages. As strokes are progressively added, the sketch reveals a completely different subject. Code released💙
👉Review https://t.ly/98Oim
👉Paper https://lnkd.in/dTA7iuce
👉Project https://lnkd.in/dhTMGw23
👉Repo https://lnkd.in/deQyDGFu
❤7👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🥝Conversational Segmentation🥝
👉CIS grounds abstract, intent-oriented concepts into pixel-accurate masks, reasoning about affordances, physics, and functional properties. Code/Demo released💙
👉Review https://t.ly/SsG57
👉Paper arxiv.org/pdf/2602.13195
👉Project glab-caltech.github.io/converseg/
👉Repo github.com/AadSah/ConverSeg
👉Demo glab-caltech.github.io/converseg/#interactive-demo
👉CIS grounds abstract, intent-oriented concepts into pixel-accurate masks, reasoning about affordances, physics, and functional properties. Code/Demo released💙
👉Review https://t.ly/SsG57
👉Paper arxiv.org/pdf/2602.13195
👉Project glab-caltech.github.io/converseg/
👉Repo github.com/AadSah/ConverSeg
👉Demo glab-caltech.github.io/converseg/#interactive-demo
❤5🔥3👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
📲 Efficient VLMs 📲
👉CoPE-VideoLM is a codec-aware tokenization framework for VLM to replace dense RGB encoding w/ light structured representations derived from codec primitives. Token -93% / time-to-first-token -86%! Code announced💙
👉Review https://t.ly/3_GqN
👉Paper https://arxiv.org/pdf/2602.13191
👉Project https://sayands.github.io/cope/
👉Repo TBA
👉CoPE-VideoLM is a codec-aware tokenization framework for VLM to replace dense RGB encoding w/ light structured representations derived from codec primitives. Token -93% / time-to-first-token -86%! Code announced💙
👉Review https://t.ly/3_GqN
👉Paper https://arxiv.org/pdf/2602.13191
👉Project https://sayands.github.io/cope/
👉Repo TBA
🔥11❤4👏1