🍌 AGENT BANANA (SOTA) 🍌
👉Agent Banana is the novel SOTA agentic system for HD, native-resolution image editing through reasoning-based NL interaction, where each edit is context-aware, logically dependent, and locally precise. Code announced💙
👉Review https://t.ly/EXaCH
👉Paper https://arxiv.org/pdf/2602.09084
👉Project https://agent-banana.github.io/
👉Repo https://github.com/taco-group/agent-banana
👉Agent Banana is the novel SOTA agentic system for HD, native-resolution image editing through reasoning-based NL interaction, where each edit is context-aware, logically dependent, and locally precise. Code announced💙
👉Review https://t.ly/EXaCH
👉Paper https://arxiv.org/pdf/2602.09084
👉Project https://agent-banana.github.io/
👉Repo https://github.com/taco-group/agent-banana
❤12👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🛠️ IndustryShapes 6D Pose 🛠️
👉IndustryShapes by NTUA is a new RGB-D dataset of industrial tools, designed for both instance-level and novel object 6D pose estimation. Dataset available💙
👉Review https://t.ly/KKcuH
👉Paper https://arxiv.org/pdf/2602.05555
👉Project https://pose-lab.github.io/IndustryShapes/
👉Dataset https://huggingface.co/datasets/POSE-Lab/IndustryShapes
👉IndustryShapes by NTUA is a new RGB-D dataset of industrial tools, designed for both instance-level and novel object 6D pose estimation. Dataset available💙
👉Review https://t.ly/KKcuH
👉Paper https://arxiv.org/pdf/2602.05555
👉Project https://pose-lab.github.io/IndustryShapes/
👉Dataset https://huggingface.co/datasets/POSE-Lab/IndustryShapes
❤8🔥2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🤖Generalized Human Tracking🤖
👉Beijing Institute of Technology & Humanoid Robotics Shangai present a novel learning framework for general humanoid whole-body control. Impressive results in imitation.
👉Review https://t.ly/ucmuB
👉Paper arxiv.org/pdf/2601.23080
👉Project zeonsunlightyu.github.io/RGMT.github.io
👉Beijing Institute of Technology & Humanoid Robotics Shangai present a novel learning framework for general humanoid whole-body control. Impressive results in imitation.
👉Review https://t.ly/ucmuB
👉Paper arxiv.org/pdf/2601.23080
👉Project zeonsunlightyu.github.io/RGMT.github.io
🔥11❤2🤯2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🫧SurfPhase: 3D Interfacial Dynamics🫧
👉SurfPhase is a novel model for reconstructing 3D interfacial dynamics from sparse camera views. Repo/Dataset announced💙
👉Review https://t.ly/g2P5F
👉Paper https://arxiv.org/pdf/2602.11154
👉Project https://yuegao.me/SurfPhase/
👉Repo github.com/yuegao/SurfPhase
👉SurfPhase is a novel model for reconstructing 3D interfacial dynamics from sparse camera views. Repo/Dataset announced💙
👉Review https://t.ly/g2P5F
👉Paper https://arxiv.org/pdf/2602.11154
👉Project https://yuegao.me/SurfPhase/
👉Repo github.com/yuegao/SurfPhase
❤4🔥2👍1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🪿Teaching AI to illusions🪿
👉Stroke of Surprise by NYCU is a novel generative framework that optimizes vector strokes to satisfy distinct semantic interpretations at different drawing stages. As strokes are progressively added, the sketch reveals a completely different subject. Code released💙
👉Review https://t.ly/98Oim
👉Paper https://lnkd.in/dTA7iuce
👉Project https://lnkd.in/dhTMGw23
👉Repo https://lnkd.in/deQyDGFu
👉Stroke of Surprise by NYCU is a novel generative framework that optimizes vector strokes to satisfy distinct semantic interpretations at different drawing stages. As strokes are progressively added, the sketch reveals a completely different subject. Code released💙
👉Review https://t.ly/98Oim
👉Paper https://lnkd.in/dTA7iuce
👉Project https://lnkd.in/dhTMGw23
👉Repo https://lnkd.in/deQyDGFu
❤7👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🥝Conversational Segmentation🥝
👉CIS grounds abstract, intent-oriented concepts into pixel-accurate masks, reasoning about affordances, physics, and functional properties. Code/Demo released💙
👉Review https://t.ly/SsG57
👉Paper arxiv.org/pdf/2602.13195
👉Project glab-caltech.github.io/converseg/
👉Repo github.com/AadSah/ConverSeg
👉Demo glab-caltech.github.io/converseg/#interactive-demo
👉CIS grounds abstract, intent-oriented concepts into pixel-accurate masks, reasoning about affordances, physics, and functional properties. Code/Demo released💙
👉Review https://t.ly/SsG57
👉Paper arxiv.org/pdf/2602.13195
👉Project glab-caltech.github.io/converseg/
👉Repo github.com/AadSah/ConverSeg
👉Demo glab-caltech.github.io/converseg/#interactive-demo
❤5🔥3👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
📲 Efficient VLMs 📲
👉CoPE-VideoLM is a codec-aware tokenization framework for VLM to replace dense RGB encoding w/ light structured representations derived from codec primitives. Token -93% / time-to-first-token -86%! Code announced💙
👉Review https://t.ly/3_GqN
👉Paper https://arxiv.org/pdf/2602.13191
👉Project https://sayands.github.io/cope/
👉Repo TBA
👉CoPE-VideoLM is a codec-aware tokenization framework for VLM to replace dense RGB encoding w/ light structured representations derived from codec primitives. Token -93% / time-to-first-token -86%! Code announced💙
👉Review https://t.ly/3_GqN
👉Paper https://arxiv.org/pdf/2602.13191
👉Project https://sayands.github.io/cope/
👉Repo TBA
🔥11❤4👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🐙Dex4D: Task-Agnostic Track🐙
👉Dex4D by CMU is a novel approach for unseen objects and poses, scene layouts, backgrounds, & task trajectories. Code under Apache 2.0💙
👉Review https://t.ly/ZGx9T
👉Paper arxiv.org/pdf/2602.15828
👉Project dex4d.github.io/
👉Sim github.com/Dex4D/Dex4D-Simulation
👉Vision github.com/Dex4D/Dex4D-Vision
👉HW https://github.com/Dex4D/Dex4D-Hardware
👉Dex4D by CMU is a novel approach for unseen objects and poses, scene layouts, backgrounds, & task trajectories. Code under Apache 2.0💙
👉Review https://t.ly/ZGx9T
👉Paper arxiv.org/pdf/2602.15828
👉Project dex4d.github.io/
👉Sim github.com/Dex4D/Dex4D-Simulation
👉Vision github.com/Dex4D/Dex4D-Vision
👉HW https://github.com/Dex4D/Dex4D-Hardware
❤8🔥1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🚤Video Neural Compression🚤
👉TeCoNeRV: adapting INR hypernetworks to compress videos efficiently at higher resolutions. Impressive: +5.35dB PSNR, -36% bitrates & 1.5-3× faster. Code announced💙
👉Review https://t.ly/0AtCK
👉Paper arxiv.org/pdf/2602.16711
👉Project namithap10.github.io/teconerv/
👉Repo github.com/namithap10/TeCoNeRV/
👉TeCoNeRV: adapting INR hypernetworks to compress videos efficiently at higher resolutions. Impressive: +5.35dB PSNR, -36% bitrates & 1.5-3× faster. Code announced💙
👉Review https://t.ly/0AtCK
👉Paper arxiv.org/pdf/2602.16711
👉Project namithap10.github.io/teconerv/
👉Repo github.com/namithap10/TeCoNeRV/
🔥9❤4👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥New SOTA Planar Tracking🔥
👉WOFTSAM by the Visual Recognition Group (CTU) is a novel planar tracker that combine robust long-term segmentation by SAM2 with 8 degrees-of-freedom homography pose estimation. Repo under BY-NC-SA 4.0💙
👉Review https://t.ly/VUOe5
👉Paper https://lnkd.in/dZfc_DhQ
👉Repo https://lnkd.in/dAcneJGn
👉WOFTSAM by the Visual Recognition Group (CTU) is a novel planar tracker that combine robust long-term segmentation by SAM2 with 8 degrees-of-freedom homography pose estimation. Repo under BY-NC-SA 4.0💙
👉Review https://t.ly/VUOe5
👉Paper https://lnkd.in/dZfc_DhQ
👉Repo https://lnkd.in/dAcneJGn
🔥7👍3❤2👏1🤯1🤣1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🫸 World-Grounded Hand-Obj🫸
👉WHOLE jointly reconstructs coherent hand and object motion in the world space by guiding a generative motion prior. Code announced💙
👉Review https://t.ly/c5w8h
👉Paper https://arxiv.org/pdf/2602.22209
👉Project https://judyye.github.io/whole-www/
👉Repo TBA
👉WHOLE jointly reconstructs coherent hand and object motion in the world space by guiding a generative motion prior. Code announced💙
👉Review https://t.ly/c5w8h
👉Paper https://arxiv.org/pdf/2602.22209
👉Project https://judyye.github.io/whole-www/
👉Repo TBA
❤2👍2🔥1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🧱Solaris: generative #Minecraft🧱
👉NYU unveils Solaris, multiplayer video world model in Minecraft, which generates consistent first-person observations for two players simultaneously. Impressive work. Repo & Dataset💙
👉Review https://t.ly/VrcrT
👉Paper https://arxiv.org/pdf/2602.22208
👉Project https://solaris-wm.github.io/
👉Repo https://github.com/solaris-wm/
👉NYU unveils Solaris, multiplayer video world model in Minecraft, which generates consistent first-person observations for two players simultaneously. Impressive work. Repo & Dataset💙
👉Review https://t.ly/VrcrT
👉Paper https://arxiv.org/pdf/2602.22208
👉Project https://solaris-wm.github.io/
👉Repo https://github.com/solaris-wm/
🔥6❤2👍2👏1