This media is not supported in your browser
VIEW IN TELEGRAM
🫙Universal Feature Up-Sampling🫙
👉AnyUp is a novel method for feature up-sampling that can be applied to ANY vision feature at ANY resolution, without encoder-specific training: inference-time feature-agnostic up-sampling architecture to improve up-sampling quality. Repo CC-4.0💙
👉Review https://t.ly/HvEw9
👉Paper https://arxiv.org/pdf/2510.12764
👉Project https://wimmerth.github.io/anyup/
👉Repo https://github.com/wimmerth/anyup
👉AnyUp is a novel method for feature up-sampling that can be applied to ANY vision feature at ANY resolution, without encoder-specific training: inference-time feature-agnostic up-sampling architecture to improve up-sampling quality. Repo CC-4.0💙
👉Review https://t.ly/HvEw9
👉Paper https://arxiv.org/pdf/2510.12764
👉Project https://wimmerth.github.io/anyup/
👉Repo https://github.com/wimmerth/anyup
❤16🔥7👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🦄 City-Tour -> Simulation 🦄
👉UrbanVerse is a novel system to convert real-world urban scenes from city-tour videos into physics-aware, interactive simulation environments, enabling scalable robot learning in urban spaces with real-world generalization. Repo & Data announced 💙
👉Review https://t.ly/UvXNS
👉Paper https://arxiv.org/pdf/2510.15018
👉Project https://urbanverseproject.github.io/
👉Repo TBA
👉UrbanVerse is a novel system to convert real-world urban scenes from city-tour videos into physics-aware, interactive simulation environments, enabling scalable robot learning in urban spaces with real-world generalization. Repo & Data announced 💙
👉Review https://t.ly/UvXNS
👉Paper https://arxiv.org/pdf/2510.15018
👉Project https://urbanverseproject.github.io/
👉Repo TBA
❤12🤩2👍1🔥1😢1
🌵All-in-One Dense Keypoints🌵
👉DeepDetect is a novel all-in-one, dense keypoints detector that unifies the strengths of SIFT, ORB, BRISK, FAST, AGAST, Harris, Shi-Tomasi, Canny & Sobel into a neural net. DAMN ROMANTIC. Repo under MIT💙
👉Review https://t.ly/VKGct
👉Paper https://arxiv.org/pdf/2510.17422
👉Repo https://github.com/saktx/DeepDetect
👉DeepDetect is a novel all-in-one, dense keypoints detector that unifies the strengths of SIFT, ORB, BRISK, FAST, AGAST, Harris, Shi-Tomasi, Canny & Sobel into a neural net. DAMN ROMANTIC. Repo under MIT💙
👉Review https://t.ly/VKGct
👉Paper https://arxiv.org/pdf/2510.17422
👉Repo https://github.com/saktx/DeepDetect
❤16🔥3👏2
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 SAM 2++: Track Anything 🔥
👉SAM 2++ is a novel unified model towards tracking at any granularity, including masks, boxes, and points. Impressive results but no code announced😢
👉Review https://t.ly/I392_
👉Paper arxiv.org/pdf/2510.18822
👉Project tracking-any-granularity.github.io/
👉Repo :(
👉SAM 2++ is a novel unified model towards tracking at any granularity, including masks, boxes, and points. Impressive results but no code announced😢
👉Review https://t.ly/I392_
👉Paper arxiv.org/pdf/2510.18822
👉Project tracking-any-granularity.github.io/
👉Repo :(
❤12🔥7👏3
AI with Papers - Artificial Intelligence & Deep Learning
🦄 City-Tour -> Simulation 🦄 👉UrbanVerse is a novel system to convert real-world urban scenes from city-tour videos into physics-aware, interactive simulation environments, enabling scalable robot learning in urban spaces with real-world generalization. Repo…
Repo (pretty empty) now online: https://github.com/OatmealLiu/UrbanVerse
GitHub
GitHub - OatmealLiu/UrbanVerse: Scaling Urban Simulation - Infinite Physically-Plausible Urban Simulation = IsaacSim(Physically…
Scaling Urban Simulation - Infinite Physically-Plausible Urban Simulation = IsaacSim(Physically-Accurate Assets × Real-World City-Tour Layouts) - OatmealLiu/UrbanVerse
❤4
This media is not supported in your browser
VIEW IN TELEGRAM
🏜️Omni Driving Models🏜️
👉OmniNWM is a unified panoramic navigation world model that advances autonomous driving by jointly generating multi-modal states (RGB, semantics, depth, 3D occupancy), enabling precise action control & facilitating closed-loop evaluation through occupancy-based dense rewards. Repo under Apache 2.0💙
👉Review https://t.ly/ktXvz
👉Paper https://lnkd.in/eFKSZnrc
👉Project https://lnkd.in/eSDfccv8
👉Repo https://lnkd.in/efCSvjtp
👉OmniNWM is a unified panoramic navigation world model that advances autonomous driving by jointly generating multi-modal states (RGB, semantics, depth, 3D occupancy), enabling precise action control & facilitating closed-loop evaluation through occupancy-based dense rewards. Repo under Apache 2.0💙
👉Review https://t.ly/ktXvz
👉Paper https://lnkd.in/eFKSZnrc
👉Project https://lnkd.in/eSDfccv8
👉Repo https://lnkd.in/efCSvjtp
🔥6❤1👏1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🐠ITTO: Protocol for Dynamic Tracking🐠
👉ITTO by Caltech is a novel long-range tracking benchmark suite for evaluating and diagnosing tracking methods on complex and long-range motions. Repo under CC BY-NC 4.0💙
👉Review https://t.ly/tN84a
👉Paper https://arxiv.org/pdf/2510.19819
👉Project https://glab-caltech.github.io/ITTO/
👉Repo https://github.com/ilonadem/itto
👉ITTO by Caltech is a novel long-range tracking benchmark suite for evaluating and diagnosing tracking methods on complex and long-range motions. Repo under CC BY-NC 4.0💙
👉Review https://t.ly/tN84a
👉Paper https://arxiv.org/pdf/2510.19819
👉Project https://glab-caltech.github.io/ITTO/
👉Repo https://github.com/ilonadem/itto
❤6🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🦗Character Mixing Generation🦗
👉MBZUAI unveils the first ever video-gen system able to preserve character ID, behavior & original style while generating plausible interactions between characters that have never coexisted - from cartoons (We Bare Bears, Tom & Jerry) to realistic humans (Mr. Bean, Young Sheldon)
👉Review https://t.ly/tN84a
👉Paper https://lnkd.in/dhKMwukv
👉Project https://lnkd.in/dBkJs48h
👉Repo https://lnkd.in/dw_uzgAk
👉MBZUAI unveils the first ever video-gen system able to preserve character ID, behavior & original style while generating plausible interactions between characters that have never coexisted - from cartoons (We Bare Bears, Tom & Jerry) to realistic humans (Mr. Bean, Young Sheldon)
👉Review https://t.ly/tN84a
👉Paper https://lnkd.in/dhKMwukv
👉Project https://lnkd.in/dBkJs48h
👉Repo https://lnkd.in/dw_uzgAk
🤩5❤1👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🧷Generative Point Tracking w/ FM🧷
👉Generative Point Tracker (GenPT) is a novel generative framework for modelling multi-modal trajectories. Able to capture the multi-modality in point trajectories. Repo under MIT💙
👉Review https://t.ly/MMFrt
👉Paper https://arxiv.org/pdf/2510.20951
👉Project mtesfaldet.net/genpt_projpage/
👉Repo https://github.com/tesfaldet/genpt
👉Generative Point Tracker (GenPT) is a novel generative framework for modelling multi-modal trajectories. Able to capture the multi-modality in point trajectories. Repo under MIT💙
👉Review https://t.ly/MMFrt
👉Paper https://arxiv.org/pdf/2510.20951
👉Project mtesfaldet.net/genpt_projpage/
👉Repo https://github.com/tesfaldet/genpt
🔥7❤1👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🦄Unified Region-Level MLLM🦄
👉PixeRefers is an unified multimodal LLM framework that supports precise, region-specific understanding in both static images and dynamic videos, overcoming the holistic, scene-level bias of prior MLLMs. SOTA results. Demo, Repo & Dataset available💙
👉Review https://t.ly/WH4dQ
👉Paper arxiv.org/pdf/2510.23603
👉Project circleradon.github.io/PixelRefer
👉Repo https://github.com/alibaba-damo-academy/PixelRefer
👉PixeRefers is an unified multimodal LLM framework that supports precise, region-specific understanding in both static images and dynamic videos, overcoming the holistic, scene-level bias of prior MLLMs. SOTA results. Demo, Repo & Dataset available💙
👉Review https://t.ly/WH4dQ
👉Paper arxiv.org/pdf/2510.23603
👉Project circleradon.github.io/PixelRefer
👉Repo https://github.com/alibaba-damo-academy/PixelRefer
🔥4❤2🤯2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🌱PlanarTrack: Large Planar Tracking🌱
👉PlanarTrack is a large-scale HQ and challenging benchmark for planar tracking: 1,150 sequences with 733K+ frames, including 1,000 short-term & 150 long-term videos. Repo & Dataset available💙
👉Review https://t.ly/mYNi7
👉Paper arxiv.org/pdf/2510.23368
👉Repo https://lnkd.in/edb3GMyT
👉Project https://lnkd.in/eC-hVB-U
👉Data https://lnkd.in/eew2j4tM
👉PlanarTrack is a large-scale HQ and challenging benchmark for planar tracking: 1,150 sequences with 733K+ frames, including 1,000 short-term & 150 long-term videos. Repo & Dataset available💙
👉Review https://t.ly/mYNi7
👉Paper arxiv.org/pdf/2510.23368
👉Repo https://lnkd.in/edb3GMyT
👉Project https://lnkd.in/eC-hVB-U
👉Data https://lnkd.in/eew2j4tM
🔥11❤5👏2👍1
This media is not supported in your browser
VIEW IN TELEGRAM
👢Generative View Stitching 👢
👉GVS is a novel approach that enables collision-free camera-guided video generation for predefined trajectories, it's a non-autoregressive alternative to video length extrapolation. Full repo under MIT💙
👉Review https://t.ly/TiN_5
👉Paper https://arxiv.org/pdf/2510.24718
👉Project https://andrewsonga.github.io/gvs/
👉Repo github.com/andrewsonga/generative_view_stitching
👉GVS is a novel approach that enables collision-free camera-guided video generation for predefined trajectories, it's a non-autoregressive alternative to video length extrapolation. Full repo under MIT💙
👉Review https://t.ly/TiN_5
👉Paper https://arxiv.org/pdf/2510.24718
👉Project https://andrewsonga.github.io/gvs/
👉Repo github.com/andrewsonga/generative_view_stitching
🔥10❤3👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🔪Tracking Object Transformations🔪
👉"Track Any State": tracking objects through transformations while detecting/describing state changes. Repo & Dataset available under MIT💙
👉Review https://t.ly/NPyW4
👉Paper https://lnkd.in/d4pA3bXJ
👉Project https://lnkd.in/dgbNfCuj
👉Repo https://lnkd.in/dtVWq2z7
👉"Track Any State": tracking objects through transformations while detecting/describing state changes. Repo & Dataset available under MIT💙
👉Review https://t.ly/NPyW4
👉Paper https://lnkd.in/d4pA3bXJ
👉Project https://lnkd.in/dgbNfCuj
👉Repo https://lnkd.in/dtVWq2z7
🔥20❤7🤯3👏2👍1
🎸Another BRIXEL in the Wall 🎸
👉BRIXEL allows the user to produce high-resolution feature maps using the DINOv3 backbone without requiring large amounts of compute. Repo released💙
👉Review https://t.ly/fZPwC
👉Paper arxiv.org/pdf/2511.05168
👉Repo github.com/alexanderlappe/BRIXEL
👉BRIXEL allows the user to produce high-resolution feature maps using the DINOv3 backbone without requiring large amounts of compute. Repo released💙
👉Review https://t.ly/fZPwC
👉Paper arxiv.org/pdf/2511.05168
👉Repo github.com/alexanderlappe/BRIXEL
🤩7🤯3❤2🔥2👍1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🐼Pixel-Dense Embedding🐼
👉FlowFeat is a novel high-resolution and multi-task feature representation that embeds a distribution of plausible apparent motions, or motion profiles. Repo available under 💙
👉Review https://t.ly/aUx_U
👉Paper arxiv.org/pdf/2511.07696
👉Project tum-vision.github.io/flowfeat
👉Repo github.com/tum-vision/flowfeat
👉FlowFeat is a novel high-resolution and multi-task feature representation that embeds a distribution of plausible apparent motions, or motion profiles. Repo available under 💙
👉Review https://t.ly/aUx_U
👉Paper arxiv.org/pdf/2511.07696
👉Project tum-vision.github.io/flowfeat
👉Repo github.com/tum-vision/flowfeat
🔥6👍3❤2
🚨 Announcement 🚨
I’ve received numerous reports of people blatantly copying my content on LinkedIn just to get a few likes.
Let me be very clear: I put a great deal of time and effort into reviewing papers and creating original, meaningful content. It’s disappointing to see professionals (some of whom are even members of this group or my connections) resorting to plagiarism instead of contributing their own ideas.
👉 Starting today, I’ll be removing these connections from LinkedIn and banning such individuals from this group.
📢 I also encourage everyone to report these cases whenever you come across them. Every single report helps stop this bad habit and keeps our community fair, respectful, and authentic.
I’ve received numerous reports of people blatantly copying my content on LinkedIn just to get a few likes.
Let me be very clear: I put a great deal of time and effort into reviewing papers and creating original, meaningful content. It’s disappointing to see professionals (some of whom are even members of this group or my connections) resorting to plagiarism instead of contributing their own ideas.
👉 Starting today, I’ll be removing these connections from LinkedIn and banning such individuals from this group.
📢 I also encourage everyone to report these cases whenever you come across them. Every single report helps stop this bad habit and keeps our community fair, respectful, and authentic.
❤68👏22👍18😢1