🔬Intern-S1: SOTA MM-MoE 🔬
👉InternS1: a MM-MoE with 28B activated / 241b total parameters, continually pre-trained on 5T tokens, including 2.5T+ tokens from scientific domains. New SOTA for professional tasks, such as molecular synthesis planning, reaction condition prediction, etc. Models available under Apache 2.0💙
👉Review https://t.ly/3l5UW
👉Paper arxiv.org/pdf/2508.15763
👉Repo github.com/InternLM/Intern-S1
🤗HF huggingface.co/internlm/Intern-S1
👉InternS1: a MM-MoE with 28B activated / 241b total parameters, continually pre-trained on 5T tokens, including 2.5T+ tokens from scientific domains. New SOTA for professional tasks, such as molecular synthesis planning, reaction condition prediction, etc. Models available under Apache 2.0💙
👉Review https://t.ly/3l5UW
👉Paper arxiv.org/pdf/2508.15763
👉Repo github.com/InternLM/Intern-S1
🤗HF huggingface.co/internlm/Intern-S1
❤6🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🫔ATLAS: SOTA Human Model🫔
👉#META presents ATLAS, a novel high-fidelity body model learned from 600k high-res. scans captured using 240 synchronized cams. Code announced, to be released💙
👉Review https://t.ly/0hHud
👉Paper arxiv.org/pdf/2508.15767
👉Project jindapark.github.io/projects/atlas/
👉Repo TBA
👉#META presents ATLAS, a novel high-fidelity body model learned from 600k high-res. scans captured using 240 synchronized cams. Code announced, to be released💙
👉Review https://t.ly/0hHud
👉Paper arxiv.org/pdf/2508.15767
👉Project jindapark.github.io/projects/atlas/
👉Repo TBA
🔥7❤6👍1👏1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🧤Diffusive Hand from Signs🧤
👉LIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released 💙
👉Review https://t.ly/HonX_
👉Paper https://arxiv.org/pdf/2508.15902
👉Project https://imagine.enpc.fr/~leore.bensabath/HandMDM/
👉Data drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
👉Repo TBA
👉LIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released 💙
👉Review https://t.ly/HonX_
👉Paper https://arxiv.org/pdf/2508.15902
👉Project https://imagine.enpc.fr/~leore.bensabath/HandMDM/
👉Data drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
👉Repo TBA
❤4🔥3👍2🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🏎️ VROOM: F1 Reconstruction 🏎️
👉Berkeley unveils VROOM, the first attempt for reconstructing 3D models of #Formula1 circuits using only onboard camera footage from racecars. Extreme challenges due to noise & speed. Repo released💙
👉Review https://t.ly/uuHdT
👉Paper arxiv.org/pdf/2508.17172
👉Repo github.com/yajatyadav/vroom
👉Project varun-bharadwaj.github.io/vroom/
👉Berkeley unveils VROOM, the first attempt for reconstructing 3D models of #Formula1 circuits using only onboard camera footage from racecars. Extreme challenges due to noise & speed. Repo released💙
👉Review https://t.ly/uuHdT
👉Paper arxiv.org/pdf/2508.17172
👉Repo github.com/yajatyadav/vroom
👉Project varun-bharadwaj.github.io/vroom/
1❤18🔥5👏1
ezgif-8120c4563e81c3.mp4
510.6 KB
🥶 OmniHuman-1.5 🥶
👉#ByteDance proposes a novel framework designed to generate character animations that are not only physically plausible but also semantically coherent and expressive. Coherency with speech's rhythm, prosody and semantic content. Impressive results but no code 🥺
👉Review https://t.ly/CnRmX
👉Paper arxiv.org/pdf/2508.19209
👉Project omnihuman-lab.github.io/v1_5/
👉Repo 🥺
👉#ByteDance proposes a novel framework designed to generate character animations that are not only physically plausible but also semantically coherent and expressive. Coherency with speech's rhythm, prosody and semantic content. Impressive results but no code 🥺
👉Review https://t.ly/CnRmX
👉Paper arxiv.org/pdf/2508.19209
👉Project omnihuman-lab.github.io/v1_5/
👉Repo 🥺
❤5🤯2👍1🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
⚽SoccerNet 2025 results!⚽
👉SoccerNet 2025 Challenges is the open benchmarking dedicated to advancing computer vision research in football video understanding. Repo available 💙
👉Review https://t.ly/MfHKg
👉Paper https://arxiv.org/pdf/2508.19182
👉Project https://www.soccer-net.org/
👉Repo https://github.com/SoccerNet
👉SoccerNet 2025 Challenges is the open benchmarking dedicated to advancing computer vision research in football video understanding. Repo available 💙
👉Review https://t.ly/MfHKg
👉Paper https://arxiv.org/pdf/2508.19182
👉Project https://www.soccer-net.org/
👉Repo https://github.com/SoccerNet
❤15🔥6👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🌹ROSE: Remove Objects & Effects🌹
👉Fix the object’s effects on environment: shadows, reflections, light, translucency and mirror. Model, Demo & Dataset available via Hugging Face💙
👉Review https://t.ly/_KFM0
👉Paper https://lnkd.in/dNcTXQAE
👉Project https://lnkd.in/dFGmYT5h
👉Model https://lnkd.in/dhTT-VkN
👉Demo https://lnkd.in/dimgXZT6
👉Data https://lnkd.in/da7Jv667
👉Fix the object’s effects on environment: shadows, reflections, light, translucency and mirror. Model, Demo & Dataset available via Hugging Face💙
👉Review https://t.ly/_KFM0
👉Paper https://lnkd.in/dNcTXQAE
👉Project https://lnkd.in/dFGmYT5h
👉Model https://lnkd.in/dhTT-VkN
👉Demo https://lnkd.in/dimgXZT6
👉Data https://lnkd.in/da7Jv667
❤15👍3😍2🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🉐 Dress-up & Dance 🉐
👉Novel diffusion framework that generates HQ 5-second-long 24 FPS VTON videos at 1152×720 of a user wearing desired garments while moving in accordance with a given reference video. Impressive results but no repo🥺
👉Review https://t.ly/7NeTL
👉Paper arxiv.org/pdf/2508.21070
👉Project immortalco.github.io/DressAndDance/
👉Repo 🥺
👉Novel diffusion framework that generates HQ 5-second-long 24 FPS VTON videos at 1152×720 of a user wearing desired garments while moving in accordance with a given reference video. Impressive results but no repo🥺
👉Review https://t.ly/7NeTL
👉Paper arxiv.org/pdf/2508.21070
👉Project immortalco.github.io/DressAndDance/
👉Repo 🥺
❤8🔥2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈 Multi-View 3D Tracking 🌈
👉MVTracker is the first data-driven multi-view 3D point tracker for tracking arbitrary 3D points across multiple cameras. Repo available💙
👉Review https://t.ly/rISMR
👉Paper arxiv.org/pdf/2508.21060
👉Project https://lnkd.in/drHtAmRC
👉Repo https://lnkd.in/d4k8mg3B
👉MVTracker is the first data-driven multi-view 3D point tracker for tracking arbitrary 3D points across multiple cameras. Repo available💙
👉Review https://t.ly/rISMR
👉Paper arxiv.org/pdf/2508.21060
👉Project https://lnkd.in/drHtAmRC
👉Repo https://lnkd.in/d4k8mg3B
❤10🔥5👍1
This media is not supported in your browser
VIEW IN TELEGRAM
❤️🔥PHD: Personalized 3D Humans❤️🔥
👉ETH & #Meta unveil PHD, a novel approach for personalized 3D human mesh recovery (HMR) and body fitting that leverages user-specific shape information. Code & models to be released💙
👉Review https://t.ly/IeRhH
👉Paper https://arxiv.org/pdf/2508.21257
👉Project https://phd-pose.github.io/
👉Repo TBA
👉ETH & #Meta unveil PHD, a novel approach for personalized 3D human mesh recovery (HMR) and body fitting that leverages user-specific shape information. Code & models to be released💙
👉Review https://t.ly/IeRhH
👉Paper https://arxiv.org/pdf/2508.21257
👉Project https://phd-pose.github.io/
👉Repo TBA
❤7🔥2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🪴 Pixie: Physics from Pixels 🪴
👉UPenn + MIT unveil Pixie: training a neural-net that maps pretrained visual features (i.e., CLIP) to dense material fields of physical properties in a single forward pass, enabling real‑time physics simulations. Repo & Dataset under MIT license💙
👉Review https://t.ly/1W0n5
👉Paper https://lnkd.in/dsHAHDqM
👉Project https://lnkd.in/dwrHRbRc
👉Repo https://lnkd.in/dy7bvjsK
👉UPenn + MIT unveil Pixie: training a neural-net that maps pretrained visual features (i.e., CLIP) to dense material fields of physical properties in a single forward pass, enabling real‑time physics simulations. Repo & Dataset under MIT license💙
👉Review https://t.ly/1W0n5
👉Paper https://lnkd.in/dsHAHDqM
👉Project https://lnkd.in/dwrHRbRc
👉Repo https://lnkd.in/dy7bvjsK
❤6👏2🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🫛TMR: Few-Shot Template-matching🫛
👉POSTECH unveils TMR, a novel and simple template-matching detector for few-shot pattern detection, achieving strong (and SOTA) results on diverse datasets. A new dataset (RPINE) released, repo soon💙
👉Review https://t.ly/WWAcL
👉Paper https://lnkd.in/dJbSu5vk
👉Project https://lnkd.in/dwcDnHHQ
👉Repo https://lnkd.in/dp7aw8Cs
👉POSTECH unveils TMR, a novel and simple template-matching detector for few-shot pattern detection, achieving strong (and SOTA) results on diverse datasets. A new dataset (RPINE) released, repo soon💙
👉Review https://t.ly/WWAcL
👉Paper https://lnkd.in/dJbSu5vk
👉Project https://lnkd.in/dwcDnHHQ
👉Repo https://lnkd.in/dp7aw8Cs
🔥5❤3👍1
🧬 OpenVision 2 is out! 🧬
👉UCSC releases OpenVision2: a novel family of generative pretrained visual encoders that removes the text encoder and contrastive loss, training with caption-only supervision. Fully open, Apache 2.0💙
👉Review https://t.ly/Oma3w
👉Paper https://arxiv.org/pdf/2509.01644
👉Project https://ucsc-vlaa.github.io/OpenVision2/
👉Repo https://github.com/UCSC-VLAA/OpenVision
👉UCSC releases OpenVision2: a novel family of generative pretrained visual encoders that removes the text encoder and contrastive loss, training with caption-only supervision. Fully open, Apache 2.0💙
👉Review https://t.ly/Oma3w
👉Paper https://arxiv.org/pdf/2509.01644
👉Project https://ucsc-vlaa.github.io/OpenVision2/
👉Repo https://github.com/UCSC-VLAA/OpenVision
🔥7❤1👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🐉 #DoubleDragon with #AI 🐉
👉How Double Dragon would look like in real life? Each character has been transformed with #AI to capture their style, fighting spirit, and charisma, as if they had stepped right out of the game’s streets into the real world. AUDIO ON. Damn romantic💙
#artificialintelligence #machinelearning #ml #AI #deeplearning #computervision #AIwithPapers #metaverse #LLM
👉Post https://t.ly/0IpER
👉Channel http://www.youtube.com/@iaiaoh84
👉How Double Dragon would look like in real life? Each character has been transformed with #AI to capture their style, fighting spirit, and charisma, as if they had stepped right out of the game’s streets into the real world. AUDIO ON. Damn romantic💙
#artificialintelligence #machinelearning #ml #AI #deeplearning #computervision #AIwithPapers #metaverse #LLM
👉Post https://t.ly/0IpER
👉Channel http://www.youtube.com/@iaiaoh84
❤5👍2🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🍐 Promptable Human Mesh 🍐
👉PromptHMR is a promptable human pose/shape (HPS) estimation method that processes images with spatial or semantic prompts. It takes “side information” readily available from vision-language models or user input to improve the accuracy and robustness of 3D HPS. Code released💙
👉Review https://t.ly/zJ7S-
👉Paper arxiv.org/pdf/2504.06397
👉Project yufu-wang.github.io/phmr-page/
👉Repo github.com/yufu-wang/PromptHMR
👉PromptHMR is a promptable human pose/shape (HPS) estimation method that processes images with spatial or semantic prompts. It takes “side information” readily available from vision-language models or user input to improve the accuracy and robustness of 3D HPS. Code released💙
👉Review https://t.ly/zJ7S-
👉Paper arxiv.org/pdf/2504.06397
👉Project yufu-wang.github.io/phmr-page/
👉Repo github.com/yufu-wang/PromptHMR
🤣21❤10👍2🔥2
This media is not supported in your browser
VIEW IN TELEGRAM
🔥WebEyeTrack: real-time/web eye🔥
👉WebEyeTrack is a novel framework that integrates lightweight SOTA gaze estimation models directly in the browser. Bringing deep‑learning gaze estimation to the web browser and explicitly accounts for head pose. Source Code released under MIT license💙
👉Review https://t.ly/Xon9h
👉Paper https://arxiv.org/pdf/2508.19544
👉Project redforestai.github.io/WebEyeTrack/
👉Repo github.com/RedForestAi/WebEyeTrack
👉WebEyeTrack is a novel framework that integrates lightweight SOTA gaze estimation models directly in the browser. Bringing deep‑learning gaze estimation to the web browser and explicitly accounts for head pose. Source Code released under MIT license💙
👉Review https://t.ly/Xon9h
👉Paper https://arxiv.org/pdf/2508.19544
👉Project redforestai.github.io/WebEyeTrack/
👉Repo github.com/RedForestAi/WebEyeTrack
🔥8❤3👍1
This media is not supported in your browser
VIEW IN TELEGRAM
✂️ AI Open-Source Annotation ✂️
👉VisioFirm by TOELT is a fully open-source, AI-powered image annotation tool designed to accelerate labeling for Computer Vision tasks like object detection, oriented BBs, and segmentation. Source code released under Apache 2.0💙
👉Review https://t.ly/MoMvv
👉Paper https://lnkd.in/dxTncSgv
👉Repo https://lnkd.in/dCWMXp3x
👉VisioFirm by TOELT is a fully open-source, AI-powered image annotation tool designed to accelerate labeling for Computer Vision tasks like object detection, oriented BBs, and segmentation. Source code released under Apache 2.0💙
👉Review https://t.ly/MoMvv
👉Paper https://lnkd.in/dxTncSgv
👉Repo https://lnkd.in/dCWMXp3x
🔥11❤4🤯4👍3⚡1
Friends,
I’ve just open my IG account: https://www.instagram.com/aleferra.ig | Feel free to add me
What about posting stuff about AI on IG? Thoughts?
I’ve just open my IG account: https://www.instagram.com/aleferra.ig | Feel free to add me
What about posting stuff about AI on IG? Thoughts?
👍11❤1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🖌️Real-Time Drag-Based Editing🖌️
👉The Visual AI Lab unveils Inpaint4Drag, a novel framework that decomposes drag-based editing into pixel-space bidirectional warping/inpainting. Inspired by elastic object deformation. Demo and Code released (unknown license)💙
👉Review https://t.ly/H5nlR
👉Paper https://arxiv.org/pdf/2509.04582
👉Project https://visual-ai.github.io/inpaint4drag/
👉Repo https://github.com/Visual-AI/Inpaint4Drag
👉Demo https://colab.research.google.com/drive/1fzoyNzcJNZjM1_08FE9V2V20EQxGf4PH
👉The Visual AI Lab unveils Inpaint4Drag, a novel framework that decomposes drag-based editing into pixel-space bidirectional warping/inpainting. Inspired by elastic object deformation. Demo and Code released (unknown license)💙
👉Review https://t.ly/H5nlR
👉Paper https://arxiv.org/pdf/2509.04582
👉Project https://visual-ai.github.io/inpaint4drag/
👉Repo https://github.com/Visual-AI/Inpaint4Drag
👉Demo https://colab.research.google.com/drive/1fzoyNzcJNZjM1_08FE9V2V20EQxGf4PH
🔥8❤7👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🩸Foundation Red Blood Cells🩸
👉RedDino from University of Cagliari is a self-supervised foundation model designed for red blood cell (RBC) morphology analysis. Trained on 1.25M RBC images, it's the new SOTA in shape classification. Code & Models released under Apache2.0💙
👉Review https://t.ly/uWAch
👉Paper arxiv.org/pdf/2508.08180
👉Code github.com/Snarci/RedDino
👉Models huggingface.co/collections/Snarcy/reddino-689a13e29241d2e5690202fc
👉RedDino from University of Cagliari is a self-supervised foundation model designed for red blood cell (RBC) morphology analysis. Trained on 1.25M RBC images, it's the new SOTA in shape classification. Code & Models released under Apache2.0💙
👉Review https://t.ly/uWAch
👉Paper arxiv.org/pdf/2508.08180
👉Code github.com/Snarci/RedDino
👉Models huggingface.co/collections/Snarcy/reddino-689a13e29241d2e5690202fc
❤18👍4🔥2