This media is not supported in your browser
VIEW IN TELEGRAM
🍇 Graph Neural Network in TF 🍇
👉#Google TensorFlow-GNN: novel library to build Graph Neural Networks on TensorFlow. Source Code released under Apache 2.0 license 💙
👉Review https://t.ly/TQfg-
👉Code github.com/tensorflow/gnn
👉Blog blog.research.google/2024/02/graph-neural-networks-in-tensorflow.html
👉#Google TensorFlow-GNN: novel library to build Graph Neural Networks on TensorFlow. Source Code released under Apache 2.0 license 💙
👉Review https://t.ly/TQfg-
👉Code github.com/tensorflow/gnn
👉Blog blog.research.google/2024/02/graph-neural-networks-in-tensorflow.html
❤17👍4👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🆔 Magic-Me: ID-Specific Video 🆔
👉#ByteDance VCD: with just a few images of a specific identity it can generate temporal consistent videos aligned with the given prompt
👉Review https://t.ly/qjJ2O
👉Paper arxiv.org/pdf/2402.09368.pdf
👉Project magic-me-webpage.github.io
👉Code github.com/Zhen-Dong/Magic-Me
👉#ByteDance VCD: with just a few images of a specific identity it can generate temporal consistent videos aligned with the given prompt
👉Review https://t.ly/qjJ2O
👉Paper arxiv.org/pdf/2402.09368.pdf
👉Project magic-me-webpage.github.io
👉Code github.com/Zhen-Dong/Magic-Me
❤6🥰1🤯1🤣1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 Breaking: GEMINI 1.5 is out 🔥
👉Gemini 1.5 just announced: standard 128,000 token context window, up to 1 MILLION tokens via AI-Studio and #Vertex AI in private preview 🫠
👉Review https://t.ly/Vblvx
👉More: https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#build-experiment
👉Gemini 1.5 just announced: standard 128,000 token context window, up to 1 MILLION tokens via AI-Studio and #Vertex AI in private preview 🫠
👉Review https://t.ly/Vblvx
👉More: https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#build-experiment
🤯17👍4😱2
AI with Papers - Artificial Intelligence & Deep Learning
🈚 Seeing Through Occlusions 🈚 👉Novel NSF to see through occlusions, reflection suppression & shadow removal. 👉Review https://t.ly/5jcIG 👉Project https://light.princeton.edu/publication/nsf 👉Paper https://arxiv.org/pdf/2312.14235.pdf 👉Repo https://gi…
🔥 Seeing Through Occlusions: code is out 🔥
👉Repo: https://github.com/princeton-computational-imaging/NSF
👉Repo: https://github.com/princeton-computational-imaging/NSF
❤4🔥3🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
☀️ One2Avatar: Pic -> 3D Avatar ☀️
👉#Google presents a new approach to generate animatable photo-realistic avatars from only a few/one image. Impressive results.
👉Review https://t.ly/AS1oc
👉Paper arxiv.org/pdf/2402.11909.pdf
👉Project zhixuany.github.io/one2avatar_webpage/
👉#Google presents a new approach to generate animatable photo-realistic avatars from only a few/one image. Impressive results.
👉Review https://t.ly/AS1oc
👉Paper arxiv.org/pdf/2402.11909.pdf
👉Project zhixuany.github.io/one2avatar_webpage/
👏12❤3🤩3🔥2
This media is not supported in your browser
VIEW IN TELEGRAM
🪟 BOG: Fine Geometric Views 🪟
👉 #Google (+Tübingen) unveils Binary Opacity Grids, a novel method to reconstruct triangle meshes from multi-view images able to capture fine geometric detail such as leaves, branches & grass. New SOTA, real-time on Google Pixel 8 Pro (and similar).
👉Review https://t.ly/E6T0W
👉Paper https://lnkd.in/dQEq3zy6
👉Project https://lnkd.in/dYYCadx9
👉Demo https://lnkd.in/d92R6QME
👉 #Google (+Tübingen) unveils Binary Opacity Grids, a novel method to reconstruct triangle meshes from multi-view images able to capture fine geometric detail such as leaves, branches & grass. New SOTA, real-time on Google Pixel 8 Pro (and similar).
👉Review https://t.ly/E6T0W
👉Paper https://lnkd.in/dQEq3zy6
👉Project https://lnkd.in/dYYCadx9
👉Demo https://lnkd.in/d92R6QME
🔥8🤯4👏3🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🦥Neuromorphic Video Binarization🦥
👉 University of HK unveils the new SOTA in event-based neuromorphic binary reconstruction: stunning results on QR Code, barcode, & Text. Real-Time, only CPU, up to 10,000 FPS!
👉Review https://t.ly/V-NFa
👉Paper arxiv.org/pdf/2402.12644.pdf
👉Project github.com/eleboss/EBR
👉 University of HK unveils the new SOTA in event-based neuromorphic binary reconstruction: stunning results on QR Code, barcode, & Text. Real-Time, only CPU, up to 10,000 FPS!
👉Review https://t.ly/V-NFa
👉Paper arxiv.org/pdf/2402.12644.pdf
👉Project github.com/eleboss/EBR
❤15👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🩻 Pose via Ray Diffusion 🩻
👉Novel distributed representation of camera pose that treats a camera as a bundle of rays. Naturally suited for set-level transformers, it's the new SOTA on camera pose estimation. Source code released 💙
👉Review https://t.ly/qBsFK
👉Paper arxiv.org/pdf/2402.14817.pdf
👉Project jasonyzhang.com/RayDiffusion
👉Code github.com/jasonyzhang/RayDiffusion
👉Novel distributed representation of camera pose that treats a camera as a bundle of rays. Naturally suited for set-level transformers, it's the new SOTA on camera pose estimation. Source code released 💙
👉Review https://t.ly/qBsFK
👉Paper arxiv.org/pdf/2402.14817.pdf
👉Project jasonyzhang.com/RayDiffusion
👉Code github.com/jasonyzhang/RayDiffusion
🔥17❤6🤯3👍1👏1🍾1
🗃️ MATH-Vision Dataset 🗃️
👉MATH-V is a curated dataset of 3,040 HQ mat problems with visual contexts sourced from real math competitions. Dataset released 💙
👉Review https://t.ly/gmIAu
👉Paper arxiv.org/pdf/2402.14804.pdf
👉Project mathvision-cuhk.github.io/
👉Code github.com/mathvision-cuhk/MathVision
👉MATH-V is a curated dataset of 3,040 HQ mat problems with visual contexts sourced from real math competitions. Dataset released 💙
👉Review https://t.ly/gmIAu
👉Paper arxiv.org/pdf/2402.14804.pdf
👉Project mathvision-cuhk.github.io/
👉Code github.com/mathvision-cuhk/MathVision
🤯8🔥4👍2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🫅FlowMDM: Human Composition🫅
👉FlowMDM, a diffusion-based approach capable of generating seamlessly continuous sequences of human motion from textual denoscriptions.
👉Review https://t.ly/pr2g_
👉Paper https://lnkd.in/daYRftdF
👉Project https://lnkd.in/dcRkv5Pc
👉Repo https://lnkd.in/dw-3JJks
👉FlowMDM, a diffusion-based approach capable of generating seamlessly continuous sequences of human motion from textual denoscriptions.
👉Review https://t.ly/pr2g_
👉Paper https://lnkd.in/daYRftdF
👉Project https://lnkd.in/dcRkv5Pc
👉Repo https://lnkd.in/dw-3JJks
❤9🔥6👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🎷EMO: talking/singing Gen-AI 🎷
👉EMO: audio-driven portrait-video generation. Vocal avatar videos with expressive facial expressions, and various head poses. Input: 1 single frame, video duration = length of input audio
👉Review https://t.ly/4IYj5
👉Paper https://lnkd.in/dGPX2-Yc
👉Project https://lnkd.in/dyf6p_N3
👉Repo (empty) github.com/HumanAIGC/EMO
👉EMO: audio-driven portrait-video generation. Vocal avatar videos with expressive facial expressions, and various head poses. Input: 1 single frame, video duration = length of input audio
👉Review https://t.ly/4IYj5
👉Paper https://lnkd.in/dGPX2-Yc
👉Project https://lnkd.in/dyf6p_N3
👉Repo (empty) github.com/HumanAIGC/EMO
❤18🔥7👍4🤯3🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
💌 Multi-LoRA Composition 💌
👉Two novel training-free image composition: LoRA Switch and LoRA Composite for integrating any number of elements in an image through multi-LoRA composition. Source Code released 💙
👉Review https://t.ly/GFy3Z
👉Paper arxiv.org/pdf/2402.16843.pdf
👉Code github.com/maszhongming/Multi-LoRA-Composition
👉Two novel training-free image composition: LoRA Switch and LoRA Composite for integrating any number of elements in an image through multi-LoRA composition. Source Code released 💙
👉Review https://t.ly/GFy3Z
👉Paper arxiv.org/pdf/2402.16843.pdf
👉Code github.com/maszhongming/Multi-LoRA-Composition
👍11❤6🔥2🥰1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
💥 MM-AU: Video Accident 💥
👉MM-AU - Multi-Modal Accident Understanding: 11,727 videos with temporally aligned denoscriptions. 2.23M+ BBs, 58,650 pairs of video-based accident reasons. Data & Code announced 💙
👉Review https://t.ly/a-jKI
👉Paper arxiv.org/pdf/2403.00436.pdf
👉Dataset http://www.lotvsmmau.net/MMAU/demo
👉MM-AU - Multi-Modal Accident Understanding: 11,727 videos with temporally aligned denoscriptions. 2.23M+ BBs, 58,650 pairs of video-based accident reasons. Data & Code announced 💙
👉Review https://t.ly/a-jKI
👉Paper arxiv.org/pdf/2403.00436.pdf
👉Dataset http://www.lotvsmmau.net/MMAU/demo
👍11❤2🔥2🤯2
🔥 SOTA: Stable Diffusion 3 is out! 🔥
👉Stable Diffusion 3 is the new SOTA in text-to-image generation (based on human preference evaluations). New Multimodal Diffusion Transformer (MMDiT) architecture uses separate sets of weights for image & language, improving text understanding/spelling capabilities. Weights & Source Code to be released 💙
👉Review https://t.ly/a1koo
👉Paper https://lnkd.in/d4i-9Bte
👉Blog https://lnkd.in/d-bEX-ww
👉Stable Diffusion 3 is the new SOTA in text-to-image generation (based on human preference evaluations). New Multimodal Diffusion Transformer (MMDiT) architecture uses separate sets of weights for image & language, improving text understanding/spelling capabilities. Weights & Source Code to be released 💙
👉Review https://t.ly/a1koo
👉Paper https://lnkd.in/d4i-9Bte
👉Blog https://lnkd.in/d-bEX-ww
🔥19❤5👏3⚡1👍1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🧵E-LoFTR: new Feats-Matching SOTA🧵
👉A novel LoFTR-inspired algorithm for efficiently producing semidense matches across images: up to 2.5× faster than LoFTR, superior to previous SOTA pipeline (SuperPoint + LightGlue). Code announced.
👉Review https://t.ly/7SPmC
👉Paper https://arxiv.org/pdf/2403.04765.pdf
👉Project https://zju3dv.github.io/efficientloftr/
👉Repo https://github.com/zju3dv/efficientloftr
👉A novel LoFTR-inspired algorithm for efficiently producing semidense matches across images: up to 2.5× faster than LoFTR, superior to previous SOTA pipeline (SuperPoint + LightGlue). Code announced.
👉Review https://t.ly/7SPmC
👉Paper https://arxiv.org/pdf/2403.04765.pdf
👉Project https://zju3dv.github.io/efficientloftr/
👉Repo https://github.com/zju3dv/efficientloftr
🔥13👍4🤯2❤1
🦁StableDrag: Point-based Editing🦁
👉#Tencent unveils StableDrag, a novel point-based image editing framework via discriminative point tracking method + confidence-based latent enhancement strategy for motion supervision. Source Code announced but still no repo.
👉Review https://t.ly/eUI05
👉Paper https://lnkd.in/dz8-ymck
👉Project stabledrag.github.io/
👉#Tencent unveils StableDrag, a novel point-based image editing framework via discriminative point tracking method + confidence-based latent enhancement strategy for motion supervision. Source Code announced but still no repo.
👉Review https://t.ly/eUI05
👉Paper https://lnkd.in/dz8-ymck
👉Project stabledrag.github.io/
❤2👍1🔥1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🏛️ PIXART-Σ: 4K Generation 🏛️
👉PixArt-Σ is a novel Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. Authors: #Huawei, Dalian, HKU & HKUST. Demos available, code announced 💙
👉Review https://t.ly/Cm2Qh
👉Paper arxiv.org/pdf/2403.04692.pdf
👉Project pixart-alpha.github.io/PixArt-sigma-project/
👉Repo (empty) github.com/PixArt-alpha/PixArt-sigma
🤗-Demo https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha
👉PixArt-Σ is a novel Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. Authors: #Huawei, Dalian, HKU & HKUST. Demos available, code announced 💙
👉Review https://t.ly/Cm2Qh
👉Paper arxiv.org/pdf/2403.04692.pdf
👉Project pixart-alpha.github.io/PixArt-sigma-project/
👉Repo (empty) github.com/PixArt-alpha/PixArt-sigma
🤗-Demo https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha
🔥7⚡1❤1👍1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
👺 Can GPT-4 play DOOM? 👺
👉Apparently yes, GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing. Code (with licensing restrictions) released
👉Review https://t.ly/W8-0F
👉Paper https://lnkd.in/dmsB7bjA
👉Project https://lnkd.in/ddDPwjQB
👉Apparently yes, GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing. Code (with licensing restrictions) released
👉Review https://t.ly/W8-0F
👉Paper https://lnkd.in/dmsB7bjA
👉Project https://lnkd.in/ddDPwjQB
🤯8💩7🔥2🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🪖RT Humanoid from Head-Mounted Sensors🪖
👉#META (+CMU) announced SimXR, a method for controlling a simulated avatar from info obtained from AR/VR headsets
👉Review https://t.ly/Si2Mp
👉Paper arxiv.org/pdf/2403.06862.pdf
👉Project www.zhengyiluo.com/SimXR/
👉#META (+CMU) announced SimXR, a method for controlling a simulated avatar from info obtained from AR/VR headsets
👉Review https://t.ly/Si2Mp
👉Paper arxiv.org/pdf/2403.06862.pdf
👉Project www.zhengyiluo.com/SimXR/
❤12⚡1👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🏷️ Face Foundation Model 🏷️
👉Arc2Face, the first foundation model for human faces. Source Code released 💙
👉Review https://t.ly/MfAFI
👉Paper https://lnkd.in/dViE_tCd
👉Project https://lnkd.in/d4MHdEZK
👉Code https://lnkd.in/dv9ZtDfA
👉Arc2Face, the first foundation model for human faces. Source Code released 💙
👉Review https://t.ly/MfAFI
👉Paper https://lnkd.in/dViE_tCd
👉Project https://lnkd.in/d4MHdEZK
👉Code https://lnkd.in/dv9ZtDfA
❤12👍3👏1🤩1
🪼FaceXFormer: Unified Face-Transformer🪼
👉FaceXFormer, the first unified transformer for facial analysis: face parsing, landmark detection, head pose, attributes recognition, age, gender, race, and landmarks.
👉Review https://t.ly/MfAFI
👉Paper https://arxiv.org/pdf/2403.12960.pdf
👉Project kartik-3004.github.io/facexformer_web/
👉Code github.com/Kartik-3004/facexformer
👉FaceXFormer, the first unified transformer for facial analysis: face parsing, landmark detection, head pose, attributes recognition, age, gender, race, and landmarks.
👉Review https://t.ly/MfAFI
👉Paper https://arxiv.org/pdf/2403.12960.pdf
👉Project kartik-3004.github.io/facexformer_web/
👉Code github.com/Kartik-3004/facexformer
👍11❤4🥰2🔥1