ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
3.96K photos
223 videos
23 files
4.26K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
OpenSeeD

A Simple Framework for Open-Vocabulary Segmentation and Detection

🖥 Github: https://github.com/idea-research/openseed

Paper: https://arxiv.org/abs/2303.08131v2

💨 Dataset: https://paperswithcode.com/dataset/objects365

https://news.1rj.ru/str/DataScienceT
❤‍🔥3
Contrastive Semi-supervised Learning for Underwater Image Restoration via Reliable Bank

🖥 Github: https://github.com/huang-shirui/semi-uir

Paper: https://arxiv.org/abs/2303.09101v1

💨 Project: https://paperswithcode.com/dataset/uieb

https://news.1rj.ru/str/DataScienceT
❤‍🔥2
🖥 GigaGAN - Pytorch

Implementation of GigaGAN, new SOTA GAN out of Adobe.

https://github.com/lucidrains/gigagan-pytorch

https://news.1rj.ru/str/DataScienceT
❤‍🔥2
Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation (CVPR 2023)

Novel Diffusion Audio-Gesture Transformer is devised to better attend to the information from multiple modalities and model the long-term temporal dependency.

🖥 Github: https://github.com/advocate99/diffgesture

Paper: https://arxiv.org/abs/2303.09119v1

💨 Dataset: https://paperswithcode.com/dataset/beat

https://news.1rj.ru/str/DataScienceT
👍3❤‍🔥2
👍2❤‍🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
⚜️ ViperGPT: Visual Inference via Python Execution for Reasoning

ViperGPT, a framework that leverages code-generation models to compose vision-and-language models into subroutines to produce a result for any query.

🖥 Github: https://github.com/cvlab-columbia/viper

Paper: https://arxiv.org/pdf/2303.08128.pdf

💨 Project: https://paperswithcode.com/dataset/beat

https://news.1rj.ru/str/DataScienceT
👍3🏆2❤‍🔥1
🎥 Zero-1-to-3: Zero-shot One Image to 3D Object

Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image.

🖥 Github: https://github.com/cvlab-columbia/zero123

🤗 Hugging face: https://huggingface.co/spaces/cvlab/zero123-live

Paper: https://arxiv.org/abs/2303.11328v1

Dataset: https://zero123.cs.columbia.edu/

💨 Project: https://paperswithcode.com/dataset/beat

⭐️ Demo: https://huggingface.co/spaces/cvlab/zero123

https://news.1rj.ru/str/DataScienceT
3❤‍🔥3🏆2👍1
MIT Introduction to Deep Learning - 2023 Starting soon! MIT Intro to DL is one of the most concise AI courses on the web that cover basic deep learning techniques, architectures, and applications.

2023 lectures are starting in just one day, Jan 9th!

Link to register:
http://introtodeeplearning.com

MIT Introduction to Deep Learning The 2022 lectures can be found here:

https://m.youtube.com/playlist?list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI

https://news.1rj.ru/str/DataScienceT
❤‍🔥3👍3🏆2
Train your ControlNet with diffusers 🧨

ControlNet is a neural network structure that allows fine-grained control of diffusion models by adding extra conditions.

🤗 Hugging face: https://huggingface.co/blog/train-your-controlnet#

🖥 Github: https://github.com/huggingface/blog/blob/main/train-your-controlnet.md

ControlNet training example: https://github.com/huggingface/diffusers/tree/main/examples/controlnet

https://news.1rj.ru/str/DataScienceT
❤‍🔥3🏆2
🔥 Fix the Noise: Disentangling Source Feature for Controllable Domain Translation

A new approach for high-quality domain translation with better controllability.

🖥 Github: https://github.com/LeeDongYeun/FixNoise

Paper: https://arxiv.org/abs/2303.11545v1

💨 Dataset: https://paperswithcode.com/dataset/metfaces

https://news.1rj.ru/str/DataScienceT
1
This media is not supported in your browser
VIEW IN TELEGRAM
"A panda is playing guitar on times square"

Text2Video-Zero

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators

Paper: https://arxiv.org/abs/2303.13439
Video Result: video result link
Source code: https://github.com/picsart-ai-research/text2video-zero

https://news.1rj.ru/str/DataScienceT
1
This media is not supported in your browser
VIEW IN TELEGRAM
Conditional Image-to-Video Generation with Latent Flow Diffusion Models

New approach for cI2V using novel latent flow diffusion models (LFDM) that synthesize an optical flow sequence in the latent space based on the given condition to warp the given image.

🖥 Github: https://github.com/nihaomiao/cvpr23_lfdm

Paper: https://arxiv.org/abs/2303.13744v1

💨 Dataset: https://drive.google.com/file/d/1dRn1wl5TUaZJiiDpIQADt1JJ0_q36MVG/view?usp=share_link

https://news.1rj.ru/str/DataScienceT
3❤‍🔥2👍1
What's your gender?
This media is not supported in your browser
VIEW IN TELEGRAM
Test of Time: Instilling Video-Language Models with a Sense of Time

GPT-5 will likely have video abilities, but will it have a sense of time? Here is answer to this question in #CVPR2023 paper by student of University of Amsterdam to learn how to instil time into video-language foundation models.

Paper:
https://arxiv.org/abs/2301.02074

Code:
https://github.com/bpiyush/TestOfTime

Project Page:
https://bpiyush.github.io/testoftime-website/

https://news.1rj.ru/str/DataScienceT
❤‍🔥3
This media is not supported in your browser
VIEW IN TELEGRAM
ViperGPT: Visual Inference via Python Execution for Reasoning

ViperGPT, a framework that leverages code-generation models to compose vision-and-language models into subroutines to produce a result for any query.


Github:
https://github.com/cvlab-columbia/viper

Paper:
https://arxiv.org/pdf/2303.08128.pdf

Project:
https://paperswithcode.com/dataset/beat

https://news.1rj.ru/str/DataScienceT
❤‍🔥2
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research

Propose a three-stage processing pipeline for filtering noisy data and generating high-quality captions, where ChatGPT.

🖥 Github: https://github.com/xinhaomei/wavcaps

Paper: https://arxiv.org/abs/2303.17395v1

💨 Dataset: https://paperswithcode.com/dataset/sounddescs

https://news.1rj.ru/str/DataScienceT
❤‍🔥2👍2
❤‍🔥3👍1