📢 Research Assistant Positions Available
The Robust and Interpretable Machine Learning (RIML) Lab and the Trustworthy and Secure Artificial Intelligence Lab (TSAIL) at the Computer Engineering Department of Sharif University of Technology are seeking highly motivated and talented research assistants to join our team. This collaborative project is jointly supervised by Dr. Rohban and Dr. Sadeghzadeh.
🔍 Position Overview
We are working on cutting-edge research in the field of generative models, with a focus on robustness, interpretability, and trustworthiness. As a research assistant, you will contribute to impactful projects at the intersection of theory and real-world applications.
🧠 Required Qualifications
- Solid background in machine learning, artificial intelligence, and generative models
- Hands-on experience with generative models and their practical applications
- Proficiency in Python and frameworks such as PyTorch
- Strong communication skills and the ability to work well in a collaborative research environment
📝 How to Apply
If you are interested in joining our team, please complete the application form and upload your CV using the following link:
👉 Application Form
📚 Suggested Background Reading
To better understand the context of our research, we recommend reviewing the following papers:
1. http://arxiv.org/abs/2410.15618
2. http://arxiv.org/abs/2305.10120
We look forward to your application!
The Robust and Interpretable Machine Learning (RIML) Lab and the Trustworthy and Secure Artificial Intelligence Lab (TSAIL) at the Computer Engineering Department of Sharif University of Technology are seeking highly motivated and talented research assistants to join our team. This collaborative project is jointly supervised by Dr. Rohban and Dr. Sadeghzadeh.
🔍 Position Overview
We are working on cutting-edge research in the field of generative models, with a focus on robustness, interpretability, and trustworthiness. As a research assistant, you will contribute to impactful projects at the intersection of theory and real-world applications.
🧠 Required Qualifications
- Solid background in machine learning, artificial intelligence, and generative models
- Hands-on experience with generative models and their practical applications
- Proficiency in Python and frameworks such as PyTorch
- Strong communication skills and the ability to work well in a collaborative research environment
📝 How to Apply
If you are interested in joining our team, please complete the application form and upload your CV using the following link:
👉 Application Form
📚 Suggested Background Reading
To better understand the context of our research, we recommend reviewing the following papers:
1. http://arxiv.org/abs/2410.15618
2. http://arxiv.org/abs/2305.10120
We look forward to your application!
Telegram
RIML Lab
Robust and Interpretable Machine Learning Lab,
Prof. Mohammad Hossein Rohban,
Sharif University of Technology
https://youtube.com/@rimllab
twitter.com/MhRohban
https://www.linkedin.com/company/robust-and-interpretable-machine-learning-lab/
Prof. Mohammad Hossein Rohban,
Sharif University of Technology
https://youtube.com/@rimllab
twitter.com/MhRohban
https://www.linkedin.com/company/robust-and-interpretable-machine-learning-lab/
💠 Compositional Learning Journal Club
Join us this week for an in-depth discussion on Unlearning in Deep generative models in the context of cutting-edge generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle unlearning tasks and where improvements can be made.
✅ This Week's Presentation:
🔹 Title: Categorical Reparameterization with Gumbel-Softmax
🔸 Presenter: Aryan Komaei
🌀 Abstract:
This paper addresses the challenge of using categorical variables in stochastic neural networks, which traditionally struggle with backpropagation due to non-differentiable sampling. The authors propose the Gumbel-Softmax distribution as a solution — a differentiable approximation of categorical variables that allows for efficient gradient-based optimization. The key benefit is that it can be smoothly annealed to behave like a true categorical distribution. The method outperforms previous gradient estimators in tasks like structured prediction and generative modeling, and also enables significant speedups in semi-supervised classification.
Session Details:
- 📅 Date: Tuesday
- 🕒 Time: 11:00 - 12:30 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
Join us this week for an in-depth discussion on Unlearning in Deep generative models in the context of cutting-edge generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle unlearning tasks and where improvements can be made.
✅ This Week's Presentation:
🔹 Title: Categorical Reparameterization with Gumbel-Softmax
🔸 Presenter: Aryan Komaei
🌀 Abstract:
This paper addresses the challenge of using categorical variables in stochastic neural networks, which traditionally struggle with backpropagation due to non-differentiable sampling. The authors propose the Gumbel-Softmax distribution as a solution — a differentiable approximation of categorical variables that allows for efficient gradient-based optimization. The key benefit is that it can be smoothly annealed to behave like a true categorical distribution. The method outperforms previous gradient estimators in tasks like structured prediction and generative modeling, and also enables significant speedups in semi-supervised classification.
Session Details:
- 📅 Date: Tuesday
- 🕒 Time: 11:00 - 12:30 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
arXiv.org
Categorical Reparameterization with Gumbel-Softmax
Categorical variables are a natural choice for representing discrete structure in the world. However, stochastic neural networks rarely use categorical latent variables due to the inability to...
🔐 ML Security Journal Club
✅ This Week's Presentation:
🔹 Title: Jailbreaking Text-to-image Generative Models
🔸 Presenter: Arian Komaei
🌀 Abstract:
This paper introduces SneakyPrompt, an automated attack framework designed to bypass safety filters in text-to-image generative models like Stable Diffusion and DALL·E 2. These models are often equipped with safety filters to prevent the generation of harmful or NSFW (Not-Safe-for-Work) images. SneakyPrompt exploits these systems by using reinforcement learning to perturb blocked prompts in a way that circumvents the filters.
📄 Paper: SneakyPrompt: Jailbreaking Text-to-image Generative Models
Session Details:
- 📅 Date: Wednesday
- 🕒 Time: 5:00 - 6:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
✅ This Week's Presentation:
🔹 Title: Jailbreaking Text-to-image Generative Models
🔸 Presenter: Arian Komaei
🌀 Abstract:
This paper introduces SneakyPrompt, an automated attack framework designed to bypass safety filters in text-to-image generative models like Stable Diffusion and DALL·E 2. These models are often equipped with safety filters to prevent the generation of harmful or NSFW (Not-Safe-for-Work) images. SneakyPrompt exploits these systems by using reinforcement learning to perturb blocked prompts in a way that circumvents the filters.
📄 Paper: SneakyPrompt: Jailbreaking Text-to-image Generative Models
Session Details:
- 📅 Date: Wednesday
- 🕒 Time: 5:00 - 6:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
arXiv.org
SneakyPrompt: Jailbreaking Text-to-image Generative Models
Text-to-image generative models such as Stable Diffusion and DALL$\cdot$E raise many ethical concerns due to the generation of harmful images such as Not-Safe-for-Work (NSFW) ones. To address...
🪢 Compositional Learning Journal Club
Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.
🌟 This Week's Presentation:
📌 Title:
Direct Preference Optimization for Aligning Diffusion Models with Visually Consistent Samples
🎙️ Presenter: Mobina Poulaei
🧠 Abstract:
This work tackles a key challenge in diffusion models: the misalignment between generated images and their text prompts. While Direct Preference Optimization (DPO) has been used to improve alignment, it struggles with visual inconsistency between training samples. To address this, the authors propose D-Fusion, a method that creates visually consistent, DPO-trainable image pairs using mask-guided self-attention fusion. D-Fusion also preserves denoising trajectories necessary for optimization. Experiments show that it effectively improves prompt-image alignment across multiple reinforcement learning settings.
📄 Paper:
D-Fusion: Direct Preference Optimization for Aligning Diffusion Models with Visually Consistent Samples
Session Details:
- 📅 Date: Tuesday, August 5
- 🕒 Time: 11:00 AM - 12:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.
🌟 This Week's Presentation:
📌 Title:
Direct Preference Optimization for Aligning Diffusion Models with Visually Consistent Samples
🎙️ Presenter: Mobina Poulaei
🧠 Abstract:
This work tackles a key challenge in diffusion models: the misalignment between generated images and their text prompts. While Direct Preference Optimization (DPO) has been used to improve alignment, it struggles with visual inconsistency between training samples. To address this, the authors propose D-Fusion, a method that creates visually consistent, DPO-trainable image pairs using mask-guided self-attention fusion. D-Fusion also preserves denoising trajectories necessary for optimization. Experiments show that it effectively improves prompt-image alignment across multiple reinforcement learning settings.
📄 Paper:
D-Fusion: Direct Preference Optimization for Aligning Diffusion Models with Visually Consistent Samples
Session Details:
- 📅 Date: Tuesday, August 5
- 🕒 Time: 11:00 AM - 12:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
arXiv.org
D-Fusion: Direct Preference Optimization for Aligning Diffusion...
The practical applications of diffusion models have been limited by the misalignment between generated images and corresponding text prompts. Recent studies have introduced direct preference...
🔐 ML Security Journal Club
✅ This Week's Presentation:
🔹 Title: Jailbreaking Text-to-image Generative Models
🔸 Presenter: Arian Komaei
🌀 Abstract:
This paper introduces GhostPrompt, an automated jailbreak framework targeting text-to-image (T2I) generation models to bypass integrated safety filters for not-safe-for-work (NSFW) content. Unlike previous token-level perturbation methods, GhostPrompt leverages large language models (LLMs) with multimodal feedback for semantic-level adversarial prompt generation. It combines Dynamic Optimization—an iterative feedback-driven process for generating aligned adversarial prompts—and Adaptive Safety Indicator Injection, which strategically embeds benign visual cues to evade image-level detection. The framework achieves a 99% bypass rate against ShieldLM-7B (up from 12.5% with Sneakyprompt), improves CLIP scores, reduces processing time, and generalizes to unseen models, including GPT-4.1 and DALL·E 3. The work reveals critical vulnerabilities in current multimodal safety systems and calls for further AI safety research under controlled-access protocols.
📄 Paper: GhostPrompt: Jailbreaking Text-to-image Generative Models based on Dynamic Optimization
Session Details:
- 📅 Date: Tuesday
- 🕒 Time: 6:30 - 7:30 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
✅ This Week's Presentation:
🔹 Title: Jailbreaking Text-to-image Generative Models
🔸 Presenter: Arian Komaei
🌀 Abstract:
This paper introduces GhostPrompt, an automated jailbreak framework targeting text-to-image (T2I) generation models to bypass integrated safety filters for not-safe-for-work (NSFW) content. Unlike previous token-level perturbation methods, GhostPrompt leverages large language models (LLMs) with multimodal feedback for semantic-level adversarial prompt generation. It combines Dynamic Optimization—an iterative feedback-driven process for generating aligned adversarial prompts—and Adaptive Safety Indicator Injection, which strategically embeds benign visual cues to evade image-level detection. The framework achieves a 99% bypass rate against ShieldLM-7B (up from 12.5% with Sneakyprompt), improves CLIP scores, reduces processing time, and generalizes to unseen models, including GPT-4.1 and DALL·E 3. The work reveals critical vulnerabilities in current multimodal safety systems and calls for further AI safety research under controlled-access protocols.
📄 Paper: GhostPrompt: Jailbreaking Text-to-image Generative Models based on Dynamic Optimization
Session Details:
- 📅 Date: Tuesday
- 🕒 Time: 6:30 - 7:30 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
arXiv.org
GhostPrompt: Jailbreaking Text-to-image Generative Models based on...
Text-to-image (T2I) generation models can inadvertently produce not-safe-for-work (NSFW) content, prompting the integration of text and image safety filters. Recent advances employ large language...
🪢 Compositional Learning Journal Club
Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.
🌟 This Week's Presentation:
📌 Title:
Fast Noise Initialization for Temporally Consistent Video Generation
🎙️ Presenter: Ali Aghayari
🧠 Abstract:
Video generation has advanced rapidly with diffusion models, but ensuring temporal consistency remains challenging. Existing methods like FreeInit address this by iteratively refining noise during inference, though at a significant computational cost. To overcome this, the authors introduce FastInit, a fast noise initialization method powered by a Video Noise Prediction Network (VNPNet). Given random noise and a text prompt, VNPNet produces refined noise in a single forward pass, eliminating the need for iteration. This approach greatly improves efficiency while maintaining high temporal consistency across frames. Trained on a large-scale dataset of text prompts and noise pairs, FastInit consistently enhances video quality in experiments with various text-to-video models. By offering both speed and stability, FastInit provides a practical solution for real-world video generation. The code and dataset will be released publicly.
📄 Paper:
FastInit: Fast Noise Initialization for Temporally Consistent Video Generation
Session Details:
- 📅 Date: Tuesday, August 19
- 🕒 Time: 11:00 AM - 12:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.
🌟 This Week's Presentation:
📌 Title:
Fast Noise Initialization for Temporally Consistent Video Generation
🎙️ Presenter: Ali Aghayari
🧠 Abstract:
Video generation has advanced rapidly with diffusion models, but ensuring temporal consistency remains challenging. Existing methods like FreeInit address this by iteratively refining noise during inference, though at a significant computational cost. To overcome this, the authors introduce FastInit, a fast noise initialization method powered by a Video Noise Prediction Network (VNPNet). Given random noise and a text prompt, VNPNet produces refined noise in a single forward pass, eliminating the need for iteration. This approach greatly improves efficiency while maintaining high temporal consistency across frames. Trained on a large-scale dataset of text prompts and noise pairs, FastInit consistently enhances video quality in experiments with various text-to-video models. By offering both speed and stability, FastInit provides a practical solution for real-world video generation. The code and dataset will be released publicly.
📄 Paper:
FastInit: Fast Noise Initialization for Temporally Consistent Video Generation
Session Details:
- 📅 Date: Tuesday, August 19
- 🕒 Time: 11:00 AM - 12:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
arXiv.org
FastInit: Fast Noise Initialization for Temporally Consistent...
Video generation has made significant strides with the development of diffusion models; however, achieving high temporal consistency remains a challenging task. Recently, FreeInit identified a...
Call for Research Assistants in Large Language Model Projects
If you are familiar with Large Language Models (LLMs), you are invited to join our research projects as a research assistant. These projects focus on advanced topics in large language models.
The projects are conducted within the RIML Laboratory and may also be considered as undergraduate thesis projects, if applicable.
For an introduction to the topic, you can read:
Learning to Generate Research Idea with Dynamic Control
If you are interested, please complete the following form:
Registration Form
if you faced any problem contact @Moein_Salimi
If you are familiar with Large Language Models (LLMs), you are invited to join our research projects as a research assistant. These projects focus on advanced topics in large language models.
The projects are conducted within the RIML Laboratory and may also be considered as undergraduate thesis projects, if applicable.
For an introduction to the topic, you can read:
Learning to Generate Research Idea with Dynamic Control
If you are interested, please complete the following form:
Registration Form
if you faced any problem contact @Moein_Salimi
arXiv.org
LDC: Learning to Generate Research Idea with Dynamic Control
Recent advancements in large language models (LLMs) have demonstrated their potential in automating the scientific research ideation. Existing approaches primarily focus on prompting techniques,...
جلسهی سی و یکم باشگاه مدلهای زبانی بزرگ
📚 موضوع: برآورد عدم قطعیت در شبکههای عمیق
سخنران: دکتر یاسین عباسی، پژوهشگر پیشین هوش مصنوعی در دیپمایند
زمان: چهارشنبه ۱۴۰۴/۰۶/۲۶، ساعت ۱۵:۰۰
لینک جلسه:
https://vc.sharif.edu/rohban
یوتیوب (ویدئو جلسهها)
توییتر
افزودن رویداد به تقویم گوگل
وبسایت ژورنالکلاب
از همه دعوت میکنیم که در این جلسه شرکت کنند.
#LLM_Club
@LLM_CLUB
📚 موضوع: برآورد عدم قطعیت در شبکههای عمیق
سخنران: دکتر یاسین عباسی، پژوهشگر پیشین هوش مصنوعی در دیپمایند
زمان: چهارشنبه ۱۴۰۴/۰۶/۲۶، ساعت ۱۵:۰۰
لینک جلسه:
https://vc.sharif.edu/rohban
یوتیوب (ویدئو جلسهها)
توییتر
افزودن رویداد به تقویم گوگل
وبسایت ژورنالکلاب
از همه دعوت میکنیم که در این جلسه شرکت کنند.
#LLM_Club
@LLM_CLUB
🪢 Compositional Learning Journal Club
Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.
🌟 This Week's Presentation:
📄 Paper:
Minority-Focused Text-to-Image Generation via Prompt Optimization
🧠 Abstract:
This paper introduces a new framework for improving the generation of minority samples with pretrained text-to-image diffusion models. Minority instances—defined as samples in low-density regions of text-conditioned data distributions—are valuable for applications like data augmentation and creative AI but are underrepresented in current models, which tend to focus on high-density regions. To address this imbalance, the authors propose an online prompt optimization method that preserves semantic content while guiding the emergence of desired properties. They further adapt this approach with a specialized likelihood-based objective to better capture minority features. Experimental results across multiple diffusion models show that the method substantially improves the quality and diversity of generated minority samples compared to existing techniques.
🎙️ Presenter: Amir Kasaei
Session Details:
- 📅 Date: Tuesday
- 🕒 Time: 4:00 PM - 5:30 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.
🌟 This Week's Presentation:
📄 Paper:
Minority-Focused Text-to-Image Generation via Prompt Optimization
🧠 Abstract:
This paper introduces a new framework for improving the generation of minority samples with pretrained text-to-image diffusion models. Minority instances—defined as samples in low-density regions of text-conditioned data distributions—are valuable for applications like data augmentation and creative AI but are underrepresented in current models, which tend to focus on high-density regions. To address this imbalance, the authors propose an online prompt optimization method that preserves semantic content while guiding the emergence of desired properties. They further adapt this approach with a specialized likelihood-based objective to better capture minority features. Experimental results across multiple diffusion models show that the method substantially improves the quality and diversity of generated minority samples compared to existing techniques.
🎙️ Presenter: Amir Kasaei
Session Details:
- 📅 Date: Tuesday
- 🕒 Time: 4:00 PM - 5:30 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
arXiv.org
Minority-Focused Text-to-Image Generation via Prompt Optimization
We investigate the generation of minority samples using pretrained text-to-image (T2I) latent diffusion models. Minority instances, in the context of T2I generation, can be defined as ones living...
🔐 ML Security Journal Club
✅ This Week's Presentation:
🔹 Title: Safe Generative AI Workshop @ NeurIPS 2024
🔸 Presenter: Arian Komaei
🌀 Abstract:
In the past two years, generative AI has been the major driving force behind the development of advanced AI productssuch as ChatGPT4, AlphaFold, and StableDiffusion. These technologies, while significantly improving productivity for many, have raised significant safety concerns. However, there has been no workshop focusing on this topic in the past two years. This workshop, emphasizing AI safety concerns related to the use of generative AI, is very needed for the community. Generative AI, including large language models, vision-language models, diffusion models, and many more, has significantly aided various aspects of both academia and industry. In scientific discovery, these aspects encompass experimental design, hypothesis formulation, theoretical reasoning, and observation organization. In commercial applications, generative models such as large language models and diffusion algorithms have changed the lifestyles and workflows of billions around the world. This workshop aims to convene experts from various fields to address these challenges and explore potential solutions.
Session Details:
- 📅 Date: Sunday
- 🕒 Time: 4:00 - 5:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
✅ This Week's Presentation:
🔹 Title: Safe Generative AI Workshop @ NeurIPS 2024
🔸 Presenter: Arian Komaei
🌀 Abstract:
In the past two years, generative AI has been the major driving force behind the development of advanced AI productssuch as ChatGPT4, AlphaFold, and StableDiffusion. These technologies, while significantly improving productivity for many, have raised significant safety concerns. However, there has been no workshop focusing on this topic in the past two years. This workshop, emphasizing AI safety concerns related to the use of generative AI, is very needed for the community. Generative AI, including large language models, vision-language models, diffusion models, and many more, has significantly aided various aspects of both academia and industry. In scientific discovery, these aspects encompass experimental design, hypothesis formulation, theoretical reasoning, and observation organization. In commercial applications, generative models such as large language models and diffusion algorithms have changed the lifestyles and workflows of billions around the world. This workshop aims to convene experts from various fields to address these challenges and explore potential solutions.
Session Details:
- 📅 Date: Sunday
- 🕒 Time: 4:00 - 5:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
🪢 Compositional Learning Journal Club
Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.
🌟 This Week's Presentation:
📌 Title:
Compositional Visual Reasoning: Why It Matters and What Holds Us Back
🧠 Abstract:
Compositional visual reasoning is a key challenge in multimodal AI, focusing on enabling machines to break down visual scenes into meaningful parts, connect them with concepts, and perform multi-step logical inference. In this session, we will introduce the foundations of visual reasoning and discuss why compositionality is crucial for achieving robustness, interpretability, and cognitive alignment in AI systems. We will also highlight major challenges, including hallucinations, difficulty in maintaining semantic fidelity, and the limitations of current reasoning strategies. The aim is to provide a clear picture of the problem space and motivate deeper exploration in future sessions.
📄 Paper:
Explain Before You Answer: A Survey on Compositional Visual Reasoning
🎙 Presenter: Amir Kasaei
Session Details:
- 📅 Date: Tuesday, September 23
- 🕒 Time: 4:00 AM - 5:30 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.
🌟 This Week's Presentation:
📌 Title:
Compositional Visual Reasoning: Why It Matters and What Holds Us Back
🧠 Abstract:
Compositional visual reasoning is a key challenge in multimodal AI, focusing on enabling machines to break down visual scenes into meaningful parts, connect them with concepts, and perform multi-step logical inference. In this session, we will introduce the foundations of visual reasoning and discuss why compositionality is crucial for achieving robustness, interpretability, and cognitive alignment in AI systems. We will also highlight major challenges, including hallucinations, difficulty in maintaining semantic fidelity, and the limitations of current reasoning strategies. The aim is to provide a clear picture of the problem space and motivate deeper exploration in future sessions.
📄 Paper:
Explain Before You Answer: A Survey on Compositional Visual Reasoning
🎙 Presenter: Amir Kasaei
Session Details:
- 📅 Date: Tuesday, September 23
- 🕒 Time: 4:00 AM - 5:30 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
arXiv.org
Explain Before You Answer: A Survey on Compositional Visual Reasoning
Compositional visual reasoning has emerged as a key research frontier in multimodal AI, aiming to endow machines with the human-like ability to decompose visual scenes, ground intermediate...
🔐 ML Security Journal Club
✅ This Week's Presentation:
🔹 Title: Unlearning diffusion models
🔸 Presenter: Arian Komaei
🌀 Abstract:
This paper introduces Single Layer Unlearning Gradient (SLUG), a new method for removing unwanted information from trained models efficiently. Unlike traditional unlearning approaches that require costly updates across many layers, SLUG updates only one carefully chosen layer using a single gradient step. The method relies on layer importance and gradient alignment to identify the optimal layer, preserving model performance while unlearning targeted content. Experiments show that SLUG works effectively across models like CLIP, Stable Diffusion, and vision-language models, handling both concrete concepts (e.g., objects, identities) and abstract ones (e.g., artistic styles). Compared to existing approaches, SLUG achieves similar unlearning results but with much lower computational cost, making it a practical solution for efficient and precise targeted unlearning.
📄 Paper: Targeted Unlearning with Single Layer Unlearning Gradient
Session Details:
- 📅 Date: Sunday
- 🕒 Time: 4:00 - 5:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
✅ This Week's Presentation:
🔹 Title: Unlearning diffusion models
🔸 Presenter: Arian Komaei
🌀 Abstract:
This paper introduces Single Layer Unlearning Gradient (SLUG), a new method for removing unwanted information from trained models efficiently. Unlike traditional unlearning approaches that require costly updates across many layers, SLUG updates only one carefully chosen layer using a single gradient step. The method relies on layer importance and gradient alignment to identify the optimal layer, preserving model performance while unlearning targeted content. Experiments show that SLUG works effectively across models like CLIP, Stable Diffusion, and vision-language models, handling both concrete concepts (e.g., objects, identities) and abstract ones (e.g., artistic styles). Compared to existing approaches, SLUG achieves similar unlearning results but with much lower computational cost, making it a practical solution for efficient and precise targeted unlearning.
📄 Paper: Targeted Unlearning with Single Layer Unlearning Gradient
Session Details:
- 📅 Date: Sunday
- 🕒 Time: 4:00 - 5:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
arXiv.org
Targeted Unlearning with Single Layer Unlearning Gradient
Machine unlearning methods aim to remove sensitive or unwanted content from trained models, but typically demand extensive model updates at significant computational cost while potentially...
RIML Lab
🔐 ML Security Journal Club ✅ This Week's Presentation: 🔹 Title: Unlearning diffusion models 🔸 Presenter: Arian Komaei 🌀 Abstract: This paper introduces Single Layer Unlearning Gradient (SLUG), a new method for removing unwanted information from trained…
جلسهی امروز متاسفانه برگزار نخواهد شد
سایر جلسات از طریق همین کانال اطلاع رسانی خواهد شد
سایر جلسات از طریق همین کانال اطلاع رسانی خواهد شد
🔐 ML Security Journal Club
✅ This Week's Presentation:
🔹 Title: Unlearning diffusion models
🔸 Presenter: Arian Komaei
🌀 Abstract:
This paper introduces Single Layer Unlearning Gradient (SLUG), a new method for removing unwanted information from trained models efficiently. Unlike traditional unlearning approaches that require costly updates across many layers, SLUG updates only one carefully chosen layer using a single gradient step. The method relies on layer importance and gradient alignment to identify the optimal layer, preserving model performance while unlearning targeted content. Experiments show that SLUG works effectively across models like CLIP, Stable Diffusion, and vision-language models, handling both concrete concepts (e.g., objects, identities) and abstract ones (e.g., artistic styles). Compared to existing approaches, SLUG achieves similar unlearning results but with much lower computational cost, making it a practical solution for efficient and precise targeted unlearning.
📄 Paper: Targeted Unlearning with Single Layer Unlearning Gradient
Session Details:
- 📅 Date: Sunday
- 🕒 Time: 4:00 - 5:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
✅ This Week's Presentation:
🔹 Title: Unlearning diffusion models
🔸 Presenter: Arian Komaei
🌀 Abstract:
This paper introduces Single Layer Unlearning Gradient (SLUG), a new method for removing unwanted information from trained models efficiently. Unlike traditional unlearning approaches that require costly updates across many layers, SLUG updates only one carefully chosen layer using a single gradient step. The method relies on layer importance and gradient alignment to identify the optimal layer, preserving model performance while unlearning targeted content. Experiments show that SLUG works effectively across models like CLIP, Stable Diffusion, and vision-language models, handling both concrete concepts (e.g., objects, identities) and abstract ones (e.g., artistic styles). Compared to existing approaches, SLUG achieves similar unlearning results but with much lower computational cost, making it a practical solution for efficient and precise targeted unlearning.
📄 Paper: Targeted Unlearning with Single Layer Unlearning Gradient
Session Details:
- 📅 Date: Sunday
- 🕒 Time: 4:00 - 5:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
Research Team Formation: ML Trustworthiness & Speech Language Models
We are currently forming a research team for a project in the field of ML Trustworthiness and Speech Language Models.
Our goal is to publish the outcomes of this research in top-tier machine learning conferences. Additionally, active team members who contribute meaningfully to the project will receive recommendation letters from faculty members.
If you are interested in these topics and have sufficient time to dedicate to research, please fill out the form below:
Form Link
To learn more about related works previously conducted in our lab, you can visit the following links:
• Dr. Mohammad Hossein Rohban – Google Scholar
• PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers (CVPR 2025)
_We look forward to collaborating with you!_
We are currently forming a research team for a project in the field of ML Trustworthiness and Speech Language Models.
Our goal is to publish the outcomes of this research in top-tier machine learning conferences. Additionally, active team members who contribute meaningfully to the project will receive recommendation letters from faculty members.
If you are interested in these topics and have sufficient time to dedicate to research, please fill out the form below:
Form Link
To learn more about related works previously conducted in our lab, you can visit the following links:
• Dr. Mohammad Hossein Rohban – Google Scholar
• PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers (CVPR 2025)
_We look forward to collaborating with you!_
scholar.google.co.uk
Mohammad Hossein Rohban
Associate Professor in Computer Engineering, Sharif University of Technology - Cited by 5,251 - Machine Learning - Statistics - Computational Biology
🚀 Open RA Positions – Reinforcement Learning (Generalization & Sample Efficiency)
We have a few Research Assistant (RA) openings on Generalization and Sample Efficiency in Reinforcement Learning (RL). Selected candidates will work directly with Dr. Rohban and the project supervisor.
The project focuses on improving RL agents’ generalization beyond training environments using contrastive learning. While the choice of positive/negative samples greatly impacts training (see: https://arxiv.org/abs/2102.10960
), anchor selection remains an unexplored area (related works: https://arxiv.org/abs/2004.04136
and https://arxiv.org/abs/1511.05952
).
We’re looking for highly motivated researchers (B.Sc. or higher) with:
1️⃣ Strong background in Python and Git
2️⃣ Proficiency in Deep Learning & Reinforcement Learning (taken/audited both courses)
3️⃣ At least 3 months of prior research experience
4️⃣ Self-motivated, independent, and a quick learner
5️⃣ On-site presence in the lab, with weekly meetings with Dr. Rohban and regular reports to the project supervisor
🕘 Deadline: Wednesday, October 20th, 2025 – 9:00 AM (Tehran time)
📄 Apply here: https://forms.gle/88SfwtwZvQ2JCZ7X7
We have a few Research Assistant (RA) openings on Generalization and Sample Efficiency in Reinforcement Learning (RL). Selected candidates will work directly with Dr. Rohban and the project supervisor.
The project focuses on improving RL agents’ generalization beyond training environments using contrastive learning. While the choice of positive/negative samples greatly impacts training (see: https://arxiv.org/abs/2102.10960
), anchor selection remains an unexplored area (related works: https://arxiv.org/abs/2004.04136
and https://arxiv.org/abs/1511.05952
).
We’re looking for highly motivated researchers (B.Sc. or higher) with:
1️⃣ Strong background in Python and Git
2️⃣ Proficiency in Deep Learning & Reinforcement Learning (taken/audited both courses)
3️⃣ At least 3 months of prior research experience
4️⃣ Self-motivated, independent, and a quick learner
5️⃣ On-site presence in the lab, with weekly meetings with Dr. Rohban and regular reports to the project supervisor
🕘 Deadline: Wednesday, October 20th, 2025 – 9:00 AM (Tehran time)
📄 Apply here: https://forms.gle/88SfwtwZvQ2JCZ7X7
arXiv.org
Return-Based Contrastive Representation Learning for Reinforcement Learning
Recently, various auxiliary tasks have been proposed to accelerate representation learning and improve sample efficiency in deep reinforcement learning (RL). However, existing auxiliary tasks do...
📢 Research Assistant Positions Available
The Robust and Interpretable Machine Learning (RIML) Lab and the Trustworthy and Secure Artificial Intelligence Lab (TSAIL) at the Computer Engineering Department of Sharif University of Technology are seeking highly motivated and talented research assistants to join our team. This collaborative project is jointly supervised by Dr. Rohban and Dr. Sadeghzadeh.
🔍 Position Overview
We are working on cutting-edge research in the field of generative models, with a focus on robustness, interpretability, and trustworthiness. As a research assistant, you will contribute to impactful projects at the intersection of theory and real-world applications.
🧠 Required Qualifications
- Solid background in machine learning, artificial intelligence, and generative models
- Hands-on experience with generative models and their practical applications
- Proficiency in Python and frameworks such as PyTorch
- Strong communication skills and the ability to work well in a collaborative research environment
📝 How to Apply
If you are interested in joining our team, please complete the application form and upload your CV using the following link:
👉 Application Form
📚 Suggested Background Reading
To better understand the context of our research, we recommend reviewing the following papers:
1. http://arxiv.org/abs/2410.15618
2. http://arxiv.org/abs/2305.10120
⚠️ Note 1: We do not accept applicants who currently have a full-time job or those who are students with a part-time job.
⚠️ Note 2: The target of these projects is submission to ICML and ECCV at the end of this Shamsi year. Therefore, time is limited, and participants must have at least 20–30 hours of free time per week to dedicate to the projects.
We look forward to your application!
The Robust and Interpretable Machine Learning (RIML) Lab and the Trustworthy and Secure Artificial Intelligence Lab (TSAIL) at the Computer Engineering Department of Sharif University of Technology are seeking highly motivated and talented research assistants to join our team. This collaborative project is jointly supervised by Dr. Rohban and Dr. Sadeghzadeh.
🔍 Position Overview
We are working on cutting-edge research in the field of generative models, with a focus on robustness, interpretability, and trustworthiness. As a research assistant, you will contribute to impactful projects at the intersection of theory and real-world applications.
🧠 Required Qualifications
- Solid background in machine learning, artificial intelligence, and generative models
- Hands-on experience with generative models and their practical applications
- Proficiency in Python and frameworks such as PyTorch
- Strong communication skills and the ability to work well in a collaborative research environment
📝 How to Apply
If you are interested in joining our team, please complete the application form and upload your CV using the following link:
👉 Application Form
📚 Suggested Background Reading
To better understand the context of our research, we recommend reviewing the following papers:
1. http://arxiv.org/abs/2410.15618
2. http://arxiv.org/abs/2305.10120
⚠️ Note 1: We do not accept applicants who currently have a full-time job or those who are students with a part-time job.
⚠️ Note 2: The target of these projects is submission to ICML and ECCV at the end of this Shamsi year. Therefore, time is limited, and participants must have at least 20–30 hours of free time per week to dedicate to the projects.
We look forward to your application!
arXiv.org
Erasing Undesirable Concepts in Diffusion Models with Adversarial...
Diffusion models excel at generating visually striking content from text but can inadvertently produce undesirable or harmful content when trained on unfiltered internet data. A practical solution...
Call for Research Assistants in Large Language Model Projects
If you are familiar with LLMs, you are invited to join our research projects as a research assistant. This project focuses on abductive reasoning in llms.
This project focuses on abductive reasoning in LLMs and aims at preparing submission for ACL 2026.
For an introduction to the topic, you can read:
GEAR: A General Evaluation Framework for Abductive Reasoning
If you are interested, please complete the following form:
Registration Form
If you are familiar with LLMs, you are invited to join our research projects as a research assistant. This project focuses on abductive reasoning in llms.
This project focuses on abductive reasoning in LLMs and aims at preparing submission for ACL 2026.
For an introduction to the topic, you can read:
GEAR: A General Evaluation Framework for Abductive Reasoning
If you are interested, please complete the following form:
Registration Form
arXiv.org
GEAR: A General Evaluation Framework for Abductive Reasoning
Since the advent of large language models (LLMs), research has focused on instruction following and deductive reasoning. A central question remains: can these models discover new knowledge, and...
🔘 Open Research Position: Hallucination Detection in Vision-Language Models (VLMs)
We are looking for motivated students to join our research project.
🔍 Project Denoscription
VLMs suffer from hallucination issues, where responses are incorrect, misleading, or not grounded in the image content. This research focuses on detecting these hallucinations and distinguishing hallucinated responses from non-hallucinated ones.
🔹 Requirements
- Strong Python programming skills
- Knowledge of deep learning
- Familiarity with VLMs
- Hands-on experience with PyTorch
📌 Note: Filling out this form does not guarantee acceptance. Only shortlisted candidates will receive an email by Nov 23.
📅 Application Deadline: Nov 22, 2025 (یکم آذر)
🔗 Apply here: Google Form
We are looking for motivated students to join our research project.
🔍 Project Denoscription
VLMs suffer from hallucination issues, where responses are incorrect, misleading, or not grounded in the image content. This research focuses on detecting these hallucinations and distinguishing hallucinated responses from non-hallucinated ones.
🔹 Requirements
- Strong Python programming skills
- Knowledge of deep learning
- Familiarity with VLMs
- Hands-on experience with PyTorch
📌 Note: Filling out this form does not guarantee acceptance. Only shortlisted candidates will receive an email by Nov 23.
📅 Application Deadline: Nov 22, 2025 (یکم آذر)
🔗 Apply here: Google Form
🔐 ML Security Journal Club
✅ This Week's Presentation:
🔹 Title: Unlearning diffusion models
🔸 Presenter: Arian Komaei
🌀 Abstract:
This paper digs into the messiness of “concept erasure” in diffusion models and shows just how fragile most erasure claims really are. The authors break down the erasure process into two fundamental mechanisms: (1) disrupting the model’s internal guidance so it tries not to produce a target concept, and (2) outright suppressing the unconditional probability of generating that concept at all. Then they put current erasure techniques under a microscope using a battery of independent probes—visual context manipulation, altered diffusion trajectories, classifier guidance tests, and inspection of substitute generations that emerge when the “erased” concept is supposedly gone. The verdict? Most methods barely scratch the surface. Models often smuggle the concept back in through alternative prompts, context cues, or trajectory tweaks. The paper’s evaluation suite exposes these failure modes and sets a much higher bar for claiming true erasure in diffusion models.
📄 Paper: When Are Concepts Erased From Diffusion Models?
Session Details:
- 📅 Date: Sunday
- 🕒 Time: 3:30 - 4:30 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
✅ This Week's Presentation:
🔹 Title: Unlearning diffusion models
🔸 Presenter: Arian Komaei
🌀 Abstract:
This paper digs into the messiness of “concept erasure” in diffusion models and shows just how fragile most erasure claims really are. The authors break down the erasure process into two fundamental mechanisms: (1) disrupting the model’s internal guidance so it tries not to produce a target concept, and (2) outright suppressing the unconditional probability of generating that concept at all. Then they put current erasure techniques under a microscope using a battery of independent probes—visual context manipulation, altered diffusion trajectories, classifier guidance tests, and inspection of substitute generations that emerge when the “erased” concept is supposedly gone. The verdict? Most methods barely scratch the surface. Models often smuggle the concept back in through alternative prompts, context cues, or trajectory tweaks. The paper’s evaluation suite exposes these failure modes and sets a much higher bar for claiming true erasure in diffusion models.
📄 Paper: When Are Concepts Erased From Diffusion Models?
Session Details:
- 📅 Date: Sunday
- 🕒 Time: 3:30 - 4:30 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
arXiv.org
When Are Concepts Erased From Diffusion Models?
In concept erasure, a model is modified to selectively prevent it from generating a target concept. Despite the rapid development of new methods, it remains unclear how thoroughly these approaches...
🪢 Compositional Learning Journal Club
Join us this week for an in-depth discussion on Compositional Learning for Visual Reasoning in modern vision–language models. We will explore recent breakthroughs and challenges, focusing on how these models perform compositional visual reasoning over complex scenes and where there is still room for improvement in robustness, faithfulness, and instruction following.
🌟 This Week's Presentation
📌 Title:
Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
🧠 Abstract:
Multimodal Large Language Models (MLLMs) have recently shown strong potential in visual reasoning, especially when combined with test-time scaling techniques. However, most current approaches keep the visual input fixed and only explore different textual reasoning paths, which limits their ability to exploit rich visual details—particularly in high-resolution images with many fine-grained elements. In such settings, vision-level reasoning becomes crucial: models need to dynamically zoom into informative regions of the image to gather the evidence required for accurate decisions.
In this session, we will discuss ZoomEye, a training-free, model-agnostic tree search algorithm for vision-level reasoning. ZoomEye treats an image as a hierarchical tree, where each node is a region and child nodes correspond to zoomed-in sub-regions. By navigating this tree, MLLMs can simulate human-like zooming behavior, selectively focusing on task-relevant areas. Experiments on high-resolution benchmarks show that ZoomEye substantially boosts the performance of multiple MLLMs (e.g., InternVL2.5-8B gains over 15%–17% on HR-Bench) and even enables small 3–8B models to outperform larger systems such as GPT-4o.
📄 Paper:
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
🎙 Presenter: Amir Kasaei
Session Details:
- 📅 Date: Tuesday, November 25th
- 🕒 Time: 3:00 PM - 4:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
Join us this week for an in-depth discussion on Compositional Learning for Visual Reasoning in modern vision–language models. We will explore recent breakthroughs and challenges, focusing on how these models perform compositional visual reasoning over complex scenes and where there is still room for improvement in robustness, faithfulness, and instruction following.
🌟 This Week's Presentation
📌 Title:
Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
🧠 Abstract:
Multimodal Large Language Models (MLLMs) have recently shown strong potential in visual reasoning, especially when combined with test-time scaling techniques. However, most current approaches keep the visual input fixed and only explore different textual reasoning paths, which limits their ability to exploit rich visual details—particularly in high-resolution images with many fine-grained elements. In such settings, vision-level reasoning becomes crucial: models need to dynamically zoom into informative regions of the image to gather the evidence required for accurate decisions.
In this session, we will discuss ZoomEye, a training-free, model-agnostic tree search algorithm for vision-level reasoning. ZoomEye treats an image as a hierarchical tree, where each node is a region and child nodes correspond to zoomed-in sub-regions. By navigating this tree, MLLMs can simulate human-like zooming behavior, selectively focusing on task-relevant areas. Experiments on high-resolution benchmarks show that ZoomEye substantially boosts the performance of multiple MLLMs (e.g., InternVL2.5-8B gains over 15%–17% on HR-Bench) and even enables small 3–8B models to outperform larger systems such as GPT-4o.
📄 Paper:
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
🎙 Presenter: Amir Kasaei
Session Details:
- 📅 Date: Tuesday, November 25th
- 🕒 Time: 3:00 PM - 4:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! ✌️
arXiv.org
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming...
Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities in vision-language understanding. Recently, with the integration of test-time scaling techniques, these models...