NEW BOT Телеграм, страница

ML&|Sec Feed

Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
University of Illinois Urbana-Champaign, Lapis Labs (2024)
https://www.arxiv.org/abs/2401.17263

Исследование представляет оптимизационный алгоритм получения защитного суффикса к системному промпту (Robust Prompt Optimization, RPO) на различных методах оптимизационных промпт-атак.

Для оптимизации суффикса использовались GCG, PAIR, JBC.
Для проверки эффективности - AutoDAN, TAP (Tree-of-Attacks with Pruning), PAP (Persuasive Adversarial Prompt), и какие-то FewShot инструкции.

Опасные инструкции брались из AdvBench.

Тестировали на Vicuna-13B, Llama-2-7B-Chat, Qwen-1.5-14B, Llama-2-13B-Chat, GPT-3.5-Turbo, GPT-4

Метод показал на результаты как на атаках из обучения (понятное дело):
GPT-4: с 50% до 6% (на PAIR атаке)
Llama-2: с 4% до 0% (полная защита)
GCG атака: 0% ASR на всех моделях (полная защита)

Так и на неизвестных атаках (из HarmBench):
Vicuna: снижение ASR в среднем на 18%
Llama-2: снижение на 6.6%
GPT-3.5: снижение на 8.7%
GPT-4: снижение на 3.5%

При этом метод оказывает минимальный импакт на прикладное использование (проверили на MMLU и MT-bench, падение минимально)

Практически же метод представляет собой
- Всего 20 дополнительных токенов на запрос (набор рандомных символов после ввода пользователя)
- В 8 раз дешевле оптимизации по сравнению с GCG суффиксами на один запрос по вычислениям

Пример
<im_start>system
You are an assistant that is harmless and helpful.
<im_end>

<im_start>user
[ПОЛЬЗОВАТЕЛЬСКИЙ ЗАПРОС/JAILBREAK]
<im_end>

<im_start>system
[ЗДЕСЬ ДОБАВЛЯЕТСЯ RPO СУФФИКС]
<im_end>

167 viewsБорис_ь с ml, 10:44

ML&|Sec Feed

Defensive Prompt Patch: A Robust and Generalizable Defense of Large Language Models against Jailbreak Attacks
Hong-Kong Science University, Princeton University, IBM (2025)
https://arxiv.org/pdf/2405.20099

Метод, аналогичный RPO

210 viewsБорис_ь с ml, 10:57

ML&|Sec Feed

Defending Jailbreak Prompts via In-Context Adversarial Game (ICAG)
University of Notre Dame, INRIA, King Abdullah University of Science and Technology (2024)
https://aclanthology.org/2024.emnlp-main.1121.pdf

Метод поэтапной генерации системного промпта как игры между двумя LLM (attack agent и defense agent)

217 viewsБорис_ь с ml, 11:00

ML&|Sec Feed

Forwarded from CyberSecurityTechnologies

Pickle_vulns.pdf

1.4 MB

#MLSecOps
"The Art of Hide and Seek: Making Pickle-Based Model Supply Chain Poisoning Stealthy Again", 2025.

// the first systematic disclosure of the picklebased model poisoning surface from both the model loading and risky function perspectives

195 viewsБорис_ь с ml, 15:44

ML&|Sec Feed

Forwarded from CyberSecurityTechnologies

WFA.pdf

600.4 KB

#MLSecOps
#Red_Team_Tactics
"Web Fraud Attacks Against LLM-Driven Multi-Agent Systems", 2025.
]-> Examples of WFA (Repo)

// In this paper, we propose Web Fraud Attacks, a novel type of attack aiming at inducing MAS to visit malicious websites. We design 11 representative attack variants that encompass domain name tampering, link structure camouflage (sub-directory nesting, sub-domain grafting, parameter obfuscation, etc.), and other deceptive techniques tailored to exploit MAS's vulnerabilities in link validation

259 viewsБорис_ь с ml, 15:44

ML&|Sec Feed

An Automated Multi-Agent Framework for Reproducing CVEs
https://arxiv.org/pdf/2509.01835

253 viewsБорис_ь с ml, 18:40

ML&|Sec Feed

Forwarded from CyberSecurityTechnologies

stt_reasoning.pdf

645.5 KB

#AIOps
#Offensive_security
"Guided Reasoning in LLM-Driven Penetration Testing Using Structured Attack Trees", COLM 2025.
]-> https://github.com/KatsuNK/stt-reasoning

// a guided reasoning pipeline for pentesting LLM agents that incorporates a deterministic task tree built from the MITRE ATT&CK Matrix, a proven penetration testing kill chain, to constrain the LLM’s reasoning process to explicitly defined tactics, techniques, and procedures

👍1

227 viewsБорис_ь с ml, 04:37

ML&|Sec Feed

Forwarded from CyberSecurityTechnologies

AIJack.pdf

1001.6 KB

#tools
#MLSecOps
"Stealth by Conformity: Evading Robust Aggregation through Adaptive Poisoning", 2025.
]-> AIJack: Security and Privacy Risk Simulator for Machine Learning

// In this paper, we challenge this underlying assumption by showing that a model can be poisoned while keeping malicious updates within the main distribution

🔥3

216 viewsБорис_ь с ml, 03:51

ML&|Sec Feed

https://github.com/microsoft/ai-agents-for-beginners

GitHub

GitHub - microsoft/ai-agents-for-beginners: 12 Lessons to Get Started Building AI Agents

12 Lessons to Get Started Building AI Agents. Contribute to microsoft/ai-agents-for-beginners development by creating an account on GitHub.

❤2

443 viewsБорис_ь с ml, 06:56

ML&|Sec Feed

Forwarded from Machinelearning

📕 Свежий гайд от Anthropic: Writing effective tools for agents — with agents

Anthropic описывает, как правильно создавать инструменты (tools) для AI-агентов: так, чтобы они были максимально полезными, эффективными и надёжными. Особый акцент сделан на том, как использовать самих агентов для прототипирования, тестирования и оптимизации инструментов.

Как писать эффективные инструменты для агентов
- Делай быстрые прототипы и сразу проверяй, как агент с ними работает.
- Тестируй на реальных сценариях, а не на абстрактных примерах.
- Анализируй логи и поведение агента, чтобы находить ошибки и непонятные места.
- Избегай дублирования: один инструмент должен выполнять одну чёткую задачу.
- Используй понятные имена и структуры (`machinelearning_create_task`, `mla_list_users`).
- Возвращай только нужные данные, не перегружай ответ лишним. Добавляй фильтрацию и пагинацию.
- Пиши описания так, чтобы их понял даже человек, который не в теме: чётко, без двусмысленностей, с примерами входа и выхода.

Что это дает:
- Улучшает способность AI-агентов решать реальные задачи.
- Минимизирует ошибки: неверное использование инструментов, лишние токены, избыточные вызовы.
- Повышает надёжность и предсказуемость поведения агентов.
- Упрощает масштабирование — добавление новых инструментов и задач.

🟠 Полный гайд: https://www.anthropic.com/engineering/writing-tools-for-agents

@ai_machinelearning_big_data

#Anthropic #claude #aiagents #ai

Please open Telegram to view this post

VIEW IN TELEGRAM

Please open Telegram to view this post

VIEW IN TELEGRAM

200 viewsБорис_ь с ml, 06:05

ML&|Sec Feed

https://www.securitylab.ru/news/563360.php

SecurityLab.ru

SpamGPT: за $5000 теперь можно купить личного ИИ-хакера уровня профи

Теперь фишинг стал делом одного клика.

181 viewsБорис_ь с ml, 04:20

ML&|Sec Feed

Forwarded from Анализ данных (Data analysis)

🔥

agency-swarm — проект, связанный с моделированием многагентных систем с использованием методов искусственного интеллекта!

🌟 Проект ориентирован на создание среды, где множество агентов могут взаимодействовать друг с другом, сотрудничать или конкурировать для достижения определенных целей.

🌟 Основная цель agency-swarm — это исследование и реализация агентных систем, где каждый агент может быть автономным и выполнять задачи в рамках заданной среды. Такие системы часто используются для симуляции поведения групп людей, животных или даже для оптимизации процессов, например, в логистике, робототехнике или при моделировании социальных взаимодействий.

🔐 Лицензия: MIT

🖥

Github

@data_analysis_ml

Please open Telegram to view this post

VIEW IN TELEGRAM

❤1👎1

168 viewsБорис_ь с ml, 17:26

ML&|Sec Feed

Forwarded from AISecHub

CyberSOCEval.pdf

6.5 MB

CyberSOCEval: Benchmarking LLMs Capabilities for Malware Analysis and Threat Intelligence Reasoning - https://ai.meta.com/research/publications/cybersoceval-benchmarking-llms-capabilities-for-malware-analysis-and-threat-intelligence-reasoning/

Today’s cyber defenders are overwhelmed by a deluge of security alerts, threat intelligence signals, and shifting business context, creating an urgent need for AI systems that can enhance operational security work. Despite the potential of Large Language Models (LLMs) to automate and scale Security Operations Center (SOC) operations, existing evaluations are incomplete in assessing the scenarios that matter most to real-world cyber defenders. This lack of informed evaluation has significant implications for both AI developers and those seeking to apply LLMs to SOC automation.

Without a clear understanding of how LLMs perform in real-world security scenarios, AI system developers lack a north star to guide their development efforts, and users are left without a reliable way to select the most effective models. Furthermore, malicious actors have begun using AI to scale cyber attacks, emphasizing the need for open source benchmarks to drive adoption and community-driven improvement among defenders and AI model developers.

To address this gap, we introduce CyberSOCEval, a new suite of open source benchmarks that are part of CyberSecEval 4. CyberSOCEval consists of benchmarks tailored to evaluate LLMs in two tasks: Malware Analysis and Threat Intelligence Reasoning, core defensive domains that have inadequate coverage in current security benchmarks. Our evaluations reveal that larger, more modern LLMs tend to perform better, confirming the training scaling laws paradigm. We also find that reasoning models leveraging test time scaling do not achieve the boost they do in areas like coding and math, suggesting that these models have not been trained to reason about cybersecurity analysis, and pointing to a key opportunity for improvement.

Finally, we find that current LLMs are far from saturating our evaluations, demonstrating that CyberSOCEval presents a significant hill to climb for AI developers to improve AI cyber defense capabilities.

🔥1

173 viewsБорис_ь с ml, 06:20

About

Blog

Apps

Platform