The Role of AI Safety Benchmarks in Evaluating Systemic Risks in General-Purpose AI Models
The evaluation of systemic risks in General-Purpose AI (GPAI) models is a complex challenge that requires a multifaceted approach, extending beyond traditional capability assessments. This report analyses the role of AI safety benchmarks in identifying systemic risks in GPAI models, proposing a dual-trigger framework that combines capability triggers with safety benchmarks to provide a more comprehensive assessment of potential harms. The current landscape of safety benchmarks is still in development, with various initiatives emerging to address specific systemic risk categories, such as the ones identified in the GPAI Code of Practice: cyber offence, chemical, biological, radiological and nuclear (CBRN) risks, harmful manipulation, and loss of control. A tiered evaluation strategy is recommended, applying more rigorous and costly safety evaluations only to models that meet a predefined capability threshold or are intended for deployment in high-risk domains, ensuring proportionality and efficient resource allocation. Ultimately, the development of robust and standardised safety benchmarks is relevant for accurate classification of GPAI models as GPAI models with systemic risk, and policy initiatives should incentivise their creation to enable more effective systemic risk identification.
Authors: @ifigeneialel, @tseligkas, @Georgios_cyber, @eu_eeas, Jamila Boutemeur, Kevin Foley, Jussi Leskinen, Jakub Otčenášek, Dominik Ziolek
Source: https://publications.jrc.ec.europa.eu/repository/bitstream/JRC143259/JRC143259_01.pdf
The evaluation of systemic risks in General-Purpose AI (GPAI) models is a complex challenge that requires a multifaceted approach, extending beyond traditional capability assessments. This report analyses the role of AI safety benchmarks in identifying systemic risks in GPAI models, proposing a dual-trigger framework that combines capability triggers with safety benchmarks to provide a more comprehensive assessment of potential harms. The current landscape of safety benchmarks is still in development, with various initiatives emerging to address specific systemic risk categories, such as the ones identified in the GPAI Code of Practice: cyber offence, chemical, biological, radiological and nuclear (CBRN) risks, harmful manipulation, and loss of control. A tiered evaluation strategy is recommended, applying more rigorous and costly safety evaluations only to models that meet a predefined capability threshold or are intended for deployment in high-risk domains, ensuring proportionality and efficient resource allocation. Ultimately, the development of robust and standardised safety benchmarks is relevant for accurate classification of GPAI models as GPAI models with systemic risk, and policy initiatives should incentivise their creation to enable more effective systemic risk identification.
Authors: @ifigeneialel, @tseligkas, @Georgios_cyber, @eu_eeas, Jamila Boutemeur, Kevin Foley, Jussi Leskinen, Jakub Otčenášek, Dominik Ziolek
Source: https://publications.jrc.ec.europa.eu/repository/bitstream/JRC143259/JRC143259_01.pdf
ENISA Threat Landscape 2025 - - 14 AI Takeaways - https://www.enisa.europa.eu/sites/default/files/2025-10/ENISA%20Threat%20Landscape%202025_0.pdf
1️⃣ AI is reshaping the threat landscape - By early 2025, AI-assisted phishing dominated global social-engineering activity, exceeding four-fifths of observed campaigns. - https://www.group-ib.com/blog/the-dark-side-of-automation-and-rise-of-ai-agent/
2️⃣ LLMs for social engineering and tooling - Threat groups use commercial and jailbroken LLMs (e.g., WormGPT, EscapeGPT, FraudGPT) to scale persuasion and speed up malicious tool creation. - https://blogs.microsoft.com/on-the-issues/2025/02/27/disrupting-cybercrime-abusing-gen-ai/
3️⃣ State-aligned use of Gemini and ChatGPT plus persona building - China-, Iran-, and DPRK-linked actors employ Gemini and ChatGPT for research, recon, and evasion, while units like Famous Chollima generate convincing LinkedIn personas. - https://cloud.google.com/blog/topics/threat-intelligence/adversarial-misuse-generative-ai - @googlecloud
4️⃣ Standalone malicious AI systems emerge - Bespoke systems such as Xanthorox AI signal a shift from jailbreaks to locally hosted, customized AI tools that reduce detection risk. - https://destcert.com/resources/xanthorox-ai/ - @destcert
5️⃣ Deepfakes and synthetic media used broadly - AI-generated voice and video bolster vishing, impersonation, and even malware-related activity. - https://edition.cnn.com/2024/02/04/asia/deepfake-cfo-scam-hong-kong-intl-hnk - @cnni
6️⃣ AI as a lure via fake AI tool sites and installers - Impersonations of tools like Kling AI, Luma AI, Canva Dream Lab, and DeepSeek-R1 deliver malware, including payloads disguised as legitimate AI installers. - https://cloud.google.com/blog/topics/threat-intelligence/cybercriminals-weaponize-fake-ai-websites
7️⃣ Targeting the AI supply chain - Adversaries poison hosted ML models, ship trojanized PyPI packages, and abuse a “Rules File Backdoor” to insert hostile instructions into config files used by Cursor and GitHub Copilot. - https://www.reversinglabs.com/blog/malicious-attack-method-on-hosted-ml-models-now-targets-pypi
8️⃣ ‘Slopsquatting’ enters the lexicon - Attackers exploit LLM-hallucinated package or app names as AI becomes embedded in developer workflows. - https://socket.dev/blog/slopsquatting-how-ai-hallucinations-are-fueling-a-new-class-of-supply-chain-attacks
9️⃣ Subverting model behavior beyond mere misuse - Proofs of concept show how model objectives can be steered toward attacker goals rather than simply using AI tools “as is.” - https://www.anthropic.com/research/sleeper-agents-training-deceptive-llms-that-persist-through-safety-training
🔟 AI software and infrastructure vulnerabilities - Critical RCE issues in Langflow and Microsoft 365 Copilot, plus infra flaws like SSRF (CVE-2024-27564) tied to ChatGPT components, highlight ecosystem fragility. - https://www.aim.security/lp/aim-labs-echoleak-blogpost
1️⃣1️⃣ Malicious browser extensions and ecosystem risk - Compromised open-source repos and harmful browser add-ons spread infostealers and amplify supply-chain exposure. - https://www.malwarebytes.com/blog/news/2025/01/google-chrome-ai-extensions-deliver-info-stealing-malware-in-broad-attack
1️⃣2️⃣ AI in FIMI and hybrid operations - Information manipulation efforts increasingly lean on AI, with notable targeting of European public institutions. - https://www.eeas.europa.eu/eeas/foreign-information-manipulation-and-interference-fimi_en
1️⃣3️⃣ Election impact example in Romania (2024) - Intelligence findings linked coordinated, AI-driven influence operations and cyberattacks to an annulled first-round presidential result. - https://www.ccr.ro/comunicat-de-presa-6-decembrie-2024/
1️⃣4️⃣ Quantifying AI-driven phishing dominance - Between September 2024 and February 2025, AI factored into the vast majority of phishing emails, enabling volume and personalization at scale. - https://www.knowbe4.com/hubfs/Phishing-Threat-Trends-2025_Report.pdf?hsLang=en - @KnowBe4
1️⃣ AI is reshaping the threat landscape - By early 2025, AI-assisted phishing dominated global social-engineering activity, exceeding four-fifths of observed campaigns. - https://www.group-ib.com/blog/the-dark-side-of-automation-and-rise-of-ai-agent/
2️⃣ LLMs for social engineering and tooling - Threat groups use commercial and jailbroken LLMs (e.g., WormGPT, EscapeGPT, FraudGPT) to scale persuasion and speed up malicious tool creation. - https://blogs.microsoft.com/on-the-issues/2025/02/27/disrupting-cybercrime-abusing-gen-ai/
3️⃣ State-aligned use of Gemini and ChatGPT plus persona building - China-, Iran-, and DPRK-linked actors employ Gemini and ChatGPT for research, recon, and evasion, while units like Famous Chollima generate convincing LinkedIn personas. - https://cloud.google.com/blog/topics/threat-intelligence/adversarial-misuse-generative-ai - @googlecloud
4️⃣ Standalone malicious AI systems emerge - Bespoke systems such as Xanthorox AI signal a shift from jailbreaks to locally hosted, customized AI tools that reduce detection risk. - https://destcert.com/resources/xanthorox-ai/ - @destcert
5️⃣ Deepfakes and synthetic media used broadly - AI-generated voice and video bolster vishing, impersonation, and even malware-related activity. - https://edition.cnn.com/2024/02/04/asia/deepfake-cfo-scam-hong-kong-intl-hnk - @cnni
6️⃣ AI as a lure via fake AI tool sites and installers - Impersonations of tools like Kling AI, Luma AI, Canva Dream Lab, and DeepSeek-R1 deliver malware, including payloads disguised as legitimate AI installers. - https://cloud.google.com/blog/topics/threat-intelligence/cybercriminals-weaponize-fake-ai-websites
7️⃣ Targeting the AI supply chain - Adversaries poison hosted ML models, ship trojanized PyPI packages, and abuse a “Rules File Backdoor” to insert hostile instructions into config files used by Cursor and GitHub Copilot. - https://www.reversinglabs.com/blog/malicious-attack-method-on-hosted-ml-models-now-targets-pypi
8️⃣ ‘Slopsquatting’ enters the lexicon - Attackers exploit LLM-hallucinated package or app names as AI becomes embedded in developer workflows. - https://socket.dev/blog/slopsquatting-how-ai-hallucinations-are-fueling-a-new-class-of-supply-chain-attacks
9️⃣ Subverting model behavior beyond mere misuse - Proofs of concept show how model objectives can be steered toward attacker goals rather than simply using AI tools “as is.” - https://www.anthropic.com/research/sleeper-agents-training-deceptive-llms-that-persist-through-safety-training
🔟 AI software and infrastructure vulnerabilities - Critical RCE issues in Langflow and Microsoft 365 Copilot, plus infra flaws like SSRF (CVE-2024-27564) tied to ChatGPT components, highlight ecosystem fragility. - https://www.aim.security/lp/aim-labs-echoleak-blogpost
1️⃣1️⃣ Malicious browser extensions and ecosystem risk - Compromised open-source repos and harmful browser add-ons spread infostealers and amplify supply-chain exposure. - https://www.malwarebytes.com/blog/news/2025/01/google-chrome-ai-extensions-deliver-info-stealing-malware-in-broad-attack
1️⃣2️⃣ AI in FIMI and hybrid operations - Information manipulation efforts increasingly lean on AI, with notable targeting of European public institutions. - https://www.eeas.europa.eu/eeas/foreign-information-manipulation-and-interference-fimi_en
1️⃣3️⃣ Election impact example in Romania (2024) - Intelligence findings linked coordinated, AI-driven influence operations and cyberattacks to an annulled first-round presidential result. - https://www.ccr.ro/comunicat-de-presa-6-decembrie-2024/
1️⃣4️⃣ Quantifying AI-driven phishing dominance - Between September 2024 and February 2025, AI factored into the vast majority of phishing emails, enabling volume and personalization at scale. - https://www.knowbe4.com/hubfs/Phishing-Threat-Trends-2025_Report.pdf?hsLang=en - @KnowBe4
❤1👍1🤡1
AISecHub
ENISA Threat Landscape 2025 - - 14 AI Takeaways - https://www.enisa.europa.eu/sites/default/files/2025-10/ENISA%20Threat%20Landscape%202025_0.pdf 1️⃣ AI is reshaping the threat landscape - By early 2025, AI-assisted phishing dominated global social-engineering…
ENISA Threat Landscape 2025_0.pdf
4.8 MB
❤1👍1🥱1
Checklist - Secure implementation of an AI system
"The simplified checklist in appendix 1 could provide AI users, operators and developers with additional measures and recommendations to consider" - By ANSSI
Source: Building trust in AI through a cyber risk-based approach - https://cyber.gouv.fr/en/publications/building-trust-ai-through-cyber-risk-based-approach
"The simplified checklist in appendix 1 could provide AI users, operators and developers with additional measures and recommendations to consider" - By ANSSI
Source: Building trust in AI through a cyber risk-based approach - https://cyber.gouv.fr/en/publications/building-trust-ai-through-cyber-risk-based-approach
👍1👎1🔥1
NIST Cybersecurity Standards Training Dataset
https://huggingface.co/datasets/ethanolivertroy/nist-cybersecurity-training
Dataset Summary:
This dataset contains 523,706 training examples extracted from 568 NIST cybersecurity publications, including:
> FIPS (Federal Information Processing Standards)
> SP 800 series (Special Publications)
> IR (Interagency/Internal Reports)
> CSWP (Cybersecurity White Papers)
https://huggingface.co/datasets/ethanolivertroy/nist-cybersecurity-training
Dataset Summary:
This dataset contains 523,706 training examples extracted from 568 NIST cybersecurity publications, including:
> FIPS (Federal Information Processing Standards)
> SP 800 series (Special Publications)
> IR (Interagency/Internal Reports)
> CSWP (Cybersecurity White Papers)
huggingface.co
ethanolivertroy/nist-cybersecurity-training · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
😱4❤3🔥2
Evaluating LLM agents on Cyber Threat Investigation - https://arxiv.org/pdf/2507.14201 |
https://microsoft.com/en-us/security/blog/2025/10/14/microsoft-raises-the-bar-a-smarter-way-to-measure-ai-for-cybersecurity/ | https://github.com/microsoft/SecRL
We present ExCyTIn-Bench, the first benchmark to Evaluate an LLM agent x on the task of Cyber Threat Investigation through security questions derived from investigation graphs.
Real-world security analysts must sift through a large number of heterogeneous alert signals and security logs, follow multi-hop chains of evidence, and compile an incident report. With the developments of LLMs, building LLM-based agents for automatic thread investigation is a promising direction.
To assist the development and evaluation of LLM agents, we construct a dataset from a controlled Azure tenant that covers 8 simulated real-world multi-step attacks, 57 log tables from Microsoft Sentinel and related services, and 589 automatically generated questions.
We leverage security logs extracted with expert-crafted detection logic to build threat investigation graphs, and then generate questions with LLMs using paired nodes on the graph, taking the start node as background context and the end node as answer
#AI #CyberSecurity #LLMAgents #ThreatInvestigation #DigitalForensics #IncidentResponse #AutoGen #AG2 #MicrosoftSecurity #PennState #Tsinghua #Benchmark
https://microsoft.com/en-us/security/blog/2025/10/14/microsoft-raises-the-bar-a-smarter-way-to-measure-ai-for-cybersecurity/ | https://github.com/microsoft/SecRL
We present ExCyTIn-Bench, the first benchmark to Evaluate an LLM agent x on the task of Cyber Threat Investigation through security questions derived from investigation graphs.
Real-world security analysts must sift through a large number of heterogeneous alert signals and security logs, follow multi-hop chains of evidence, and compile an incident report. With the developments of LLMs, building LLM-based agents for automatic thread investigation is a promising direction.
To assist the development and evaluation of LLM agents, we construct a dataset from a controlled Azure tenant that covers 8 simulated real-world multi-step attacks, 57 log tables from Microsoft Sentinel and related services, and 589 automatically generated questions.
We leverage security logs extracted with expert-crafted detection logic to build threat investigation graphs, and then generate questions with LLMs using paired nodes on the graph, taking the start node as background context and the end node as answer
#AI #CyberSecurity #LLMAgents #ThreatInvestigation #DigitalForensics #IncidentResponse #AutoGen #AG2 #MicrosoftSecurity #PennState #Tsinghua #Benchmark
❤1👍1🔥1
Cybersecurity AI Weekly - Market Insights
https://medium.com/ai-security-hub/cybersecurity-ai-weekly-market-insights-c0d2b8ac7974
https://medium.com/ai-security-hub/cybersecurity-ai-weekly-market-insights-c0d2b8ac7974
Medium
Cybersecurity AI Weekly — Market Insights
This week’s cybersecurity market moves at a glance: notable acquisitions, product launches, research releases, and programs from leading…
🔥1
State of MCP Server Security 2025: 5,200 Servers, Credential Risks - https://astrix.security/learn/blog/state-of-mcp-server-security-2025/
We analyzed over 5,200 unique, open-source MCP server implementations to understand how they manage credentials and what this means for the security of the growing AI agent ecosystem.
- 88% of MCP servers need credentials to function
- 53% rely on static API keys and Personal Access Tokens (PAT)
- Only 8.5% use modern OAuth authentication
- 79% store API keys in basic environment variables
#MCP #ModelContextProtocol #AIAgents #AgentSecurity #CredentialSecurity #SecretsManagement #OAuth #APIKeys #PATs #SecretRotation #LeastPrivilege #AstrixSecurity
We analyzed over 5,200 unique, open-source MCP server implementations to understand how they manage credentials and what this means for the security of the growing AI agent ecosystem.
- 88% of MCP servers need credentials to function
- 53% rely on static API keys and Personal Access Tokens (PAT)
- Only 8.5% use modern OAuth authentication
- 79% store API keys in basic environment variables
#MCP #ModelContextProtocol #AIAgents #AgentSecurity #CredentialSecurity #SecretsManagement #OAuth #APIKeys #PATs #SecretRotation #LeastPrivilege #AstrixSecurity
Astrix Security
State of MCP Server Security 2025: Research Report | Astrix
5K+ MCP servers analysis: 53% use insecure hard-coded credentials. Read the full 2025 research and download the open-source MCP Secret Wrapper to mitigate risks.
👍2
Are autonomous AI agents a credible offensive threat yet, or mostly hype in practice? - https://pacebench.github.io/
The increasing autonomy of LLMs necessitates a rigorous evaluation of their potential to aid in cyber offense. Existing benchmarks often lack real-world complexity and are thus unable to accurately assess LLMs’ cybersecurity capabilities.
To address this gap, we introduce PACEbench, a practical AI cyber-exploitation benchmark built on the principles of realistic vulnerability difficulty, environmental complexity, and cyber defenses.
Specifically, PACEbench comprises four scenarios spanning single, blended, chained, and defense vulnerability exploitations. To handle these complex challenges, we propose PACEagent, a novel agent that emulates human penetration testers by supporting multi-phase reconnaissance, analysis, and exploitation.
Extensive experiments with seven frontier LLMs demonstrate that current models struggle with complex cyber scenarios, and none can bypass defenses.
These findings suggest that current models do not yet pose a generalized cyber offense threat. Nonetheless, our work provides a robust benchmark to guide the trustworthy development of future models
Zicheng Liu, Lige Huang, Jie Zhang, Dongrui Liu, Yuan Tian, Jing Shao - @sjtu1896, @CAS__Science
Source: https://arxiv.org/pdf/2510.11688v1
The increasing autonomy of LLMs necessitates a rigorous evaluation of their potential to aid in cyber offense. Existing benchmarks often lack real-world complexity and are thus unable to accurately assess LLMs’ cybersecurity capabilities.
To address this gap, we introduce PACEbench, a practical AI cyber-exploitation benchmark built on the principles of realistic vulnerability difficulty, environmental complexity, and cyber defenses.
Specifically, PACEbench comprises four scenarios spanning single, blended, chained, and defense vulnerability exploitations. To handle these complex challenges, we propose PACEagent, a novel agent that emulates human penetration testers by supporting multi-phase reconnaissance, analysis, and exploitation.
Extensive experiments with seven frontier LLMs demonstrate that current models struggle with complex cyber scenarios, and none can bypass defenses.
These findings suggest that current models do not yet pose a generalized cyber offense threat. Nonetheless, our work provides a robust benchmark to guide the trustworthy development of future models
Zicheng Liu, Lige Huang, Jie Zhang, Dongrui Liu, Yuan Tian, Jing Shao - @sjtu1896, @CAS__Science
Source: https://arxiv.org/pdf/2510.11688v1
🔥2🤡2👍1
AISecHub
Are autonomous AI agents a credible offensive threat yet, or mostly hype in practice? - https://pacebench.github.io/ The increasing autonomy of LLMs necessitates a rigorous evaluation of their potential to aid in cyber offense. Existing benchmarks often lack…
2510.11688v1.pdf
3.4 MB
PACEbench is an automated benchmarking and comparison platform for penetration-testing agents and security tools. It provides multi-category security tasks, automated environment provisioning, port conflict resolution with flag generation/injection, external agent integration protocols, and automated scoring with result aggregation.
👍1🔥1
Can Task-Based Access Control (TBAC) Become Risk-Adaptive Agentic AI? - https://arxiv.org/pdf/2510.11414
The paper proposes Task-Based Access Control for agentic AI that is both risk-adaptive and uncertainty-aware. Instead of granting static, role-based permissions, an LLM “judge” synthesizes a just-in-time policy for each task, computes a composite risk from the specific tools and resources requested, and estimates its own uncertainty.
A capability token encodes least-privilege access, while policy enforcement points and a human-in-the-loop path handle elevated-risk or high-uncertainty cases. The design emphasizes short-lived credentials, auditable rationales, and reuse/caching for repeated safe tasks.
Enterprises want autonomous agents to act across heterogeneous systems, but predefined RBAC/ABAC rules don’t cover novel, multi-step tasks. Naively letting an LLM choose steps can produce over-privileged plans, fail open under distribution shift, or succumb to prompt-injection. The problem is to authorize new tasks safely, minimizing both under-permission (blocking productivity) and over-permission (creating breach paths), while acknowledging model fallibility and the varying criticality of target systems.
Treat authorization for AI agents as a dynamic risk-management loop, not a static table. Make the planner account for resource criticality and quantify uncertainty; escalate when either is high. Bind decisions to short-lived, least-privilege tokens and enforce at multiple points. Pair automation with explainability and human review, and continuously calibrate risk scores and uncertainty estimates as environments and models change.
Charles Fleming, Ashish Kundu, @rkompella - @Cisco @CiscoSecure
References:
- TBAC - https://profsandhu.com/confrnc/ifip/i97tbac.pdf
- RBAC - https://csrc.nist.gov/csrc/media/projects/role-based-access-control/documents/sandhu96.pdf
#AccessControl #AgenticAI #LLMSecurity #ZeroTrust #RiskManagement #Uncertainty #TBAC #AITrust #SecurityArchitecture #EnterpriseSecurity #AIAgents #CiscoResearch
The paper proposes Task-Based Access Control for agentic AI that is both risk-adaptive and uncertainty-aware. Instead of granting static, role-based permissions, an LLM “judge” synthesizes a just-in-time policy for each task, computes a composite risk from the specific tools and resources requested, and estimates its own uncertainty.
A capability token encodes least-privilege access, while policy enforcement points and a human-in-the-loop path handle elevated-risk or high-uncertainty cases. The design emphasizes short-lived credentials, auditable rationales, and reuse/caching for repeated safe tasks.
Enterprises want autonomous agents to act across heterogeneous systems, but predefined RBAC/ABAC rules don’t cover novel, multi-step tasks. Naively letting an LLM choose steps can produce over-privileged plans, fail open under distribution shift, or succumb to prompt-injection. The problem is to authorize new tasks safely, minimizing both under-permission (blocking productivity) and over-permission (creating breach paths), while acknowledging model fallibility and the varying criticality of target systems.
Treat authorization for AI agents as a dynamic risk-management loop, not a static table. Make the planner account for resource criticality and quantify uncertainty; escalate when either is high. Bind decisions to short-lived, least-privilege tokens and enforce at multiple points. Pair automation with explainability and human review, and continuously calibrate risk scores and uncertainty estimates as environments and models change.
Charles Fleming, Ashish Kundu, @rkompella - @Cisco @CiscoSecure
References:
- TBAC - https://profsandhu.com/confrnc/ifip/i97tbac.pdf
- RBAC - https://csrc.nist.gov/csrc/media/projects/role-based-access-control/documents/sandhu96.pdf
#AccessControl #AgenticAI #LLMSecurity #ZeroTrust #RiskManagement #Uncertainty #TBAC #AITrust #SecurityArchitecture #EnterpriseSecurity #AIAgents #CiscoResearch
🔥3
AISecHub
Can Task-Based Access Control (TBAC) Become Risk-Adaptive Agentic AI? - https://arxiv.org/pdf/2510.11414 The paper proposes Task-Based Access Control for agentic AI that is both risk-adaptive and uncertainty-aware. Instead of granting static, role-based permissions…
2510.11414v1.pdf
327.7 KB
🔥1
Your AI agent is already compromised and you dont even know it - 675 upvotes
https://www.reddit.com/r/AI_Agents/comments/1o7xuhf/your_ai_agent_is_already_compromised_and_you_dont/
https://www.reddit.com/r/AI_Agents/comments/1o7xuhf/your_ai_agent_is_already_compromised_and_you_dont/
🔥2❤1👏1
AI-powered cybersecurity attack flow visualization tool using MITRE ATT&CK
An open-source React application that analyzes cybersecurity articles and generates interactive attack flow visualizations using the MITRE ATT&CK framework.
https://github.com/davidljohnson/flowviz
An open-source React application that analyzes cybersecurity articles and generates interactive attack flow visualizations using the MITRE ATT&CK framework.
https://github.com/davidljohnson/flowviz
🔥2❤1👏1
LLM Agents for Automated Web Vulnerability Reproduction: Are We There Yet?
Our evaluation of 20 LLM agents across 16 core dimensions on 3 representative CVEs identified three top performers (OpenHands, SWE-agent, and CAI) for testing on 80 real-world CVEs spanning 6 web technologies and 7 vulnerability types.
Results show agents handle simple library-based vulnerabilities well but consistently fail on complex service-based scenarios requiring multi-component environments and authentication. The gap between executing exploit code and triggering actual vulnerabilities reveals fundamental limitations in environmental adaptation.
These findings highlight the need for significant advances before automated vulnerability reproduction becomes practical. Current agents lack the robustness required for real-world deployment, showing high sensitivity to input guidance and poor performance under incomplete information.
Bin Liu, @YanjieZhao96, Guoai Xu, Haoyu Wang - @HIT_China, @2024_HUST
#LLMSecurity #Cybersecurity #VulnerabilityReproduction #LLMAgents #WebSecurity #BinaryAnalysis #SoK #CVE #AppSec #SecurityResearch #AgentSecurity #Benchmarking
Our evaluation of 20 LLM agents across 16 core dimensions on 3 representative CVEs identified three top performers (OpenHands, SWE-agent, and CAI) for testing on 80 real-world CVEs spanning 6 web technologies and 7 vulnerability types.
Results show agents handle simple library-based vulnerabilities well but consistently fail on complex service-based scenarios requiring multi-component environments and authentication. The gap between executing exploit code and triggering actual vulnerabilities reveals fundamental limitations in environmental adaptation.
These findings highlight the need for significant advances before automated vulnerability reproduction becomes practical. Current agents lack the robustness required for real-world deployment, showing high sensitivity to input guidance and poor performance under incomplete information.
Bin Liu, @YanjieZhao96, Guoai Xu, Haoyu Wang - @HIT_China, @2024_HUST
#LLMSecurity #Cybersecurity #VulnerabilityReproduction #LLMAgents #WebSecurity #BinaryAnalysis #SoK #CVE #AppSec #SecurityResearch #AgentSecurity #Benchmarking
❤1
OAuth for MCP - Emerging Enterprise Patterns for Agent Authorization - https://blog.gitguardian.com/oauth-for-mcp-emerging-enterprise-patterns-for-agent-authorization/
> OAuth 2.1 is a solid foundation for MCP, but the non-deterministic nature of agent interactions introduce sequence-level risks that classic request-level checks don’t cover.
> Resource Indicators are mandatory and should be scoped aggressively; keep tokens short-lived and server-specific; never let credentials leak into LLM context.
> An emergent pattern at scale, many teams are adopting gateway-based authorization to centralize policy, transform tokens, and create strong audit boundaries.
> OAuth 2.1 is a solid foundation for MCP, but the non-deterministic nature of agent interactions introduce sequence-level risks that classic request-level checks don’t cover.
> Resource Indicators are mandatory and should be scoped aggressively; keep tokens short-lived and server-specific; never let credentials leak into LLM context.
> An emergent pattern at scale, many teams are adopting gateway-based authorization to centralize policy, transform tokens, and create strong audit boundaries.
GitGuardian Blog - Take Control of Your Secrets Security
OAuth for MCP - Emerging Enterprise Patterns for Agent Authorization
Why agents break the old model and require rethinking traditional OAuth patterns.
❤1
Deepfake Image and Video Detection - https://youtu.be/GPqL9_muXJA?si=4wBpPL_ZWLFaVsTb
Performing analysis of fake images and videos can be challenging considering the plethora of techniques that can be used to create a deepfake. In this session, we'll explore methods for identifying fake images and videos whether created by AI, photoshopped, or GAN-generated media.
We'll then use this for the basis of a live demonstration walking through methods of exposing signs of alteration or AI generation using more than a dozen techniques to expose these forgeries. We'll also highlight a free GPT tool for performing your own analysis. Finally, we'll provide additional resources and thoughts for the future of deepfake detection.
#DeepfakeDetection #MediaForensics #ImageForensics #VideoForensics #SyntheticMedia #AIForensics #GANDetection #OSINT #Misinformation #DEFCON
Performing analysis of fake images and videos can be challenging considering the plethora of techniques that can be used to create a deepfake. In this session, we'll explore methods for identifying fake images and videos whether created by AI, photoshopped, or GAN-generated media.
We'll then use this for the basis of a live demonstration walking through methods of exposing signs of alteration or AI generation using more than a dozen techniques to expose these forgeries. We'll also highlight a free GPT tool for performing your own analysis. Finally, we'll provide additional resources and thoughts for the future of deepfake detection.
#DeepfakeDetection #MediaForensics #ImageForensics #VideoForensics #SyntheticMedia #AIForensics #GANDetection #OSINT #Misinformation #DEFCON
YouTube
DEF CON 33 - Deepfake Image and Video Detection - Mike Raggo
Performing analysis of fake images and videos can be challenging considering the plethora of techniques that can be used to create a deepfake. In this session, we'll explore methods for identifying fake images and videos whether created by AI, photoshopped…
❤1