AISecHub
2503.01908v3.pdf
A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning
This research introduces UDora, a unified red-teaming framework designed to evaluate and enhance the security of LLM agents by exposing their vulnerabilities to adversarial attacks. While our work aims to improve the robustness and reliability of LLM agents, thereby contributing positively to their safe deployment in real-world applications, it also highlights significant risks associated with their misuse.
The ability to manipulate LLM agents to perform unauthorized actions or access sensitive information underscores the urgent need for stringent security measures and ethical guidelines in the development and deployment of these technologies.
Source: https://arxiv.org/abs/2503.01908
This research introduces UDora, a unified red-teaming framework designed to evaluate and enhance the security of LLM agents by exposing their vulnerabilities to adversarial attacks. While our work aims to improve the robustness and reliability of LLM agents, thereby contributing positively to their safe deployment in real-world applications, it also highlights significant risks associated with their misuse.
The ability to manipulate LLM agents to perform unauthorized actions or access sensitive information underscores the urgent need for stringent security measures and ethical guidelines in the development and deployment of these technologies.
Source: https://arxiv.org/abs/2503.01908
❤1👨💻1
EchoGram: The Hidden Vulnerability Undermining AI Guardrails - https://hiddenlayer.com/innovation-hub/echogram-the-hidden-vulnerability-undermining-ai-guardrails/
HiddenLayer | Security for AI
EchoGram: The Hidden Vulnerability Undermining AI Guardrails
HiddenLayer unveils EchoGram, a new attack technique that manipulates AI guardrails protecting LLMs like GPT-4, Claude, and Gemini.
👏1
Hacking Gemini: A Multi-Layered Approach
https://buganizer.cc/hacking-gemini-a-multi-layered-approach-md
https://buganizer.cc/hacking-gemini-a-multi-layered-approach-md
🔥2🎄2👍1
This media is not supported in your browser
VIEW IN TELEGRAM
AI-Assisted Reverse Engineering with Ghidra - https://github.com/biniamf/ai-reverse-engineering
🔥1🦄1
From Deepfake Scams to Poisoned Chatbots: AI and Election Security in 2025
https://cetas.turing.ac.uk/publications/deepfake-scams-poisoned-chatbots
This year will see more than 100 national elections take place in countries ranging from Argentina and Canada to Singapore and Germany. They follow the so-called Super Election Year of 2024 – in which, with almost two billion voters heading to the polls, CETaS closely monitored various electoral campaigns around the world to understand how AI tools were contributing to the spread of false information and harmful content. These tools have rapidly evolved since then. So, what has changed about the threat they pose to election security?
This Expert Analysis explores new threat vectors that have emerged in elections throughout 2025 – building on the data collected in our three previous reports. Specifically, the article focuses on the ways that AI tools have been misused in recent elections for: illicit financial scams; attempts to manipulate results by reducing voter turnout; and the dissemination of even more sophisticated disinformation, which complicates debunking efforts. Additionally, the article highlights how such misuse is being exacerbated by the actions of some political parties, as well as by data-poisoning attacks on chatbots. Finally, it proposes a series of measures that could help protect elections in 2026 against AI-enabled information threats.
https://cetas.turing.ac.uk/publications/deepfake-scams-poisoned-chatbots
This year will see more than 100 national elections take place in countries ranging from Argentina and Canada to Singapore and Germany. They follow the so-called Super Election Year of 2024 – in which, with almost two billion voters heading to the polls, CETaS closely monitored various electoral campaigns around the world to understand how AI tools were contributing to the spread of false information and harmful content. These tools have rapidly evolved since then. So, what has changed about the threat they pose to election security?
This Expert Analysis explores new threat vectors that have emerged in elections throughout 2025 – building on the data collected in our three previous reports. Specifically, the article focuses on the ways that AI tools have been misused in recent elections for: illicit financial scams; attempts to manipulate results by reducing voter turnout; and the dissemination of even more sophisticated disinformation, which complicates debunking efforts. Additionally, the article highlights how such misuse is being exacerbated by the actions of some political parties, as well as by data-poisoning attacks on chatbots. Finally, it proposes a series of measures that could help protect elections in 2026 against AI-enabled information threats.
Centre for Emerging Technology and Security
From Deepfake Scams to Poisoned Chatbots: AI and Election Security in 2025
An Automated Framework for Strategy Discovery, Retrieval, and Evolution in LLM Jailbreak Attacks
In this paper, we propose ASTRA, a modular black-box automated jailbreak framework equipped with continuous learning capabilities. Through a closed-loop "attack–evaluate–distill–reuse" mechanism, ASTRA transforms every attack interaction into retrievable and transferable strategic knowledge.
Centered around three core modules, i.e., the Attack Designer, Strategy Extractor, and Strategy Storage & Retrieval, ASTRA achieves the self-accumulation and adaptive evolution of strategies without the need for jailbreak templates or internal model information.
Extensive experiments demonstrate that ASTRA significantly outperforms existing jailbreak attack methods in terms of both attack success rate and attack efficiency, and that its learned strategies exhibit exceptional cross-dataset and cross-model transferability. These findings not only validate ASTRA’s effectiveness but also highlight the severe challenges that current safety alignments face against adaptive, continuously evolving attacks. ASTRA serves as a critical tool for red teaming and provides new insights for the design of future large model safety defense mechanisms.
Source: https://arxiv.org/pdf/2511.02356
In this paper, we propose ASTRA, a modular black-box automated jailbreak framework equipped with continuous learning capabilities. Through a closed-loop "attack–evaluate–distill–reuse" mechanism, ASTRA transforms every attack interaction into retrievable and transferable strategic knowledge.
Centered around three core modules, i.e., the Attack Designer, Strategy Extractor, and Strategy Storage & Retrieval, ASTRA achieves the self-accumulation and adaptive evolution of strategies without the need for jailbreak templates or internal model information.
Extensive experiments demonstrate that ASTRA significantly outperforms existing jailbreak attack methods in terms of both attack success rate and attack efficiency, and that its learned strategies exhibit exceptional cross-dataset and cross-model transferability. These findings not only validate ASTRA’s effectiveness but also highlight the severe challenges that current safety alignments face against adaptive, continuously evolving attacks. ASTRA serves as a critical tool for red teaming and provides new insights for the design of future large model safety defense mechanisms.
Source: https://arxiv.org/pdf/2511.02356
❤2🔥1
ShadowRay 2.0: Attackers Turn AI Against Itself in Global Campaign that Hijacks AI Into Self-Propagating Botnet - https://www.oligo.security/blog/shadowray-2-0-attackers-turn-ai-against-itself-in-global-campaign-that-hijacks-ai-into-self-propagating-botnet
www.oligo.security
ShadowRay 2.0: Active Global Campaign Hijacks Ray AI Infrastructure Into Self-Propagating Botnet | Oligo Security
Oligo Security uncovers ShadowRay 2.0, an active global campaign exploiting Ray to hijack AI infrastructure and create a self-propagating botnet.
❤1🔥1
Using MCP for Debugging, Reversing, and Threat Analysis: Part 2 - https://whiteknightlabs.com/2025/11/18/using-mcp-for-debugging-reversing-and-threat-analysis-part-2/ by @AlanSguigna
In Part 1 of this article series, I demonstrated the configuration steps for using natural language processing in analyzing a Windows crash dump. In this blog, I dive far deeper, using vibe coding to extend the use of MCP for Windows kernel debugging.
In Part 1 of this article series, I demonstrated the configuration steps for using natural language processing in analyzing a Windows crash dump. In this blog, I dive far deeper, using vibe coding to extend the use of MCP for Windows kernel debugging.
White Knight Labs
Using MCP for Debugging, Reversing, and Threat Analysis: Part 2 | White Knight Labs
In Part 1 of this article series, I demonstrated the configuration steps for using natural language processing in analyzing a Windows crash dump. In this
Report_for_Serious_Incidents_under_the_AI_Act_GeneralPurpose_AI.pdf
174.2 KB
AI Act: Commission publishes a reporting template for serious incidents involving general-purpose AI models with systemic risk
The publication of this template promotes consistent and transparent reporting and helps providers demonstrate compliance with the commitments set out in Commitment 9 of the GPAI Code of Practice.
A glowing lightbulb with "AI" and a robot icon inside, surrounded by symbols of data, security, and analytics, is held in a person's hand.
AdobeStock © Supatman
Under Article 55 of the EU AI Act, these providers are required to report relevant information on serious incidents to the AI Office and, where appropriate, to national competent authorities. The Code of Practice for general-purpose AI (GPAI) further operationalises this obligation by specifying the information to be included in such reports as part of demonstrating compliance, while the Commission’s Guidelines for providers of general-purpose AI models help developers understand if they need to comply with the obligations under the AI Act.
Regarding high-risk AI systems, the Commission has already published draft guidance and a reporting template on serious AI incidents and is seeking stakeholders’ feedback until 7 November.
Source: https://digital-strategy.ec.europa.eu/en/library/ai-act-commission-publishes-reporting-template-serious-incidents-involving-general-purpose-ai
The publication of this template promotes consistent and transparent reporting and helps providers demonstrate compliance with the commitments set out in Commitment 9 of the GPAI Code of Practice.
A glowing lightbulb with "AI" and a robot icon inside, surrounded by symbols of data, security, and analytics, is held in a person's hand.
AdobeStock © Supatman
Under Article 55 of the EU AI Act, these providers are required to report relevant information on serious incidents to the AI Office and, where appropriate, to national competent authorities. The Code of Practice for general-purpose AI (GPAI) further operationalises this obligation by specifying the information to be included in such reports as part of demonstrating compliance, while the Commission’s Guidelines for providers of general-purpose AI models help developers understand if they need to comply with the obligations under the AI Act.
Regarding high-risk AI systems, the Commission has already published draft guidance and a reporting template on serious AI incidents and is seeking stakeholders’ feedback until 7 November.
Source: https://digital-strategy.ec.europa.eu/en/library/ai-act-commission-publishes-reporting-template-serious-incidents-involving-general-purpose-ai
👏1
New Benchmark for Evaluating LLMs on Patching Real-World Vulnerabilities
https://github.com/bytedance/PatchEval
https://github.com/bytedance/PatchEval
GitHub
GitHub - bytedance/PatchEval: PatchEval: A New Benchmark for Evaluating LLMs on Patching Real-World Vulnerabilities
PatchEval: A New Benchmark for Evaluating LLMs on Patching Real-World Vulnerabilities - bytedance/PatchEval
🔥1🫡1😎1
Software quality's collapse: How AI is accelerating decline
https://www.reversinglabs.com/blog/software-quality-collapse-ai-accelerate
https://www.reversinglabs.com/blog/software-quality-collapse-ai-accelerate
ReversingLabs
Software quality's collapse: How AI is accelerating decline | ReversingLabs
Development is in freefall toward software entropy and insecurity. Can spec-driven development help?
Sabotage Evaluations for Automated AI R&D
CTRL-ALT-DECEIT - AI systems are being deployed to automate and assist with software engineering in the real world, with tools such as GitHub CoPilot, Cursor, Deep Research, and Claude Code [1]. As these systems are deployed in high-stakes and safety-critical domains, their reliability becomes correspondingly important. Notably, the most capable AI systems may first be deployed inside AI companies to automate AI R&D - “potentially significantly accelerating AI development in an unpredictable way"
There is growing evidence that frontier and future AI systems may be misaligned with their developers or users and may intentionally act against human interests.
Misaligned AI systems may have incentives to sabotage efforts to evaluate their own dangerous capabilities, monitor their behaviour or make decisions about their deployment. It is therefore important to ensure that AI systems do not subvert human decision-making in high-stakes domains, like frontier AI R&D.
Source: https://arxiv.org/pdf/2511.09904
GitHub: https://github.com/TeunvdWeij/ctrl-alt-deceit
CTRL-ALT-DECEIT - AI systems are being deployed to automate and assist with software engineering in the real world, with tools such as GitHub CoPilot, Cursor, Deep Research, and Claude Code [1]. As these systems are deployed in high-stakes and safety-critical domains, their reliability becomes correspondingly important. Notably, the most capable AI systems may first be deployed inside AI companies to automate AI R&D - “potentially significantly accelerating AI development in an unpredictable way"
There is growing evidence that frontier and future AI systems may be misaligned with their developers or users and may intentionally act against human interests.
Misaligned AI systems may have incentives to sabotage efforts to evaluate their own dangerous capabilities, monitor their behaviour or make decisions about their deployment. It is therefore important to ensure that AI systems do not subvert human decision-making in high-stakes domains, like frontier AI R&D.
Source: https://arxiv.org/pdf/2511.09904
GitHub: https://github.com/TeunvdWeij/ctrl-alt-deceit
🔥1🤔1🤯1
Ollama Remote Code Execution: Securing the Code That Runs LLMs
https://www.sonarsource.com/blog/ollama-remote-code-execution-securing-the-code-that-runs-llms/
https://www.sonarsource.com/blog/ollama-remote-code-execution-securing-the-code-that-runs-llms/
Sonarsource
Ollama Remote Code Execution: Securing the Code That Runs LLMs
Our Vulnerability Researchers uncovered vulnerabilities in the code of Ollama, a popular tool to run LLMs locally. Dive into the details of how LLMs are implemented and what can go wrong.
❤2
The Rise of AI-Powered Scam Assembly Lines - https://www.trendmicro.com/vinfo/us/security/news/cybercrime-and-digital-threats/reimagining-fraud-operations-the-rise-of-ai-powered-scam-assembly-lines
Trendmicro
Reimagining Fraud Operations: The Rise of AI-Powered Scam Assembly Lines
Trend™ Research replicated an AI-powered scam assembly line to reveal how AI is eradicating the barrier for entry to running scams, making fraud easier to run, harder to detect, and effortless to scale.