AISecHub – Telegram
AISecHub
1.49K subscribers
555 photos
36 videos
254 files
1.42K links
https://linktr.ee/aisechub managed by AISecHub. Sponsored by: innovguard.com
Download Telegram
AISecHub
2503.01908v3.pdf
A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning

This research introduces UDora, a unified red-teaming framework designed to evaluate and enhance the security of LLM agents by exposing their vulnerabilities to adversarial attacks. While our work aims to improve the robustness and reliability of LLM agents, thereby contributing positively to their safe deployment in real-world applications, it also highlights significant risks associated with their misuse.

The ability to manipulate LLM agents to perform unauthorized actions or access sensitive information underscores the urgent need for stringent security measures and ethical guidelines in the development and deployment of these technologies.

Source: https://arxiv.org/abs/2503.01908
1👨‍💻1
🔥2🎄2👍1
From Deepfake Scams to Poisoned Chatbots: AI and Election Security in 2025
https://cetas.turing.ac.uk/publications/deepfake-scams-poisoned-chatbots

This year will see more than 100 national elections take place in countries ranging from Argentina and Canada to Singapore and Germany. They follow the so-called Super Election Year of 2024 – in which, with almost two billion voters heading to the polls, CETaS closely monitored various electoral campaigns around the world to understand how AI tools were contributing to the spread of false information and harmful content. These tools have rapidly evolved since then. So, what has changed about the threat they pose to election security?

This Expert Analysis explores new threat vectors that have emerged in elections throughout 2025 – building on the data collected in our three previous reports. Specifically, the article focuses on the ways that AI tools have been misused in recent elections for: illicit financial scams; attempts to manipulate results by reducing voter turnout; and the dissemination of even more sophisticated disinformation, which complicates debunking efforts. Additionally, the article highlights how such misuse is being exacerbated by the actions of some political parties, as well as by data-poisoning attacks on chatbots. Finally, it proposes a series of measures that could help protect elections in 2026 against AI-enabled information threats.
An Automated Framework for Strategy Discovery, Retrieval, and Evolution in LLM Jailbreak Attacks

In this paper, we propose ASTRA, a modular black-box automated jailbreak framework equipped with continuous learning capabilities. Through a closed-loop "attack–evaluate–distill–reuse" mechanism, ASTRA transforms every attack interaction into retrievable and transferable strategic knowledge.

Centered around three core modules, i.e., the Attack Designer, Strategy Extractor, and Strategy Storage & Retrieval, ASTRA achieves the self-accumulation and adaptive evolution of strategies without the need for jailbreak templates or internal model information.

Extensive experiments demonstrate that ASTRA significantly outperforms existing jailbreak attack methods in terms of both attack success rate and attack efficiency, and that its learned strategies exhibit exceptional cross-dataset and cross-model transferability. These findings not only validate ASTRA’s effectiveness but also highlight the severe challenges that current safety alignments face against adaptive, continuously evolving attacks. ASTRA serves as a critical tool for red teaming and provides new insights for the design of future large model safety defense mechanisms.

Source: https://arxiv.org/pdf/2511.02356
2🔥1
Using MCP for Debugging, Reversing, and Threat Analysis: Part 2 - https://whiteknightlabs.com/2025/11/18/using-mcp-for-debugging-reversing-and-threat-analysis-part-2/ by @AlanSguigna

In Part 1 of this article series, I demonstrated the configuration steps for using natural language processing in analyzing a Windows crash dump. In this blog, I dive far deeper, using vibe coding to extend the use of MCP for Windows kernel debugging.
Report_for_Serious_Incidents_under_the_AI_Act_GeneralPurpose_AI.pdf
174.2 KB
AI Act: Commission publishes a reporting template for serious incidents involving general-purpose AI models with systemic risk

The publication of this template promotes consistent and transparent reporting and helps providers demonstrate compliance with the commitments set out in Commitment 9 of the GPAI Code of Practice.

A glowing lightbulb with "AI" and a robot icon inside, surrounded by symbols of data, security, and analytics, is held in a person's hand.
AdobeStock © Supatman

Under Article 55 of the EU AI Act, these providers are required to report relevant information on serious incidents to the AI Office and, where appropriate, to national competent authorities. The Code of Practice for general-purpose AI (GPAI) further operationalises this obligation by specifying the information to be included in such reports as part of demonstrating compliance, while the Commission’s Guidelines for providers of general-purpose AI models help developers understand if they need to comply with the obligations under the AI Act.

Regarding high-risk AI systems, the Commission has already published draft guidance and a reporting template on serious AI incidents and is seeking stakeholders’ feedback until 7 November.

Source: https://digital-strategy.ec.europa.eu/en/library/ai-act-commission-publishes-reporting-template-serious-incidents-involving-general-purpose-ai
👏1
Sabotage Evaluations for Automated AI R&D

CTRL-ALT-DECEIT - AI systems are being deployed to automate and assist with software engineering in the real world, with tools such as GitHub CoPilot, Cursor, Deep Research, and Claude Code [1]. As these systems are deployed in high-stakes and safety-critical domains, their reliability becomes correspondingly important. Notably, the most capable AI systems may first be deployed inside AI companies to automate AI R&D - “potentially significantly accelerating AI development in an unpredictable way"

There is growing evidence that frontier and future AI systems may be misaligned with their developers or users and may intentionally act against human interests.

Misaligned AI systems may have incentives to sabotage efforts to evaluate their own dangerous capabilities, monitor their behaviour or make decisions about their deployment. It is therefore important to ensure that AI systems do not subvert human decision-making in high-stakes domains, like frontier AI R&D.

Source: https://arxiv.org/pdf/2511.09904
GitHub: https://github.com/TeunvdWeij/ctrl-alt-deceit
🔥1🤔1🤯1