AISecHub – Telegram
AISecHub
1.48K subscribers
550 photos
36 videos
254 files
1.41K links
https://linktr.ee/aisechub managed by AISecHub. Sponsored by: innovguard.com
Download Telegram
Using MCP for Debugging, Reversing, and Threat Analysis: Part 2 - https://whiteknightlabs.com/2025/11/18/using-mcp-for-debugging-reversing-and-threat-analysis-part-2/ by @AlanSguigna

In Part 1 of this article series, I demonstrated the configuration steps for using natural language processing in analyzing a Windows crash dump. In this blog, I dive far deeper, using vibe coding to extend the use of MCP for Windows kernel debugging.
Report_for_Serious_Incidents_under_the_AI_Act_GeneralPurpose_AI.pdf
174.2 KB
AI Act: Commission publishes a reporting template for serious incidents involving general-purpose AI models with systemic risk

The publication of this template promotes consistent and transparent reporting and helps providers demonstrate compliance with the commitments set out in Commitment 9 of the GPAI Code of Practice.

A glowing lightbulb with "AI" and a robot icon inside, surrounded by symbols of data, security, and analytics, is held in a person's hand.
AdobeStock © Supatman

Under Article 55 of the EU AI Act, these providers are required to report relevant information on serious incidents to the AI Office and, where appropriate, to national competent authorities. The Code of Practice for general-purpose AI (GPAI) further operationalises this obligation by specifying the information to be included in such reports as part of demonstrating compliance, while the Commission’s Guidelines for providers of general-purpose AI models help developers understand if they need to comply with the obligations under the AI Act.

Regarding high-risk AI systems, the Commission has already published draft guidance and a reporting template on serious AI incidents and is seeking stakeholders’ feedback until 7 November.

Source: https://digital-strategy.ec.europa.eu/en/library/ai-act-commission-publishes-reporting-template-serious-incidents-involving-general-purpose-ai
👏1
Sabotage Evaluations for Automated AI R&D

CTRL-ALT-DECEIT - AI systems are being deployed to automate and assist with software engineering in the real world, with tools such as GitHub CoPilot, Cursor, Deep Research, and Claude Code [1]. As these systems are deployed in high-stakes and safety-critical domains, their reliability becomes correspondingly important. Notably, the most capable AI systems may first be deployed inside AI companies to automate AI R&D - “potentially significantly accelerating AI development in an unpredictable way"

There is growing evidence that frontier and future AI systems may be misaligned with their developers or users and may intentionally act against human interests.

Misaligned AI systems may have incentives to sabotage efforts to evaluate their own dangerous capabilities, monitor their behaviour or make decisions about their deployment. It is therefore important to ensure that AI systems do not subvert human decision-making in high-stakes domains, like frontier AI R&D.

Source: https://arxiv.org/pdf/2511.09904
GitHub: https://github.com/TeunvdWeij/ctrl-alt-deceit
🔥1🤔1🤯1
Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

The study provides systematic evidence that poetic reformulation degrades refusal behavior across all evaluated model families. When harmful prompts are expressed in verse rather than prose, attack-success rates rise sharply, both for hand-crafted adversarial poems and for the 1,200-item MLCommons corpus transformed through a standardized meta-prompt. The magnitude and consistency of the effect indicate that contemporary alignment pipelines do not generalize across stylistic shifts. The surface form alone is sufficient to move inputs outside the operational distribution on which refusal mechanisms have been optimized.

The cross-model results suggest that the phenomenon is structural rather than provider-specific. Models built using RLHF, Constitutional AI, and hybrid alignment strategies all display elevated vulnerability, with increases ranging from single digits to more than sixty percentage points depending on provider. The effect spans CBRN, cyber-offense, manipulation, privacy, and loss-of-control domains, showing that the bypass does not exploit weakness in any one refusal subsystem but interacts with general alignment heuristics.

Source: https://arxiv.org/pdf/2511.15304
roi_of_ai_in_security_2025.pdf
8.5 MB
The ROI of AI in Security

Google Cloud’s new The ROI of AI in security report showcases how best-in-class organizations are getting value from AI in cybersecurity. For several years, the cybersecurity industry has discussed the potential of artificial intelligence. Much of this discussion has been aspirational, often blurring the lines between true machine learning and more basic automation. However, this report suggests a significant, practical shift is already underway.

Google surveyed 3,466 senior leaders globally to move beyond hype and into the practical discussion of revenue, productivity, and risk reduction. The findings confirm that the conversation is no longer about "if" AI should be used, but "how" it can be scaled to measurably improve security posture.

https://services.google.com/fh/files/misc/roi_of_ai_in_security_2025.pdf
Awesome Claude Skills - Security & Systems

https://github.com/ComposioHQ/awesome-claude-skills
👍3
Agentic AI Security Scoping Matrix

The Agentic AI Security Scoping Matrix provides a structured mental model and framework for understanding and addressing the security challenges of autonomous agentic AI systems across four distinct scopes. By accurately assessing your current scope and implementing appropriate controls across all six security dimensions, organizations can confidently deploy agentic AI while managing the landscape of associated risks.

Source: https://aws.amazon.com/blogs/security/the-agentic-ai-security-scoping-matrix-a-framework-for-securing-autonomous-ai-systems/
1🔥1🤓1
From shortcuts to sabotage: natural emergent misalignment from reward hacking

In the latest research from Anthropic’s alignment team, we show for the first time that realistic AI training processes can accidentally produce misaligned models.

https://www.anthropic.com/research/emergent-misalignment-reward-hacking
👍1🤯1🎄1