Academic papers, new techniques, benchmarks, and theoretical findings in AI/LLM security.
In Q2 2025, attackers exploited GPT-4.1 by embedding malicious hidden instructions within tool descriptions, a technique called tool poisoning (hiding harmful prompts inside the text that describes what a tool does). When the AI interacted with these poisoned tools, it unknowingly executed unauthorized actions and leaked sensitive data without the user's knowledge.
Fix: The source explicitly mentions these mitigations: implement strict validation and sanitization of tool descriptions, establish permissions and access controls for tool integrations, monitor AI behavior for anomalies during tool execution, and educate developers on secure integration practices. Developers must validate third-party tools and ensure descriptions are free of hidden prompts, and IT teams should audit AI tool integrations and monitor for unusual activity.
OWASP GenAI SecurityCyberRisk Alliance and OWASP (Open Worldwide Application Security Project, a non-profit focused on improving software security) announced a partnership to advance education in application security (protecting software from attacks) and AI security. The collaboration will involve creating shared content, hosting events, and conducting research initiatives together.
Generative AI (AI systems that create new text, code, or images) is a double-edged sword in cybersecurity, helping both defenders and attackers. The case study of a fictional insurance company shows how GenAI can be used to launch cyberattacks (malicious attempts to breach computer systems) and also to defend against them, creating a difficult choice for IT leaders about whether to use AI as a defensive tool or risk falling behind attackers who already have it.
As AI development has grown rapidly, organizations struggle with how to actually put responsible AI practices into action beyond just making promises about it. This article describes how two organizations created a five-phase process to embed responsibility pledges (formal commitments to use AI ethically) into their daily practices using a systems approach (treating responsibility as interconnected parts of the whole organization rather than isolated efforts).
AI agents (automated systems that can take actions based on AI decisions) are easy to build with modern tools, but they face several security threats. The OWASP Gen AI Security Project held a hackathon in New York where participants intentionally created insecure agents to identify common security problems.
As AI systems start connecting to real tools and databases through the Model Context Protocol (MCP, a system that lets AI models interact with external applications and data), new security risks appear that older security methods cannot fully handle. The OWASP GenAI Security Project has released research on how to secure MCP, offering defense-in-depth strategies (a layered security approach using multiple protective measures) to help developers build safer AI applications that can act independently in real time.
Version 4.9.0 is a release of the MITRE ATLAS framework, which documents attack techniques and defenses specific to AI systems. The update adds new attack methods like reverse shells (unauthorized remote access to a system), model corruption, and supply chain attacks targeting AI tools, while also updating existing security techniques and adding real-world case studies of AI-related security breaches.
Researchers created the Virology Capabilities Test (VCT), a benchmark measuring how well AI systems can solve complex virology lab problems, and found that leading AI models like OpenAI's o3 now outperform human experts in specialized virology knowledge. This is concerning because virology knowledge has dual-use potential, meaning the same capabilities that could help prevent disease could also be misused by bad actors to develop dangerous pathogens.
Fix: The authors recommend that highly dual-use virology capabilities should be excluded from publicly-available AI systems, and know-your-customer mechanisms (verification processes to confirm who customers are and what they'll use the technology for) could ensure these capabilities remain accessible only to researchers in institutions with appropriate safety protocols. As a result of the paper, xAI has added new safeguards to their systems.
CAIS AI Safety NewsletterThe OWASP Generative AI Security Project, an organization focused on application security, announced nine new corporate sponsors to support efforts in improving security for generative AI technologies. The sponsors, including companies like ByteDance and Trend Micro, represent increased investment and momentum in making AI systems more secure.
OWASP (Open Worldwide Application Security Project, a nonprofit that helps organizations secure their software) has renamed and promoted its OWASP Top 10 for LLM (large language model, an AI trained on massive amounts of text data) project to the OWASP Gen AI Security Project, expanding its focus from just listing AI vulnerabilities to providing broader guidance on governance, risk management, and compliance for generative AI systems. The project now includes over 600 experts from 18 countries and has published new resources like the Agentic AI Threats and Mitigations Guide (addressing security risks in autonomous AI systems) along with translations in six additional languages.
This content is a product navigation page for GitHub v4.8.0, listing features related to AI code creation, developer workflows, application security, and enterprise solutions. It does not contain technical information about a specific AI or LLM vulnerability, bug, or security issue.