Academic papers, new techniques, benchmarks, and theoretical findings in AI/LLM security.
This research proposes 2PCAutoDL, a system for automatically designing deep neural networks (DNNs, which are AI models with many layers) while keeping data and model designs private by splitting computations between two separate cloud servers. The system balances security and speed by using specialized protocols (step-by-step procedures) for different types of network layers, achieving significant speedups compared to existing approaches while maintaining similar model accuracy.
Researchers discovered a side-channel attack (a method of extracting secret information by analyzing physical properties like power usage rather than breaking encryption directly) called PrivateCharger that can infer what a user is doing on their laptop by analyzing magnetic field signals from the laptop charger from a distance. The attack works with commercially available equipment, requires no physical access to the laptop, and achieved 84.6% accuracy at certain battery levels, revealing that everyday peripherals can leak private information in ways previously not considered.
OptiVersa-ECDSA is a new cryptographic protocol that improves threshold-ECDSA (a method where multiple parties must cooperate to sign blockchain transactions securely). The protocol uses novel techniques called verifiable secret-product sharing (VSPS, a way to distribute and verify secret values) to achieve 35-65% faster performance and 99% improvement in cheater identification compared to previous approaches, making it practical for real-time blockchain use.
Recommender systems (platforms that suggest products or services to users) are vulnerable to data poisoning attacks (malicious manipulation of the data the system learns from to make it behave incorrectly). This paper presents METT, a detection method that identifies these attacks even when they are carefully hidden or small-scale, using techniques like causality inference (analyzing cause-and-effect relationships in user behavior) and a disturbance tolerance mechanism (a way to distinguish real attack patterns from false alarms).
This research paper presents a new method for coverless image steganography (CIS, a technique to hide secret information inside images without visibly altering them), designed to resist black-box attacks (attacks where an attacker can't see how the system works, only its outputs). The method uses SIFT (Scale-Invariant Feature Transform, an algorithm that identifies distinctive points in images), to create a dataset and mapping structure that hides data more securely and with greater capacity than previous CIS methods.
This paper proposes CQT-AKA, a security method for mobile devices that combines cancelable biometrics (fingerprints or facial features that can be regenerated if compromised) with quantum-resistant encryption (protection against future powerful computers) to securely exchange encryption keys between devices. The approach is more secure than traditional methods that rely on passwords or smart cards alone, and it works well on resource-limited devices because it requires less storage and computing power.
This research presents a self-supervised learning (SSL, a training method where an AI learns patterns from unlabeled data without human annotations) framework to help soft robots understand their own body position and movement. The key innovation is that the approach uses large amounts of unannotated data to train an initial model, then fine-tunes it with just a small set of labeled examples, requiring only about 5% of the annotated data that traditional supervised learning methods need while achieving better results.
Researchers studied how humans use two types of thinking (fast intuitive processing and slower logical reasoning) when looking at images, and tested whether AI systems like multimodal large language models (MLLMs, which process both text and images together) have similar abilities. They found that while MLLMs have improved at correcting intuitive errors, they still struggle with logical processing tasks that require deeper analysis, and segmentation models (AI systems that identify objects in images) make errors similar to human intuitive mistakes rather than using logical reasoning.
Researchers developed a new method for watermarking LLM outputs (adding hidden markers to prove ownership and track content) using a three-part system that works only through input prompts, without needing access to the model's internal parameters. The approach uses one AI to create watermarking instructions, another to generate marked outputs, and a third to detect the watermarks, making it work across different LLM types including both proprietary and open-source models.
This research creates a benchmark and evaluation framework for online safety analysis of LLMs, which involves detecting unsafe outputs while the AI is generating text rather than after it finishes. The study tests various safety detection methods on different LLMs and finds that combining multiple methods together, called hybridization, can improve safety detection effectiveness. The work aims to help developers choose appropriate safety methods for their specific applications.
Deep neural networks (DNNs, AI models with multiple layers that learn patterns) are vulnerable to adversarial examples, which are inputs slightly modified to trick the model into making wrong predictions. This paper introduces a concept called the certified local transferable region, a mathematically guaranteed area around an input where a single small perturbation (adversarial attack) will fool the model, and proposes a method called RAOS (reverse attack oracle-based search) to measure how large these vulnerable areas are as a way to evaluate how robust neural networks truly are.
Researchers have developed a method to hide secret data inside large language models (AI systems trained on massive amounts of text) by encoding information into the model's parameters during training. The hidden data doesn't interfere with the model's normal functions like text classification or generation, but authorized users with a secret key can extract the concealed information, enabling covert communication. The method leverages transformers (the neural network architecture behind modern AI language models) and its self-attention mechanisms (components that help the model focus on relevant parts of input) to achieve high capacity for hidden data while remaining undetectable.
N/A -- This content is a navigation menu and feature listing from GitHub's Release 4.9.1 page, not a description of a security issue, vulnerability, or AI/LLM problem.
Differential privacy (DP, a mathematical technique that adds controlled randomness to data to protect individual privacy while keeping data useful) is a widely-used method for protecting sensitive information, but putting it into practice in real-world systems has proven difficult. Researchers analyzed 21 actual deployments of differential privacy by major companies and institutions over the last ten years to understand what works and what doesn't.
This research addresses the problem of stealing attacks against healthcare APIs (application programming interfaces, which are tools that let software systems communicate with each other), where attackers try to copy or extract data from medical AI models. The authors propose a defense strategy called "adaptive teleportation" that modifies incoming queries (requests) in clever ways to fool attackers while still allowing legitimate users to get accurate results from the healthcare API.
Fix: The source proposes 'adaptive teleportation of incoming queries' as the defense mechanism. According to the text, 'The adaptive teleportation operations are generated based on the formulated bi-level optimization target and follows the evolution trajectory depicted by the Wasserstein gradient flows, which effectively push attacking queries to cross decision boundary while constraining the deviation level of benign queries.' This approach 'provides misleading information on malicious queries while preserving model utility.' The authors validated this mechanism on three healthcare prediction tasks (inhospital mortality, bleed risk, and ischemic risk prediction) and found it 'significantly more effective to suppress the performance of cloned model while maintaining comparable serving utility compared to existing defense approaches.'
IEEE Xplore (Security & AI Journals)OWASP's Agentic Security Initiative has created a taxonomy (a classification system for threats and their fixes) that is now being used in real developer tools like PENSAR, SPLX.AI Agentic Radar, and AI&ME to help teams build and test secure agentic AI systems (AI systems that can take actions autonomously). This taxonomy is also informing the development of OWASP's Top 10 for Agentic AI, a list of the most critical security risks in this area.
In Q2 2025, attackers exploited GPT-4.1 by embedding malicious hidden instructions within tool descriptions, a technique called tool poisoning (hiding harmful prompts inside the text that describes what a tool does). When the AI interacted with these poisoned tools, it unknowingly executed unauthorized actions and leaked sensitive data without the user's knowledge.
Fix: The source explicitly mentions these mitigations: implement strict validation and sanitization of tool descriptions, establish permissions and access controls for tool integrations, monitor AI behavior for anomalies during tool execution, and educate developers on secure integration practices. Developers must validate third-party tools and ensure descriptions are free of hidden prompts, and IT teams should audit AI tool integrations and monitor for unusual activity.
OWASP GenAI SecurityCyberRisk Alliance and OWASP (Open Worldwide Application Security Project, a non-profit focused on improving software security) announced a partnership to advance education in application security (protecting software from attacks) and AI security. The collaboration will involve creating shared content, hosting events, and conducting research initiatives together.
As AI development has grown rapidly, organizations struggle with how to actually put responsible AI practices into action beyond just making promises about it. This article describes how two organizations created a five-phase process to embed responsibility pledges (formal commitments to use AI ethically) into their daily practices using a systems approach (treating responsibility as interconnected parts of the whole organization rather than isolated efforts).
Generative AI (AI systems that create new text, code, or images) is a double-edged sword in cybersecurity, helping both defenders and attackers. The case study of a fictional insurance company shows how GenAI can be used to launch cyberattacks (malicious attempts to breach computer systems) and also to defend against them, creating a difficult choice for IT leaders about whether to use AI as a defensive tool or risk falling behind attackers who already have it.