Academic papers, new techniques, benchmarks, and theoretical findings in AI/LLM security.
This academic survey article examines agentic AI in healthcare, which refers to AI systems that can independently plan and execute tasks to accomplish goals. The article discusses both the potential benefits of using such AI systems in medical settings and the technical, ethical, and practical obstacles that need to be addressed. The survey provides an overview of current research directions for developing safer and more effective autonomous AI agents in healthcare applications.
This academic survey examines hallucinations in large visual and language models, which are instances where AI systems generate false or nonsensical information that appears plausible. The paper, published in ACM Computing Surveys in October 2026, provides a comprehensive overview spanning 36 pages of research on this problem affecting both language models (AI systems trained on text) and multimodal models (AI systems that process both images and text).
This academic survey examines harmful fine-tuning attacks (methods where attackers modify an AI model's training process to make it behave dangerously) and the defenses designed to stop them. The paper reviews different types of attacks, how they work, and various protection strategies researchers have developed to keep large language models safe from this threat.
This academic survey paper examines metrics, or measurement methods, used to evaluate privacy-preserving generative models (AI systems that create new data while protecting personal information). The paper provides a comprehensive overview of different ways researchers measure how well these models protect privacy while still functioning effectively.
SALT is a watermarking technique for diffusion models (AI systems that generate images by gradually removing noise from random data) that uses semantic guidance and adaptive latent space truncation to embed hidden ownership marks. The method aims to protect diffusion models from unauthorized use while maintaining the quality of generated images. This research addresses the need for better ownership verification and copyright protection in generative AI systems.
Researchers used OpenAI o3 Deep Research, an AI reasoning model, to re-analyze 376 previously unsolved rare genetic disease cases by connecting clinical data, genetic variants, and scientific literature into evidence-based explanations for human experts to review. After specialist evaluation and clinical confirmation, the AI-assisted workflow helped establish new diagnoses in 18 cases (4.8% additional diagnostic yield), with the model generating hypotheses rather than making medical decisions itself. This demonstrates how periodic AI-assisted reanalysis could help scale the process of solving rare disease cases as medical knowledge evolves.
This research paper presents a new security framework called TFSEA that combines feature selection (choosing which data points matter most), classification (sorting data into categories), and authentication (verifying user identity) to detect unauthorized access attempts in cloud computing environments. The paper proposes using this hybrid approach to improve how well systems can identify and prevent intrusions in cloud infrastructure.
This article examines how ransomware (malicious software that locks files and demands payment to unlock them) defense strategies need to change as generative AI (AI systems that create new content like text or code) becomes more common. The piece suggests that traditional security approaches may be less effective in an environment where AI is widely used.
This academic survey paper reviews methods for testing how well neural networks (AI systems trained to recognize patterns in data) perform when faced with unexpected or manipulated images. The paper examines various approaches researchers use to assess whether image recognition systems remain accurate and reliable under challenging conditions.
Researchers discovered that they can figure out what actions industrial robots are performing just by analyzing encrypted network traffic (data traveling across networks in scrambled form) without being able to read the actual messages. The study shows both practical attacks that successfully identified robot movements and theoretical limits on how much information can be extracted from this type of traffic. This reveals a security gap where encryption alone may not fully protect sensitive robot operations from being monitored.
This paper presents a new cryptographic method called certificateless lattice-based matchmaking encryption (CLLME) designed to secure data sharing on cloud platforms while meeting regulations like GDPR. CLLME provides post-quantum security (protection against future quantum computers), allows both senders and receivers to control who can access data, and includes a filtering mechanism to avoid decrypting irrelevant encrypted files. The researchers proved the method is mathematically secure and showed it works efficiently in real-world scenarios.
Thumbnail-preserving encryption (TPE, a method that keeps some visual information visible in encrypted images to balance usability and privacy) has a security weakness: existing approaches encrypt pixels, blocks, or channels separately, creating vulnerabilities. Researchers propose a new 'triple-chain architecture' that links encryption at three levels (pixels, blocks, and channels) so that any small change to an image causes completely different encryption results, making the system more secure while still maintaining TPE benefits.
This research paper describes methods for making neural networks (AI models that learn patterns from data) more private by using fully homomorphic encryption (a type of encryption that lets computers perform calculations on encrypted data without decrypting it first). The work focuses on optimizing how these privacy-protecting neural networks search through and train on data while keeping information secure.
This research paper proposes a new cryptographic method for securing communication in IoT (Internet of Things) devices that is lightweight and preserves privacy. The scheme uses certificateless signcryption (a technique that combines digital signatures for authentication with encryption for confidentiality, without requiring traditional certificates) and designated-verifier privacy (meaning only a chosen recipient can verify that a message is authentic), designed to work efficiently on resource-constrained IoT devices.
This research paper describes a watermarking technique that allows AI model creators to prove they own their models without revealing the watermark during normal use. The watermark remains hidden when the model is deployed but becomes detectable when the model is updated, helping prevent unauthorized copying or theft of AI models.
Researchers demonstrated that large language models (AI systems trained on vast text data) can be used to generate attack strategies against industrial control systems (the computers that manage power plants, factories, and critical infrastructure). The study shows a concerning security risk where these powerful AI tools could be misused to help attackers plan harmful activities against systems that society depends on.
This academic publication examines security vulnerabilities in the mechanisms that deliver software updates to computers and systems. The article, published in June 2026, analyzes how attackers might exploit the update process itself to compromise systems, rather than targeting the software after it's already installed.
Researchers describe a method for creating hidden communication channels within networks by using hash-based filtering to disguise data inside normal-looking network traffic. This technique, called a covert channel (a hidden path for sending information that shouldn't be detectable), could allow attackers to secretly send data through systems without being noticed by security tools.
Existing model fingerprinting techniques (methods that create unique digital signatures to prove ownership of AI models) are vulnerable to false claim attacks, where attackers can fraudulently claim they own models they didn't create. This paper introduces FIT-Print, a targeted fingerprinting approach that uses optimization to create verifiable signatures resistant to these false claims, offering two specific methods (bit-wise FIT-ModelDiff and list-wise FIT-LIME) that achieved 100% success in preventing false ownership claims while maintaining accurate ownership verification.
Fix: The paper proposes FIT-Print, a targeted fingerprinting paradigm that 'actively counters false claim attacks' by leveraging 'optimization to transform the fingerprint into a verifiable, targeted signature.' Two specific black-box fingerprinting methods are introduced: 'bit-wise FIT-ModelDiff' which 'utilizes output distances' and 'list-wise FIT-LIME' which utilizes 'feature attributions as robust model signatures.' The framework demonstrated '100% defense success rate' against false claim attacks and '100% ownership verification rate.'
IEEE Xplore (Security & AI Journals)This research proposes TAPGuard, a framework for detecting cascading threats in Trigger-Action Programming (TAP, a system where one event automatically triggers another action, commonly used in smart home devices). The framework uses large language models (AI systems trained on text) to understand the semantic meaning (the actual intent and meaning, not just the structure) of automation rules and identifies two types of threats: explicit ones from direct device interactions and implicit ones from rules sharing environmental variables that shouldn't interact. TAPGuard performs better than existing methods at catching these dangerous rule combinations.