Academic papers, new techniques, benchmarks, and theoretical findings in AI/LLM security.
FreqTransNet is a new deep learning watermarking model that combines convolutional modules, Transformer structures (neural networks that use self-attention to understand relationships between distant parts of data), and frequency-domain transformations (mathematical techniques that analyze images by breaking them into component frequencies) to embed invisible marks into images more robustly. The model outperforms existing watermarking methods, achieving better visual quality and maintaining over 97% accuracy in extracting watermarks even when images are attacked or modified.
This research paper presents RASE, a new method for collecting data from Internet of Things devices (connected sensors that gather information) while protecting people's privacy from disclosure attacks (attempts to figure out what specific individuals' data is). RASE works in three steps: first adding noise (random false data) to real readings, then randomly scrambling who sent what data so senders can't be linked to receivers, and finally calculating approximate totals from the scrambled data.
A Decentralized Randomness Beacon (DRB, a cryptographic tool that generates unpredictable, publicly verifiable randomness for distributed systems) is used in blockchain and distributed networks, but existing protocols like GRandLine and RandFlash expose participants' identities during leader election, risking privacy leaks. The paper proposes PADRE, a new privacy-aware DRB protocol that hides participant identities while maintaining security and efficiency, using a new cryptographic primitive called anonymous threshold verifiable random functions (ATVRF).
This research identifies how attackers can exploit non-control data in the Linux kernel's file system (the part of the OS that manages files and folders) to gain unauthorized access, even when control-flow integrity (a security technique that prevents attackers from hijacking program execution paths) is in place. The study developed a framework to automatically find vulnerable data objects and demonstrated 10 working attacks against the kernel with modern security protections enabled.
This paper surveys eXplainable AI (XAI, methods for making AI decisions understandable to humans) in visual recognition, which is increasingly important for safety-critical applications like autonomous driving and medical diagnostics. The survey organizes XAI approaches by intent, object, presentation, and methodology, and also examines how interpretability applies to Multimodal Large Language Models (AI systems that process and combine text, images, and other data types).
This research addresses negative transfer, which occurs when an AI model performs worse after trying to apply knowledge from one domain (a labeled dataset) to a different domain (an unlabeled dataset) due to significant differences between them. The study identifies that models relying too heavily on non-causal environmental features (irrelevant details that don't actually cause predictions) creates disagreement across domains, harming performance. The proposed solution, called RED (Reducing Environmental Disagreement), separates each sample into causal features (the truly relevant information) and non-causal environmental features, then reduces the disagreement between domains based on these environmental features.
Fix: The proposed solution is RED (Reducing Environmental Disagreement), which "disentangles each sample into domain-invariant causal features and domain-specific non-causal environmental features via adversarially training domain-specific environmental feature extractors in the opposite domains. Subsequently, RED estimates and reduces environmental disagreement based on domain-specific non-causal environmental features."
IEEE Xplore (Security & AI Journals)Researchers developed PING (Positive-Incentive Noise Generator), a new method that adds carefully designed noise to protect private data in decentralized learning (where multiple computers train AI models together without sending raw data to a central server) while keeping the learning process efficient. The method uses network connections and lightweight encryption to create correlated noise (noise patterns that work together), and builds on this to create PP-DPIN, an algorithm that combines differential privacy (a mathematical technique for protecting individual data points) and information theory to ensure strong privacy guarantees for at least half the computers involved.
AI systems receive instructions from multiple sources (system policies, developers, users, and online data), and models must learn to prioritize the most trustworthy ones to stay safe. When models treat untrusted instructions as authoritative, they can be tricked into revealing private information, following harmful requests, or falling victim to prompt injection (hidden malicious instructions hidden in input data). OpenAI's solution uses a clear instruction hierarchy (System > developer > user > tool) and trains models with IH-Challenge, a reinforcement learning dataset designed to teach models to follow high-priority instructions even when lower-priority ones conflict with them.
Fix: OpenAI's models are trained on a clear instruction hierarchy where System instructions have highest priority, followed by developer instructions, then user instructions, then tool outputs. The company also created IH-Challenge, a reinforcement learning training dataset that generates conversations with conflicting instructions where high-priority instructions are kept simple and objectively gradable, ensuring models learn to prioritize correctly without resorting to useless shortcuts like over-refusing benign requests.
OpenAI BlogResearchers demonstrated a practical differential fault attack (an exploit that deliberately introduces errors into a system to extract secrets) against GEA-1 and GEA-2, the stream ciphers (algorithms that encrypt data bit-by-bit) used to protect GPRS (General Packet Radio Service, a mobile data standard) communications between phones and base stations. By identifying the exact location where faults occur in the cipher, attackers can recover the 64-bit secret keys in about 16 minutes on a standard laptop. Many current phones still support these outdated ciphers, making them vulnerable.
Multi-LLM collaborative systems (setups where multiple AI models work together on complex tasks) can be attacked through three new methods: Decision Poisoning Attack (injecting false instructions to manipulate system output), Indirect Echoleak Attack (extracting private information through model interactions), and Information Collision Attack (exploiting communication between models). While these collaborative systems offer flexibility and better reasoning, their internal communication channels create security and privacy vulnerabilities that attackers can exploit.
Game-theoretic DoS attacks (GDoS, attacks that exploit miners' financial incentives) can damage proof-of-work blockchains (like Bitcoin, which uses computational puzzles to secure transactions) even when attackers control less than 20% of the network's computing power. Rather than changing the blockchain protocol itself, researchers propose a cooperative defense where miners temporarily move their computing resources to larger mining pools during attacks to maintain their earnings and discourage attackers.
Fix: The source proposes a 'cooperative hash-power hopping mechanism in which miners temporarily reallocate hash power to larger pools when under attack to preserve expected payoffs and suppress attacker incentives.' Simulations show this strategy 'reduces attacker revenue gains by more than 20% and prevents throughput degradation across the entire attack range.' However, this is a theoretical proposal presented in a research paper, not an implemented or deployed mitigation in existing systems.
IEEE Xplore (Security & AI Journals)SeVoAuth is a cloud-based voiceprint authentication system (a security method that recognizes users by their unique voice characteristics) designed to protect user privacy while defending against replay attacks (replaying a recorded voice), spoofing (faking a voice), and adversarial attacks (manipulating input to fool the system). The system stores a synthesized version of a user's voice in the cloud and uses hash functions (mathematical functions that transform data into fixed-size codes) to continuously change the verification targets during each login, making it difficult for attackers to reuse old voice recordings or tricks.
Phishing attacks are a form of social engineering (tricking people into revealing secrets by pretending to be trustworthy) that trick users into visiting fake websites that look like real ones to steal sensitive information. Researchers created a new dataset with 31 attributes (measurable characteristics) derived from URLs and similarity features, then tested multiple machine learning algorithms (computer programs that learn patterns from data) on it to detect these attacks. The Logistic Regression method achieved 96.40% accuracy at detecting phishing, showing that this approach works well for protecting local systems in real-world situations.
AGFPS is a new attack method that steals system prompts (the hidden instructions that control how an LLM behaves) from deployed AI applications by using evolutionary optimization (a technique that mimics natural selection to find solutions) instead of gradient-based methods. The researchers demonstrated that their approach successfully extracted prompts 95.2% of the time and worked better than previous methods, highlighting serious security weaknesses in how LLMs are currently deployed.
Researchers studied how well different versions of major LLMs (like GPT, Llama, and Qwen) resist adversarial attacks, which are inputs designed to trick AI systems into making mistakes, ignoring safety guidelines, or producing false information. They found that newer versions of these models don't always become more resistant to these attacks, and that simply making models larger doesn't guarantee better security.
Researchers developed a new attack called Distribution Drift Learner (DDL) that can break through non-transferable learning (NTL, a method that prevents AI models from being adapted to new tasks to protect their intellectual property) by only observing the model's input and output responses. The attack works by manipulating how data is distributed across domains and reconstructing training samples, successfully increasing accuracy on protected models from 10% to 81%, exposing serious weaknesses in current model protection strategies.
Cybersecurity uses deception (deliberately creating fake systems or false information to trick attackers) alongside defense and detection, and generative AI makes it easier to create convincing decoys. However, there are currently no well-established methods to measure how well these deception tactics actually work.
This research addresses a problem in graph matching (a technique for finding correspondences between similar structures), where training data often contains incomplete or incorrect information. The authors propose a dual-expert framework that uses two different mathematical approaches (KB-QAP and L-QAP, which are optimization methods for assignment problems) working together through an align-fuse-refine pipeline to handle both missing keypoints from partial views and errors from mislabeled data.
A 2025 survey of 704 IT executives found that AI is now the top concern for IT management, ahead of cybersecurity and aligning IT with business goals. While most organizations are increasing IT salaries (90.5%), fewer are hiring new IT staff (54.2%), and cost control has dropped as a priority for measuring how well IT leaders perform.
A banking group implemented a retrieval-augmented AI-powered compliance assistant (a system where AI pulls in external compliance documents to answer questions) to help with regulatory requirements while maintaining human oversight. The article identifies key challenges with this approach, including authority illusion (over-trusting the AI's answers), unclear responsibility for decisions, loss of human judgment about context, and gaps in understanding how the system works, then proposes a four-phase framework to help organizations move from passive AI assistants toward systems where AI and humans reason together.