aisecwatch.com
DashboardVulnerabilitiesNewsResearchArchiveStatsDatasetFor devs
Subscribe
aisecwatch.com

Real-time AI security monitoring. Tracking AI-related vulnerabilities, safety and security incidents, privacy risks, research developments, and policy changes.

Navigation

VulnerabilitiesNewsResearchDigest ArchiveNewsletter ArchiveSubscribeData SourcesStatisticsDatasetAPIIntegrationsWidgetRSS Feed

Maintained by

Truong (Jack) Luu

Information Systems Researcher

Research

Academic papers, new techniques, benchmarks, and theoretical findings in AI/LLM security.

to
Export CSV
691 items

FreqTransNet: A Frequency-Aware Transformer Network for Robust Image Watermarking

inforesearchPeer-Reviewed
research
Mar 10, 2026

FreqTransNet is a new deep learning watermarking model that combines convolutional modules, Transformer structures (neural networks that use self-attention to understand relationships between distant parts of data), and frequency-domain transformations (mathematical techniques that analyze images by breaking them into component frequencies) to embed invisible marks into images more robustly. The model outperforms existing watermarking methods, achieving better visual quality and maintaining over 97% accuracy in extracting watermarks even when images are attacked or modified.

IEEE Xplore (Security & AI Journals)

RASE: Efficient Privacy-Preserving Data Aggregation Against Disclosure Attacks for IoTs

inforesearchPeer-Reviewed
research

PADRE: Privacy-Aware Decentralized Randomness

inforesearchPeer-Reviewed
research

Beyond Control: Exploring Novel File System Objects for Data-Only Attacks on Linux Systems

inforesearchPeer-Reviewed
security

A Survey on Interpretability in Visual Recognition

inforesearchPeer-Reviewed
research

Mitigating Negative Transfer via Reducing Environmental Disagreement

inforesearchPeer-Reviewed
research

Privacy Preserving Decentralized Learning With Positive-Incentive Noise

inforesearchPeer-Reviewed
security

Improving instruction hierarchy in frontier LLMs

inforesearchBlog Research
safety

Practical Differential Fault Attacks on the GPRS Standard Ciphers

inforesearchPeer-Reviewed
security

Cracks in Collaboration: Threat Models and Attacks on Multi-LLM Collaborative Systems

inforesearchPeer-Reviewed
security

Defending PoW Blockchains Against Game-Theoretic DoS Attacks: A Rational Strategy Analysis

inforesearchPeer-Reviewed
security

SeVoAuth: Secure Voiceprint Authentication With Hash-Based Feature Transformation

inforesearchPeer-Reviewed
security

Evaluation of Phishing Attacks Targeting Local Systems Using an Attribute-Based Dataset and Machine Learning Methods

inforesearchPeer-Reviewed
research

AGFPS: An Automated Gradient-Free Framework for Prompt Stealing

inforesearchPeer-Reviewed
security

Robustness Over Time: Understanding Adversarial Examples’ Effectiveness on Longitudinal Versions of Large Language Models

inforesearchPeer-Reviewed
security

Your Non-Transferable Learning is Fragile: Practical Breach of Protected Models

inforesearchPeer-Reviewed
security

Beyond Guesswork: How to Measure What Makes Cyber Deception Work

inforesearchPeer-Reviewed
security

Learning With Partial and Noisy Correspondence in Graph Matching

inforesearchPeer-Reviewed
research

The 2025 SIM IT Issues and Trends Study

inforesearchPeer-Reviewed
industry

The Evolution of AI Compliance Assistance from Reactive Support to Co-Agency

inforesearchPeer-Reviewed
policy
Previous19 / 35Next
Mar 10, 2026

This research paper presents RASE, a new method for collecting data from Internet of Things devices (connected sensors that gather information) while protecting people's privacy from disclosure attacks (attempts to figure out what specific individuals' data is). RASE works in three steps: first adding noise (random false data) to real readings, then randomly scrambling who sent what data so senders can't be linked to receivers, and finally calculating approximate totals from the scrambled data.

IEEE Xplore (Security & AI Journals)
Mar 10, 2026

A Decentralized Randomness Beacon (DRB, a cryptographic tool that generates unpredictable, publicly verifiable randomness for distributed systems) is used in blockchain and distributed networks, but existing protocols like GRandLine and RandFlash expose participants' identities during leader election, risking privacy leaks. The paper proposes PADRE, a new privacy-aware DRB protocol that hides participant identities while maintaining security and efficiency, using a new cryptographic primitive called anonymous threshold verifiable random functions (ATVRF).

IEEE Xplore (Security & AI Journals)
Mar 10, 2026

This research identifies how attackers can exploit non-control data in the Linux kernel's file system (the part of the OS that manages files and folders) to gain unauthorized access, even when control-flow integrity (a security technique that prevents attackers from hijacking program execution paths) is in place. The study developed a framework to automatically find vulnerable data objects and demonstrated 10 working attacks against the kernel with modern security protections enabled.

IEEE Xplore (Security & AI Journals)
Mar 10, 2026

This paper surveys eXplainable AI (XAI, methods for making AI decisions understandable to humans) in visual recognition, which is increasingly important for safety-critical applications like autonomous driving and medical diagnostics. The survey organizes XAI approaches by intent, object, presentation, and methodology, and also examines how interpretability applies to Multimodal Large Language Models (AI systems that process and combine text, images, and other data types).

IEEE Xplore (Security & AI Journals)
Mar 10, 2026

This research addresses negative transfer, which occurs when an AI model performs worse after trying to apply knowledge from one domain (a labeled dataset) to a different domain (an unlabeled dataset) due to significant differences between them. The study identifies that models relying too heavily on non-causal environmental features (irrelevant details that don't actually cause predictions) creates disagreement across domains, harming performance. The proposed solution, called RED (Reducing Environmental Disagreement), separates each sample into causal features (the truly relevant information) and non-causal environmental features, then reduces the disagreement between domains based on these environmental features.

Fix: The proposed solution is RED (Reducing Environmental Disagreement), which "disentangles each sample into domain-invariant causal features and domain-specific non-causal environmental features via adversarially training domain-specific environmental feature extractors in the opposite domains. Subsequently, RED estimates and reduces environmental disagreement based on domain-specific non-causal environmental features."

IEEE Xplore (Security & AI Journals)
privacy
Mar 10, 2026

Researchers developed PING (Positive-Incentive Noise Generator), a new method that adds carefully designed noise to protect private data in decentralized learning (where multiple computers train AI models together without sending raw data to a central server) while keeping the learning process efficient. The method uses network connections and lightweight encryption to create correlated noise (noise patterns that work together), and builds on this to create PP-DPIN, an algorithm that combines differential privacy (a mathematical technique for protecting individual data points) and information theory to ensure strong privacy guarantees for at least half the computers involved.

IEEE Xplore (Security & AI Journals)
research
Mar 10, 2026

AI systems receive instructions from multiple sources (system policies, developers, users, and online data), and models must learn to prioritize the most trustworthy ones to stay safe. When models treat untrusted instructions as authoritative, they can be tricked into revealing private information, following harmful requests, or falling victim to prompt injection (hidden malicious instructions hidden in input data). OpenAI's solution uses a clear instruction hierarchy (System > developer > user > tool) and trains models with IH-Challenge, a reinforcement learning dataset designed to teach models to follow high-priority instructions even when lower-priority ones conflict with them.

Fix: OpenAI's models are trained on a clear instruction hierarchy where System instructions have highest priority, followed by developer instructions, then user instructions, then tool outputs. The company also created IH-Challenge, a reinforcement learning training dataset that generates conversations with conflicting instructions where high-priority instructions are kept simple and objectively gradable, ensuring models learn to prioritize correctly without resorting to useless shortcuts like over-refusing benign requests.

OpenAI Blog
Mar 9, 2026

Researchers demonstrated a practical differential fault attack (an exploit that deliberately introduces errors into a system to extract secrets) against GEA-1 and GEA-2, the stream ciphers (algorithms that encrypt data bit-by-bit) used to protect GPRS (General Packet Radio Service, a mobile data standard) communications between phones and base stations. By identifying the exact location where faults occur in the cipher, attackers can recover the 64-bit secret keys in about 16 minutes on a standard laptop. Many current phones still support these outdated ciphers, making them vulnerable.

IEEE Xplore (Security & AI Journals)
research
Mar 9, 2026

Multi-LLM collaborative systems (setups where multiple AI models work together on complex tasks) can be attacked through three new methods: Decision Poisoning Attack (injecting false instructions to manipulate system output), Indirect Echoleak Attack (extracting private information through model interactions), and Information Collision Attack (exploiting communication between models). While these collaborative systems offer flexibility and better reasoning, their internal communication channels create security and privacy vulnerabilities that attackers can exploit.

IEEE Xplore (Security & AI Journals)
Mar 9, 2026

Game-theoretic DoS attacks (GDoS, attacks that exploit miners' financial incentives) can damage proof-of-work blockchains (like Bitcoin, which uses computational puzzles to secure transactions) even when attackers control less than 20% of the network's computing power. Rather than changing the blockchain protocol itself, researchers propose a cooperative defense where miners temporarily move their computing resources to larger mining pools during attacks to maintain their earnings and discourage attackers.

Fix: The source proposes a 'cooperative hash-power hopping mechanism in which miners temporarily reallocate hash power to larger pools when under attack to preserve expected payoffs and suppress attacker incentives.' Simulations show this strategy 'reduces attacker revenue gains by more than 20% and prevents throughput degradation across the entire attack range.' However, this is a theoretical proposal presented in a research paper, not an implemented or deployed mitigation in existing systems.

IEEE Xplore (Security & AI Journals)
research
Mar 9, 2026

SeVoAuth is a cloud-based voiceprint authentication system (a security method that recognizes users by their unique voice characteristics) designed to protect user privacy while defending against replay attacks (replaying a recorded voice), spoofing (faking a voice), and adversarial attacks (manipulating input to fool the system). The system stores a synthesized version of a user's voice in the cloud and uses hash functions (mathematical functions that transform data into fixed-size codes) to continuously change the verification targets during each login, making it difficult for attackers to reuse old voice recordings or tricks.

IEEE Xplore (Security & AI Journals)
security
Mar 9, 2026

Phishing attacks are a form of social engineering (tricking people into revealing secrets by pretending to be trustworthy) that trick users into visiting fake websites that look like real ones to steal sensitive information. Researchers created a new dataset with 31 attributes (measurable characteristics) derived from URLs and similarity features, then tested multiple machine learning algorithms (computer programs that learn patterns from data) on it to detect these attacks. The Logistic Regression method achieved 96.40% accuracy at detecting phishing, showing that this approach works well for protecting local systems in real-world situations.

IEEE Xplore (Security & AI Journals)
research
Mar 9, 2026

AGFPS is a new attack method that steals system prompts (the hidden instructions that control how an LLM behaves) from deployed AI applications by using evolutionary optimization (a technique that mimics natural selection to find solutions) instead of gradient-based methods. The researchers demonstrated that their approach successfully extracted prompts 95.2% of the time and worked better than previous methods, highlighting serious security weaknesses in how LLMs are currently deployed.

IEEE Xplore (Security & AI Journals)
research
Mar 9, 2026

Researchers studied how well different versions of major LLMs (like GPT, Llama, and Qwen) resist adversarial attacks, which are inputs designed to trick AI systems into making mistakes, ignoring safety guidelines, or producing false information. They found that newer versions of these models don't always become more resistant to these attacks, and that simply making models larger doesn't guarantee better security.

IEEE Xplore (Security & AI Journals)
research
Mar 9, 2026

Researchers developed a new attack called Distribution Drift Learner (DDL) that can break through non-transferable learning (NTL, a method that prevents AI models from being adapted to new tasks to protect their intellectual property) by only observing the model's input and output responses. The attack works by manipulating how data is distributed across domains and reconstructing training samples, successfully increasing accuracy on protected models from 10% to 81%, exposing serious weaknesses in current model protection strategies.

IEEE Xplore (Security & AI Journals)
research
Mar 9, 2026

Cybersecurity uses deception (deliberately creating fake systems or false information to trick attackers) alongside defense and detection, and generative AI makes it easier to create convincing decoys. However, there are currently no well-established methods to measure how well these deception tactics actually work.

IEEE Xplore (Security & AI Journals)
Mar 9, 2026

This research addresses a problem in graph matching (a technique for finding correspondences between similar structures), where training data often contains incomplete or incorrect information. The authors propose a dual-expert framework that uses two different mathematical approaches (KB-QAP and L-QAP, which are optimization methods for assignment problems) working together through an align-fuse-refine pipeline to handle both missing keypoints from partial views and errors from mislabeled data.

IEEE Xplore (Security & AI Journals)
Mar 6, 2026

A 2025 survey of 704 IT executives found that AI is now the top concern for IT management, ahead of cybersecurity and aligning IT with business goals. While most organizations are increasing IT salaries (90.5%), fewer are hiring new IT staff (54.2%), and cost control has dropped as a priority for measuring how well IT leaders perform.

AIS eLibrary (Journal of AIS, CAIS, etc.)
safety
Mar 6, 2026

A banking group implemented a retrieval-augmented AI-powered compliance assistant (a system where AI pulls in external compliance documents to answer questions) to help with regulatory requirements while maintaining human oversight. The article identifies key challenges with this approach, including authority illusion (over-trusting the AI's answers), unclear responsibility for decisions, loss of human judgment about context, and gaps in understanding how the system works, then proposes a four-phase framework to help organizations move from passive AI assistants toward systems where AI and humans reason together.

AIS eLibrary (Journal of AIS, CAIS, etc.)