aisecwatch.com
DashboardVulnerabilitiesNewsResearchArchiveStatsDatasetFor devs
Subscribe
aisecwatch.com

Real-time AI security monitoring. Tracking AI-related vulnerabilities, safety and security incidents, privacy risks, research developments, and policy changes.

Navigation

VulnerabilitiesNewsResearchDigest ArchiveNewsletter ArchiveSubscribeData SourcesStatisticsDatasetAPIIntegrationsWidgetRSS Feed

Maintained by

Truong (Jack) Luu

Information Systems Researcher

Research

Academic papers, new techniques, benchmarks, and theoretical findings in AI/LLM security.

to
Export CSV
690 items

The Triple Calculus Model: The Case of Location Privacy in Smartphones

inforesearchPeer-Reviewed
privacy
Mar 30, 2026

This research paper examines how smartphone users develop privacy concerns about location tracking through a 'triple calculus model' (a framework showing how people weigh risks and benefits of sharing location data). By studying 559 smartphone users, researchers found that users' sense of control over location sharing significantly influenced how they perceived both the risks and benefits of location disclosure, and that social influences and past experiences with privacy breaches also shaped their privacy concerns.

AIS eLibrary (Journal of AIS, CAIS, etc.)

Actual Self-disclosure to Anthropomorphic AI Chatbots: A Contextual Privacy Calculus Approach

inforesearchPeer-Reviewed
research

Differentially Private Zeroth-Order Methods for Scalable Large Language Model Fine-Tuning

inforesearchPeer-Reviewed
research

PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models

inforesearchPeer-Reviewed
safety

Rethinking Frequency Modeling: Tail-Aware Dynamic Adversarial Training for Long-Tailed Robustness

inforesearchPeer-Reviewed
research

RanDS: A Large-Scale Open Dataset of Raw Binaries and Extracted Features for Ransomware Research

inforesearchPeer-Reviewed
research

One Trigger, Multiple Victims: Clean-Label Neighborhood Backdoor Attacks on Graph Neural Networks

inforesearchPeer-Reviewed
security

GDetox: Purifying Backdoor Encoder in Graph Self-Supervised Learning via Knowledge Distillation

inforesearchPeer-Reviewed
security

Component-Specific Prompt Tuning for Deepfake Detection

inforesearchPeer-Reviewed
research

Software Security Is Your Highest Priority. Do Your Developers Know That?

inforesearchPeer-Reviewed
security

PATD: Privacy-Preserving Auditing and Transparent Deduplication in UAV Cloud Storage

inforesearchPeer-Reviewed
security

EIP: Efficient image protection scheme

inforesearchPeer-Reviewed
security

PadNet: Defending Neural Networks Against Adversarial Examples

inforesearchPeer-Reviewed
security

Filter, Obstruct, and Dilute: Defending Against Backdoor Attacks on Semi-Supervised Learning

inforesearchPeer-Reviewed
security

Privacy-Preserving Multi-Modal Object Fusion for Connected Autonomous Vehicles: Resilience Against Malicious Third-Party Attacks

inforesearchPeer-Reviewed
security

Propose and Rectify: A Forensics-Driven MLLM Framework for Image Manipulation Localization

inforesearchPeer-Reviewed
research

Assessing and Improving DNN Robustness Against Adversarial Examples From the Perspective of Fully Connected Layers

inforesearchPeer-Reviewed
research

SRAP: Robust and Transferable Self-Reversible Adversarial Patch for Image Privacy Protection

inforesearchPeer-Reviewed
research

CLIP-ADA: CLIP-Guided Artifact-Invariant Generalizable Synthetic Image Detection

inforesearchPeer-Reviewed
research

An efficient hierarchical secret sharing for privacy-preserving distributed gradient descent algorithm

inforesearchPeer-Reviewed
security
Previous15 / 35Next
privacy
Mar 30, 2026

This research studies how making AI chatbots seem more human-like (anthropomorphism) affects whether people actually share personal information with them. The study found that while human-like design can build trust and reduce worry about privacy, it can also create an "uncanny valley" effect (where something looks almost human but feels unsettling), and people's actual sharing behavior doesn't always follow what they say they intend to do.

AIS eLibrary (Journal of AIS, CAIS, etc.)
privacy
Mar 30, 2026

This research proposes new methods for fine-tuning (customizing a trained AI model for specific tasks) large language models while protecting sensitive data using differential privacy (a technique that adds noise to data to prevent identifying individuals). The paper introduces DP-ZOSO and DP-ZOPO, which use zeroth-order gradient approximation (estimating how to improve the model without calculating exact mathematical directions) instead of traditional methods, making the process faster and more scalable while maintaining privacy protection.

IEEE Xplore (Security & AI Journals)
research
Mar 30, 2026

Text-to-image models (AI systems that generate pictures from written descriptions) can be misused to create unsafe content like sexually explicit or violent images. PromptGuard is a new safety technique that uses a soft prompt (a special text input optimized for safety that works within the model's internal text processing layer) to moderate unsafe requests and prevent the generation of such content while still producing high-quality normal images.

Fix: The source describes PromptGuard as the solution itself rather than a patch or update. The technique works by optimizing a safety soft prompt that functions as an implicit system prompt within the text-to-image model's embedding space, with a divide-and-conquer strategy that optimizes category-specific soft prompts and combines them into holistic safety guidance. Code and dataset are available at https://t2i-promptguard.github.io/

IEEE Xplore (Security & AI Journals)
safety
Mar 30, 2026

This research addresses a problem where adversarial training (a method to make AI models resistant to adversarial attacks, which are carefully crafted inputs designed to fool the model) works poorly when training data is imbalanced, meaning some classes have many examples while others have very few. The authors propose Tail-Aware Dynamic Adversarial Training (TAD-AT), which improves robustness by adjusting the training loss, attack strategy, and weight averaging to account for which classes are most vulnerable to attacks, rather than just how many examples exist per class.

Fix: The proposed mitigation is Tail-Aware Dynamic Adversarial Training (TAD-AT), which consists of three components: (1) a training loss that incorporates frequency- and accuracy-aware regularization to emphasize learning for vulnerable classes, (2) an attack that adjusts perturbations based on class-wise vulnerability to encourage robust feature learning, and (3) a weight average that adaptively controls the decay rate across classes to improve robust generalization and training stability. Code is available at https://github.com/bookman233/TADAT.

IEEE Xplore (Security & AI Journals)
Mar 28, 2026

RanDS is a new large-scale dataset containing raw binary files (the compiled machine code of programs) and extracted features designed to help researchers study and detect ransomware (malicious software that encrypts victims' files and demands payment). This resource aims to support the development and testing of machine learning models that can identify ransomware threats more effectively.

Elsevier Security Journals
research
Mar 27, 2026

Researchers discovered a new backdoor attack (a security flaw where hidden malicious code is planted in training data) on Graph Neural Networks, or GNNs (AI models designed to understand interconnected data). The attack uses a single trigger node (a specially crafted fake data point) attached to a target node to trick the GNN into making wrong predictions not just on that node, but also on its immediate neighbors, while remaining stealthy and achieving over 95% success rates even against existing defenses.

IEEE Xplore (Security & AI Journals)
research
Mar 26, 2026

Graph Neural Networks (GNNs, AI systems designed to work with interconnected data structured as graphs) used in graph self-supervised learning (training without labeled data) can be secretly compromised by backdoor attacks (where hidden malicious instructions are embedded in the model). Researchers developed GDetox, a defense method that removes these backdoor features from compromised encoders (the parts of the model that learn to represent data) using knowledge distillation (a technique where a teacher model teaches a student model to learn better), reducing successful attacks to 4% while keeping the model's normal performance nearly unchanged.

Fix: GDetox purifies backdoored encoders in graph self-supervised learning by applying self-supervised distillation without requiring labeled data, combined with adversarial contrastive learning (a training method that improves model robustness by creating challenging examples) to enhance the teacher model and improve the final encoder performance.

IEEE Xplore (Security & AI Journals)
Mar 26, 2026

Deepfake technology can create fake facial images that are hard to distinguish from real ones, posing risks to privacy and security. This paper proposes a new detection method using Visual Language Models (VLMs, AI systems that understand both images and text) combined with component-specific prompt tuning (customizing input instructions to focus on specific facial parts like eyes and nose). The approach transforms deepfake detection into a Visual Question Answering task and uses a Q-Former module (a feature extraction component guided by instructions) to help the model identify forgery traces in local facial features, achieving better accuracy than existing methods.

IEEE Xplore (Security & AI Journals)
Mar 26, 2026

Many software organizations claim to make security a priority, but they don't actually provide developers with the tools, training, or culture needed to build secure code. A global survey found significant gaps between what companies say about security and what they actually do to support developers in writing secure software.

IEEE Xplore (Security & AI Journals)
Mar 26, 2026

This research paper addresses security and transparency challenges in cloud storage for UAV (unmanned aerial vehicle) data by proposing PATD, a system that combines privacy-preserving auditing with transparent deduplication. The paper identifies two main problems: verifying that outsourced data hasn't been corrupted or tampered with (without revealing the data itself), and ensuring that file deduplication (removing duplicate copies to save storage) is performed honestly and transparently by the cloud provider.

Elsevier Security Journals
Mar 25, 2026

This is a research paper proposing EIP, an efficient image protection scheme designed to safeguard images from unauthorized access or tampering. The paper was published in June 2026 in the Journal of Information Security and Applications by Haider, Sattar, Komninos, and Hayat. However, the provided content does not include details about how the scheme works or what specific security problem it addresses.

Elsevier Security Journals
research
Mar 25, 2026

PadNet is a defense method designed to protect neural networks (AI models that learn patterns from data) against adversarial examples (specially crafted inputs that trick AI systems into making wrong predictions). The paper, published in an academic journal, presents techniques to make these AI systems more robust when facing such attacks.

ACM Digital Library (TOPS, DTRAP, CSUR)
research
Mar 25, 2026

Semi-supervised learning (SSL, a training method where models learn from both labeled and unlabeled data) is vulnerable to backdoor attacks, where attackers can corrupt model predictions by poisoning a small portion of training data with hidden triggers. This paper reveals that SSL backdoor attacks are particularly dangerous because they exploit the pseudo-labeling mechanism (the process where the model assigns labels to unlabeled data) to create stronger trigger-target correlations than in supervised learning. The researchers propose Backdoor Invalidator (BI), a defense framework using complementary learning, trigger mix-up, and dual domain filtering to obstruct and filter backdoor influences during both feature learning and data processing.

Fix: The source presents Backdoor Invalidator (BI) as an explicit defense framework. According to the text, BI 'integrates three novel techniques: complementary learning, trigger mix-up, and dual domain filtering, which collectively obstruct, dilute, and filter the influence of backdoor attacks in both feature learning and data processing.' The framework is designed to 'significantly reduce the average attack success rate while maintaining comparable accuracy on clean data' and is described as 'practical deployable as a plug-in component.' Code implementing this defense is available at https://github.com/wxr99/Backdoor_Invalidator4SSL.

IEEE Xplore (Security & AI Journals)
research
Mar 25, 2026

Connected autonomous vehicles (CAVs) use multiple types of sensors, like LiDAR (light-based radar that creates 3D maps) and cameras, to understand their surroundings, and combining information from both sensors improves accuracy. However, this sensor fusion process can leak private information and relies on a third party to generate random numbers, which could be compromised by attackers. Researchers propose MPOF, a model that uses secure computation protocols (mathematical methods that let systems calculate results without exposing raw data) and sacrificial verification (a technique that detects when a third party behaves maliciously) to protect privacy while defending against attacks from that third party.

Fix: The source proposes the MPOF model with secure computation protocols that include sacrificial verification to detect malicious third-party behavior during random number generation. The paper states the protocols 'reduce computational overhead by five orders of magnitude' compared to methods using homomorphic encryption (encryption that allows calculations on encrypted data without decrypting it first), making the approach more practical for resource-constrained vehicles.

IEEE Xplore (Security & AI Journals)
Mar 25, 2026

This research presents a new framework called Propose-Rectify that helps detect and locate image manipulations (alterations made to photos) by combining two approaches: first, a semantic reasoning stage uses a modified LLaVA model (a multimodal AI that understands both images and language) to identify suspicious regions, and second, a refinement stage uses specialized forensic analysis (technical methods that detect tampering traces) to validate and precisely locate the manipulated areas. The framework bridges the gap between AI understanding and forensic detection, achieving better accuracy than previous methods.

IEEE Xplore (Security & AI Journals)
security
Mar 25, 2026

Deep neural networks (machine learning models with many layers that process information) are vulnerable to adversarial examples, which are inputs slightly modified to fool the AI into making wrong predictions. This paper proposes adding a redundant fully connected layer (a type of neural network component that connects all inputs to all outputs) with a special loss function to make these networks more robust against attacks while maintaining accuracy on normal inputs.

Fix: The source describes a defense mechanism but does not present it as a deployed fix or patch. It is a research proposal for a novel component (redundant fully connected layer with a cosine similarity-based loss function) that can be added to existing models. N/A -- no mitigation discussed in source.

IEEE Xplore (Security & AI Journals)
security
Mar 23, 2026

Researchers developed SRAP (Self-Reversible Adversarial Patch), a technique that creates adversarial patches (small, intentionally corrupted image regions designed to fool AI models) that can be reversed back to the original image while protecting privacy. The method improves two key weaknesses in existing adversarial patches: transferability (working across different AI models, achieving up to 90% success rate) and robustness (resisting image processing and defensive techniques), and demonstrates an 88% attack success rate against commercial AI services.

IEEE Xplore (Security & AI Journals)
Mar 23, 2026

This research paper presents CLIP-ADA, a method for detecting synthetic images (fake images created by AI generators) that works better across different types of generators and artifacts. The method analyzes how CLIP (a vision-language model that understands both images and text) processes images at different levels, then uses this understanding to train detectors that rely less on specific artifact patterns and more on general forensic features, achieving over 6% better accuracy on unseen synthetic images.

IEEE Xplore (Security & AI Journals)
privacy
Mar 22, 2026

This research paper describes a method for protecting privacy in distributed gradient descent (a technique where multiple computers work together to train AI models by each processing part of the data). The authors propose using hierarchical secret sharing (a cryptographic approach where information is split into pieces distributed across multiple parties, so no single party can see the complete data) to keep individual data private while still allowing the AI training process to work efficiently.

Elsevier Security Journals