Academic papers, new techniques, benchmarks, and theoretical findings in AI/LLM security.
This research paper examines how smartphone users develop privacy concerns about location tracking through a 'triple calculus model' (a framework showing how people weigh risks and benefits of sharing location data). By studying 559 smartphone users, researchers found that users' sense of control over location sharing significantly influenced how they perceived both the risks and benefits of location disclosure, and that social influences and past experiences with privacy breaches also shaped their privacy concerns.
This research studies how making AI chatbots seem more human-like (anthropomorphism) affects whether people actually share personal information with them. The study found that while human-like design can build trust and reduce worry about privacy, it can also create an "uncanny valley" effect (where something looks almost human but feels unsettling), and people's actual sharing behavior doesn't always follow what they say they intend to do.
This research proposes new methods for fine-tuning (customizing a trained AI model for specific tasks) large language models while protecting sensitive data using differential privacy (a technique that adds noise to data to prevent identifying individuals). The paper introduces DP-ZOSO and DP-ZOPO, which use zeroth-order gradient approximation (estimating how to improve the model without calculating exact mathematical directions) instead of traditional methods, making the process faster and more scalable while maintaining privacy protection.
Text-to-image models (AI systems that generate pictures from written descriptions) can be misused to create unsafe content like sexually explicit or violent images. PromptGuard is a new safety technique that uses a soft prompt (a special text input optimized for safety that works within the model's internal text processing layer) to moderate unsafe requests and prevent the generation of such content while still producing high-quality normal images.
Fix: The source describes PromptGuard as the solution itself rather than a patch or update. The technique works by optimizing a safety soft prompt that functions as an implicit system prompt within the text-to-image model's embedding space, with a divide-and-conquer strategy that optimizes category-specific soft prompts and combines them into holistic safety guidance. Code and dataset are available at https://t2i-promptguard.github.io/
IEEE Xplore (Security & AI Journals)This research addresses a problem where adversarial training (a method to make AI models resistant to adversarial attacks, which are carefully crafted inputs designed to fool the model) works poorly when training data is imbalanced, meaning some classes have many examples while others have very few. The authors propose Tail-Aware Dynamic Adversarial Training (TAD-AT), which improves robustness by adjusting the training loss, attack strategy, and weight averaging to account for which classes are most vulnerable to attacks, rather than just how many examples exist per class.
Fix: The proposed mitigation is Tail-Aware Dynamic Adversarial Training (TAD-AT), which consists of three components: (1) a training loss that incorporates frequency- and accuracy-aware regularization to emphasize learning for vulnerable classes, (2) an attack that adjusts perturbations based on class-wise vulnerability to encourage robust feature learning, and (3) a weight average that adaptively controls the decay rate across classes to improve robust generalization and training stability. Code is available at https://github.com/bookman233/TADAT.
IEEE Xplore (Security & AI Journals)RanDS is a new large-scale dataset containing raw binary files (the compiled machine code of programs) and extracted features designed to help researchers study and detect ransomware (malicious software that encrypts victims' files and demands payment). This resource aims to support the development and testing of machine learning models that can identify ransomware threats more effectively.
Researchers discovered a new backdoor attack (a security flaw where hidden malicious code is planted in training data) on Graph Neural Networks, or GNNs (AI models designed to understand interconnected data). The attack uses a single trigger node (a specially crafted fake data point) attached to a target node to trick the GNN into making wrong predictions not just on that node, but also on its immediate neighbors, while remaining stealthy and achieving over 95% success rates even against existing defenses.
Graph Neural Networks (GNNs, AI systems designed to work with interconnected data structured as graphs) used in graph self-supervised learning (training without labeled data) can be secretly compromised by backdoor attacks (where hidden malicious instructions are embedded in the model). Researchers developed GDetox, a defense method that removes these backdoor features from compromised encoders (the parts of the model that learn to represent data) using knowledge distillation (a technique where a teacher model teaches a student model to learn better), reducing successful attacks to 4% while keeping the model's normal performance nearly unchanged.
Fix: GDetox purifies backdoored encoders in graph self-supervised learning by applying self-supervised distillation without requiring labeled data, combined with adversarial contrastive learning (a training method that improves model robustness by creating challenging examples) to enhance the teacher model and improve the final encoder performance.
IEEE Xplore (Security & AI Journals)Deepfake technology can create fake facial images that are hard to distinguish from real ones, posing risks to privacy and security. This paper proposes a new detection method using Visual Language Models (VLMs, AI systems that understand both images and text) combined with component-specific prompt tuning (customizing input instructions to focus on specific facial parts like eyes and nose). The approach transforms deepfake detection into a Visual Question Answering task and uses a Q-Former module (a feature extraction component guided by instructions) to help the model identify forgery traces in local facial features, achieving better accuracy than existing methods.
Many software organizations claim to make security a priority, but they don't actually provide developers with the tools, training, or culture needed to build secure code. A global survey found significant gaps between what companies say about security and what they actually do to support developers in writing secure software.
This research paper addresses security and transparency challenges in cloud storage for UAV (unmanned aerial vehicle) data by proposing PATD, a system that combines privacy-preserving auditing with transparent deduplication. The paper identifies two main problems: verifying that outsourced data hasn't been corrupted or tampered with (without revealing the data itself), and ensuring that file deduplication (removing duplicate copies to save storage) is performed honestly and transparently by the cloud provider.
This is a research paper proposing EIP, an efficient image protection scheme designed to safeguard images from unauthorized access or tampering. The paper was published in June 2026 in the Journal of Information Security and Applications by Haider, Sattar, Komninos, and Hayat. However, the provided content does not include details about how the scheme works or what specific security problem it addresses.
PadNet is a defense method designed to protect neural networks (AI models that learn patterns from data) against adversarial examples (specially crafted inputs that trick AI systems into making wrong predictions). The paper, published in an academic journal, presents techniques to make these AI systems more robust when facing such attacks.
Semi-supervised learning (SSL, a training method where models learn from both labeled and unlabeled data) is vulnerable to backdoor attacks, where attackers can corrupt model predictions by poisoning a small portion of training data with hidden triggers. This paper reveals that SSL backdoor attacks are particularly dangerous because they exploit the pseudo-labeling mechanism (the process where the model assigns labels to unlabeled data) to create stronger trigger-target correlations than in supervised learning. The researchers propose Backdoor Invalidator (BI), a defense framework using complementary learning, trigger mix-up, and dual domain filtering to obstruct and filter backdoor influences during both feature learning and data processing.
Fix: The source presents Backdoor Invalidator (BI) as an explicit defense framework. According to the text, BI 'integrates three novel techniques: complementary learning, trigger mix-up, and dual domain filtering, which collectively obstruct, dilute, and filter the influence of backdoor attacks in both feature learning and data processing.' The framework is designed to 'significantly reduce the average attack success rate while maintaining comparable accuracy on clean data' and is described as 'practical deployable as a plug-in component.' Code implementing this defense is available at https://github.com/wxr99/Backdoor_Invalidator4SSL.
IEEE Xplore (Security & AI Journals)Connected autonomous vehicles (CAVs) use multiple types of sensors, like LiDAR (light-based radar that creates 3D maps) and cameras, to understand their surroundings, and combining information from both sensors improves accuracy. However, this sensor fusion process can leak private information and relies on a third party to generate random numbers, which could be compromised by attackers. Researchers propose MPOF, a model that uses secure computation protocols (mathematical methods that let systems calculate results without exposing raw data) and sacrificial verification (a technique that detects when a third party behaves maliciously) to protect privacy while defending against attacks from that third party.
Fix: The source proposes the MPOF model with secure computation protocols that include sacrificial verification to detect malicious third-party behavior during random number generation. The paper states the protocols 'reduce computational overhead by five orders of magnitude' compared to methods using homomorphic encryption (encryption that allows calculations on encrypted data without decrypting it first), making the approach more practical for resource-constrained vehicles.
IEEE Xplore (Security & AI Journals)This research presents a new framework called Propose-Rectify that helps detect and locate image manipulations (alterations made to photos) by combining two approaches: first, a semantic reasoning stage uses a modified LLaVA model (a multimodal AI that understands both images and language) to identify suspicious regions, and second, a refinement stage uses specialized forensic analysis (technical methods that detect tampering traces) to validate and precisely locate the manipulated areas. The framework bridges the gap between AI understanding and forensic detection, achieving better accuracy than previous methods.
Deep neural networks (machine learning models with many layers that process information) are vulnerable to adversarial examples, which are inputs slightly modified to fool the AI into making wrong predictions. This paper proposes adding a redundant fully connected layer (a type of neural network component that connects all inputs to all outputs) with a special loss function to make these networks more robust against attacks while maintaining accuracy on normal inputs.
Fix: The source describes a defense mechanism but does not present it as a deployed fix or patch. It is a research proposal for a novel component (redundant fully connected layer with a cosine similarity-based loss function) that can be added to existing models. N/A -- no mitigation discussed in source.
IEEE Xplore (Security & AI Journals)Researchers developed SRAP (Self-Reversible Adversarial Patch), a technique that creates adversarial patches (small, intentionally corrupted image regions designed to fool AI models) that can be reversed back to the original image while protecting privacy. The method improves two key weaknesses in existing adversarial patches: transferability (working across different AI models, achieving up to 90% success rate) and robustness (resisting image processing and defensive techniques), and demonstrates an 88% attack success rate against commercial AI services.
This research paper presents CLIP-ADA, a method for detecting synthetic images (fake images created by AI generators) that works better across different types of generators and artifacts. The method analyzes how CLIP (a vision-language model that understands both images and text) processes images at different levels, then uses this understanding to train detectors that rely less on specific artifact patterns and more on general forensic features, achieving over 6% better accuracy on unseen synthetic images.
This research paper describes a method for protecting privacy in distributed gradient descent (a technique where multiple computers work together to train AI models by each processing part of the data). The authors propose using hierarchical secret sharing (a cryptographic approach where information is split into pieces distributed across multiple parties, so no single party can see the complete data) to keep individual data private while still allowing the AI training process to work efficiently.