Academic papers, new techniques, benchmarks, and theoretical findings in AI/LLM security.
Vertical split learning (VSL, a privacy method that divides an AI model between multiple clients and a server) has been found vulnerable to a new stealthy attack called TPA-VSL, where attackers manipulate the embedding model (the part that converts data into numerical vectors) to misclassify targeted samples without leaving obvious signs of poisoning. The attack uses diffusion models (AI systems that generate data by reversing a noise process) and special encoders to trick the system into mapping target data to wrong classes, achieving a 30% higher success rate than existing attacks.
MIDAS is a system for verifying that data stored in the cloud hasn't been lost or corrupted, designed specifically for mobile devices which have limited processing power and battery. The system offloads heavy computational work to edge nodes (intermediate servers between mobile devices and the cloud), allowing mobile devices to do only lightweight verification tasks while maintaining security and accountability.
Federated learning (a system where multiple parties train AI models together while keeping their data private) faces two main problems: model updates can leak sensitive information, and it's hard to detect poisoning attacks (when malicious participants deliberately corrupt the training process). ClusterGuard is a new secure aggregation protocol (a method for safely combining model updates from many participants) that uses clustering, masking techniques, and filtering mechanisms to protect privacy while detecting and resisting poisoning attacks, even when up to 20% of participants are malicious.
Fix: The source proposes ClusterGuard as the solution, which includes: (1) Verifiable Random Function (VRF, a method to ensure fair and transparent grouping of participants) for client clustering, (2) key-homomorphic masking combined with verifiable secret sharing for secure aggregation within clusters, and (3) a dual filtering mechanism based on cosine similarity and norm to detect and resist poisoning attacks. The text notes that ClusterGuard provides two variants for both client-server and decentralized blockchain environments.
IEEE Xplore (Security & AI Journals)Researchers developed Urey-ML, a machine learning-based attack that can trick Apple's Ultra-Wideband (UWB, a wireless technology for precise distance measurement) systems into reporting false distances between devices. The attack works by exploiting two weaknesses: an unprotected message during key negotiation (the process of establishing secure communication) that allows the attacker to bypass encryption, and a reinforcement learning algorithm (a type of AI that learns by trial and error) that generates fake signals mimicking normal human movement to fool Apple's defense mechanism.
Researchers developed DUAP (Disentanglement-based Universal Adversarial Perturbation), a method to protect user speech privacy by adding subtle noise to audio that prevents Whisper, a multilingual speech recognition AI, from accurately transcribing what is said. The technique works across multiple languages and remains effective even when audio is compressed or played through speakers in real rooms, addressing privacy risks that earlier protection methods could not handle well in multilingual contexts.
This research examines how employees with different roles in organizations perceive people analytics (systems that collect and analyze worker behavioral data to improve efficiency), and discovers that their views are shaped by data ideologies, which are underlying beliefs and assumptions about data and its use. The study found that data ideologies influence whether employees actually use these technologies in practice, operating through three mechanisms: moderation (limiting use), confirmation (supporting existing beliefs), and modulation (adjusting how technologies are applied). Understanding these different ideologies is important for successfully implementing workplace data collection systems.
This research proposes a new method called DP-QAM (Differentially Private Quadrature Amplitude Modulation) to solve privacy and communication problems in federated analytics (a system where multiple devices analyze data together without sending raw data to a central server). The method takes advantage of natural errors that occur during data compression and wireless transmission to add extra privacy protection, while balancing privacy, communication efficiency, and accuracy.
AdaParse is a framework that can identify the specific settings (hyperparameters, which are configuration values that control how a model behaves) used to create AI-generated images by analyzing those images in detail. Unlike older methods that use a single general fingerprint (a characteristic pattern), AdaParse creates customized fingerprints for each image, allowing it to distinguish between images made with different settings across many different generative models (AI systems that create images).
This research addresses security challenges in Internet of Things (IoT) devices by improving radio frequency fingerprint identification (RFFI, a method that uniquely identifies devices based on their wireless signal characteristics) using federated learning (a distributed AI training approach where data stays on local devices rather than being sent to a central server). The paper proposes a feature alignment strategy to handle non-IID data (data that isn't uniformly distributed across different receivers), which occurs when different receivers have different hardware and environmental conditions, and demonstrates that the approach achieves 90.83% identification accuracy with improved stability compared to existing federated learning methods.
Fix: The paper proposes a feature alignment strategy based on federated learning that guides each client (receiver) to learn aligned intermediate feature representations during local training, effectively mitigating the adverse impact of distribution shifts on model generalization in heterogeneous wireless environments.
IEEE Xplore (Security & AI Journals)This research addresses vulnerabilities in Federated Learning (FL, a system where multiple computers train an AI model together without sharing their raw data), which faces attacks from malicious participants and privacy leaks from gradient updates (the numerical adjustments that improve the model). The authors propose a new method combining homomorphic encryption (a way to perform calculations on encrypted data without decrypting it) and dimension compression (reducing the size of data while keeping important relationships intact) to protect privacy and defend against Byzantine attacks (when malicious actors send corrupted data to sabotage the system) while reducing computational costs by 25 to 35 times.
Large vision-language models (LVLMs, which are AIs that understand both images and text) can be attacked using simple visual transformations, such as rotations or color changes, that fool them into giving wrong answers. Researchers found that combining multiple harmful transformations can make these attacks more effective, and they can be optimized using gradient approximation (a mathematical technique to find the best attack parameters). This research highlights a previously overlooked safety risk in how well LVLMs resist these kinds of adversarial attacks (attempts to trick AI systems).
Large Language Models (LLMs, AI systems trained on massive amounts of text) used in task-oriented dialogue systems (AI assistants designed to help users complete specific goals like booking travel) can accidentally memorize and leak sensitive training data, including personal information like phone numbers and complete travel schedules. Researchers demonstrated new attack techniques that can extract thousands of pieces of training data from these systems with over 70% accuracy in the best cases. The paper identifies factors that influence how much data LLMs memorize in dialogue systems but does not propose specific fixes.
QuEST is a new framework that makes backdoor attacks (hidden malicious behaviors injected into AI models) more stealthy and efficient when models undergo quantization (compressing models to use less memory and computation). The framework uses special training techniques and parameter sharing to hide the attack from detection systems while reducing the computational resources needed to carry out the attack.
Researchers discovered a new attack called Lure that targets generative language models (GLMs, which are AI systems that generate text) during the fine-tuning process (when developers customize an open-source model with their own data). By hiding malicious code in the source code of an open-source model, attackers can trick a fine-tuned model into remembering and later revealing the proprietary data used to customize it through specially crafted prompts (input text designed to trigger specific outputs).
This research paper addresses security risks from synthetic tabular data (AI-generated fake datasets) by proposing PBC-TabFip, a fingerprinting framework that embeds hidden identifiers into synthetic data to detect unauthorized copying and identify who leaked it. The framework uses diffusion models (AI systems that generate data by gradually refining random noise) and Tardos codes (a mathematical scheme for tracking which user leaked protected content) to protect synthetic tables even when primary keys (unique identifiers for database rows) are missing or altered, and to resist collusion attacks (when multiple users combine their copies to remove the fingerprint).
Fix: The source proposes 'PBC-TabFip' as the solution: a framework that 'readily incorporates with symmetric Tardos codes of arbitrary alphabet sizes' to enable fingerprinting of synthetic tabular data generated by diffusion models. The paper also proposes specific schemes including 'binary TabFip and TabFip+, quaternary TabFip* and TabFip+*' that use 'Bit Matching (BM) and Valid Bit Matching (VBM) mechanisms' to identify malicious users. According to the authors, 'TabFip with Tardos codes identifies at least one of the colluders with 100% probability and without detecting innocent against two types of collusion attack.'
IEEE Xplore (Security & AI Journals)Diffusion-based image editing systems (AI tools that modify images based on text descriptions) can be manipulated maliciously, and while adding imperceptible perturbations (tiny, invisible changes) to images helps protect against this, existing defenses don't work well across different models. This paper proposes TDAE, a system that combines image and text-based defenses to create images that are harder to maliciously edit, even when attacked by unfamiliar editing models.
Bitcoin is shifting from system rewards to transaction fees (payments users include with their transactions) to incentivize miners, but this creates a 'mining gap' where miners turn off their equipment when fees are too low, weakening Bitcoin's security. This paper identifies this as an 'egoistic dilemma' where both users and miners act selfishly, and proposes an incentive mechanism based on zero-determinant theory (a game theory approach) to solve the problem.
This research proposes a new authentication and key agreement (AKA, a process where devices verify each other's identity and create shared secret keys for secure communication) scheme for VANETs (vehicular ad hoc networks, where cars communicate directly with each other without central infrastructure). The scheme uses a consortium blockchain (a shared, distributed ledger controlled by a group of organizations rather than one central authority) to work in asynchronous environments, where messages may arrive out of order or with delays, and employs lightweight cryptographic techniques (mathematical methods that require less computing power) to reduce system overhead.
This paper introduces CipheRAG, a system that helps large language models (LLMs) safely use external knowledge sources while protecting sensitive data. The system balances two competing needs: keeping data private while still retrieving information quickly, which existing approaches struggle to do because cryptography-based methods are slow while faster methods leak more information.
Adversarial SQL injection (SQLi, a technique where attackers modify their attacks based on feedback from a Web Application Firewall to bypass it) has become a serious threat, with automated tools like AdvSQLi and GPTFuzzer making it easier to find vulnerabilities. The paper proposes a hybrid defense system combining Character-Level CNN (a neural network that analyzes attack payloads character-by-character to find harmful patterns) and Reinforcement Learning (a type of AI training that learns through trial and feedback) to detect these advanced attacks, showing that this approach can catch malicious patterns even when attackers try to disguise their payloads.