Academic papers, new techniques, benchmarks, and theoretical findings in AI/LLM security.
BlindU is a method that allows users to remove their data's influence from trained AI models while keeping that data hidden from the server. Instead of uploading raw data to the server (which creates privacy risks), BlindU lets users create compressed versions of their data locally, and the server performs the removal process only on these compressed versions, making it practical for federated learning (a distributed training setup where data stays on users' devices).
Fix: BlindU implements unlearning through several stated mechanisms: (1) 'the user locally generates privacy-preserving representations, and the server performs unlearning solely on these representations and their labels', (2) use of an information bottleneck mechanism that 'learns representations that distort maximum task-irrelevant information from inputs', (3) 'two dedicated unlearning modules tailored explicitly for IB-based models and uses a multiple gradient descent algorithm to balance forgetting and utility retaining', and (4) 'a noise-free differential privacy masking method to deal with the raw erasing data before compressing' for additional privacy protection.
IEEE Xplore (Security & AI Journals)Deep learning models used for MRI reconstruction (creating medical images from incomplete data) can fail when faced with unexpected situations like noise, different imaging settings, or unseen medical conditions. This paper proposes RODIO, a method that uses diffusion models (AI systems that gradually refine noisy data into clear images) as "purifiers" to make MRI reconstruction systems more reliable, and shows it works better than existing robustification techniques like adversarial training (deliberately exposing models to bad inputs during training to make them stronger).
Fix: The paper proposes RODIO as the solution: using pretrained diffusion models as purifiers to improve robustness by fine-tuning on purified examples, which eliminates the need for adversarial training's complex optimization process. The authors state their approach demonstrates adaptability across multiple deep learning MRI reconstruction models, compatibility with accelerated diffusion samplers, robustness to data with unseen lesions, and effectiveness with unsupervised generative reconstructors.
IEEE Xplore (Security & AI Journals)Split Learning (SL) is a distributed learning framework designed to preserve privacy while reducing computational load, but researchers discovered a new attack called SLeak that allows a server adversary to steal client data and models. The attack works by exploiting information in the smashed data (intermediate data passed between clients and server) and server model to build a substitute client that mimics the target client's behavior, without needing strong privacy assumptions or much auxiliary data. The study shows SLeak is more effective than previous attacks across different datasets and scenarios.
Unlearnable examples are protective noises added to private data to prevent AI models from learning useful information from them, but this paper shows that data augmentation (a common technique that creates variations of training data to improve model performance) can undo this protection and restore learnability from 21.3% to 66.1% accuracy. The researchers propose Armor, a defense framework that adds protective noise while accounting for data augmentation effects, using a surrogate model (a practice model used to simulate the real training process) and smart augmentation selection to keep private data unlearnable even after augmentation is applied.
Fix: The paper proposes Armor, a defense framework that works by: (1) designing a non-local module-assisted surrogate model to better capture the effect of data augmentation, (2) using a surrogate augmentation selection strategy that maximizes distribution alignment between augmented and non-augmented samples to choose the optimal augmentation strategy for each class, and (3) using a dynamic step size adjustment algorithm to enhance the defensive noise generation process. The authors state that 'Armor can preserve the unlearnability of protected private data under data augmentation' and plan to open-source the code upon publication.
IEEE Xplore (Security & AI Journals)This research addresses how to identify whether one machine learning model is derived from another model through modification techniques (adjusting or fine-tuning an existing model rather than training from scratch), and how to measure how much two models differ from each other. The authors propose a method that determines lineage (derivative relationships) by checking if two models' parameters exist in the same local optimum of the loss landscape (the mathematical space of possible model configurations), and measure closeness by analyzing how their decision boundaries (the lines or surfaces that separate different predictions) differ from each other.
Out-of-distribution (OoD, inputs that don't match what an AI was trained on) detection in object detection systems causes AI models to make overconfident wrong predictions on objects they shouldn't recognize. This paper reveals that popular benchmark datasets used to test OoD detection have quality problems, where up to 13% of test objects are mislabeled, making current methods appear better than they really are. The authors propose a new training-time approach where object detectors are fine-tuned using carefully created OoD training data that looks similar to normal objects, which reduces false detections by 91% in YOLO models.
Fix: The paper introduces a training-time mitigation paradigm where 'we fine-tune the detector using a carefully synthesized OoD dataset that semantically resembles in-distribution objects.' This approach 'shapes a defensive decision boundary by suppressing objectness on OoD objects' and achieves 'a 91% reduction in hallucination error of a YOLO model on BDD-100 K.' The methodology is shown to work across multiple detection architectures including YOLO, Faster R-CNN, and RT-DETR.
IEEE Xplore (Security & AI Journals)This article presents a control method for multiple fixed-wing UAVs (unmanned aerial vehicles, or drones) that need to fly together in formation while avoiding collisions and handling unpredictable disturbances. The approach uses reinforcement learning (a type of AI that learns by trial and error) combined with control barrier functions (mathematical tools that enforce safety constraints) to create a system that keeps the UAVs safe and stable while optimizing their performance.
Hallucinations (instances where Large Language Models generate false or misleading content) are a safety problem for AI applications. The paper introduces UQLM, a Python package that uses uncertainty quantification (UQ, a statistical technique for measuring how confident a model is in its answer) to detect when an LLM is likely hallucinating by assigning confidence scores between 0 and 1 to responses.
Fix: The source describes UQLM as 'an off-the-shelf solution for UQ-based hallucination detection that can be easily integrated to enhance the reliability of LLM outputs.' No specific implementation steps, code examples, or version details are provided in the source text.
JMLR (Journal of Machine Learning Research)This research paper studies diffusion models, a type of AI used to generate images and audio, as a statistical method for density estimation (learning the probability distribution of data). The authors show that when data has a factorizable structure (meaning it can be broken into independent low-dimensional components, like in Bayesian networks), diffusion models can efficiently learn this structure and achieve optimal performance using a specially designed sparse neural network architecture (one where most connections between neurons are inactive).
This research studies how to predict whether borrowers on micro-lending platforms (small-loan services) will default (fail to repay their loans) by examining their call activity and social media behavior. The study analyzed over 154,000 loans from Indonesian platforms and found that frequent calls and stable calling patterns suggest lower default risk, while frequent social media activity and stable social media patterns actually indicate higher default risk. These findings suggest that micro-lending platforms could improve their credit assessment models (systems for deciding who gets loans) by combining both types of behavioral data.
This research studied what makes knowledge workers (people whose jobs involve handling information) want to use ChatGPT at work, using technology affordance and constraints theory (a framework explaining how tools enable certain actions while limiting others). The study found that ChatGPT's benefits like automation, information quality, and productivity boost adoption, but concerns about risk and lack of regulation reduce it. Personal innovativeness (how open someone is to new ideas) and supportive workplace culture help workers embrace ChatGPT despite their concerns.
Hypergraph Neural Networks (HGNNs, which are AI models that learn from data where connections can link multiple items together instead of just pairs) can be weakened by structural attacks that corrupt their connections and reduce accuracy. HGNN Shield is a defense framework with two main components: Hyperedge-Dependent Estimation (which assesses how important each connection is within the network) and High-Order Shield (which detects and removes harmful connections before the AI processes data). Experiments show the framework improves performance by an average of 9.33% compared to existing defenses.
Fix: The HGNN Shield defense framework addresses the vulnerability through two modules: (1) Hyperedge-Dependent Estimation (HDE) that 'prioritizes vertex dependencies within hyperedges and adapts traditional connectivity measures to hypergraphs, facilitating precise structural modifications,' and (2) High-Order Shield (HOS) positioned before convolutional layers, which 'consists of three submodules: Hyperpath Cut, Hyperpath Link, and Hyperpath Refine' that 'collectively detect, disconnect, and refine adversarial connections, ensuring robust message propagation.'
IEEE Xplore (Security & AI Journals)This paper addresses source-free domain adaptation (SFDA, a technique that adapts AI models to new datasets without accessing the original training data) for time-series data, such as sensor readings or activity logs. The authors argue that existing methods lack interpretability and may learn spurious patterns, so they propose PrEPoA, a framework that evaluates which parts of the time-series data the model considers important before fine-tuning it on the target domain. They demonstrate their approach works better than existing methods across five different real-world datasets.
This research addresses machine unlearning in neural IR (information retrieval, the technology that ranks search results), a process called neural machine unranking (NuMuR) that selectively removes data from AI systems for privacy compliance. The authors propose CoCoL (contrastive and consistent loss, a method with two complementary training objectives), which uses a contrastive loss to reduce relevance scores on forgotten data while preserving performance on shared data, plus a consistent loss to maintain accuracy on retained data, demonstrating effective data removal across multiple neural ranking models.
Fix: The proposed solution is CoCoL, a dual-objective framework comprising: 1) a contrastive loss that reduces relevance scores on forget sets while maintaining performance on entangled samples, and 2) a consistent loss that preserves accuracy on the retain set. According to the paper, CoCoL achieves substantial forgetting with minimal retention and generalization performance loss.
IEEE Xplore (Security & AI Journals)Elderly people are increasingly using digital technology for communication and information access, but their limited cybersecurity knowledge makes them attractive targets for cybercriminals. The article examines common cybercrimes targeting seniors, the specific vulnerabilities that put them at risk, and existing approaches to reduce these dangers.
Generative AI (systems that create new text, images, or other content) is transforming many industries but raises ethical concerns like data privacy (protecting personal information), bias (unfair treatment of certain groups), transparency (being open about how the AI works), and accountability (responsibility for the AI's actions). Researchers propose a trust framework based on transparency, fairness, accountability, and privacy to help ensure generative AI is developed and used responsibly.
Large language models (LLMs, AI systems trained on huge amounts of text to generate human-like responses) can now mimic not just general human language but also unusual, individual-specific human behaviors. This ability could lead to LLMs being used more widely in research studies and potentially reduce the role of actual humans, which raises concerns about AI alignment (ensuring AI systems behave in ways humans intend and approve of) and how this technology affects society.
Cyberbullying on social media is a growing problem that harms people's mental health, and traditional methods to stop it are no longer effective. This study examines how artificial intelligence can help protect online communities from cyberbullying by exploring different AI technologies, their uses, and the challenges involved. The goal is to understand how AI might create safer online environments.
This research addresses a problem in federated learning (a method where multiple computers train an AI model together without sharing raw data) combined with adversarial training (a technique that makes AI models resistant to intentionally tricky inputs). The authors found that simply combining these two approaches causes the model's accuracy to drop because adversarial training increases differences in the data across different computers, making the federated learning less effective. They propose SFAT (Slack Federated Adversarial Training), which uses a relaxation mechanism to adjust how the computers combine their learning results, reducing the harmful effects of data differences and improving overall performance.
Federated Learning (FL, a method where multiple computers train an AI model together without sharing raw data) can leak private information through gradient inversion attacks (GIA, techniques that reconstruct sensitive data from the mathematical updates used in training). This paper reviews three types of GIA methods and finds that while optimization-based GIA is most practical, generation-based and analytics-based GIA have significant limitations, and proposes a three-stage defense pipeline for FL frameworks.
Fix: The source mentions 'a three-stage defense pipeline to users when designing FL frameworks and protocols for better privacy protection,' but does not explicitly describe what this pipeline contains or how to implement it.
IEEE Xplore (Security & AI Journals)