aisecwatch.com
DashboardVulnerabilitiesNewsResearchArchiveStatsDatasetFor devs
Subscribe
aisecwatch.com

Real-time AI security monitoring. Tracking AI-related vulnerabilities, safety and security incidents, privacy risks, research developments, and policy changes.

Navigation

VulnerabilitiesNewsResearchDigest ArchiveNewsletter ArchiveSubscribeData SourcesStatisticsDatasetAPIIntegrationsWidgetRSS Feed

Maintained by

Truong (Jack) Luu

Information Systems Researcher

Research

Academic papers, new techniques, benchmarks, and theoretical findings in AI/LLM security.

to
Export CSV
691 items

BlindU: Blind Machine Unlearning Without Revealing Erasing Data

inforesearchPeer-Reviewed
researchprivacy
Jan 15, 2026

BlindU is a method that allows users to remove their data's influence from trained AI models while keeping that data hidden from the server. Instead of uploading raw data to the server (which creates privacy risks), BlindU lets users create compressed versions of their data locally, and the server performs the removal process only on these compressed versions, making it practical for federated learning (a distributed training setup where data stays on users' devices).

Fix: BlindU implements unlearning through several stated mechanisms: (1) 'the user locally generates privacy-preserving representations, and the server performs unlearning solely on these representations and their labels', (2) use of an information bottleneck mechanism that 'learns representations that distort maximum task-irrelevant information from inputs', (3) 'two dedicated unlearning modules tailored explicitly for IB-based models and uses a multiple gradient descent algorithm to balance forgetting and utility retaining', and (4) 'a noise-free differential privacy masking method to deal with the raw erasing data before compressing' for additional privacy protection.

IEEE Xplore (Security & AI Journals)

Robust Physics-Based Deep MRI Reconstruction via Diffusion Purification

inforesearchPeer-Reviewed
research

SLeak: Multi-Target Privacy Stealing Attack Against Split Learning

inforesearchPeer-Reviewed
security

Armor: Shielding Unlearnable Examples Against Data Augmentation

inforesearchPeer-Reviewed
security

Model Lineage Analysis: Determination and Closeness Measurement

inforesearchPeer-Reviewed
research

Revisiting Out-of-Distribution Detection in Real-Time Object Detection: From Benchmark Pitfalls to a New Mitigation Paradigm

inforesearchPeer-Reviewed
research

Reinforcement Learning-Based Optimal Formation Tracking for UAVs With Safety Constraints

inforesearchPeer-Reviewed
research

UQLM: A Python Package for Uncertainty Quantification in Large Language Models

inforesearchPeer-Reviewed
research

Nonparametric Estimation of a Factorizable Density using Diffusion Models

inforesearchPeer-Reviewed
research

Dialing for Dollars or Defaulting Online? Assessing Borrower Risk through Call Activity and Social Media Engagement in Microfinance

inforesearchPeer-Reviewed
research

Adoption of ChatGPT in Organizations: Technology Affordance and Constraints Theory Perspective

inforesearchPeer-Reviewed
research

HGNN Shield: Defending Hypergraph Neural Networks Against High-Order Structure Attack

inforesearchPeer-Reviewed
security

Source-Free Time-Series Domain Adaptation With Prior Evaluation of Model Salience

inforesearchPeer-Reviewed
research

Neural Machine Unranking

inforesearchPeer-Reviewed
research

Cybersecurity Challenges for the Elderly: Vulnerabilities and Risks

inforesearchPeer-Reviewed
security

Generative Artificial Intelligence: Ethical Challenges and Trust Mechanisms

inforesearchPeer-Reviewed
research

Large Language Models in Human Subject Research, and the Presence of Idiosyncratic Human Behaviors

inforesearchPeer-Reviewed
research

The Impact of Artificial Intelligence in Protecting the Online Social Community From Cyberbullying

inforesearchPeer-Reviewed
research

Slack Federated Adversarial Training

inforesearchPeer-Reviewed
research

Exploring the Vulnerabilities of Federated Learning: A Deep Dive Into Gradient Inversion Attacks

inforesearchPeer-Reviewed
security
Previous25 / 35Next
safety
Jan 14, 2026

Deep learning models used for MRI reconstruction (creating medical images from incomplete data) can fail when faced with unexpected situations like noise, different imaging settings, or unseen medical conditions. This paper proposes RODIO, a method that uses diffusion models (AI systems that gradually refine noisy data into clear images) as "purifiers" to make MRI reconstruction systems more reliable, and shows it works better than existing robustification techniques like adversarial training (deliberately exposing models to bad inputs during training to make them stronger).

Fix: The paper proposes RODIO as the solution: using pretrained diffusion models as purifiers to improve robustness by fine-tuning on purified examples, which eliminates the need for adversarial training's complex optimization process. The authors state their approach demonstrates adaptability across multiple deep learning MRI reconstruction models, compatibility with accelerated diffusion samplers, robustness to data with unseen lesions, and effectiveness with unsupervised generative reconstructors.

IEEE Xplore (Security & AI Journals)
research
Jan 14, 2026

Split Learning (SL) is a distributed learning framework designed to preserve privacy while reducing computational load, but researchers discovered a new attack called SLeak that allows a server adversary to steal client data and models. The attack works by exploiting information in the smashed data (intermediate data passed between clients and server) and server model to build a substitute client that mimics the target client's behavior, without needing strong privacy assumptions or much auxiliary data. The study shows SLeak is more effective than previous attacks across different datasets and scenarios.

IEEE Xplore (Security & AI Journals)
privacy
Jan 12, 2026

Unlearnable examples are protective noises added to private data to prevent AI models from learning useful information from them, but this paper shows that data augmentation (a common technique that creates variations of training data to improve model performance) can undo this protection and restore learnability from 21.3% to 66.1% accuracy. The researchers propose Armor, a defense framework that adds protective noise while accounting for data augmentation effects, using a surrogate model (a practice model used to simulate the real training process) and smart augmentation selection to keep private data unlearnable even after augmentation is applied.

Fix: The paper proposes Armor, a defense framework that works by: (1) designing a non-local module-assisted surrogate model to better capture the effect of data augmentation, (2) using a surrogate augmentation selection strategy that maximizes distribution alignment between augmented and non-augmented samples to choose the optimal augmentation strategy for each class, and (3) using a dynamic step size adjustment algorithm to enhance the defensive noise generation process. The authors state that 'Armor can preserve the unlearnability of protected private data under data augmentation' and plan to open-source the code upon publication.

IEEE Xplore (Security & AI Journals)
Jan 12, 2026

This research addresses how to identify whether one machine learning model is derived from another model through modification techniques (adjusting or fine-tuning an existing model rather than training from scratch), and how to measure how much two models differ from each other. The authors propose a method that determines lineage (derivative relationships) by checking if two models' parameters exist in the same local optimum of the loss landscape (the mathematical space of possible model configurations), and measure closeness by analyzing how their decision boundaries (the lines or surfaces that separate different predictions) differ from each other.

IEEE Xplore (Security & AI Journals)
safety
Jan 5, 2026

Out-of-distribution (OoD, inputs that don't match what an AI was trained on) detection in object detection systems causes AI models to make overconfident wrong predictions on objects they shouldn't recognize. This paper reveals that popular benchmark datasets used to test OoD detection have quality problems, where up to 13% of test objects are mislabeled, making current methods appear better than they really are. The authors propose a new training-time approach where object detectors are fine-tuned using carefully created OoD training data that looks similar to normal objects, which reduces false detections by 91% in YOLO models.

Fix: The paper introduces a training-time mitigation paradigm where 'we fine-tune the detector using a carefully synthesized OoD dataset that semantically resembles in-distribution objects.' This approach 'shapes a defensive decision boundary by suppressing objectness on OoD objects' and achieves 'a 91% reduction in hallucination error of a YOLO model on BDD-100 K.' The methodology is shown to work across multiple detection architectures including YOLO, Faster R-CNN, and RT-DETR.

IEEE Xplore (Security & AI Journals)
Jan 1, 2026

This article presents a control method for multiple fixed-wing UAVs (unmanned aerial vehicles, or drones) that need to fly together in formation while avoiding collisions and handling unpredictable disturbances. The approach uses reinforcement learning (a type of AI that learns by trial and error) combined with control barrier functions (mathematical tools that enforce safety constraints) to create a system that keeps the UAVs safe and stable while optimizing their performance.

IEEE Xplore (Security & AI Journals)
safety
Dec 31, 2025

Hallucinations (instances where Large Language Models generate false or misleading content) are a safety problem for AI applications. The paper introduces UQLM, a Python package that uses uncertainty quantification (UQ, a statistical technique for measuring how confident a model is in its answer) to detect when an LLM is likely hallucinating by assigning confidence scores between 0 and 1 to responses.

Fix: The source describes UQLM as 'an off-the-shelf solution for UQ-based hallucination detection that can be easily integrated to enhance the reliability of LLM outputs.' No specific implementation steps, code examples, or version details are provided in the source text.

JMLR (Journal of Machine Learning Research)
Dec 31, 2025

This research paper studies diffusion models, a type of AI used to generate images and audio, as a statistical method for density estimation (learning the probability distribution of data). The authors show that when data has a factorizable structure (meaning it can be broken into independent low-dimensional components, like in Bayesian networks), diffusion models can efficiently learn this structure and achieve optimal performance using a specially designed sparse neural network architecture (one where most connections between neurons are inactive).

JMLR (Journal of Machine Learning Research)
Dec 31, 2025

This research studies how to predict whether borrowers on micro-lending platforms (small-loan services) will default (fail to repay their loans) by examining their call activity and social media behavior. The study analyzed over 154,000 loans from Indonesian platforms and found that frequent calls and stable calling patterns suggest lower default risk, while frequent social media activity and stable social media patterns actually indicate higher default risk. These findings suggest that micro-lending platforms could improve their credit assessment models (systems for deciding who gets loans) by combining both types of behavioral data.

AIS eLibrary (Journal of AIS, CAIS, etc.)
Dec 31, 2025

This research studied what makes knowledge workers (people whose jobs involve handling information) want to use ChatGPT at work, using technology affordance and constraints theory (a framework explaining how tools enable certain actions while limiting others). The study found that ChatGPT's benefits like automation, information quality, and productivity boost adoption, but concerns about risk and lack of regulation reduce it. Personal innovativeness (how open someone is to new ideas) and supportive workplace culture help workers embrace ChatGPT despite their concerns.

AIS eLibrary (Journal of AIS, CAIS, etc.)
research
Dec 26, 2025

Hypergraph Neural Networks (HGNNs, which are AI models that learn from data where connections can link multiple items together instead of just pairs) can be weakened by structural attacks that corrupt their connections and reduce accuracy. HGNN Shield is a defense framework with two main components: Hyperedge-Dependent Estimation (which assesses how important each connection is within the network) and High-Order Shield (which detects and removes harmful connections before the AI processes data). Experiments show the framework improves performance by an average of 9.33% compared to existing defenses.

Fix: The HGNN Shield defense framework addresses the vulnerability through two modules: (1) Hyperedge-Dependent Estimation (HDE) that 'prioritizes vertex dependencies within hyperedges and adapts traditional connectivity measures to hypergraphs, facilitating precise structural modifications,' and (2) High-Order Shield (HOS) positioned before convolutional layers, which 'consists of three submodules: Hyperpath Cut, Hyperpath Link, and Hyperpath Refine' that 'collectively detect, disconnect, and refine adversarial connections, ensuring robust message propagation.'

IEEE Xplore (Security & AI Journals)
Dec 24, 2025

This paper addresses source-free domain adaptation (SFDA, a technique that adapts AI models to new datasets without accessing the original training data) for time-series data, such as sensor readings or activity logs. The authors argue that existing methods lack interpretability and may learn spurious patterns, so they propose PrEPoA, a framework that evaluates which parts of the time-series data the model considers important before fine-tuning it on the target domain. They demonstrate their approach works better than existing methods across five different real-world datasets.

IEEE Xplore (Security & AI Journals)
privacy
Dec 23, 2025

This research addresses machine unlearning in neural IR (information retrieval, the technology that ranks search results), a process called neural machine unranking (NuMuR) that selectively removes data from AI systems for privacy compliance. The authors propose CoCoL (contrastive and consistent loss, a method with two complementary training objectives), which uses a contrastive loss to reduce relevance scores on forgotten data while preserving performance on shared data, plus a consistent loss to maintain accuracy on retained data, demonstrating effective data removal across multiple neural ranking models.

Fix: The proposed solution is CoCoL, a dual-objective framework comprising: 1) a contrastive loss that reduces relevance scores on forget sets while maintaining performance on entangled samples, and 2) a consistent loss that preserves accuracy on the retain set. According to the paper, CoCoL achieves substantial forgetting with minimal retention and generalization performance loss.

IEEE Xplore (Security & AI Journals)
Dec 22, 2025

Elderly people are increasingly using digital technology for communication and information access, but their limited cybersecurity knowledge makes them attractive targets for cybercriminals. The article examines common cybercrimes targeting seniors, the specific vulnerabilities that put them at risk, and existing approaches to reduce these dangers.

IEEE Xplore (Security & AI Journals)
safety
Dec 22, 2025

Generative AI (systems that create new text, images, or other content) is transforming many industries but raises ethical concerns like data privacy (protecting personal information), bias (unfair treatment of certain groups), transparency (being open about how the AI works), and accountability (responsibility for the AI's actions). Researchers propose a trust framework based on transparency, fairness, accountability, and privacy to help ensure generative AI is developed and used responsibly.

IEEE Xplore (Security & AI Journals)
safety
Dec 22, 2025

Large language models (LLMs, AI systems trained on huge amounts of text to generate human-like responses) can now mimic not just general human language but also unusual, individual-specific human behaviors. This ability could lead to LLMs being used more widely in research studies and potentially reduce the role of actual humans, which raises concerns about AI alignment (ensuring AI systems behave in ways humans intend and approve of) and how this technology affects society.

IEEE Xplore (Security & AI Journals)
safety
Dec 22, 2025

Cyberbullying on social media is a growing problem that harms people's mental health, and traditional methods to stop it are no longer effective. This study examines how artificial intelligence can help protect online communities from cyberbullying by exploring different AI technologies, their uses, and the challenges involved. The goal is to understand how AI might create safer online environments.

IEEE Xplore (Security & AI Journals)
security
Dec 22, 2025

This research addresses a problem in federated learning (a method where multiple computers train an AI model together without sharing raw data) combined with adversarial training (a technique that makes AI models resistant to intentionally tricky inputs). The authors found that simply combining these two approaches causes the model's accuracy to drop because adversarial training increases differences in the data across different computers, making the federated learning less effective. They propose SFAT (Slack Federated Adversarial Training), which uses a relaxation mechanism to adjust how the computers combine their learning results, reducing the harmful effects of data differences and improving overall performance.

IEEE Xplore (Security & AI Journals)
research
Dec 22, 2025

Federated Learning (FL, a method where multiple computers train an AI model together without sharing raw data) can leak private information through gradient inversion attacks (GIA, techniques that reconstruct sensitive data from the mathematical updates used in training). This paper reviews three types of GIA methods and finds that while optimization-based GIA is most practical, generation-based and analytics-based GIA have significant limitations, and proposes a three-stage defense pipeline for FL frameworks.

Fix: The source mentions 'a three-stage defense pipeline to users when designing FL frameworks and protocols for better privacy protection,' but does not explicitly describe what this pipeline contains or how to implement it.

IEEE Xplore (Security & AI Journals)