Academic papers, new techniques, benchmarks, and theoretical findings in AI/LLM security.
Version 5.2.0 adds new attack techniques against AI systems, including methods to steal credentials from AI agent tools (software components that perform actions on behalf of an AI), poison training data, and generate malicious commands. It also introduces new defenses such as segmenting AI agent components, validating inputs and outputs, detecting deepfakes, and implementing human oversight for AI agent actions.
Fix: The source lists mitigations rather than fixes for a specific vulnerability. Key mitigations mentioned include: Input and Output Validation for AI Agent Components, Segmentation of AI Agent Components, Restrict AI Agent Tool Invocation on Untrusted Data, Human In-the-Loop for AI Agent Actions, Adversarial Input Detection, Model Hardening, Sanitize Training Data, and Generative AI Guardrails.
MITRE ATLAS ReleasesLarge language models face four main types of adversarial threats: privacy breaches (exposing sensitive data the model learned), integrity compromises (corrupting the model's outputs or training data), adversarial misuse (using the model for harmful purposes), and availability disruptions (making the model unavailable or slow). The article organizes these threats by their attackers' goals to help understand the landscape of vulnerabilities in LLMs.
This short story examines privacy risks that arise when companies are bought and sold, particularly concerning AI digital twins (AI models that replicate a specific person's behavior and knowledge) and the problems that occur when organizations fail to threat model (identify and plan for potential security risks in) major changes to their systems and technology. The story raises ethical questions about these scenarios.
Current AI assistants are not yet trustworthy enough to be personal advisors, despite how useful they seem. They fail in specific ways: they encourage users to make poor decisions, they create false doubt about things people know to be true (gaslighting), and they confuse a person's current identity with their past. They also struggle when information is incomplete or inaccurate, with no reliable way to fix errors or hold the system responsible when wrong information causes harm.
Federated learning (collaborative model training where participants share only gradients, not raw data) is vulnerable to gradient inversion attacks, where adversaries reconstruct sensitive training data from the shared gradients. The paper proposes Gradient Dropout, a defense that randomly scales some gradient components and replaces others with Gaussian noise (random numerical values) to disrupt reconstruction attempts while maintaining model accuracy.
Fix: Gradient Dropout is applied as a defense mechanism: it perturbs gradients by randomly scaling a subset of components and replacing the remainder with Gaussian noise, applied across all layers of the model. According to the source, this approach yields less than 2% accuracy reduction relative to baseline while significantly impeding reconstruction attacks.
IEEE Xplore (Security & AI Journals)Concept drift (when data patterns change over time due to evolving attacks or environments) is a major problem for machine learning models used in cybersecurity, since frequent retraining is expensive and hard to understand. DriftTrace is a new system that detects concept drift at the sample level (individual data points) using a contrastive learning-based autoencoder (a type of neural network that learns patterns without needing lots of labeled examples), explains which features caused the drift using feature selection, and adapts to drift by balancing training data. The system was tested on malware and network intrusion datasets and achieved strong results, outperforming existing approaches.
Fix: DriftTrace addresses concept drift through three mechanisms: (1) detecting drift at the sample level using a contrastive learning-based autoencoder without requiring extensive labeling, (2) employing a greedy feature selection strategy to explain which input features are relevant to drift detection decisions, and (3) leveraging sample interpolation techniques to handle data imbalance during adaptation to the drift.
IEEE Xplore (Security & AI Journals)This research addresses authentication risks in wireless IoT devices by proposing LiteNP-Net, a lightweight neural network for physical layer authentication (PLA, a security method that verifies device identity using unique wireless channel characteristics). The approach combines hypothesis testing theory with deep learning to create a system that works effectively even without detailed prior knowledge of wireless channel properties, and testing showed it performs better than existing methods in real-world Wi-Fi environments.
This research introduces DeSA, a protocol for secure aggregation (a privacy technique that protects individual data while combining results) in federated learning (a machine learning approach where multiple devices train a shared model without sending raw data to a central server) across decentralized device-to-device networks. The protocol addresses challenges in zero-trust networks (environments where no participant is automatically trusted) by using zero-knowledge proofs (cryptographic methods that verify information is correct without revealing the information itself) to verify model training, protecting against Byzantine attacks (attacks where malicious nodes send false information to disrupt the system), and employing a one-time masking method to maintain privacy while allowing model aggregation.
Subgraph Federated Learning (FL, a system where pieces of a graph are distributed across multiple devices to protect data privacy) is vulnerable to backdoor attacks (hidden malicious functions that cause a model to behave incorrectly when triggered). Researchers developed BEEF, an attack method that uses adversarial perturbations (carefully crafted small changes to input data that fool the model) as hidden triggers while keeping the model's internal parameters unchanged, making the attack harder to detect than existing methods.
Mobile super apps (large platforms that host smaller third-party applications, called miniapps, which share the same underlying services) create new security risks because multiple apps can access shared resources and data. Researchers studied how these ecosystems work, identified security vulnerabilities and potential abuses, and developed recommendations to make super app platforms safer while keeping them easy to use.
This research analyzes how discussions about Generative AI spread across different industries (like media, healthcare, and finance) in the six months after ChatGPT's release, using social media data and innovation theory. The study found that different industries had different concerns: media and marketing focused on content generation with positive views, while healthcare and finance were more cautious and focused on analysis. Misinformation was the biggest concern overall, and the research showed that emotional reactions (sentiment) were the main factor driving how quickly information about AI spread between people.
Generative artificial intelligence (GAI, AI systems that create new text, images, or code) is significantly changing how information systems are taught in universities. IS educators are discussing both the benefits and risks of GAI, including concerns about academic integrity (students using AI to cheat), and they are developing recommendations for how to responsibly teach with and about GAI in the classroom.
This research paper analyzes how companies that invest in digital technologies, including AI, affect their greenhouse gas emissions and natural resource use. The study found that companies investing in these technologies tend to reduce their emissions and consume fewer natural resources, suggesting that digital tools can help address environmental challenges.
This paper addresses white-box attacks (scenarios where attackers can see all the inner workings of an encryption system and control the computer it runs on), which are harder to defend against than black-box attacks (where attackers cannot see the implementation). The authors propose a new method to protect symmetric encryption algorithms that use substitution-permutation networks (a common encryption structure that substitutes and rearranges data) by adding secret components to lookup tables, making the encryption stronger without changing the final encrypted message.
Unlearnable examples are protective noises added to private data to prevent AI models from learning useful information from them, but this paper shows that data augmentation (a common technique that creates variations of training data to improve model performance) can undo this protection and restore learnability from 21.3% to 66.1% accuracy. The researchers propose Armor, a defense framework that adds protective noise while accounting for data augmentation effects, using a surrogate model (a practice model used to simulate the real training process) and smart augmentation selection to keep private data unlearnable even after augmentation is applied.
Fix: The paper proposes Armor, a defense framework that works by: (1) designing a non-local module-assisted surrogate model to better capture the effect of data augmentation, (2) using a surrogate augmentation selection strategy that maximizes distribution alignment between augmented and non-augmented samples to choose the optimal augmentation strategy for each class, and (3) using a dynamic step size adjustment algorithm to enhance the defensive noise generation process. The authors state that 'Armor can preserve the unlearnability of protected private data under data augmentation' and plan to open-source the code upon publication.
IEEE Xplore (Security & AI Journals)This research paper studies diffusion models, a type of AI used to generate images and audio, as a statistical method for density estimation (learning the probability distribution of data). The authors show that when data has a factorizable structure (meaning it can be broken into independent low-dimensional components, like in Bayesian networks), diffusion models can efficiently learn this structure and achieve optimal performance using a specially designed sparse neural network architecture (one where most connections between neurons are inactive).
Hallucinations (instances where Large Language Models generate false or misleading content) are a safety problem for AI applications. The paper introduces UQLM, a Python package that uses uncertainty quantification (UQ, a statistical technique for measuring how confident a model is in its answer) to detect when an LLM is likely hallucinating by assigning confidence scores between 0 and 1 to responses.
Fix: The source describes UQLM as 'an off-the-shelf solution for UQ-based hallucination detection that can be easily integrated to enhance the reliability of LLM outputs.' No specific implementation steps, code examples, or version details are provided in the source text.
JMLR (Journal of Machine Learning Research)This research studies how to predict whether borrowers on micro-lending platforms (small-loan services) will default (fail to repay their loans) by examining their call activity and social media behavior. The study analyzed over 154,000 loans from Indonesian platforms and found that frequent calls and stable calling patterns suggest lower default risk, while frequent social media activity and stable social media patterns actually indicate higher default risk. These findings suggest that micro-lending platforms could improve their credit assessment models (systems for deciding who gets loans) by combining both types of behavioral data.
This research studied what makes knowledge workers (people whose jobs involve handling information) want to use ChatGPT at work, using technology affordance and constraints theory (a framework explaining how tools enable certain actions while limiting others). The study found that ChatGPT's benefits like automation, information quality, and productivity boost adoption, but concerns about risk and lack of regulation reduce it. Personal innovativeness (how open someone is to new ideas) and supportive workplace culture help workers embrace ChatGPT despite their concerns.
Hypergraph Neural Networks (HGNNs, which are AI models that learn from data where connections can link multiple items together instead of just pairs) can be weakened by structural attacks that corrupt their connections and reduce accuracy. HGNN Shield is a defense framework with two main components: Hyperedge-Dependent Estimation (which assesses how important each connection is within the network) and High-Order Shield (which detects and removes harmful connections before the AI processes data). Experiments show the framework improves performance by an average of 9.33% compared to existing defenses.
Fix: The HGNN Shield defense framework addresses the vulnerability through two modules: (1) Hyperedge-Dependent Estimation (HDE) that 'prioritizes vertex dependencies within hyperedges and adapts traditional connectivity measures to hypergraphs, facilitating precise structural modifications,' and (2) High-Order Shield (HOS) positioned before convolutional layers, which 'consists of three submodules: Hyperpath Cut, Hyperpath Link, and Hyperpath Refine' that 'collectively detect, disconnect, and refine adversarial connections, ensuring robust message propagation.'
IEEE Xplore (Security & AI Journals)