Academic papers, new techniques, benchmarks, and theoretical findings in AI/LLM security.
This paper presents Co-AttenDWG, a new method for detecting offensive content by combining text and images together. The approach uses coattention (a technique where two types of data pay attention to each other simultaneously), dimension-wise gating (a mechanism that selectively emphasizes important features at a detailed level), and expert fusion (combining predictions from multiple specialized models) to better understand how text and visual information relate to each other.
This research presents a federated learning (FL, a technique where multiple parties train an AI model together without sharing raw data) approach for learning the structure of dynamic Bayesian networks (DBN, a statistical model that represents relationships between variables over time) from distributed time series data. The method addresses challenges like data privacy and heterogeneity (when different parties' data follows different patterns), and provides mathematical proof that the approach reliably converges to good results, which hadn't been formally guaranteed before in this setting.
N/A -- The provided content is a navigation menu and feature listing from GitHub's website, not a security issue, vulnerability report, or technical problem related to AI/LLMs.
ATLAS Data v5.0.0 introduces a new "Technique Maturity" field that categorizes AI attack techniques based on evidence level, ranging from feasible (proven in research) to realized (used in actual attacks). The release adds 11 new techniques covering AI agent attacks like context poisoning (injecting false information into an AI system's memory), credential theft from AI configurations, and prompt injection (tricking an AI by hiding malicious instructions in its input), plus updates to existing techniques and case studies.
This article describes a maritime operational technology shipboard testbed, which is a controlled testing environment on a ship that mimics real maritime systems. The testbed allows cybersecurity researchers and professionals to safely study cyberattacks and test defensive strategies without risking actual ships or critical systems.
This research addresses challenges in asynchronous federated learning (AFL, a distributed machine learning approach where multiple devices train a model on their own data without sending raw data to a central server), specifically when devices have different types of objective functions and uneven data. The authors propose two main improvements: a staleness-aware aggregation mechanism (a method that reduces the influence of outdated updates from slower devices) and a dynamic learning rate schedule (an adaptive parameter that adjusts training speed based on how delayed each device's updates are) to improve model accuracy and stability in real-world environments where devices have different computing power and network speeds.
Fix: The source explicitly proposes two solutions: (1) 'a staleness-aware aggregation mechanism that penalizes outdated updates, ensuring fresher data have a more significant influence on the global model,' and (2) 'a dynamic learning rate schedule that adapts to client staleness and heterogeneity, improving stability and convergence.' The authors demonstrate practical implementation using 'PyTorch and Python's asyncio library.'
IEEE Xplore (Security & AI Journals)This research presents LipVor, an algorithm that mathematically verifies whether a trained neural network (a computer model with interconnected nodes that learns patterns) follows partial monotonicity constraints, which means outputs change predictably with certain inputs. The method works by testing the network at specific points and using mathematical properties to guarantee the network behaves correctly across its entire domain, potentially allowing neural networks to be used in critical applications like credit scoring where trustworthiness and predictable behavior are required.
Researchers developed TabExtractor, a tool that can steal tabular models (AI systems trained on spreadsheet-like data) without needing access to the original training data or knowing how the model was built. The attack works by creating synthetic data samples and using a special neural network architecture called a contrastive tabular transformer (CTT, a type of AI that learns by comparing similar and different examples) to reverse-engineer a clone of the victim model that performs almost as well as the original. This research shows that tabular models face serious security risks from extraction attacks.
Researchers discovered a type of backdoor attack (hidden malicious instructions planted in AI systems) on multiagent reinforcement learning systems, where one adversary agent uses its actions to trigger hidden failures in other agents' decision-making policies. Unlike previous attacks that assumed unrealistic direct control over what victims observe, this attack is more practical because it works through normal agent interactions in partially observable environments (where agents cannot always see what others are doing). The researchers developed a training method to help adversary agents efficiently trigger these backdoors with minimal suspicious actions.
This article describes BMMA-GPT, a biometric authentication system that uses multiple forms of identification (like fingerprints and facial recognition) together with mathematical optimization to improve security and speed. The system uses a dual-threshold approach (two decision points to verify identity) and can be tailored to different organizational needs, achieving high accuracy while keeping verification time under 1.5 seconds.
Machine unlearning allows AI models to forget the effects of specific training samples, but verifying whether this actually happened is difficult because existing checks (like backdoor attacks or membership inference attacks, which test if a model remembers data by trying to extract or manipulate it) can be fooled by a dishonest model provider who simply retrains the model to pass the test rather than truly unlearning. This paper proposes IndirectVerify, a formal verification method that uses pairs of connected samples (trigger samples that are unlearned and reaction samples that should be affected by that unlearning) with intentional perturbations (small changes to training data) to create indirect evidence that unlearning actually occurred, making it harder to fake.
This research addresses privacy risks in decentralized optimization (where multiple networked computers work together to solve a problem without a central coordinator) by proposing ZS-DDAPush, an algorithm that adds mathematical noise structures to protect sensitive node information during communication. The key innovation is that ZS-DDAPush achieves privacy protection while maintaining the accuracy and efficiency of the optimization process, avoiding the typical trade-offs seen in other privacy methods like differential privacy (adding statistical noise to protect individual data) or encryption (scrambling data so only authorized parties can read it).
AI systems used for important decisions often rely on empirical risk minimization (ERM, a training method that reduces prediction errors on known data) to build models, but these systems can suffer from unintentional bias, lack of transparency, and other risks. The EU has established Ethics Guidelines requiring trustworthy AI to meet seven key requirements, yet current ERM-based design prioritizes accuracy over trustworthiness. This article argues that developers need to balance four core objectives when designing AI systems: fairness (not discriminating against groups), privacy (protecting user data), robustness (resisting intentional attacks like fake news), and explainability (being transparent about how decisions are made).
This research proposes a new method for deploying cyber deception (defensive tricks to confuse attackers) in networks by combining deep reinforcement learning (a type of AI that learns by trial and error) with game theory that accounts for time delays. The method uses an algorithm called proximal policy optimization (PPO, a technique for training AI to make optimal decisions) to figure out where and when to place deception resources, and tests show it outperforms existing approaches in handling complex network attacks.
This research presents a new method for generating counterfactual explanations (minimal changes needed to flip an AI model's prediction), which are a type of explainable AI that helps users understand why models make specific decisions. The approach combines physics concepts like energy minimization and simulated annealing (an optimization technique inspired by metallurgy) to find the smallest, most realistic modifications needed to change a model's output, with applications tested in cybersecurity for Internet of Things devices (networked physical devices like sensors and cameras).
This research paper proposes a new cryptographic method for secure data sharing in Internet of Vehicles (IoV, a system where vehicles communicate with each other and road infrastructure). The method uses Certificateless Signcryption (CLSC, a technique that encrypts data and verifies its authenticity without requiring traditional certificates) to allow one sender to securely share customized data with multiple specific receivers while keeping it hidden from others, even across different geographic regions. The proposed approach reduces computational complexity and includes privacy protections through pseudonym generation (creating fake identities).
This paper presents DynMD, a new machine learning model that uses Graph Neural Networks (GNNs, which are AI systems that analyze connected data points and their relationships) to detect malware by analyzing streaming behavioral data (information about what a program does over time). Unlike previous approaches that miss how malware behaviors connect over time, DynMD uses an energy-based method to better understand malware patterns and can detect threats 3.81 to 5.33 times faster than existing systems.
Mujaz is a system that uses natural language processing (NLP, the field of AI that helps computers understand human language) to automatically clean up and summarize vulnerability descriptions found in public databases. The system was trained on a collection of carefully labeled vulnerability summaries and uses pre-trained language models (AI systems trained on large amounts of text) to create clearer, more consistent descriptions that help developers and organizations understand and patch security issues more effectively.
This paper describes a new watermarking technique (a method to embed hidden ownership markers into AI models) that remains stable when models are fine-tuned (adjusted to perform new tasks) across different domains. The researchers propose a system that automatically adjusts synthetic training samples and watermark embedding based on the specific data, using out-of-distribution awareness (detecting when data differs significantly from expected patterns) to keep the watermark robust while maintaining the model's performance on its actual task.
Researchers developed BPDA, a method for finding security vulnerabilities in embedded firmware (software that runs on devices like routers and IoT devices) by tracking how user input flows through code to reach dangerous functions called sinks. The method is faster and more accurate than existing tools, discovering 163 real vulnerabilities including 34 previously unknown ones when tested on firmware from major manufacturers.