Simbian, a self-improving SecOps company, has officially announced the formation of the Simbian Research Lab. At the same time, the company introduced the Simbian Cyber Defense Benchmark, a new framework designed to evaluate how effectively large language models (LLMs) can detect MITRE ATT&CK chains in complex, real-world scenarios.
To begin with, the Simbian Research Lab developed this benchmark to address a major gap in cybersecurity testing. While many frontier AI models excel at identifying and exploiting software vulnerabilities, they still struggle significantly when applied to cyber defense tasks. In fact, none of the 11 tested models managed to achieve a passing score in this benchmark. Notably, Anthropic Claude Opus 4.6 emerged as the top performer; however, it detected only an average of 46% of attack evidence per MITRE tactic. Moreover, every model failed to identify entire categories of attacks, highlighting critical limitations in current AI capabilities.
Furthermore, Simbian designed the Cyber Defense Benchmark to simulate advanced and realistic attack scenarios. This approach represents a shift from traditional benchmarks, which typically rely on static questions about cyberattacks. Instead, this new benchmark uses real attack telemetry within an agentic investigation framework, requiring models to actively analyze and respond to evolving threats. As a result, it offers a more accurate measure of how LLMs perform in real-world security operations.
In addition, the research tested models from major AI providers, including Anthropic, OpenAI, and Google, alongside leading open-weight models from Alibaba, Minimax, DeepSeek, and Moonshot AI. These models operated within a simple ReAct loop and were tasked with identifying attackers and their tactics. Although Anthropic Opus 4.6 detected three times more threat indicators than Google Gemini 3 Flash, it did so at nearly 100 times the cost, raising concerns about efficiency and scalability.
“Our research shows you can’t throw an LLM dart in the dark and expect to hit the cyber defense bullseye,” said Ambuj Kumar, Founder and CEO of Simbian. “The same frontier models that perform strongly during cyberattacks struggle on the defense side. Defense is fundamentally harder than offense as it requires reasoning across noisy, partial evidence rather than executing against a known target. The LLMs must be accompanied by outside intelligence in the form of a sophisticated harness. Simbian has been able to get 95% accuracy in production enterprise environments on cyber defense SecOps following some of these techniques.”
Additionally, industry experts have recognized the importance of this benchmark. “We know the large models can do amazing things, but can we measure their efficacy in analyzing machine logs for security events?” said Richard Stiennon, Chief Research Analyst, cybersecurity industry analyst firm IT-Harvest. “This benchmark answers that question. In contrast to existing AI security benchmarks, this benchmark was designed to be difficult to game. It uses real telemetry rather than curated questions, mutates context to prevent memorization, enforces deterministic scoring against ground truth, and tracks detection cost alongside accuracy.”
Ultimately, the introduction of the Simbian Cyber Defense Benchmark marks a crucial step toward advancing AI-driven security operations. By focusing on realistic attack detection and measurable outcomes, Simbian aims to push the boundaries of what LLMs can achieve in defending modern digital environments.
Recommended Cyber Technology News :
- Marlink Integrates Stellar Cyber NDR Into SecOps Stack
- NSS Labs Launches AI Security Testing Framework
- PyPI Package Hack Exposes Devs via elementary-data
To participate in our interviews, please write to our CyberTech Media Room at info@intentamplify.com
🔒 Login or Register to continue reading





