Critical vulnerabilities uncovered in Ollama are raising fresh concerns around the security of enterprise AI infrastructure after researchers revealed flaws capable of exposing sensitive memory data and enabling persistent code execution on Windows systems.

The most serious issue, tracked as CVE-2026-7482 and nicknamed “Bleeding Llama,” could allow attackers to retrieve information directly from Ollama server memory without authentication. Researchers estimate the flaw may affect more than 300,000 internet-exposed servers globally.

For organizations rapidly deploying local AI infrastructure and self-hosted large language models, the disclosure highlights how quickly AI systems are becoming part of the enterprise attack surface often before governance and security controls are fully established.

What Researchers Discovered

Researchers from Cyera disclosed the issue after identifying a critical out-of-bounds read vulnerability affecting Ollama’s GGUF model loader. The flaw impacts Ollama versions earlier than 0.17.1 and carries a CVSS severity score of 9.1.

According to the researchers, attackers could exploit the vulnerability by uploading a specially crafted GGUF model file through the /api/create  endpoint and then extracting leaked process memory through the /api/push API.

Potentially Exposed Enterprise Data

The exposed memory may contain:

  • API keys
  • Environment variables
  • Proprietary prompts
  • User conversations
  • Internal code snippets
  • Sensitive enterprise workflow data

Researchers also uncovered two additional Windows-related vulnerabilities tied to Ollama’s update mechanism:

  • CVE-2026-42248 — missing signature verification
  • CVE-2026-42249 — path traversal flaw

When chained together, the two issues could allow persistent code execution every time a user logs into a Windows system. Researchers noted that the Windows vulnerabilities reportedly remain unpatched for affected versions after the expiration of a 90-day disclosure period.

Why The Disclosure Matters Beyond Ollama

The disclosure reflects a broader shift taking place across enterprise cybersecurity as AI infrastructure increasingly becomes operational infrastructure.

Organizations are moving quickly to deploy local LLM frameworks, AI copilots, and self-hosted inference environments because they offer greater flexibility, local processing, and more direct control over enterprise data. However, many of these deployments are happening without the hardened controls traditionally applied to production systems.

That gap is becoming more visible as AI systems begin handling sensitive internal workflows, proprietary business data, customer information, API credentials, and developer secrets.

AI Memory Exposure Creates A Different Kind Of Risk

Unlike traditional infrastructure breaches, attacks targeting AI inference memory can expose the operational logic of the business itself including prompts, workflows, embedded automation logic, and internal decision-making processes.

The incident also reflects how AI adoption inside enterprises is frequently evolving faster than governance frameworks around it. In many organizations, security teams are now being asked to secure AI systems that originated through developer experimentation and decentralized adoption rather than formal enterprise deployment strategies.

Enterprise Security Teams Are Facing A Visibility Problem

Many enterprises initially viewed self-hosted AI deployments as safer alternatives to cloud-based AI services because they reduced reliance on third-party infrastructure providers.

This incident challenges that assumption by showing how exposed inference systems can quickly become high-value attack targets when authentication, segmentation, and runtime visibility controls are weak or missing.

Why Security Leaders Are Paying Attention

For many CISOs, the immediate challenge is visibility. AI infrastructure often stretches across developer environments, research systems, APIs, model repositories, and production workloads simultaneously, creating operational blind spots that traditional security programs were never designed to manage at scale.

This matters particularly for SOC teams, platform engineering leaders, DevSecOps groups, and cloud security teams now being tasked with securing AI systems that may have entered the organization through decentralized experimentation rather than structured deployment programs.

The disclosure is also likely to accelerate investment in AI runtime monitoring, workload segmentation, API security, secure model lifecycle management, and continuous exposure management strategies aimed at protecting inference environments before they become deeply embedded into critical business operations.

AI Infrastructure Is Becoming Core Enterprise Infrastructure

The Ollama vulnerabilities are another indication that enterprise AI adoption is entering a new phase where inference infrastructure is no longer treated as an experimental innovation layer operating at the edge of the business.

These systems are increasingly becoming operational platforms that process sensitive data, business logic, and enterprise decision-making at scale.

That transition is reshaping how organizations think about infrastructure risk not just around cloud environments and endpoints, but around the AI systems increasingly sitting at the center of modern enterprise operations.

Research and Intelligence Sources: cyera

To participate in our interviews, please write to our CyberTech Media Room at info@intentamplify.com



🔒 Login or Register to continue reading