Virtana Introduces AI Observability for Nutanix Platforms

Virtana has introduced its latest solution, AI Factory Observability, specifically designed for Nutanix environments. With this launch, the company aims to enhance monitoring capabilities across Nutanix Cloud Infrastructure and Nutanix Enterprise AI, helping organizations better manage increasingly complex AI operations.

To begin with, the platform provides infrastructure and platform teams with a unified operational view of AI workloads along with the systems supporting them. More importantly, it connects token-level service data with GPU utilization, storage, orchestration layers, and overall infrastructure performance within Nutanix deployments. As a result, teams can gain deeper insights into how their AI systems behave in real-time.

Notably, this launch arrives at a time when enterprises are transitioning from developing AI models to running large-scale production environments. Consequently, the demand for efficient infrastructure management has grown significantly. According to Virtana’s research, nearly 75% of enterprises experience double-digit AI job failure rates. Furthermore, more than half of these failures stem from infrastructure bottlenecks, which not only reduce throughput but also increase cost per token and limit concurrent workload capacity.

Operational Focus on Agentic AI

In particular, Virtana has tailored this solution to address the operational complexities of agentic AI systems. These systems continuously adapt their use of computing resources, making visibility across both infrastructure and application layers essential. Therefore, organizations using Nutanix environments require comprehensive monitoring that extends beyond basic infrastructure insights.

The platform covers a wide range of components, including Nutanix AHV, Nutanix Enterprise AI, Kubernetes orchestration, Nvidia GPU clusters, and distributed AI workflows. By doing so, it enables teams to identify inefficiencies, resource contention, and performance issues before they escalate.

Additionally, the system offers real-time GPU telemetry, tracking metrics such as utilization, memory usage, power consumption, temperature, and overall health. It also detects idle or underutilized GPUs and correlates workloads with GPU consumption across both training and inference processes. At the same time, token-level visibility allows organizations to measure throughput, latency, and resource demand with greater accuracy.

Another key capability lies in proactive risk detection. The platform identifies thermal, power, and reliability issues early, ensuring they do not disrupt production AI services. Moreover, it analyzes performance across multi-node and multi-GPU environments, which are critical for supporting large-scale agentic workloads.

Industry Perspective on Growing Complexity

Luke Congdon, Vice President of Product Management at Nutanix, highlighted the increasing importance of visibility as AI systems become more sophisticated in production environments.

“As enterprises adopt the Nutanix Agentic AI platform to build and run intelligent, distributed AI systems, understanding how those workloads behave across infrastructure and services becomes critical,” he said. “Virtana’s extension of observability into Nutanix Enterprise AI helps provide that visibility, enabling organizations to operate AI factories with greater performance, efficiency, and control.”

Shifting Challenges in AI Deployment

Meanwhile, Paul Appleby, Chief Executive Officer of Virtana emphasized that organizations have already overcome the initial challenge of deploying AI infrastructure. However, the real difficulty now lies in managing dynamic, distributed AI systems at scale.

“Enterprises have proven they can stand up AI infrastructure,” he said. “The challenge now is operating agentic AI environments where systems reason, adapt, and act across distributed resources. These are dynamic systems that demand full-stack visibility and control to optimize GPU utilization, manage cost efficiency, and support thousands of concurrent agents with the performance and governance required for production at scale.”

In addition, Virtana pointed out that Nutanix Enterprise AI plays a critical role in these ecosystems. Since it hosts models, agent services, and enterprise AI workflows, it becomes a central point for observing inference performance, infrastructure contention, and system reliability in a unified manner.

Further reinforcing this view, Amitkumar Rathi, Chief Product Officer at Virtana explained the need for deeper operational insight.

“AI workloads are no longer static. They are increasingly agentic, continuously adapting how they consume infrastructure,” he said. “By extending AI Factory Observability into Nutanix Enterprise AI, we give organizations end-to-end visibility and control across the layer where AI services are built and operated, while connecting that activity back to the infrastructure supporting it. Platform teams can manage performance, reliability, and cost with greater precision, and data teams gain the operational context required to run AI in production with confidence.”

Recommended Cyber Technology News:

To participate in our interviews, please write to our CyberTech Media Room at info@intentamplify.com

🔒 Login or Register to continue reading