A new class of vulnerabilities known as “Comment and Control” prompt injection attacks has been discovered in AI-powered developer tools, exposing critical security risks in modern software workflows. Researchers led by Aonan Guan at Johns Hopkins University revealed that widely used AI agents including Claude Code Security Review, Google Gemini CLI Action, and GitHub Copilot Agent can be manipulated using standard GitHub communication channels.
Unlike traditional attacks, this technique does not rely on external infrastructure. Instead, attackers embed malicious instructions directly into pull request titles, issue descriptions, or comments. Because these AI agents are designed to analyze repository content, they unknowingly ingest these instructions as part of their workflow. As a result, they fail to distinguish between legitimate system prompts and attacker-controlled input.
Consequently, once the AI agent processes the manipulated content, it may execute unauthorized commands using the permissions of the GitHub Actions environment. This can lead to the exposure of sensitive data, including API keys, environment variables, and access tokens posing a serious risk to development pipelines.
The research highlights how each platform is affected differently. For instance, Claude Code Security Review integrates pull request data directly into its prompts without proper sanitization. Attackers can inject commands such as system-level queries, which the agent executes. In some cases, extracted credentials are then included in public pull request comments or logged within workflows. This vulnerability was rated critical, with a CVSS score of 9.4, and has since been partially mitigated.
Similarly, the Google Gemini CLI Action is vulnerable through issue-based workflows. Attackers can append deceptive instructions such as a fake “Trusted Content Section” to override built-in safety mechanisms. As a result, the system may output sensitive data like API keys directly into public repositories.
Meanwhile, the GitHub Copilot Agent demonstrates an even more advanced exploitation method. In this case, attackers hide malicious instructions inside invisible HTML comments within issues. When assigned to the agent, it processes the hidden payload and executes commands such as system process enumeration. Notably, attackers can then encode the output to bypass detection mechanisms and exfiltrate sensitive data through standard repository actions like commits or pull requests.
Furthermore, this technique successfully bypasses multiple layers of security. It circumvents environment filtering by accessing parent process variables, evades secret scanning through encoding methods, and avoids network restrictions by using trusted platforms like GitHub itself for data exfiltration.
At its core, the issue stems from a fundamental architectural challenge. AI agents require access to powerful tools and sensitive credentials to function effectively. However, they must also process untrusted, user-generated input as part of normal development workflows. Therefore, this overlap creates an inherent security gap that attackers can exploit.
Ultimately, the findings underscore a critical need for stronger safeguards in AI-driven development environments. Until organizations separate untrusted input processing from privileged execution contexts, these systems will remain vulnerable to indirect prompt injection attacks regardless of existing model-level defenses.
Recommended Cyber Technology News:
- Cato Networks Launches Enterprise Browser for Unified Zero Trust Security
- LACOE Investigates Potential Data Breach Incident
- Elisa and Valmet Strengthen Cybersecurity and Network Collaboration
To participate in our interviews, please write to our CyberTech Media Room at info@intentamplify.com
