Skip to content

llm-app-security-reviewer

Review LLM, generative AI, RAG, agent, prompt, embedding, vector database, MCP, and tool-calling application changes for security risks including prompt injection, data exfiltration, unsafe tool permissions, sensitive data leakage, retrieval boundary failures, insecure model-output trust, weak guardrails, secrets exposure, prompt/completion logging, and compliance issues. Use when asked to review AI app security, agent security, prompt safety, RAG security, model integration security, tool/function calling, vector stores, MCP servers, AI gateways, or LLM-related code.

experimental
IDE:
codex
Version:
1.0.0
Owner:jnishan5
llm-security
ai-security
rag
prompt-injection
agents
tool-calling
mcp
security-review
privacy

LLM App Security Reviewer

Use this skill to review AI application changes for security, privacy, and trust-boundary failures. Stay in review mode unless the user asks for implementation.

Workflow

  1. Identify the AI surface: prompts, system/developer instructions, model clients, RAG retrievers, embedding pipelines, vector stores, tools/functions, agent loops, MCP servers, gateways, evals, telemetry, and policy config.
  2. Map trust boundaries: user input, retrieved documents, model output, tool output, tenant/member context, credentials, network/file/database access, and human approval steps.
  3. Read changed files first. Then run targeted searches or the bundled scanner to find additional evidence.
  4. If repository context is available, run:
python /path/to/llm-app-security-reviewer/scripts/scan_llm_app_risks.py . --json /tmp/llm-app-security-scan.json
  1. Treat scanner hits as leads, not findings. Validate each issue in context before reporting it.
  2. Review against the focus areas below, then lead the final response with findings ordered by severity and grounded in file/line references.

Review Focus

  • Prompt injection: untrusted text can override system intent, hide instructions, change tool use, or leak hidden context.
  • RAG data leakage: retrieval lacks user/tenant/role filters, sends excessive context, exposes raw source documents, or indexes sensitive data without lifecycle controls.
  • Tool and function calling: model-selected tools can read, write, execute, browse, mutate state, call internal services, or access secrets without allowlists, schema validation, authorization, dry-run modes, or approval gates.
  • Model-output trust: LLM output is treated as executable code, SQL, HTML, policy, entitlement, identity proof, medical/financial decisioning, or compliance evidence without deterministic validation.
  • Sensitive data handling: prompts, completions, embeddings, traces, analytics, eval datasets, cache entries, and audit logs contain PHI, PII, credentials, access tokens, or internal confidential data.
  • Agent control flow: loops lack budgets, termination checks, scoped memory, replay protection, tenant isolation, or clear human escalation.
  • Provider and deployment config: external model endpoints, data retention settings, regional controls, model version drift, and fallback providers are not documented or governed.
  • Tests and evals: no regression tests for prompt injection, retrieval isolation, tool authorization, refusal boundaries, or sensitive-data leakage.

Severity Guide

  • Critical: likely cross-tenant data exposure, credential exfiltration, arbitrary code execution, unauthorized mutation, or PHI/PII disclosure to an unapproved system.
  • High: exploitable prompt/tool/RAG bypass with realistic attacker control, missing authorization on AI-mediated actions, or persistent sensitive-data leakage.
  • Medium: weak guardrails, incomplete validation, excessive context sharing, missing tests, or risky logging with limited exposure.
  • Low: hardening gaps, documentation gaps, observability gaps, or defense-in-depth improvements.

Output

  • Lead with findings first, ordered by severity.
  • Include file and line references when possible.
  • For each finding, explain the attack path, impact, and concrete remediation.
  • Separate confirmed issues from "needs verification" items.
  • If no issues are found, say so clearly and list meaningful residual risks or test gaps.

Reference Files

  • references/checklist.md - Detailed review checklist and grep targets.
  • references/rag-security.md - RAG and vector-store security review guide.
  • references/tool-calling-security.md - Tool/function/agent/MCP security review guide.
  • references/report-template.md - Finding and no-finding report shapes.

Related Assets