llm-app-security-reviewer

Review LLM, generative AI, RAG, agent, prompt, embedding, vector database, MCP, and tool-calling application changes for security risks including prompt injection, data exfiltration, unsafe tool permissions, sensitive data leakage, retrieval boundary failures, insecure model-output trust, weak guardrails, secrets exposure, prompt/completion logging, and compliance issues. Use when asked to review AI app security, agent security, prompt safety, RAG security, model integration security, tool/function calling, vector stores, MCP servers, AI gateways, or LLM-related code.

experimental

IDE:

codex

Version:

1.0.0

Owner:jnishan5

llm-security

ai-security

rag

prompt-injection

agents

tool-calling

mcp

security-review

privacy

LLM App Security Reviewer

Use this skill to review AI application changes for security, privacy, and trust-boundary failures. Stay in review mode unless the user asks for implementation.

Workflow

Identify the AI surface: prompts, system/developer instructions, model clients, RAG retrievers, embedding pipelines, vector stores, tools/functions, agent loops, MCP servers, gateways, evals, telemetry, and policy config.
Map trust boundaries: user input, retrieved documents, model output, tool output, tenant/member context, credentials, network/file/database access, and human approval steps.
Read changed files first. Then run targeted searches or the bundled scanner to find additional evidence.
If repository context is available, run:

python /path/to/llm-app-security-reviewer/scripts/scan_llm_app_risks.py . --json /tmp/llm-app-security-scan.json

Treat scanner hits as leads, not findings. Validate each issue in context before reporting it.
Review against the focus areas below, then lead the final response with findings ordered by severity and grounded in file/line references.

Review Focus

Prompt injection: untrusted text can override system intent, hide instructions, change tool use, or leak hidden context.
RAG data leakage: retrieval lacks user/tenant/role filters, sends excessive context, exposes raw source documents, or indexes sensitive data without lifecycle controls.
Tool and function calling: model-selected tools can read, write, execute, browse, mutate state, call internal services, or access secrets without allowlists, schema validation, authorization, dry-run modes, or approval gates.
Model-output trust: LLM output is treated as executable code, SQL, HTML, policy, entitlement, identity proof, medical/financial decisioning, or compliance evidence without deterministic validation.
Sensitive data handling: prompts, completions, embeddings, traces, analytics, eval datasets, cache entries, and audit logs contain PHI, PII, credentials, access tokens, or internal confidential data.
Agent control flow: loops lack budgets, termination checks, scoped memory, replay protection, tenant isolation, or clear human escalation.
Provider and deployment config: external model endpoints, data retention settings, regional controls, model version drift, and fallback providers are not documented or governed.
Tests and evals: no regression tests for prompt injection, retrieval isolation, tool authorization, refusal boundaries, or sensitive-data leakage.

Severity Guide

Critical: likely cross-tenant data exposure, credential exfiltration, arbitrary code execution, unauthorized mutation, or PHI/PII disclosure to an unapproved system.
High: exploitable prompt/tool/RAG bypass with realistic attacker control, missing authorization on AI-mediated actions, or persistent sensitive-data leakage.
Medium: weak guardrails, incomplete validation, excessive context sharing, missing tests, or risky logging with limited exposure.
Low: hardening gaps, documentation gaps, observability gaps, or defense-in-depth improvements.

Output

Lead with findings first, ordered by severity.
Include file and line references when possible.
For each finding, explain the attack path, impact, and concrete remediation.
Separate confirmed issues from "needs verification" items.
If no issues are found, say so clearly and list meaningful residual risks or test gaps.

Reference Files

references/checklist.md - Detailed review checklist and grep targets.
references/rag-security.md - RAG and vector-store security review guide.
references/tool-calling-security.md - Tool/function/agent/MCP security review guide.
references/report-template.md - Finding and no-finding report shapes.

Related Assets

OTC Awesome LLM Catalog Assistant

experimental

Use OTC AWESOME LLM tools to list, search, and download OTC assets (skills/instructions/prompts/agents) into this repo.

Owner: platform-devops

Wall-E RAG Tuning Helper

experimental

Recommend RAG chunking, embedding, and retrieval parameters for Wall-E contexts based on corpus characteristics and performance requirements.

Owner: epic-platform-sre

Wall-E Workflow Designer (Optum)

experimental

Assist with designing, reviewing, and optimizing multi-agent Wall-E workflows and MCP integrations following Optum enterprise patterns.

Owner: epic-platform-sre

Kratos Memory System Usage Guide

active

Guidelines for using Kratos MCP memory system for project-isolated persistent memory storage, retrieval, and full-text search in AI coding workflows.

Owner: epic-platform-sre

MCP Server Development Standards (Optum)

experimental

Standards, patterns, and guardrails for building Model Context Protocol (MCP) servers compatible with Wall-E, VS Code Copilot, and enterprise systems.

Owner: epic-platform-sre