Stand up an AI-specific incident response capability in 90 days

Problem

Generic incident response misses AI-specific failure modes; standing up a parallel program is wasteful and slow.

Outcome

A documented AI-IR capability that reuses your existing alerting, on-call, and runbook tooling, with a 90-day tabletop completed.

This playbook is for security and engineering leaders who need an AI-specific incident-response capability stood up quickly. It assumes you already have a generic incident-response process and an on-call rotation; it adds the AI-specific wiring on top.

Six steps

  1. Inventory your AI surface

    List every AI-mediated workflow currently in production, including shadow AI. For each, record the model, the data classification flowing through it, the human review point, and the on-call owner. If the inventory has fewer than three rows, look harder.

  2. Define AI-specific failure modes

    Beyond outage and latency, document detection signals for hallucination, prompt-injection echoes, refusals masquerading as answers, model drift, and tool-call abuse.

  3. Wire detection into existing alerting

    Reuse your SIEM / observability platform. Do not stand up a parallel stack. Tag AI events distinctly so cross-team queries are possible.

  4. Stand up the AI runbook

    Draft a runbook covering: containment (disable the integration, swap to a fallback, drop to human-only), evidence preservation (capture prompts and responses), and notification (legal, comms, the model vendor).

  5. Run a tabletop within 90 days

    Use a real recent prompt-injection or hallucination event from public reporting. Time-box to 90 minutes. Capture the gaps and convert them to backlog items with owners.

  6. Set the recurring review

    AI infrastructure changes faster than IR documentation. Calendar a 90-day re-review. The first version is wrong; the second version is useful.