AI Change, Drift & Incident Review Questionnaire

Who this questionnaire is for
Operational teams, AI platform owners, incident response leads, SREs, and governance teams reviewing live systems or post-incident behaviour.

What it assesses
Operational risk in deployed AI systems — including drift detection, incident handling, rollback authority, escalation paths, evidence retention, and post-incident learning.

How it helps
This questionnaire functions as a live operational review tool. It supports decisions to continue, pause, rollback, or re-approve AI systems after changes, incidents, or observed drift. Outputs are designed to be recorded, shared, and referenced as part of operational governance.

Best used when

  • Reviewing incidents or near-misses
  • Assessing readiness after model or data changes
  • Determining whether systems can safely remain in production

AI Change, Drift & Incident Review

Scores automatically as you click. The top fields are optional — only the questions affect scoring.

Status: Not scored Coverage: 0% Score: Decision:

Optional fields: these improve your governance record, but are not required to score.

1) Change Classification & Re-approval Triggers

Is there a clear classification of the change (minor vs material), with a documented decision and owner?

Change gating
Material changes should trigger re-evaluation and possibly re-approval.

Are re-approval triggers defined (and actually used) for the system?

Release control
Examples: new model, new retrieval domains, new tools/actions, new user groups.

2) Drift Monitoring & Post-Deployment Signals

Do you measure drift with concrete signals (quality, safety, grounding, latency/cost) on an ongoing basis?

Monitoring
Drift includes retrieval/policy/prompt drift and user behaviour drift — not only model drift.

Are thresholds defined for “stop / rollback / investigate” and connected to operational actions?

Actionability
Monitoring without action thresholds is not governance.

3) Incident Response & Escalation

Is there an incident playbook specific to this AI system, including escalation paths and authority?

IR
Include who can pause the system and how evidence is preserved.

After incidents/near-misses, do you complete a post-incident review with tracked follow-through?

Learning loop
A governance system that cannot learn from incidents will repeat them.

4) Evidence, Logging & Auditability

Can you reconstruct what the system did for a given event (inputs, context, outputs, tool actions)?

Evidence
Logging should be privacy-aware and access controlled.

If the system uses retrieval (RAG), do you capture retrieval evidence (sources, top-k, scores) per response?

RAG
Governed RAG requires evidence of what was retrieved.

5) Access, Permissions & Operational Boundaries

Are access permissions and tool/action boundaries documented and enforced (least privilege)?

Permissions
Over-permissioning is a common governance failure.

Are operational limits documented (rate limits, cost caps, latency SLOs) with escalation when exceeded?

Ops limits
Operational limits prevent silent failure modes.

6) Decision & Required Actions

Can you justify “continue / monitor / re-approve / pause / rollback” with documented owners?

Decision
A governance review should end in an explicit decision and ownership.

Are remediation actions tracked (ticketed) and verified before the next major release/change?

Follow-through
Governance fails when remediation is not executed.