Risk Assessment Worksheets
Seven cloud-agnostic workbooks that take a healthcare AI system through 84 structured questions
Overview
What these worksheets cover
The set spans seven pillars of healthcare AI risk: clinical safety and efficacy, data privacy and compliance, governance and accountability, operational resilience, security and threat protection, fairness and equity, and transparency and explainability. Each pillar is a standalone Excel workbook containing 12 questions, 84 in total, and is built for clinical and regulatory reality.
Every question states why it matters, rates the inherent risk if it goes unaddressed, and gives best-practice guidance, so a reviewer can tell what a strong control looks like instead of guessing. Behind each answer sits a formula: Yes, No, or Partial is combined with that inherent risk to produce a risk level, a score from 0 to 3, and a recommendation. A High-risk control left unaddressed is flagged for governance escalation within 30 days; a control that is fully in place scores zero and simply needs periodic verification.
Inside each file
Risk Assessment
Twelve questions, each with why it matters, the inherent risk if left unaddressed, and best-practice guidance. You record Yes, No, or Partial, plus evidence and a remediation plan.
Scoring Guide
Defines the answer and risk levels and rolls your responses into an overall posture band: Strong, Acceptable, Moderate, Elevated, or Critical.
NIST AI RMF Mapping
Traces every question to specific NIST AI RMF 1.0 subcategories, so framework alignment is documented rather than asserted.
Dashboard
Aggregates the question-level risk scores into a per-pillar summary you can bring to a governance review.
Seven Pillars
The worksheets
Run a full pass across all seven before a first deployment. Each file stands on its own, so you can also pull a single pillar into a focused review.
Clinical Safety & Efficacy
CS-01 to CS-12Checks that a model meets clinical safety standards and keeps producing validated outcomes once it is in front of patients. Questions press on whether the system was validated against ground truth representative of your actual population, whether the distinction between advisory and autonomous use is written down, and whether clinicians ever lose the ability to override a recommendation.
- Validation against representative clinical ground truth
- Advisory vs autonomous decision authority
- Confidence scores and uncertainty shown to clinicians
- Human override on every recommendation
- Edge-case testing for rare and underrepresented cohorts
- Outcome drift monitoring and decommissioning criteria
Data Privacy & Compliance
DP-01 to DP-12Walks the full lifecycle of protected health information, from how it is encrypted and classified to how it leaves the building. It asks whether keys are organization-managed rather than provider-default, whether de-identification happens before data reaches a training pipeline, and whether model outputs that point to an identifiable patient are themselves treated as PHI.
- Encryption at rest with organization-managed keys
- TLS 1.2+ between every AI component
- De-identification before model training
- Business Associate Agreements with all PHI processors
- Data residency and retention enforcement
- Patient access, correction, and deletion handling
AI Governance & Accountability
AG-01 to AG-12Looks at who actually owns AI decisions inside the organization. It covers whether a body with real authority signs off on deployments, whether a model registry tracks version, owner, and approval history, and whether retraining runs through change management instead of landing in production unannounced. Patient notification and a working escalation path are part of the same picture.
- Governance committee with deployment authority
- Mandatory pre-production risk assessment
- Model registry with version and approval history
- Defined roles across the model lifecycle
- Change management for updates and retraining
- Escalation path for raised concerns
Operational Resilience
OR-01 to OR-12Tests whether clinical work keeps moving when the AI does not. It asks for an availability target tied to the clinical workflow, documented manual fallback for when the system is down, and an architecture without a single point of failure. Deployment safety shows up here too, through canary or blue-green rollouts and an automated rollback when a new model drops below threshold.
- Availability SLA/SLO aligned to clinical need
- Documented manual fallback procedures
- No single point of failure in the architecture
- Health, latency, and error-rate alerting
- Tested disaster recovery for models and data
- Automated rollback on performance degradation
Security & Threat Protection
ST-01 to ST-12Focuses on the attack surface that traditional security reviews tend to miss. It covers adversarial testing for evasion, poisoning, and model inversion, prompt injection and jailbreak defense for LLM-based systems, segmentation between training, inference, and clinical data, and verification that the model supply chain has not been tampered with before anything ships.
- Authentication, authorization, and rate limiting on endpoints
- Adversarial testing for evasion, poisoning, inversion
- Prompt injection and jailbreak defense
- Segmentation of training, inference, and data stores
- Image and dependency vulnerability scanning
- Supply chain integrity verification
Fairness & Equity
FE-01 to FE-12Examines whether the system performs evenly across the people it serves and whether it could quietly widen existing disparities. It asks for performance broken out by race, ethnicity, gender, and age, checks for proxy discrimination where correlated features stand in for protected attributes, and looks for the fairness metrics your clinical stakeholders actually agreed to, such as equalized odds or calibration.
- Performance disparity testing across demographics
- Training data representative of the served population
- Proxy discrimination evaluation
- Accessibility for patients with disabilities
- Language diversity in the patient population
- Agreed fairness metrics and emergent-bias monitoring
Transparency & Explainability
TE-01 to TE-12Asks whether the people relying on the system can actually understand and trace it. It covers explanations clinicians can act on, a complete audit trail from input through inference to output, and honest documentation of purpose and failure modes. It also reaches into third-party and foundation model risk, staff training, and the energy footprint of training and inference.
- Decision logic explainable to clinicians
- End-to-end auditable inference trail
- Documented purpose, scope, and failure modes
- Third-party and foundation model monitoring
- Staff training on capabilities and limits
- Environmental impact assessment
Workflow
Running an assessment
Answer each question with your team
Sit clinical, technical, compliance, and security people at the same table. Mark each question Yes, No, or Partial, and capture the evidence behind the answer.
Let the workbook score it
Each response computes a risk level and a score from 0 to 3, weighted by the inherent risk of the question. A High-risk gap answered No surfaces as a critical item; a strong control scores zero.
Read your posture band
The Scoring Guide turns the counts into a posture: any High-risk gaps push you into Elevated or Critical, while zero High and zero Medium reads as Strong.
Remediate on the stated clock
High items carry a 30-day clock and a governance escalation, Medium items 90 days in the risk register, and Low items a six-month improvement window.
Reassess on cadence
Run a full pass before first deployment, review open High and Medium items quarterly, reassess annually, and trigger a fresh review after any incident or regulatory change.
Regulatory Coverage
What the questions align to
The NIST mapping sheet is the primary traceability mechanism. The question content also reflects requirements drawn from a wider set of healthcare and AI governance regimes, so a single assessment supports several compliance conversations at once.