Comprehensive AI Failure Mode Index

BRITE Institute is building a structured database of healthcare AI failure modes. The database will translate evidence about known and potential failures into practical tools that help AI developers build safer products and help healthcare organizations understand, evaluate, and manage the risks of adopting them.

What is this project about?

Artificial intelligence can fail in healthcare in many ways. A system may omit a critical allergy, fabricate a clinical fact, fail to recognize an urgent condition, perform poorly for a particular patient population, or present uncertain information with unjustified confidence. Harm may also arise even when the underlying model works as designed—for example, when an AI product is poorly integrated into clinical workflows or clinicians rely too heavily on its recommendations.

Information about these risks is currently scattered across academic studies, regulatory reports, safety evaluations, incident reports, technical documentation, and individual organizations. There is no widely used system that translates this evidence into practical, product-specific safety guidance.

BRITE Institute is developing a structured, searchable database that documents:

  • The nature of each failure mode
  • The conditions under which it may occur
  • Its technical, human, workflow, and organizational causes
  • The patients or clinical settings most likely to be affected
  • The potential severity and detectability of harm
  • Technical measures that can prevent or identify the failure
  • Human and organizational safeguards
  • Relevant testing and monitoring methods

The database will support two primary goals.

First, it will help AI developers generate customized, curated safety checklists for specific products. Rather than relying on a generic list of AI risks, developers will be able to identify the failure modes most relevant to their product’s intended use, model architecture, data sources, users, patient populations, and clinical environment.

Second, it will help hospital procurement teams, clinicians, and healthcare leaders understand and plan for the risks of acquiring and implementing AI products. Users will be able to identify relevant risks, ask vendors more precise questions, evaluate proposed safeguards, and develop implementation plans that protect patients throughout the product lifecycle.

Sample Failure-Mode Entries

The final database will contain substantially more detail. The table below demonstrates the types of information that may be included.

```html

Sample Healthcare AI Failure-Mode Database

The table below illustrates how the database will connect healthcare AI failure modes with their causes, visible warning signs, patient consequences, and potential safeguards.

Sample healthcare AI failure modes, affected products, causes, consequences, safeguards, and severity scores.
Failure Mode What the Error Looks Like to the End User Healthcare AI Most Affected Example Causes Potential Consequences Technical Prevention and Detection Human and Organizational Prevention Severity
Critical information omitted from a patient-record summary The summary appears complete and professionally written, but excludes an allergy, medication, abnormal result, diagnosis, or other clinically important fact contained in the source record. Patient-record summarizers; clinical documentation tools; RAG systems; chart-review assistants; handoff tools Context-window limits; failed document retrieval; poor parsing; incorrect prioritization; fragmented records; missing source data Contraindicated treatment, medication error, delayed diagnosis, or failure to act on an urgent finding Required-field checks; source-to-summary completeness testing; retrieval validation; omission-detection models; alerts for missing records; direct links to supporting source text Require verification of high-risk information; train users not to treat summaries as complete records; establish procedures for incomplete inputs; conduct periodic omission audits 9/10
Fabricated clinical fact or unsupported recommendation The system confidently presents a diagnosis, test result, citation, treatment, or patient detail that is not supported by the available evidence. Generative clinical assistants; medical chatbots; documentation tools; RAG systems; clinical decision-support tools Model hallucination; weak grounding; ambiguous prompts; retrieval failure; overgeneralization; inadequate output constraints Incorrect diagnosis, inappropriate treatment, unnecessary testing, medication error, or delayed care Retrieval-grounded generation; citation requirements; constrained output formats; factual-consistency checks; confidence calibration; refusal when evidence is insufficient Require clinician review before action; train users to verify claims; prohibit autonomous use for specified high-risk decisions; provide clear reporting pathways 9/10
Failure to recognize an urgent or deteriorating condition The system gives a routine recommendation, low-risk score, or reassuring summary despite evidence that the patient may require immediate escalation. Triage systems; early-warning systems; diagnostic models; emergency-department decision support; remote-monitoring tools Incomplete or delayed inputs; poor sensitivity; threshold errors; distribution shift; missing vital signs; inadequate validation of rare conditions Delayed escalation, delayed treatment, permanent injury, or death High-sensitivity testing; escalation rules; real-time data-quality checks; false-negative monitoring; out-of-distribution detection; conservative thresholds for high-risk conditions Preserve independent clinical escalation pathways; prohibit use as the sole screening method; train staff to override the system; audit missed and delayed escalations 10/10
Unequal performance across patient populations The product appears reliable overall but gives less accurate, less complete, or less appropriate outputs for particular demographic, linguistic, disability, or clinical groups. Diagnostic imaging AI; predictive models; risk scores; speech and language tools; dermatology AI; triage systems Unrepresentative training data; biased labels; proxy variables; small subgroup samples; insufficient subgroup validation Misdiagnosis, delayed treatment, inappropriate risk classification, or unequal access to care Stratified validation; subgroup performance thresholds; dataset audits; fairness monitoring; recalibration; post-deployment outcome analysis Include diverse clinical reviewers; restrict use in insufficiently validated populations; monitor outcomes by subgroup; establish escalation and reporting procedures 9/10
Automation bias or excessive reliance on an AI recommendation A clinician follows an AI recommendation even when the patient’s condition, source data, or other clinical evidence suggests that the output may be incorrect. Clinical decision support; diagnostic systems; triage tools; treatment-recommendation systems; predictive alerts; generative assistants Overconfident language; authoritative interface design; poor uncertainty communication; alert fatigue; inadequate training; unclear accountability Failure to correct an AI error, delayed intervention, inappropriate treatment, or loss of independent clinical judgment Calibrated confidence displays; visible supporting evidence; disagreement alerts; friction for high-risk actions; alternative recommendations; uncertainty warnings Train users on limitations and override procedures; require independent review for high-risk decisions; define accountability; conduct competency assessments and case reviews 8/10

Severity scale: 1 represents negligible potential patient impact; 10 represents a failure capable of causing death or catastrophic patient harm. Scores represent the plausible severity of the outcome, not the probability that the failure will occur. Actual severity will depend on the clinical setting, patient population, likelihood of detection, and available safeguards.

```

Severity will depend on the product, clinical context, patient population, likelihood of detection, and availability of safeguards. The database will therefore distinguish between the existence of a failure mode and the level of risk it creates in a particular implementation.

Why is this important?

Healthcare organizations are adopting AI products under significant pressure to improve efficiency, reduce administrative burden, and address workforce shortages. However, procurement teams and clinicians may lack a systematic way to determine how an AI product could fail, which patients could be harmed, and whether the vendor’s safeguards are adequate.

This creates risks for patients. A failure that appears minor in a demonstration may become dangerous when the product is used repeatedly, connected to incomplete data, deployed in a high-pressure environment, or relied upon by clinicians who do not understand its limitations. Because AI systems can operate at scale, a single design or implementation weakness may affect hundreds or thousands of patients.

The database is intended to make patient protection a routine part of both product development and healthcare procurement. It will help organizations move beyond broad questions such as “Is this AI accurate?” and instead ask:

  • What specific failures are possible?
  • Which patients are most vulnerable?
  • How severe could the resulting harm be?
  • How was the product tested for these failures?
  • What safeguards must be in place before deployment?
  • How will failures and near misses be detected after implementation?
  • What responsibilities remain with clinicians and healthcare organizations?

A structured failure-mode database may also make procurement faster and more efficient. Procurement teams often spend substantial time independently identifying risks, developing vendor questions, seeking internal expertise, and determining appropriate implementation controls. A curated framework can reduce duplication, standardize reviews, and help teams focus rapidly on the risks most relevant to a particular product.

Faster procurement should not mean weaker scrutiny. By giving purchasers a clearer and more consistent evaluation process, the database can help healthcare organizations make well-informed decisions more quickly while maintaining patient safety as the central criterion.

For developers, identifying relevant failure modes early can reduce costly redesign, prevent recurrent safety problems, and clarify testing requirements before a product reaches patients. For hospitals, early identification of implementation risks can prevent unsafe deployment, unexpected workflow disruption, and reliance on controls that are inadequate in real clinical settings.

Where can it be applied?

1. AI Development

Developers will be able to generate customized safety checklists based on a product’s purpose, users, data, clinical setting, and level of autonomy. These checklists can guide requirements development, model testing, interface design, documentation, deployment planning, and post-market monitoring.

2. Healthcare Procurement and Vendor Evaluation

Hospital procurement teams, clinicians, information-technology leaders, and safety officers can use the database to identify product-specific risks, develop vendor questions, compare products, evaluate safety claims, and determine which controls must be established before implementation.

3. Clinical Training and Education

Failure-mode examples can be converted into educational materials that teach clinicians how AI systems may fail, how to recognize unreliable outputs, when independent verification is required, and how to report suspected failures or near misses.

4. Safety Benchmarks

The database can support standardized benchmarks that test whether AI products are robust against known healthcare failure modes. These benchmarks may evaluate factors such as completeness, factual accuracy, uncertainty communication, subgroup performance, clinical prioritization, and resistance to misleading or incomplete inputs.

5. Red-Teaming

In a subsequent project, BRITE Institute will use the database to inform a healthcare AI red-teaming system. Documented failure modes will be translated into test scenarios that deliberately challenge AI products, record their responses, and score the effectiveness of their safeguards.

6. Incident Investigation

Healthcare organizations can compare incidents and near misses with known failure patterns. This can help investigators identify technical, clinical, human, workflow, and organizational causes and select interventions that address the underlying problem rather than only the immediate error.

What are this results?

This project is currently in development.

Research You Can Rely On

Did you know that many research findings are manipulated—or even outright false? Some estimates suggest that up to 90% of published research may be unreliable. Meanwhile, more than $167 billion in taxpayer money is spent annually on research and development.

At BRITE Institute, we believe research should do more than just look credible. It should be credible. That’s why we go above and beyond typical standards with rigorous practices that ensure honesty, transparency, and accuracy at every step. Below are just some of the ways we safeguard the integrity of our work:

Get the latest updates  in your inbox

Thanks for joining our newsletter.
Oops! Something went wrong.
FaqS

Frequently Asked Questions

Cras tincidunt lobortis feugiat vivamus at morbi leo urna molestie atole elementum eu facilisis faucibus interdum posuere.

What does BRITE Institute do?

BRITE Institute is a research and development nonprofit organization dedicated to advancing the science of risk.  We conduct both basic and applied research.  We also develop tools and technologies to improve risk management. Id sed montes.

Is BRITE Institute a 501(c)(3) organization?

Yes, BRITE Institute is proud to be recognized as a 501(c)(3) nonprofit organization. All donations to BRITE Institute are tax deductible.

What kind of research does BRITE Institute do?

Our research includes basic studies for understanding complex system risks and applied studies for developing effective risk management technologies.

Why should we trust BRITE Institute?

As a public charity, we believe we need to go above and beyond to earn and keep your trust. We have adopted a four pillar framework which goes far above and beyond what is required by law.  Our four pillars of integrity are: independent audits, transparency, expert oversight, and compliance These pillars guide our operations and are central to maintaining the highest standards of integrity and effectiveness in our work. You can read more about our governance here.

How can I donate to BRITE Institute?

Donations are vital to our mission and operations. To support us financially, you can visit our website's donation page. Your contribution is greatly appreciated, and we take our responsibility to spend funds wisely seriously!

Is there a way I can support BRITE Institute if I cannot afford to make a donation?

There are many ways to support the BRITE Institute including volunteering, supporting our social media, and more. Visit our support page to learn more!

How can I contact BRITE Institute?

We welcome your queries and interest. You can reach out to us via email at info@briteinstitute.org or through our website's contact page.

Where are you located?

BRITE Institute's headquarters is in Arizona, but we are a remote team with team members across the USA and the world. You can find more detailed information about our operations here and state specific donation disclosures here.