Locally validate prior to integration
Why does this matter?
AI products are sensitive to the context in which they are developed and trained. To confirm the performance of the AI product, it is important to evaluate the product in the context where it will be integrated using the organization’s data, technical infrastructure, and end users. Without local evaluation, there is a high margin for error, and claims of performance cannot be reliably made. The evaluation process needs to allow enough time to expose the product to potential threats that exist in the integration context, which could result in a shift in the model’s performance.
How to do this?
Step 1: Establish a standard for desired performance
- Conduct a literature review.
- Reach out to prior adopters informally and gain their insights.
- Establish performance metrics for desired performance (see the guide define performance targets’ for instruction), which reflect local expectations, concerns, and needs, and use them to compare to the output of the local evaluation process.
Step 2: Conduct the validation process with a clinical champion and clinical end users
- Test the product in silent mode on prospective data. Review the paper ‘A Research Ethics Framework for the Clinical Translation of Healthcare Machine Learning.’ This requires integrating the AI product into technical infrastructure and data source systems in a fashion that does not expose frontline clinicians to AI product outputs.
- Have clinicians use the product outputs as they would in practice to identify problems and think through workflow integration. Have clinician beta-testers assess AI product accuracy and actionability:
- Assess AI product accuracy by gathering information about the following:
- Do AI product outputs provide valuable information about patient status?
- Are patients identified by the AI product appropriate to receive additional scrutiny and attention?
- Assess AI product actionability by gathering information about the following:
- Are there gaps in care for patients identified by the AI product?
- Are there clinical actions to consider for patients identified by the AI product?
- Assess AI product accuracy by gathering information about the following:
- Evaluate the product based on the performance metrics created in step 1.
“…Before we deploy a model we always run it in what we call silent mode for a while. Sometimes that is a year or two years, before we unleash it into the wild to the clinicians, and…”
Technical Leader
Step 3: Adapt workflow design
- During the clinical validation described in step 2, confirm the workflow designed during the design and test workflow for clinicians stage of AI adoption cycle.
- Specifically, revisit the following questions related to the 5 Rights of CDS with clinician beta-testers who are now familiar with the AI product outputs:
- The right person: Who in the healthcare delivery setting is best equipped to act on the information generated by the AI product?
- The right information: What information, beyond the output of the AI product, does the user need to take appropriate action?
- The right time/point in the workflow: When is the best time to present new information generated by the AI product to improve decision-making and the care pathway?
- The right context: What should the user be doing when they are presented with the new information generated by the AI product?
- The right channel: How should the user receive the new information generated by an AI product?
Step 4: Test specifically on subgroups in your organization’s patient population
- Identify subgroups of the population that are historically marginalized and face the greatest barriers to accessing high-quality healthcare.
- Test the product on underserved or at-risk groups and intersections of patient characteristics among different groups.
- Test the product on settings within the organization that are lower-resource or have limited access to centralized, specialized services.
Step 5: Do not rush the process
- Healthcare delivery organizations often take 6-12 months to validate AI products with satisfactory results prior to workflow integration. Premature workflow integration can risk stakeholder trust, not only for the product at hand but also for enterprise-wide AI projects.
- Ensure that the validation is peer-reviewed or otherwise externally verified to confirm that the appropriate methodology was covered and the results were accurate.
References
Lyons, Patrick G., et al. “Factors Associated With Variability in the Performance of a Proprietary Sepsis Prediction Model Across 9 Networked Hospitals in the US.” JAMA Internal Medicine (2023).
contribute to the topic guide
