Define performance targets
Why does this matter?
It is important to identify and communicate the appropriate measures of success for an integration to align developers to the user and organizational goals. However, determining what success looks like and how to measure it on a daily basis can be challenging. To overcome this, setting out measures of how well the integration is going can help identify potential problems early and enable transparent and objective communication with stakeholders. These measures provide a relative sense of improvement or deterioration, but it’s also important to set targets to establish an absolute sense of how well the integration progresses.
How to do this?
At this stage of the process, organizational priorities in need of a solution ( identifying and prioritizing a problem ) and the needs of end-users who will interact with the solution (defining AI product specifications) have been identified and defined, along with the intended use of the AI solution (review guide on define the role of AI). The next step is to break down those needs into specific, testable metrics that can determine the success or failure of the AI product integration. These metrics will help developers prioritize their efforts and provide stakeholders with a clear understanding of the integration’s progress and status. To ensure an unbiased assessment of the solution’s effectiveness, the success of the AI solutionThe combination of the AI product and its use in healthcare delivery setting (including user experience and workflow of use) should be compared to the safety and quality of clinical care before integration (pre-integrationThe phases of AI product development that come before it is rolled out for use in the healthcare delivery workflow and getting embedded into the technical /IT infrastructure.). Additionally, these metrics will serve as thresholds Threshold is a value that separates data into two groups or classes, based on some criterion. for determining when to suspend, review, or decommission the AI product (review guide on determining if updating or decommissioning is necessary) post-integration.
Step 1: Identify specific and testable metrics
- Analyze the pre-identified organizational and end-user needs (review key decision points on identifying and prioritizing a problem and defining AI product specifications), and categorize them by likeness, such as “model performance,” “software performance,” “security,” etc.
- The ISO 25010 standard and/or ISO 62304 standard for medical devices could be used to categorize systems, software requirements, and evaluations and can aid in identifying standard testing activities.
- Here are some example requirement categories to identify performance metrics and testing methods:
Requirements | Definition | Example Metrics | Testing Methods |
---|---|---|---|
Model Performance | Effectiveness, accuracy, and reliability of the AI model or algorithm in fulfilling its intended tasks within the clinical or healthcare context. | Sensitivity (recall, true positive rate), Specificity (true negative rate), Area Under the ROC Curve (AUC-ROC), F1 Score, Precision (positive predictive value). | Model validation, cross-validation, holdout testing, confusion matrixA confusion matrix is a table used to evaluate the performance and accuracy of a classification model in predicting a binary outcome by comparing the predicted and actual labels of a dataset by counting the number of true positives, true negatives, false positives, and false negatives. analysis, bias testing, explainability testing. |
Software Performance | Efficiency and responsiveness of processing tasks, delivering results, and overall performance of the software components and its interactions. | Inference time, throughput, model latency, response time, resource utilization, scalability. | Unit testing, integration testing, software/system testing, performance testing, load testing, stress testing. |
Clinical | Agreement with expected clinical and healthcare outcomes. | Alignment to clinical goals and impact on patient outcomes. | User acceptance testing, clinical outcome studies, user feedback analysis. |
Usability | Quality of users’ interactions with the AI-based medical software. | Clinician satisfaction, user error rates, ease of use. | Usability testing, A/B testing, user experience (UX) evaluation, focus groups. |
Interoperability | Compatibility of the AI-based medical software with existing healthcare systems and infrastructure. | Compatibility with electronic health record (EHR) systems, integration with imaging modalities. | Integration testing, interface testing. |
Compliance & Regulation | Compliance of the AI-based medical software with applicable healthcare regulations and standards. | Compliance with HIPAA, FDA regulatory requirements, GDPR (if applicable). | Regulatory compliance audit, privacy impact assessment. |
Data QualityThe Degree To Which Data Is Accurate, Complete, Consistent, And Relevant To The Intended Use. | Accuracy and relevance of data used by the AI-based medical software. | Representativeness of training data, data accuracy, data completeness. | Data quality assessment, data validation testing. |
Safety and Security | Safely and securely operating software, evaluating harm to patients and protection against unauthorized access, data breaches, and cyber threats. | Number of identified safety risks and mitigations, adherence to cybersecurity standards**, detection of adversarial attacks, incident response time. | Safety risk assessment, hazard analysis, cybersecurity testing, incident response simulation. |
Environmental & Operational | Factors necessary for IT to operationalize and deploy AI-based medical software. | IT capacity, hardware compatibility, infrastructure readiness. | IT environment assessment, operational readiness testing. |
Business | Business objectives and outcomes. | Reduction in diagnostic time, cost savings. | Business outcome analysis, return on investment (ROI)Return on investment is the ratio of the net profit (or loss) from an investment to its cost expressed as a percentage that allows comparison of different investment choices. analysis. |
Step 2: Prioritize the metrics
- Rank metrics based on their potential impact on the project’s success, typically starting with understanding user satisfaction-related requirements.
Step 3: Communicate the metrics
- Share and refine metrics with all stakeholders, including solution developers, to ensure clear and transparent communication and alignment on successful integration.
- Reach an agreement on reporting intervals throughout the integration process.
Some strategic considerations may change the complexity of the performance definitions. For example, some of these metrics may be more amenable to a quick reporting frequency, while others may have to take place as part of an audit. Refer to audit AI solutions and work environment for steps on how to conduct an audit.
“The false negative rate, in particular, is important. I think people get very concerned that it doesn’t have certain statistical properties, but in practice, it really closely measures the lived reality of patients that you’re interested in. When you’re thinking about equity, it’s like, who needs something, and who isn’t getting it.”
Bias Key Informant