Audit AI solutions and work environment

Why does this matter?

Audits are independent evaluations of model performance. They enable the investigation of how well the deployed system’s behavior matches articulated performance expectations. Internal quality assurance and monitoring are done by internal actors as part of a quality control check. Auditing has a slightly different purpose, it serves as an independent validation of the model for accountability reasons. Even when done by internal teams, it’s important for such teams to operate completely separately from the engineering team that built the AI tools to provide independent oversight on the built solution. 

Using the audit process requires defined performance metrics upstream. Typically, audits are conducted by an internal team independent of the engineering development team or an external team hired and given access to the system to conduct their independent evaluations. In order for audits to be meaningful, there should be some clear, articulated standards of performance – the audit process should both allow for this articulation of expectations and the measurement of the reality of the scenario in comparison to these expectations. 

Audits help build real processes of accountability. There are many instances where performance issues are not visible to the development team (e.g., a local deployment on a population that differs in unexpected ways from the test population), and so it’s important for audits to be done to reveal unexpected failures and enforce some external or internal oversight on deployments. Examining performance and outcomes in depth requires too many resources to be done continuously but must be done at some frequency, at least for meaningful reassurance that the tool is still delivering the presumed benefits. AI tools (see guide monitor AI performance) and the context in which they are deployed (see guide monitor work environment) influence each other over time, so a breadth of expertise and data must be assembled to perform effective audits.

How to do this?

Step 1: Make sure the AI system is auditable 

  • Before anything can happen, the system needs to be auditable in the first place. 
  • Model developers need to ensure adequate documentation about the system and that the code, data, and other key details of the system can be made directly available to known auditors. Note that full source code may not need to be made available to conduct an audit. At a minimum, auditors need to be able to run the AI product on command under various conditions and with varying sets of inputs.
  • Key stakeholders, including clinicians and technical experts involved in AI product development and integration, should also make themselves available to be interviewed by auditors.
  • For third-party vendor products, queries into the auditability of the product by an internal or hired team of auditors should be considered as part of the procurement process and set as a condition for further consideration in the procurement process.  

Step 2: Identifying independent auditing services 

  • Identify an external partner, or if it is an internal team, it should be an independent team from the engineering within the organization.
  • Given the proprietary nature of many AI products and the commercial value of these products, strict confidentiality agreements should be put in place to restrict the use of information gathered during the audit by the auditing service. The auditing service should not be commercializing its own AI product that competes with the system being audited.
  • The auditing team should have access to the organizational leadership to understand internal expectations, principle statements, or guidelines and should be able to communicate findings across settings and stakeholders. 
  • There should also be some familiarity with external compliance expectations, regulations, and any pre-existing standards, such as ISO 13485, that the product is expected to adhere to. 
  • This team is preferably interdisciplinary and comfortable with a combination of potential audit methods.

Step 3: Facilitate the execution of the audit

  • If appropriate or needed, the developer team should facilitate the execution of the audit.
  • Note that audit methodology can vary – including quantitative and qualitative methods – and audits can examine performance at various points in the life cycle. 

“We are trying to look at periodic review processes where we go in and evaluate, here’s what the algorithm was supposed to do, here’s what we’re seeing it’s actually doing. Based on that, do we either need to adjust it or remove it or keep going?”

Technical Leader

Step 4: Review audit reports yielded from the process

  • Ensure that the reports are considered by the relevant organizational stakeholders in a position to take appropriate actions in response to auditor recommendations. 
  • Create a plan to map out these recommendations and track these updates over time. 

Step 5: Create a plan for follow-up assessment and audit work

  • Commit to an auditing frequency defined by the clinical use case, the AI-enabled intervention, and the associated risk and type of drifts predicted by clinical, technical, and operational experts/end-users. 
  • Set criteria for extra audits, outside of regular frequency,  based on certain unanticipated changes, for example, updated ICD codes or new imaging equipment. 
  • Refer to an audit triggering standard unaffected by internal changes in the healthcare delivery organization.  For example, audits should be triggered due to biological outcome measures, purely technical measures, or a clinical/operational outcome benchmark derived from historic or external data. This guards against missing AI performance drops concealed by improvements in other components of an AI-enabled healthcare intervention.


Liu, Xiaoxuan, et al. “The medical algorithmic audit.” The Lancet Digital Health (2022).

adopt health ai