TrendNCart

close

Strategies for creation of data reserve and stress testing of medical AI products

[ad_1]

Evolution of the stress tests

While MAIs are constantly improving, stress tests and the data reserve should also evolve. The stress test administrators should maintain a record on the test results, especially tracking the cases that are incorrectly analysed by MAIs. In addition, for administrators to double-check the ground truth of such cases, it is necessary to keep and mark those cases, probably classifying them into different difficulty levels, depending on how many incorrect cases by the MAIs. Then in future tests, challenging cases at each difficulty levels should be included, although randomly, in the test sets.

The size of stress test data should be sufficiently large for two reasons. The first reason is to have enough statistical power for assessing the improvement in an MAI, for example, establishing statistical significance on whether a newer MAI is indeed better than its old version. The second reason is to avoid the contents of the tests being deduced by the test takers, either by MAIs or their developers. Data in the reserve should be periodically examined and updated as some data may be retired, and new data are added. Retired data, however, should not be made available to developers and researchers as access to even a portion of the data reserve may help an AI model test better in the future, but that improvement in performance is not trustworthy.

As MAIs are either of general-purpose or tailor-made for a clinical specialty, their stress tests will be different. For a general-purpose MAI, the tests should consist of as many clinical conditions as possible, yet it is important to avoid bias in the distribution of different conditions. For an MAI targeting a clinical specialty, it is important for its tests to include not only the disease of interest, but also healthy controls, differential diagnosis patients of other conditions and patients with the targeted disease who also have comorbidities.

Being highly non-linear, an MAI may experience substantial changes in its performance with adjustment of design, fine-tuning of hyperparameters and addition of training data. Under stress tests, it is important that the performance of an MAI after upgrades is comprehensively assessed. This includes, but not limited to, achieving an equal or better sensitivity, specificity and test-retest reliability.

Stress tests also evolve with the stakeholder landscape, which changes over time. Many factors influence the stakeholder landscape, including but not limited to the legal framework for MAI adoption, regulation on safety of MAI products, data privacy, liability and reimbursement policies. Ethical guidance is another important factor influencing the stakeholder landscape as it addresses bias, fairness, transparency and accountability in MAI development and use. When such factors change, it is likely that our requirements and design for sound stress tests need to change in response.

[ad_2]

Source link

Leave a Comment

Your email address will not be published. Required fields are marked *