To include or not to include? A prescription from the pharmacy on how to use active learning-assisted screening in systematic reviews
Background
Systematic reviews are essential for evidence-based decision-making, but the screening stage is often labor-intensive and susceptible to human error. Machine learning (ML) approaches, including active learning (AL), have increasingly been used to support title and abstract screening. One such approach is the SAFE procedure, which has been proposed to guide the use of AL-assisted screening in systematic reviews. However, evidence on how well this procedure performs in large, heterogeneous datasets generated by broad search strategies remains limited. This study therefore evaluates the effectiveness and reliability of AL-assisted screening with particular focus on the SAFE procedure. Specifically, it examines the comprehensiveness and necessity of the recommended SAFE procedure, assesses the influence of different labeling strategies, and investigates whether AL-assisted screening can help reduce manual screening errors.
Methods
Screening of four large, heterogeneous datasets from medication management systematic reviews was simulated using ASReview. The datasets ranged from 3475 to 16218 records. For these datasets 0.08 to 1% of records were included in the final systematic review. Our simulations systematically varied all parameters defined by the SAFE procedure. Recall versus sampling behavior was analyzed, with a focus on the impact of parameter choices on retrieving records selected for full text inclusions and on reducing the number of records to be screened.
Results
AL-assisted screening can effectively reduce the number of records to screen by almost 90% without increasing the risk of missing relevant records in comparison to manual screening. For three of the four datasets, the best performance was achieved with the SAFE procedure combined with the elas-u4 and elas-h3 models and full-text labeling. Under these conditions, ASReview identified all studies included after full-text review and reduced the screening workload by 89–90%. In practical terms, this means that screening only 10–11% of the original records was sufficient to identify all final included studies in these datasets. This parameter combination identified 87% of the studies ultimately included after full-text review in the remaining dataset (16,218 records; 0.6% included at title/abstract screening and 0.08% included after full-text review). For this dataset, the best performance, identifying all studies included after full-text review while reducing the screening workload by 90%, was achieved when using the SAFE procedure with the simpler Naive Bayes model, the TF-IDF feature extractor, and title/abstract labeling.
Conclusions
AL-assisted screening can safely and effectively reduce the workload needed to screen the large, heterogeneous datasets common in medication management systematic reviews. We recommend the modified SAFE procedure using full-text labels and the elas models. If the estimated ratio of full text includes is very low, it may be more appropriate to use the original SAFE procedure with title/abstract labeling.
Share this page