Model performance
We use several key metrics to evaluate model performance:
Accuracy: the percentage of patients the model predicted correctly.
False positives: the percentage of patients predicted to become compliant, but ultimately were non-compliant.
False negatives: the percentage of patients predicted to become non-compliant, but ultimately achieved compliance.
Area under the curve (AUC): the AUC (also called AUROC, Area under the receiver operating characteristics) is a powerful measure of the accuracy of Machine-Learning models; the higher the AUC score, the better the model is performing. A classifier that randomly picks a patient to be non-compliant or compliant would have an AUC of 0.5, while a classifier that has perfectly accurate predictions would have an AUC of 1.0.
Brier score: like the AUC, the Brier score is a metric that measures the accuracy of probabilities. When applied to any Machine-Learning model, the lower a model’s Brier score, the more accurate its predictions.
By analyzing these metrics, our developers improve the accuracy of our 90-Day model.* As we do not provide class labels to HMEs, only probabilities, the Brier score below is the most important metric since it directly measures the accuracy of the probabilities.
The following table displays comparable model performance on the validation dataset and test dataset, which suggests that the model makes helpful predictions when it sifts through new patient data. In addition to the evaluations in the table, we monitor the model’s performance to ensure we achieve the intended outcomes.
Metric | Validation dataset | Test dataset |
|---|---|---|
Accuracy | 86.93% | 86.85% |
False positives | 6.06% | 6.12% |
False negatives | 7.01% | 7.03% |
AUC: [0, 1] | 0.9463 | 0.9458 |
Brier score: [0, 1] | 0.092 | 0.093 |
Through validation and testing, we noted no significant changes in model performance across age, gender, or HME. Demographic-related information such as ethnicity and disease/condition are not available in our dataset used to train and evaluate the model.
Note: As described in the model output section, it’s important to understand that model performance varies by days since setup, and our model performs best towards the end of the 90-day window. Our dataset highlights that patients who reach compliance will most often achieve compliance early.
Model Limitations
As the table indicates, the model is not 100% accurate at making compliance predictions. The model will occasionally predict compliance probabilities that are lower than expected (false negatives) and probabilities higher than expected (false positives). However, the high accuracy, high AUC (0.9463) and low Brier (0.092) score demonstrate that the model generally performs well, and provides a useful HME tool to guide patient outreach.
*Any prediction is subject to multiple factors and there is no guarantee that any patient will reach compliance on any particular date or under every circumstance.