What Changed? Interpretable Model Comparison

Abstract

We consider the problem of distinguishing two machine learning (ML) models built for the same task in a human-interpretable way. As models can fail or succeed in different ways, classical accuracy metrics may mask crucial qualitative differences. This problem arises in a few contexts. In business applications with periodically retrained models, an updated model may deviate from its predecessor for some segments without a change in overall accuracy. In automated ML systems, where several ML pipelines are generated, the top pipelines have comparable accuracy but may have more subtle differences. We present a method for interpretable comparison of binary classification models by approximating them with Boolean decision rules. We introduce stabilization conditions that allow for the two rule sets to be more directly comparable. A method is proposed to compare two rule sets based on their statistical and semantic similarity by solving assignment problems and highlighting changes. An empirical evaluation on several benchmark datasets illustrates the insights that may be obtained and shows that artificially induced changes can be reliably recovered by our method.

Publication
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, $$IJCAI-21$$