Consider a person who applies for a loan with a financial company, but their application is rejected by a machine learning algorithm used to determine who receives a loan from the company. How would you explain the decision made by the algorithm to this person? One option is to provide them with a list of features that contributed to the algorithm’s decision, such as income and credit score. Many of the current explanation methods provide this information by either analyzing the algorithm’s properties or approximating it with a simpler, interpretable model.
However, these explanations do not help this person decide what to do next to increase their chances of getting the loan in the future. In particular, changing the most important features for prediction may not actually change the decision, and in some cases, important features may be impossible to change, such as age. A similar argument applies when algorithms are used to support decision-makers in scenarios such as screening job applicants, deciding health insurance, or disbursing government aid.
Therefore, it is equally important to show alternative feature inputs that would have received a favorable outcome from the algorithm. Such alternative examples are known as counterfactual explanations since they explain an algorithm by reasoning about a hypothetical input. In effect, they help a person answer the “what-if” question: What would have happened in an alternative counterfactual world where some of my features had been different?
To address this question, our team of researchers proposes a method for generating numerous diverse counterfactuals, which takes into account usefulness and relative ease. A paper detailing our research, entitled “Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations,” will be presented at the ACM Conference on Fairness, Accountability, and Transparency (ACM FAT* 2020) in Barcelona, Spain. We have also released an open-source library, Diverse Counterfactual Explanations (DiCE), which implements our framework for generating counterfactual explanations. Although it is easy to generate a single counterfactual, the main challenge is to generate multiple useful ones, and that is the overlying goal of our method. This research was done in collaboration with Chenhao Tan from University of Colorado, Boulder, and also includes Ramaravind Mothilal, who is currently an SCAI Center Fellow at Microsoft Research India.
The challenge: Generating multiple counterfactuals that are useful to users and system builders
Specifically, counterfactual explanation refers to a perturbation on the original feature input that results in the machine learning model providing a different decision. Such explanations are certainly useful to a person facing the decision, but they are also useful to system builders and evaluators in debugging the algorithm. In this sense, counterfactual examples are similar to adversarial examples, except that problematic examples are based not only on proximity to original input, but also on various domain-dependent restrictions that should not affect the outcome, such as sensitive attributes. Perhaps their biggest benefit is that they are always faithful to the original algorithm—following the counterfactual explanation will lead to the desired outcome, as long as the model stays the same.
To continue reading this article click here.