Take me to...
At Featurespace, we strive to interpret the presence of biases in our fraud prediction systems to ensure we make the world not only a safer place to transact, but a fairer one. Our modeling guidelines and best practices impose strict controls over feature schemas, search for spurious associations and ensure that sensitive information is only used for reporting and evaluations of bias, unless otherwise justified. We commonly follow lengthy iterative processes whereby score distributions, as well as true and false positive rates are scrutinized over subsets of the data defined across potentially sensitive partitions. Our current strategy is directed towards making advances in constructing interpretability and explainability tools that can help our internal data science teams and subject matter experts make informed modeling decisions that are better tailored to each of our customers’ preferences.
To facilitate these fairness analyses, we draw motivation from pragmatic approaches to addressing the three primary fairness criteria.
This amounts to ensuring that (conditional) statistical demographic parity and group fairness are satisfied. In simple terms, our fraud models should yield the same score for any two financial transactions associated with different entities, if the only differentiating factor in the transaction details and entity profile is the value of a protected attribute. To this end, simple statistical tricks are commonly used to enforce the independence property, applicable at different steps in the model construction process. This includes, but it is not limited to:
In binary classification settings associated with fraud prevention, achieving separation requires that receiver operating characteristic (ROC) profiles in fraud models be invariant to changes in sensitive information. Here, true and false positive rates for fraud prevalence, as well as their negative counterparts, must be consistent across sub-populations that are segregated by regional characteristics or their financial activity profile. In practice, this is commonly achieved post-processing, i.e., after model design.
Commonly, the performance of a fraud prevention model will differ across geographical regions, merchant types or customer profiles. For instance, prevention models within merchant consortia are often better calibrated to the large institutions within. Thus, separation is usually achieved through interventionism, by deliberately worsening the ROC curve profiles associated with well performing groups, to ensure that performance falls in line with the lowest common denominator. In lay terms, this offers positive discrimination through the benefit of the doubt and is applicable to randomly chosen entities which would commonly score highly for potential fraud. In our consortia example, this would be achieved by deliberately classifying as genuine financial activity a reduced number of transactions associated with fraudulent customer behavior by your prevention model. Such interventionism in your fraud system’s behavior is controversial and comes at a significant cost. However, it is a cost that encourages model builders to improve the performance for the lowest common denominator. Interventions here are generally targeted towards collecting more diverse observations that can contribute to fairer and more efficient model designs in the future.
Sufficiency is a property generally satisfied by well-trained off-the-shelf machine learning systems, without the need for any external intervention. Here, scores from your fraud prevention system should be truly representative of the likelihood of malicious activity associated with transactions, all protected attributes considered. It is paramount to avoid over-fitting, data leakage and similar mistakes during model training, since sufficiency follows from good generalization of your model’s predictive performance to newly exposed data. The property can be visually inspected through traditional ‘goodness of fit’ tests, such as Hosmer–Lemeshow tests, segregated across the protected populations in your data sample. We generally expect for a fraud model to satisfy sufficiency even without the presence of protected attributes during training if these are predictable from correlated proxy variables. However, we note that imposing any corrections in your model to ensure parity of equality of opportunities (by satisfying independence or separation) will be an impediment to constructing sufficient models.
Featurespace’s approach to ensuring model fairness is designed to fit both our own mission, as well as align with global efforts towards financial inclusion and fairness. We actively address all three fairness criteria within our model development and aim to make our models and their outputs as transparent as possible. We know that transparency in machine learning models is crucial to meeting the explainability and interpretability requirements around model governance that financial regulators are beginning to explore. We participate in recommendations and commentary processes to facilitate the development of regulatory frameworks. By emphasizing model fairness in our research and development we believe we can accelerate the adoption of machine learning for fraud prevention and anti-money laundering, ultimately making the world a safer place to transact.
Read more about Model Governance for Anti Money Laundering from Featurespace: https://www.featurespace.com/newsroom/model-governance-anti-money-laundering/