Navigating the complexities of machine learning fairness

Automated decision-making systems can reproduce or even amplify existing biases that discriminate against certain individuals or minority groups. Overcoming this challenge, however, is far from easy

Irene Unceta

In early 2018, a group of researchers from MIT conducted the “Moral Machine Experiment”. The experiment, which is still available online, sought to understand public attitudes towards ethical decisions made by self-driving cars by allowing users to experience these moral dilemmas by themselves.  

Participants were presented with various scenarios in which a self-driving car had to make a split-second decision, such as choosing between sacrificing the lives of passengers or pedestrians or sparing the lives of the young versus the elderly.  

The results, published in Nature, showed a wide range of opinions on what constitutes fair and ethical decision-making, demonstrating the difficulty in establishing a universal definition of fairness, a challenge that has gained increasing attention in recent years. 

Moral Machine Experiment
What should the self-driving car do? Source:

As automated decision making systems become more ubiquitous in our everyday lives, concerns grow about their potential impact on society. One of the most critical issues being debated is their ability to perpetuate or even amplify existing biases, resulting in discriminatory practices against certain individuals or minority groups. This concern is known as the fairness problem

How to measure fairness? 

Fairness refers to the equity or justice of a decision-making system. It is related to the system's bias and its potential for discrimination. Evaluating this potential, however, is not as simple as one might think, as fairness is not a one-size-fits-all definition and decisions about fairness often involve trade-offs. What may seem fair in one context, may not be so in another. The question is then, how do we decide what is fair in the realm of automated decision-making?  

From a technical perspective, fairness is generally evaluated based on the identification of one or more protected or sensitive attributes, such as race, ethnicity, gender, age, physical or mental abilities, or sexual orientation. An automated system is considered fair if its predictions are independent of these protected attributes, as long as any observed statistical differences based on these attributes are not reasonably justified. In practice, however, evaluating this independence is not straightforward. 

Three approaches to the fairness problem 

Let's illustrate this through a simple example. We want to evaluate if candidates are qualified for an educational grant. To train a machine learning model for this task, we need to gather historical data X from past grant calls.  

Suppose we have records from the last 4 grant calls where each candidate’s GPA and socio-economic status are known. Additionally, the grant outcome is also known and represented by a target variable Y with two values: granted and not granted. The graphic below shows this dataset. Each point represents a candidate, and the color indicates the grant outcome.  


We can use these data to train our model and identify its predictions with the letter R. As before, this variable takes two values: eligible or not eligible. The performance of the model depends on the agreement between the actual labels Y and the predicted outcomes R for each candidate in the dataset.  

Finally, we introduce a protected attribute identified by the letter A, which separates candidates based on their gender. For simplicity, we consider gender in binary terms and define two populations: men and women. With this example, we can examine different approaches to ensuring fairness in the resulting predictive system. 

Therefore, we can operate with three variables: 

  • Y = Outcome: granted or not granted 
  • R = Prediction: eligible or not eligible 
  • A = Protected attribute: man or woman 

1. Omitting sensitive information

One intuitive approach would be to exclude information about the gender of the candidates from the training set. This should ensure that the predictions R made by the model are not based on the protected attribute A, since the model has no access to this information.  

The next graphic shows the decision boundary learned by the model using this approach. In the simplest case, we consider a linear model that separates the feature space in two regions. The red shaded region includes all the data points classified by the model as eligible, while the blue shaded region includes those individuals considered by the model to be not eligible for the grant. 


Observe that there are only 4 data points where the predictions R made by the model don’t agree with the original labels Y. These are the points which either lie in the red shaded region while being colored in blue, or lie in the blue shaded region while being colored in red. In light of these results, we can consider the model to be a reasonably accurate as it correctly classifies 80% of the individuals

Let’s now evaluate its impact on the two populations defined by the protected attribute A. Remember that these data were not accessible by the model at the time of training. The graphic below shows the same decision boundary as before. This time, individuals are represented by either circles or crosses based on their gender.  ML3

We can see that there are more crosses than circles in the red region; conversely, there are more circles than crosses in the blue region. The model is not based on the protected attribute A, yet it disproportionately affects the two populations. This shows that unawareness is no guarantee of fairness in a decision-making system.  

2. Forcing equal elegibility

Another approach to ensure fairness is by forcing the model to learn a decision boundary such that the probability of being classified as eligible is the same for the two populations defined by the protected attribute. In simpler terms, the proportion of crosses and circles in the red shaded region must be equal.  

This time our graphic shows the decision boundary learned by the model based on this approach. The model classifies the same percentage of individuals as eligible from both populations.  


Following our original definition of fairness, we can conclude that this model’s predictions are effectively independent of the protected attribute. It appears our problem is solved. 

Before coming to a conclusion, however, let’s evaluate the model’s performance. The model misclassifies 5 individuals who were not granted the scholarship as eligible, and 4 individuals who were granted the scholarship as not eligible. This corresponds to an accuracy of 55%, a value significantly lower than our earlier baseline.  

Despite this, we may still choose to accept this solution as it ensures equal treatment for male and female candidates... Or does it? Upon closer examination, we find that the performance of the model varies between the two populations.  

The model misclassifies 2 men who were not granted the scholarship as eligible and misclassifies 1 who was not granted the scholarship as eligible. From the women population, the model misclassifies 3 individuals who were not granted the scholarship as eligible, and 3 who were granted the scholarship as not eligible. Therefore, the result is an accuracy of 70% for men and 40% for women.  

Our solution, despite our best efforts, still does not treat both populations equally. Even in the more optimistic case, where we focus only on the individuals that were classified as eligible by the model, we see a mismatch in accuracy between the two populations: 66% of the men who were classified as eligible were really granted the scholarship, while this number drops to 50% for women.  

3. Forcing equal performance

In a final effort, let’s require that these last values be equal for both populations. This is, let’s train a model that shows an equal predictive performance for both men and women. 

The graphic below shows such a model. Note that all the data points in the red shaded region are now correctly classified. In other words, a 100% of the women classified by the model as eligible were originally granted the scholarship, and equivalently a 100% of the men classified by the model as eligible were originally granted the scholarship.


If we consider the data points in the blue shaded regions, those classified by the model as not eligible, some differences emerge among the two populations. The model misclassifies 3 women who were granted the scholarship as not eligible, and 2 men who were not granted the scholarship as not eligible.  

This means that considering the individuals classified as not eligible, the model has an accuracy of 57% for the women and 71% for the men. Hence, while the model may have the same predictive performance over those individuals classified as eligible, it doesn’t over those classified as not eligible

Moreover, if we look at the true distribution of the labels Y for both populations, we can observe that among the 10 men in the dataset 5 were originally granted the scholarship (a 50%), whereas among the 10 women in the dataset 6 were originally granted the scholarship (a 60%). In classifying the same number of women and men as eligible, the model therefore assumes that both populations had an equal chance of being eligible for the grant, whereas the data appears to say otherwise.  

Once again, we find ourselves in need of clearly identifying what fairness means for this model in order to ensure it is enforced in practice

An open problem 

This simple example hopefully illustrates that the question of fairness in automated decision-making is a complex, multifaceted problem. What may seem fair in one context may not be so in another.  

Fairness is generally evaluated by requiring independence of one or more protected attributes, such as race, ethnicity, gender, or physical abilities. In practice, however, this criterium can be open to interpretation and needs to be refined and specified for each context.  

Moreover, note that the three solutions presented above have different performances. The first model, which was trained without access to the protected attribute, had an accuracy of 80%. The second model, where we forced that the percentage of men and women classified as eligible be equal, had an accuracy of 55%. Finally, the third model, where we imposed that the predictive performance for the individuals classified as eligible be the same for both populations, also had an accuracy of 55%. 

Once we introduce some form of fairness requirement at the time of training, this affects the overall accuracy of the model, which drops as a result. Other attempts at imposing fairness criteria on the model will equivalently lead to a decrease in performance. 

In conclusion, this means that trade-offs must be made between fairness and accuracy, making the determination of fairness a subjective and ongoing challenge that highlights the need for reflection and exploration of different perspectives, on what should be considered to be fair and when

All written content is licensed under a Creative Commons Attribution 4.0 International license.