Data, bias and harmful algorithms – Sztuczna Inteligencja

If an algorithm has been used to make a decision that concerns us, we have the right to know about it. We can’t simply give up and say: “This is artificial intelligence and only artificial intelligence knows what to do”, says Agata Foryciarz, an IT specialist from Stanford University, in conversation with Monika Redzisz

Monika Redzisz: You study algorithmic bias and track down partisan algorithms.

Agata Foryciarz*: Yes, I’m a member of a research group at the Biomedical Informatics Faculty, Stanford University School of Medicine. I analyze limitations of artificial intelligence based on the algorithm which has been used by cardiologists in the United States to prescribe statins for seven years. My job is to try to establish which situations and which people are favored by the algorithm and which of them are not. It’s a calculator to assess the probability of patient’s cardiovascular problems during the next ten years. Unfortunately, the risk is not assessed indiscriminately for everybody. For example, the possibility of potential health problems is signaled much more frequently in the case of women than in the case of men.

Does that mean women have their medicines administered more often?

That’s what you would expect. Assuming that doctors rigorously follow the guidelines, we may believe that women get drugs more often. Working on that case is very satisfying; because of the fact we have been analyzing it for a long time, we have managed to gather a lot of data. Most of AI tools are way younger and do not provide that much information.

It is said that artificial intelligence systems should be transparent. Great Britain intends to impose penalties on the companies using algorithms if they are not be able to explain the decisions made by the computer. On the other hand, some claim that those systems are black boxes and that it is impossible to explain their decisions.

As far as that issue is concerned, two things are intertwined, which often leads to misunderstandings. You can look at algorithmic errors from two perspectives. On the one hand, IT specialists would like to find specific causes of a result generated by an algorithm, for example the want to know why a wrong medicine has been proposed to a patient. That’s explainability. But what is often more important to the society is the question if such a situation could have been predicted, if it may happen again and if measures have been taken to prevent it from happening in the future. This can be achieved with sufficient and diligent audits. Understanding a mechanism that has caused an error is not enough and, to be honest, it is not always necessary. Just as it is in the case of medicines: we sometimes don’t know which biological mechanism makes them efficient. However, if a medicine is properly tested, we can safely assume that it is going to work. Explainability is not always the most important thing.

An algorithm has its limitations which result from historical data. The only thing it can do is to repeat the past

In my opinion it is important if we want to fix the algorithm. Being aware of the fact that we have a certain number of documented cases in which women get their medicines more often than men, shouldn’t we get back to the algorithm and make some adjustments to it?

We can either correct the algorithm or to state that it is not suitable to make such decisions. Just as it was in the case of an algorithm used by the court system to make decisions on how much one should pay to be released on bail. Sociologists that have been analyzing it claim that it is not the right way of making decisions.

What do you think might have caused an error in your algorithm?

It could have been an insufficient amount or poor quality of data and application of inadequate statistical models, which has been suggested in recent studies, but it is very difficult to clearly identify the source of such errors.

Why is it so difficult? The algorithms used to diagnose skin cancer have proved ineffective for dark-skinned people because they have been trained on persons with a fair complexion. The reason is very clear.

Sometimes it is easy but it is not always the case. A good example is a model which have been recently described in an article in “Science”. An algorithm used by insurance companies to predict healthcare needs has been analyzed. An insurance company is trying to offer a medical scheme to the sickest, i.e. to those who will need additional consultations, visits and tests or, in other words, to those that will generate the highest costs.

At first glance, there is no discrimination. The algorithm chooses a proportionate number of men and women from various ethnical groups. But the algorithm is designed to forecast the costs that would be generated next year and not the diseases. The assumption is logical: people who are sicker will generate higher costs.

Only that if we look at to what extent the results correspond to the medical condition several months later, it turns out that there is a huge difference between white-skinned and black-skinned people that have been included in the scheme.Statistically speaking, white people to whom the scheme was offered are healthier, which means that if you are black you have to be much more sicker than a white person to be qualified for the scheme. And why is that? Because the American healthcare system spends less money on black-skinned people. So it is obvious that if we forecast costs and if the forecast is used to gather information about people’s medical condition, we will always be biased. It would be hard to draw that conclusion basing only on a statistical analysis, without trying to understand a social context in which the algorithm is used.

Did the algorithm know who was white and who was black?

It had no data concerning the race but such information is correlated with other data. The algorithm is able to infer it on the basis of an address, for example. Secondly, the data structure reveals a group division. Let’s imagine for example that we have only data about how tall people are. Because of the fact that, on average, men are a bit taller than women, basing solely on the information about the height and referring to the Gaussian curve [Editor’s note: the most important theoretical distribution of probability in the statistics describing situations where most of the cases are close to the average result], we will be able to deduce that those points here are probably men and those points there are most likely women.

The same goes for the data concerning people from different ethnic groups, especially in the US, where inequalities in access to education, healthcare and so on are huge due to historical discrimination. It has to be reflected in the data. The algorithm doesn’t need access to data which specifically say that a person is black to generate different results regarding that person, which can later serve as a pretext for discriminating them. The algorithm has its limitations which result, for example, from available historical data. The only thing it can do is to repeat the past.

Can it be fixed?

First of all, we have to take testing more seriously. If we are diligent with the tests, we will be able to foresee what type of errors may occur and whether such errors will disproportionately affect discriminated groups, irrespective of whether we are talking about predicting recidivism or the risk of contracting a disease. We should also demand detailed documentation on such systems to be published so as to make it possible for non-government organizations and researchers to carry out audits and identify errors before anyone is harmed.

And maybe we should feed the algorithms with false data? Maybe it would be better to deliberately distort history to fix them?

Indeed, one of the ways to solve similar problems could include data replacement, although I don’t think it would be a good idea to provide false data. Changing the objective could also be a solution. If, instead of forecasting treatment costs, we try to predict the number of active diseases, then that specific type of bias in the algorithm will disappear (although other prejudice may emerge).

And what about the judiciary system? What about judges who refer to algorithms to predict who will commit an offense again and who will not? How can you change the objective in that situation?

You need a broader perspective. The model to predict recidivism was designed as a solution to a very important problem in the United States. USA is a country with the highest percentage of imprisoned people in the world with about 2.3 million citizens staying in penitentiaries. Black offenders go to jail five times more often than white ones although they represent less than 13 percent of the society. The system is extremely unjust, which is rooted in the history; during the period of racial segregation black people were arrested even for petty offenses. The reason was racism but also the fact that, historically speaking, prisoners were cheap labor in mines and other places where working was dangerous.

“An algorithmic error”, “that’s not our fault”, “we don’t know what happened” – things we hear most often from people wanting to wriggle their way out of a problem

Black people are still disproportionately more often arrested than white people for the same offenses. On top of that, there is also a problem of a long time spent under arrest before the trial. Such persons could be released on bail but what can be done if they can’t afford it? One of the ideas was the algorithm we have just talked about: let’s see who can be released and who we can trust.

Are the Americans OK with that?

Many scientists and organizations defending human rights are against the use of algorithms in such situations. They believe that instead it would be advisable to reform the judiciary and penitentiary system and to influence the decisions made by the police. There are also those who claim that the algorithm could be used in a different way, for instance to predict if somebody will appear before the judges after being released from arrest. Instead of holding a person under arrest, wouldn’t it be better to send them a reminder? Wouldn’t it be better to help them get to the court from their location or to provide childcare? As these are the most common reasons of failing to appear in courts. It’s about changing the way we think. We shouldn’t penalize people, we should try to help them.

To what extent was the algorithmic forecast right?

Actual human stories were published for the first time in 2016. That article launched a debate on algorithmic bias. Many analyses were conducted during the next four years. One of them pertained to what happened if a judge made a decision based on algorithmic forecasts and what happened if a decision was independent.

Do American courts still use that tool?

Yes, in many states.

Another well-known story is about women discriminated during the recruitment process in Amazon.

They never used that algorithm; it was only the test phase. The story made headlines, which is very good; but that particular tool did not affect anyone’s life.

On the other hand, there are many HR companies that record a video from a job interview that is used by an algorithm to determine what the candidates’ personality is like and if they will be good workers. Although there is no reliable scientific evidence that it is possible to statistically predict this kind of information, such tools are used and they influence real decisions about other people’s lives.

But we know, which has been demonstrated by Michał Kosiński form Stanford, that an algorithm based on the analysis of a human face or on the likes on Facebook is able to learn everything about a person. I wouldn’t be a bit surprised if a recorded job interview could provide much information about a candidate. Whether that approach is ethical or not is a different matter.

They are two separate issues. Even if the system was 100% accurate, should we still use it? It is used without an approval of people who are subject to the analysis. First of all, it is unethical.

But there is also a problem with how those algorithms work. It’s not a secret that facial recognition algorithms work best in the case of white men and struggle a lot in the case of black women. This case is similar to our example regarding insurance companies: if my objective, as an insurer, is to cut costs, I don’t care if I will cut the costs just a tiny bit with respect to many people or if I will cut the costs by a lot with respect to a single person. The average will always be the same. My algorithm will work fine for me. But it will make a difference for anyone who gets affected by my calculations. Most of metrics to evaluate the efficiency of algorithms calculate the average. That’s how machine learning systems work. They try to minimize the risk of an error by referring to the average.

So if we don’t perform a detailed analysis of how algorithms work in different contexts and for different sets, if audits are not conducted on an ongoing basis, individual errors and abuses will always occur, although they don’t necessarily have to influence the average.

Łukasz Kidziński told me about an insurance company algorithm which, based on an address, is able to calculate the insurance value. We can easily imagine a situation in which that would be unfair.

Exactly. Nowadays everyone is hyped about algorithms, they want to use them for everything. But first, we need to think if we really want to lunch that system. We need to think about other extra-algorithmic solutions that would be more reliable and cheaper.

Can we even stop this process at this point? This tool is being implemented on a mass scale…

I don’t know. But I know that medical experts prefer to err on the side of caution. That’s one of the main reasons why I am now working in a medical school. In medicine, safety standards are scrupulously observed. I am interested in the questions people are asking before algorithms are launched and how algorithms are analyzed.

Summing up, I understand that many companies could provide us with documentation on how their algorithms work but they won’t do that, hiding behind the argument of a black box.

Yes, it’s an abuse. “An algorithmic error”, “that’s not our fault”, “we don’t know what happened” – things we hear too often from people wanting to wriggle their way out of a problem. But if anything suits their book, like an algorithm to automate a field of their work, then, all of the sudden, everything is perfectly clear. They are happy to provide us with all documents and are open for a discussion about what and why something has gone wrong.

So you, as an IT specialist, are saying that we don’t always have to trust what we are told and that in some cases we should demand that all issues are explained to us. Am I right?

Yes.

We have heard many declarations that everything should be transparent and fair. However, in practice, algorithms that are still a mystery to us are used everywhere. What’s more, we don’t even know where or when. We have no idea which bank, which court or which employer is using algorithms for vetting purposes. How can we get that information?

Polish employment offices used an algorithm to split the unemployed into three groups, which later affected their future. After much struggle, the Panopticon Foundation has managed to get access to it. Citizens have the right to know about the fact that an algorithm has been used to make a decision in their case, especially when the decision pertains to access to services or products. This should be clear, but it is not. We can’t simply give up and say: “this is artificial intelligence and only artificial intelligence knows what to do”; it’s all window dressing, it’s all pretending that a statistical model is, in one way or another, autonomous and really “intelligent”. But the truth is it’s just a model that has been designed by humans and that someone decided to use in a specific context. We should also be entitled to appeal against a decision.

What is your opinion on the situation regarding that matter in America and in Europe? Is awareness greater on the Old Continent or on the other side of the Atlantic?

Institutionally speaking, definitely in Europe. I’m counting on Europe. In America lobbying actions taken by big tech companies are so powerful that it is very difficult to imagine effective regulations at a national level.

*Agata Foryciarz is a PhD student at the Informatics Faculty, Stanford University, where she studies machine learning methods used to assist humans in decision making processes and leads the Computer Science and Civil Society group. In Poland she collaborates with the Panopticon Foundation.

Przeczytaj polską wersję tego tekstu TUTAJ