Replacing Humans with Machines is a Terrible Idea

Is this what the public agency of the future looks like? What seems like a dystopian building complex is actually just a fancy construction tarp in Bratislava, but it makes quite the impression.

This is an opinion piece summarizing my thoughts on why putting AI in public institutions to replace people will not solve any problems and might introduce many news ones. You might naturally not agree with all things, but whether you do or do not, I’ll be happy to hear what you think.

Nearly every day a politician, industry leader, or someone else in power sings the gospel of AI productivity increases. AI, they say, will make things better, faster, stronger, look how much it has already done! It streamlines redundant tasks, codes for us, writes and edits our reports, ads, press statements, there’s nothing it can’t do. All of this might be true and all of this might be just automation going forward. But this talk gets delicate when the same people propose that AI could increase productivity in public institutions. Specifically, in institutions that exist to interact with and support people, like job agencies, social security, and healthcare institutions.

The idea of “let’s make the public sector cost less money” already existed before AI became a thing and it has a fancy name: New Public Management. And here’s the pitch – we should, in the interest of a lean state, organize public institutions like they need to compete on the market. Market forces make private companies better, faster, stronger, why should the same not work in the public sector? The idea surfaced in the 90s and has not gone away. In fact, many public institutions were overhauled with this very idea in mind and re-opened with a big neon sign flashing “now more efficient”.

This is to say, even if you’ve never heard the term, new public management is all around. Just as AI is all around. And, surprise, it’s a match: AI is touted as the ultimate tool to usher in a new age of public sector efficiency. I think that this is misguided. And in the following, I will explain why focusing on three key issues: the way public institutions measure success, the danger of algorithmic imprints, and the legacy of flawed structures.

Public institutions’ performance metrics are skewed beyond belief

When talking to people who work in public institutions, you notice that many of them care about the people they work with. Who does not care about the people they work with are algorithms. Who cares sometimes, and sometimes not, about these people are policy-makers that decide about the course of direction for public sector bodies. A policy-maker might think: These institutions are spending money on people who do not actually need it or deserve it, and besides, they don’t deliver enough throughput per day. This message is packaged and ribboned in the phrase “They need to be more efficient.”

Remember: Efficiency means minimizing spent resources for a particular outcome. Now let’s apply this lens to how public institutions work and let’s take a job agency as example. What is the job of a job agency? The intuitive answer is bringing people into work. Thus, framing the job agency’s work under the lens of efficiency means optimizing a function that takes job counsellors’ time as input and produces new hours worked or vacant positions filled as output; this lens on things creates a convenient statistical relationship that can be modelled and put into algorithms, such as a logistic regression. The algorithm can then predict performance in terms of efficiency and even provide decision-making assistance to maximize this metric. [6]

What does applying efficiency as a performance metric mean in practice? It means that Harald, an older guy with a knee issue and only eight years of school in his education record, will instantly be rerouted to a facility that acts as the oubliette for non-employables. Because this way of action minimizes counsellors’ time spent on a cumbersome hardcore case and shifts its subpar records to the statistics of another agency. You understand that the optimal sample of job-seekers for an agency running under the goal of efficiency would be a population of well-educated, compliant, flexible mid-twenties who can be left to their own devices with barely any intervention. Efficiency would quickly approximate 90+%.

But in reality, a job agency neither deals with an exclusive clientele of malleable young professionals nor is their only job to be the stepstone for easily reachable employment. While the willing twenty-somethings exist, the agency’s clientele is much more likely to be composed of people who experienced hardships and cannot simply return to work. They might have been fired, traumatized by bad leadership, bullied by colleagues, grinded through from caring for a family member, and in turn, they might have developed anxiety issues, nagging self-consciousness, or become physically or mentally ill. If you think this is an overexaggeration, I have seen all these issues in different facets and colours while conducting interviews with people at job agencies – it’s not far fetched. People who have trouble staying in employment are rarely lazy, but have serious, real human issues and are in need of support.

Trying to frame this support as a measure of efficiency is somewhat laughable. Everyone can agree that agencies should not burn money to no effect, but here comes the catch: The effect of personal counselling on job-seekers in itself is an unobservable property [5]. We can only approximate its effect through measures like job placement rate, hours of counselling spent, and the job-seekers’ reports and statements. What we cannot easily observe is psychological comfort, reassurance, satisfaction, perception of self – aspects that are equally if not more important when talking about social support.

This means that algorithms – by design – are not used to maximize or even improve the factors that make up human connection and experience, but to maximize key performance indices that can be quantified. This is why using algorithms to supplant interpersonal work in an effort to improve efficiency, consciously or unconsciously, de-humanizes the agency’s actual performance goal: letting people support people.

Every gauge should say what it measures. Picture by Eric Prouzet.

Algorithms leave gigantic and very resistant imprints

Let’s take one step back to the point where the algorithm is not yet implemented in our job agency. The idea has just entered the discussion and there are arguments for both sides: If we implement it, it could raise the agency’s efficiency and give it some fighting ground in the funding negotiations with the city next year. If not, we see it as a pilot phase and simply dismantle it again. What could go wrong?

This line of reasoning forgets that many organizations, and public agencies in particular, are not nimble little sailing boats that can take sharp turns, but lumbering cruise ships that are responsible for a host of people and for which every move costs a lot of money. Taking a metaphorical turn to see if there’s a shortcut through that archipelago will be allowed exactly once, and if it turned out to be a bad call neither captain nor crew will take any free action for a long time.

What this means is that, once the decision to deploy an algorithmic system has been made, there is absolutely no incentive to truthfully examine whether it’s a blessing or a curse. Money has been spent, personnel has been trained, computers have been set up, job-seekers were informed, everything has been checked by the legal team from top to bottom. How high, do you think, are the chances that any agency will roll this back unless they are forced to?

So, one could say, let’s put it in a real test phase, one that allows us to dismantle the whole thing again if it doesn’t turn out right. Only two counsellors will use the thing, and they will only use a prototype, job-seekers will be asked if they are ok with being rated by a machine, decisions are non-binding, and all data is stored and processed locally, so there’s no hassle with the GDPR. After two months the test is evaluated. The presentation is nice enough, but the results are somewhat… mixed? Counsellor A, whose name is Jonathan, says that he used it in the beginning but that the predictions were way off, so he began to ignore it. Counsellor B, whose name is Monika, says that for her clients the predictions were pretty much on point and she started to use the algorithm a lot. As a result – are those numbers true? – the number of job-seekers she counselled rose by 40%, wow, that’s quite a lot! So it’s a win-win, when the predictions are wrong, the counsellor takes over, and when they are not, everything is better, faster, stronger. And anyway it was only a prototype, maybe the predictions will become even better. The manager is already halfway done with drafting the mail to contract development for the full version.

Even if we leave aside that for counsellors like Jonathan this solution would mean an increased workload while having less time to check the system outputs [1] – what if both counsellors had Jonathan’s experience? Then the system might go back to development or be cancelled and we would never hear about it. These cases might exist and they would be prime-examples of what successful early-stage testing looks like. But first, this outcome presupposes a lot of requirements (there is a testing phase, the test results are honest and unbiased, the algorithm performs badly, and there’s someone who actually calls deployment off), and second, next time the algorithm might work slightly better and a counsellor might vouch for it, and then we are back to square one. And who knows, maybe this time the algorithm actually does a good job, but the point is this: Systems that are meant to automate human work are nothing that you put in and out on a whim. When they come, they leave imprints [4], and changing them becomes more difficult the more their implementation has progressed.

Algorithms will not fix what the prior structure screwed up

In one of my interviews, I spoke to a participant who is knowledgeable about law and AI regulation. When I asked what our Harald could do once he was classified by the system, he said that he didn’t stand a chance because “cruelty is the point of the system.” The phrase was a reference to a book of the same name that dissects Trump politics, and while the situation might not always be as dire as it is now in the US, it’s good to keep the point in mind.

AI systems, as you know, are often sold as a benevolent measure, a productivity boost, and as the next big step towards digitization. But AI systems simply enforce the policy of the public institutions they are implemented in. As we saw above, if the job agency pursues a policy of sanctioning people who do not work, then that is what the AI system will do better, faster, stronger. It is for this reason that, whenever you study the screwups of public AI systems, you will find that these screwups directly trace back to issues that are very deeply rooted in the institutional structure and policy decisions governing this structure. While the first and second arguments assumed that using an algorithm to replace humans was an honest mistake, this argument assumes that its use is part of a policy that does not look favourably upon long-term job-seekers.

Let’s take an example – and this one has a particular pain to it. In my interviews I spoke to many job-seekers to find out how they would feel about an AI system that would predict their employment chance, and what they would need to know to fully grasp what’s going on in this system. Many times these participants told me that they found the system was a wonderful idea, since now finally they would get an unbiased and precise picture of their employment chance. The system gave them hope. Hope placed in a system that could be put in place expressly with the intention to enforce sanctions and rationalize cost-cutting on their end. Quite easily as well, by adopting the restriction that only women could have duty of care for family and ranking them lower for it, and by giving everybody from outside the EU an automatic malus [2, 6]. This system might be the very opposite of individual support and counsel. And yet, in some cases, even after I explained to them what happen, the job-seekers still vouched for the system, saying it couldn’t be worse than the counsellors they had met. They vouched for the human mulcher in the hope that it wouldn’t mulch so hard. How bad does a public agency need to screw up to get people to say stuff like that?

For every AI system that is deployed in public institutions, we need to ask ourselves: Which structures will this system inherit, and whose tune will it dance to? AI deployment is guided by political interest [3, see also], just like everything else, and thinking that AI would un-bias and de-politicize any sphere of public interest is an illusion. To make the point clear: Say an institution is charged with the dismantling of other public institutions, just as we currently observe in the US, and presents a new AI system that is titled the Human Resource Upgrader – how much hope would you place in this system to upgrade the human resources of anything? It’s not about what’s on the label, it’s about what’s in the can.

This is why it is important to know what’s going on in the system and how you can raise objections against it: There’s a good chance that an AI system is not acting in your interest. If cruelty is the point of the system, then it will inflict harm. Knowing who is responsible for this harm, and, most importantly, not trusting the AI to have your wellbeing in mind, is crucial to prevent algorithmic fallout.

Coincidentally, we have published some research on making it easier for people to know what’s going on in an AI system. If you’re interested in that, check out these two papers:

Thanks for reading! Here are some more references if you want to dive in deeper:

[1] Allhutter, Doris, et al. “Algorithmic profiling of job seekers in Austria: How austerity politics are made effective.” Frontiers in big data 3 (2020): 5. https://doi.org/10.3389/fdata.2020.00005

[2] Allhutter, Doris, et al. “DER AMS-ALGORITHMUS. Eine Soziotechnische Analyse des Arbeitsmarktchancen-Assistenz-Systems (AMAS). Endbericht.” (2020). https://doi.org/10.1553/ita-pb-2020-02

[3] Crawford, Kate. The atlas of AI: Power, politics, and the planetary costs of artificial intelligence. Yale University Press, 2021.

[4] Ehsan, Upol, et al. “The algorithmic imprint.” Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 2022. https://doi.org/10.1145/3531146.3533186

[5] Jacobs, Abigail Z., and Hanna Wallach. “Measurement and fairness.” Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 2021. https://doi.org/10.1145/3442188.3445901

[6] Lopez, Paola. “Reinforcing intersectional inequality via the AMS algorithm in Austria.” Critical issues in science, technology and society studies: Conference proceedings of the STS conference Graz. 2019. https://doi.org/10.3217/978-3-85125-668-0-16

Public institutions’ performance metrics are skewed beyond belief

Algorithms leave gigantic and very resistant imprints

Algorithms will not fix what the prior structure screwed up

Enjoy Reading This Article?