AI can solve any problem

In this section, we'll discuss the types of problems that machine learning can solve, and where it hits limitations, and look at the phenomenon of pseudoscience in AI research.

There have been plenty of opinion pieces about whether

data is the new oil

The metaphor first gained popularity following the publication of this article in The Economist, but has been repeated many times, such as in this piece in Hackernoon. A number of critiques of the idea have appeared too, such as this piece in Wired, Ben Tarnoff's article Big data for the people: it's time to take it back from our tech overlords, and this piece that proposes an alternative metaphor: What if our personal data is less the ‘new oil’ and more like uranium? On data metaphors in general, see this article from Anna Lauren Hoffmann and Luke Stark, Data is the new what? Popular Metaphors & Professional Ethics in Emerging Data Culture

, and whether that’s a good thing or a bad thing. Regardless of how we judge the comparison, one question that these debates might cause us to ask is, “if data is the new oil, then what is it fuelling?” One obvious answer is “a surveillance capitalist nightmare,” but a less provocative (although no more accurate) answer might be “machine learning.”

Whenever we hear that AI systems require enormous amounts of data, or things to that effect, we are speaking about one approach to AI, namely

machine learning (ML)

For a basic explanation of machine learning, see these helpful videos from the Royal Society on What is machine learning? or check out this article from MIT Technology Review: "Machine-learning algorithms use statistics to find patterns in massive* amounts of data. And data, here, encompasses a lot of things—numbers, words, images, clicks, what have you. If it can be digitally stored, it can be fed into a machine-learning algorithm."

, which relies on access to large amounts of training data so that the algorithms 'learn' rules. The recent surge in popularity of AI that began in 2012 is, in fact, a surge in the popularity of ML. Other, older, approaches to AI, such as expert systems, don’t require training data because they are laboriously ‘hand programmed’ by domain experts.

What makes ML unique here is that the system has to be fed with data so that it can be ‘trained’ to make certain distinctions or categorisations. A typically tough challenge for an ML system can be seen in the image below: which pictures show cats, and which croissants?

cat croissant

Other questions that we might want our ML system to answer could be:

Does this image contain a face?
Which movie would a person like to watch next?
How can we autocomplete the sentence currently being typed out by a user?

To be able to answer any of these questions, an ML system needs to be trained on large datasets, which are generally manually labelled by

badly paid human workers

For more on the human labour behind such labelling practices, see this piece from Synced, The Humans Behind Artificial Intelligence, and the work of Mary Gray and Siddarth Suri on the 'ghost work' behind automation: How Silicon Valley’s successes are fueled by an underclass of ‘ghost workers’

. For the cat-croissant problem, what we typically need is a large, labelled dataset of pictures labelled ‘cat’ and pictures labelled ‘croissant’ for the system to learn from.

Indeed, if you’ve ever wasted five minutes of your life clicking on “squares that contain traffic lights” to get through a CAPTCHA ("Completely Automated Public Turing test to tell Computers and Humans Apart”) on a website, then you have done some of this labelling yourself.

Given that ML systems require huge amounts of data to train, we often hear pleas to remove restrictions on data collection and usage so that ML innovation can reach its full potential. We also hear the common refrain that regulation will kill innovation in AI (see: We can't regulate AI for more on this) and have the consequence that countries with strong data regulation will ‘lose the AI race’ to other countries, such as China, where AI developers apparently have access to huge amounts of sensitive data.

But just how much of an impact can such unrestricted access to data have? Can ML systems solve any problem with enough data, or are there hard limits that this approach inevitably runs up against?

If we come back to the example of distinguishing cats from croissants, ‘solving the problem’ of distinguishing cats from croissants can be accomplished by feeding a sufficient amount of relevant data into a suitable machine learning model. If we had an enormous dataset with pictures of ginger cats and croissants from all conceivable angles, in theory our ML model should become an absolute ace at distinguishing one from the other.

Of course, such a system would be basically useless, as even 100% accuracy here would have very little real world application and at best would just be doing as well as ordinary people, who don’t tend to regularly mistake cats for croissants or vice-versa. It might also run into problems if faced with a black cat and one of those weird vegan-charcoal croissants.

What can ML do well?

There are, however, many tasks where highly accurate machine learning systems could make a real difference to our lives. On the one hand, we have tasks at which humans tend to be quite bad, such as trawling through huge amounts of text or video footage to measure the occurence of certain words or objects. On the other hand, we have tasks that humans might be quite good at, but for one reason or another are not the type of tasks that we want to do. This could be because they are highly repetitive/boring, or downright horrifying.

As an example of a repetitive task, we could take the example of language translation. While there are many people who are expert translators, and while the majority of us possess the capability to learn a foreign language, we all benefit from tools such as

Google Translate

For an explanation of how tools such as Google Translate work, check out this blog from Daniil Korbut: Machine Learning Translation and the Google Translate Algorithm

which can provide relatively good translations instantly. Although such tools will never produce award-winning translations of literature, they nevertheless do a good job of translating menus for us on holiday and other such routine tasks.

Machine learning is also useful for automating tasks that people tend to find unpleasant or horrifying. A notable example made headlines in early 2020 when a team at Stanford trained a machine learning system to recognise people’s ‘analprint’ so that it could monitor their toilet usage to keep an eye on their health. Obviously most people would find it rather unpleasant to have to learn to recognise patients by their analprint and verify their identity in this way each time they used the toilet. However, as multiple people pointed out in response, there are thankfully many less invasive methods of verifying identity, such as fingerprints.

The limits of machine learning

We can clearly see that there are many ways in which machine learning techniques can utilize large datasets to help human beings with certain tasks. This does not mean, however, that if we just get enough data, we will be able to train machine learning systems to solve any problem whatsoever, nor that the technique of machine learning itself is suited to all problems. In a piece entitled How to recognize AI snake oil, Stanford professor Arvind Narayanan demonstrates the limitations of ML by proposing that we make a distinction between three types of problems that ML is being used to solve.

Firstly, there are perception problems. What's important here is that there is some ground truth against which to measure accuracy (this will never be 100%, of course, but it’s as close as possible). For example, in transcribing speech to text, we can say with something close to certainty whether the transcription is correct.

Similarly, in facial recognition tasks where a system has to identify whether two photographs are of the same person (this is called

facial verification
For an explanation of the different types of facial recognition technology, check out this post from the Ada Lovelace Institute: Facial recognition: defining terms to clarify challenges

, or 1-1 matching), we can say for sure whether the system has made a correct prediction. As Narayanan says, for this type of problem, “given enough data and compute, AI will learn the patterns that distinguish one face from another.”

Here we have seen real progress in recent years, and it’s with this type of problem that the idea that “AI can solve any problem with enough data” holds at least some water. Computing power and data quantity are not sufficient, of course, as data quality is a key factor, along with all the complexities of designing and fine tuning the algorithms themselves.

Nevertheless, there can be serious ethical issues even with such perception problems, as there are applications of machine learning which are problematic in and of themselves, regardless of how accurate they may be. For example, even in a situation in which facial recognition technology had achieved extremely high accuracy, it remains an

ethically and legally problematic technology

As Malkia Devich-Cyril explains in this article, Defund Facial Recognition, in "an era when policing and private interests have become extraordinarily powerful — with those interests also intertwined with infrastructure — short-term moratoriums, piecemeal reforms, and technical improvements on the software won’t defend Black lives or protect human rights," and such systems simply have to be banned outright.

. Perfecting accuracy is not the same as making a system ethical or legal.

Things become a lot more murky in the next class of problems that Narayanan calls problems of automating judgment. What we are doing here is trying to get an ML system to learn how we make certain judgments by feeding it a sufficient number of examples.

Take spam detection, for instance. If we train an ML system on a dataset containing hundreds of thousands of emails, some marked as ‘spam,’ others as ‘not spam,’ the idea is that the algorithm will learn how to make the same distinctions we made. In the case of spam detection, the accuracy can arguably reach quite a high level. This is largely because there are usually no serious disagreements about what constitutes spammy email. Things are far less clear in other cases, however, such as

hate speech detection

For more on the limits of automated tools for hate speech detection, see the report Mixed Messages? The Limits of Automated Social Media Content Analysis from the Center for Democracy and Technology

, where definitions are highly contentious and require human nuance.

With these problems of automating judgement, it seems that the more contentious the criterion, the lower the possible accuracy of the system. In cases where there are polarised or mutually exclusive definitions of a phenomenon, there will be no way for a ML system to satisfy both, and so the system will be fundamentally flawed from one or other perspective.

We cannot, for example, train an ML system to judge good literature. This is fundamentally impossible because we cannot provide it with a data set of good and bad literature that everyone would agree with. At best, we can train it to recognise what literature certain types of people would find good, but this is not the same thing.

The final type of problem Narayanan refers to as predicting social outcomes. The issue here is that we are dealing with systems with serious social consequences and fundamentally contentious concepts. Most importantly, these systems are trying to predict the future and this is the key difference with the problem of automating judgement.

As noted above, training an ML system to identify ‘good literature’ is not a solvable problem, because the criteria of judgement cannot be unambiguously defined. At the same time, even such an intractable problem as literary criticism is fundamentally only dealing with the past: the books have already been written, we just want the system to classify them according to criteria.

Predicting social outcomes, however, combines the problem of contentious criteria with the problem of making predictions about future events for which we have incomplete information. As examples of this type of problem, Narayanan lists predicting criminal recidivism, predicting terrorist risk, predictive policing, predicting job performance, and predicting at-risk children for social intervention.

All of these problems involve predicting the future, which he says should be something that we don’t believe we can do with ML in such serious use cases, but notes that “we seem to have decided to suspend common sense when AI is involved.”

A good example of the ineffectiveness of predicting social outcomes is a recent study, the

Fragile Families & Child Wellbeing Study,

To read more about this study, see the about page of the study or this MIT Technology Review overview

which collected an enormous amount of data about so-called ‘fragile families’ and held a competition to see if researchers could predict six 'life outcomes' for children, parents, and households. Researchers were given nearly 13,000 data points on over 4,000 families.

Much to the surprise of everyone involved, none of the entries achieved any kind of reasonable accuracy. The most cutting-edge machine learning approaches with access to almost 13,000 data points barely performed better than a hundred-year-old technique using 4 data points (and none of them performed well at all). Similarly, another study by Julia Dressel and Hany Farid showed that the notorious criminal recidivism prediction system, COMPAS, was “no more accurate or fair than predictions made by people with little or no criminal justice expertise.”

They also demonstrated that “despite the impressive collection of 137 features, it would appear that a linear classifier based on only 2 features—age and total number of previous convictions—is all that is required to yield the same prediction accuracy as COMPAS.” In both cases, we see that fancy algorithms and huge data sets made no difference to accuracy and predictive power.

More importantly, the entire problem which such systems are trying to solve is framed in a way that can only lead to harmful outcomes, because the idea that complex social outcomes can be predicted from past data is highly problematic, especially in cases where those predictions have serious consequences for people.

In addition to being no better than rudimentary methods in these cases, ML systems introduce a host of additional risks. Narayanan lists a number of these, such as:

Hunger for personal data
Massive transfer of power from domain experts & workers to unaccountable tech companies
Lack of explainability
Distraction from interventions (i.e. we focus on tweaking algorithms instead of broader social solutions)
Addition of a veneer of accuracy/objectivity

There has been an alarming number of instances of 'AI' being used to make predictions and judgments for which the technology is entirely unsuited, and which in some cases shouldn't be made at all, even by humans.

To list just a couple of recent examples: a paper published in July 2020 claimed to use the body mass index (BMI) of politicians as a predictor of political corruption; it emerged that the US Department of Defence has invested $1,000,000 in developing an AI system that could "predict an enemy's emotions"; and most troublingly of all recent examples, a paper was published entitled “A Deep Neural Network Model to Predict Criminality Using Image Processing,” which claimed to be able to predict 'criminality' by analysing images of people's faces.

Regarding the latter paper, over 1,000 AI experts signed a letter condemning the research and outlining the "ways crime prediction technology reproduces, naturalizes and amplifies discriminatory outcomes, and why exclusively technical criteria are insufficient for evaluating their risks." As they further noted, "there is no way to develop a system that can predict or identify “criminality” that is not racially biased — because the category of “criminality” itself is racially biased."

The paper in question was ultimately withdrawn, but the example clearly demonstrates that the problem of such

pseudoscientific AI

For more on the resurgence of pseudoscience in AI/ML research, see Racial pseudoscience in the age of AI or Physiognomy’s New Clothes

is current, acute, and potentially deadly for marginalised groups.

Criticism has been raised against many of these scientifically dubious applications of machine learning, such as

emotion detection

On emotion detection, see AI Now's 2019 Annual Report which calls out the shaky scientific foundations of emotion detection, Access Now's work on emotion and gender detection in 'smart billboard' advertising, and this seminal article from Lisa Feldman Barrett et al. which calls out the lack of scientific basis for inferring emotion from facial expression: Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements

, and we have seen increasing pushback against technosolutionist attitudes that seek to turn social issues into technical problems to be solved by ML systems.

Although we can't provide a fully comprehensive analysis of why ML systems fail at different tasks, we've hopefully managed to highlight some of the prominent ways in which these systems fail, and to dispel the idea that 'AI can solve any problem.' To help you delve further into these issues, we've put together a bibliography to provide some extra resources to explore the problems discussed here in more detail.

AI can solve any problem

What can ML do well?

The limits of machine learning

Bibliography & Resources

AI & data

The limitations of machine learning

Dubious and pseudoscientific claims in AI

Racist pseudoscience and AI

Emotion detection

Gender detection: