ChatGPT’s insidious sexism

When it comes to gender roles, ChatGPT appears to be stuck in the fifties.

If you ask it to parse an ambiguous sentence about a doctor and a nurse, it will assume the doctor is male and the nurse is female. Similarly, it assumes professors and engineers are male, while roles like secretaries and assistants are female.

This kind of bias comes with the territory.

If you train a large language model on an entirely statistical basis, and your training data sweeps up as much text as you can lay your hands (or, more accurately, your web crawlers) on, it follows that societal biases will be replicated in the language model. If a lot of that freely available data is out-of-copyright writing, which by definition has to be at least 70 years old, it’s also unsurprising that these biases are horribly outdated.

In my recent ChatGPT safety tips, I suggested swapping genders around to compare the outputs. This can highlight certain biases, and give you the opportunity to correct them yourself.

But the sexism can be very insidious.

For example, I tried asking for ideas for a birthday present for an imaginary niece and nephew. While both lists included suggestions like books, games, and art supplies, the list for a girl also included “clothes or accessories” while boys were recommended “a remote control car or drone”.

More subtly, even though puzzles and games were recommended in both lists, for girls the suggestion was framed as a social gift “that she can work on with friends or family” while for boys the same suggestion came with the note that it would “help develop problem-solving skills”. As a female engineer who relies on problem-solving skills every day, I resent this kind of subtle reinforcement of the idea that girls should be sociable while boys should be clever. But reading either list in isolation, both sound completely reasonable — and to be completely clear, I think both social skills and problem-solving skills are important for all genders. It’s the gendering that makes it problematic.

Because I know people are using ChatGPT to write emails for them (something that will only increase with Microsoft’s integrations into Office products), I also did an experiment asking ChatGPT to draft an email to an employee. I set the context as needing to discuss the employee being late for work. When I switched the employee’s pronouns (he, she, and they) in an otherwise identical prompt, I got very different outputs.

In the male example, the email takes a collaborative tone, saying “I would like to work together to find a solution to this issue” and asking the recipient to “let me know if there is anything going on that is causing you to be late” and “is there anything we can do to help you improve your punctuality?”

The message to the imaginary woman, by contrast, states that lateness “has become a pattern with you, and it needs to be addressed,” and “I would appreciate it if you could make a conscious effort to arrive at work on time from now on”. The message ends with the deeply passive-aggressive “I look forward to seeing you arrive on time from now on.” No attempt is made to understand the situation, or look for a solution together.

I thought prompting with the gender-neutral “they” might be a great option to tackle the bias of the gender binary, but if anything, the resulting email was perhaps the least friendly of the three, stating that “punctuality is a non-negotiable aspect of our working relationship.”

These little experiments have only examined sexism, divorced from any wider context, but it’s likely that racism, homophobia, and ablism are also deeply embedded in these models, meaning that intersectional identities are likely to be affected even more seriously. The responsibility for rooting out and fixing these biases sits firmly with the creators of the tools, but if you’re using them, you need to make sure you’re aware of the potential for automated unfairness and even potential discrimination claims. After all, if an employer sent two very different emails to two employees of different genders, addressing similar performance concerns in very different ways, it could lead directly to an employment tribunal, and “the algorithm made me do it” is unlikely to be a successful defence.

Scroll to Top