What happens if ChatGPT is the future of search? | Complexical

Would you bother writing a blog post if it wasn’t going to be read by humans?

Since ChatGPT emerged into the limelight, I’m seeing a lot of people talking about whether this kind of technology is going to replace copywriters — a topic which I’ll doubtless come back to another day.

But I’m also seeing a largely unchallenged assumption that it’s going to replace traditional web searches, and possibly quite soon.

After all, Microsoft has already integrated ChatGPT into a version of Bing, and we know Google is also looking into similar technology. It makes sense for them, and it sounds good for users. As a user, why would you want to scroll through a list of web pages looking for the answer to your question, if an AI system can absorb the whole of the internet and then just tell you?

But I’ve yet to see a discussion of what this particular application might mean for the copywriting and content marketing ecosystem.

Of course, copywriting for marketing purposes has existed since well before the internet, in the form of ‘advertorial’ content in magazines and newspapers, but I don’t think it’s an exaggeration to credit Google, and the dawn of effective search algorithms, with the recent explosion in the importance of content marketing. Google search results reward quality, informative, novel content. This encourages people to create more content — because good content ranks highly and generates traffic.

Suddenly, copywriting went from being a fairly niche skill, to something every small business owner needed to either hire in or master themselves.

The current search ecosystem is a symbiotic relationship, where both parties benefit. Google rewards good content, ‘paying’ with traffic. And in return, people are incentivised to write quality web pages, feeding Google’s algorithms with more and more pages to ingest, and improving the quality of search results. Aside from a few shady practices, most of search engine optimisation (SEO) has been a battle to gain higher ranking by increasing quality.

Search based on ChatGPT, though, doesn’t link back to its sources. In its current format, it can’t. Responses are generated purely as strings of statistically-likely words. These models don’t keep a record of where they learned that those words go well together, so although common phrases are more likely to be retained and regurgitated as ‘facts’, the model will never be able to say exactly why.

(Aside: the only approach I’ve seen so far for a version of chat-based search that cites its sources begins by generating a response in the usual (statistical) way and then searches the web for an existing page that says basically the same thing. In human psychology this is called confirmation bias, and is generally acknowledged to be a bad idea.)

This kind of chat-based search fundamentally breaks the symbiotic ecosystem of search and content.

[Image shows (left) two arrows forming a loop between Google search and content, and (right) a one-way arrow from content to chat-based search. Caption reads “comparison of current search and chat-based search.”]

Under chat-based search, instead of great content leading to higher search rankings and more visitors, great content gets swallowed up (along with the mediocre and terrible content) by the AI training monster, to be later regurgitated in the form of a statistical model that uses your work and doesn’t give you any credit for it.

This has a few potential implications.

By offering up answers without linking to sources, chat-based search reduces the incentive for people to write good content in the first place. It undermines the purpose of content marketing, which aims to attract visitors by means of authoritative content, and could reduce the size of the market for copywriting. (Would I be writing this, if I didn’t hope that people might come and read it?)

Writers and content owners may decide they don’t want their text to be used as training data for a system that doesn’t link back or give credit. We’re already seeing lawsuits around similar issues in generative imagery. Methods (like ‘robots.txt’ files) already exist to tell search engines which pages they should or shouldn’t index, and I can imagine something similar being devised to flag which content can legally be used for AI training. Paywalls and passwords could also be used to restrict content to human readers.

And yet, without the explosion in content which has been incentivised by traditional search, there wouldn’t be the huge volume of published language which has been critical to the success of these new language models. If people aren’t writing so much good quality content, or if people find ways to block these tools from using their words as training, the training for the next generation of these models won’t have as much new data to learn from. Couple this with the increasing use of ChatGPT to write content, and it’s possible that future generations of this technology will be trained at least in part on their own earlier work. ChatGPT already carries a warning that it has “limited knowledge” of events after 2021.

In order for chat-based search to become and remain successful, ongoing access to quality, updated content will need to be negotiated. It seems unlikely this will happen without some kind of payment, either in links and traffic, or in cold hard cash.

Dr. Rachel Dugdale

Rachel is a computer scientist. All posts written without AI, except insofar as it is used for experiments.

Tagged with: artificial intelligence, chatgpt, sociotechnical impacts, technology