HarperCollins just changed the landscape for AI in publishing: Here’s what authors need to know

HarperCollins recently inked a deal with artificial intelligence (AI) technology companies to allow backlisted books (or, published works that have been out for more than one year) to be used for AI training purposes, with the author’s express permission. The details of the deal are still a bit unclear, but so far it appears that only select non-fiction titles are up for use, and the exact AI company in question has not been disclosed to the press.

Many people who work in publishing, including myself, were anticipating publishers to make a move regarding AI. There had been much controversy ever since the launch of ChatGPT and other AI models regarding the copyright and integrity of published books, articles, IPs, etc., and authors have been growing concerned and looking to their agents and publishers for answers.

There had also been talk of HarperCollins expressly asking for AI-related clauses in their publishing contracts and agents actively having conversations with Big Five publishers regarding AI usage and protection. But with this new licensing deal, HarperCollins appears to be the first major publishing house to take action on the matter of AI licensing in publishing.

So far, this has already raised eyebrows from authors and agents and rival publishing companies alike. I know many people like to view traditional publishing as one monolithic thing, but in reality, it’s an array of different companies who all have the same goal of making money and churning bestsellers, but often with many different approaches to doing so.

Not all of the publishing companies are on the same page with AI licensing, in fact certain reps from Simon & Schuster have already come out and said there have been no talks between them and AI tech companies. Penguin Random House had been discouraging other companies from using their titles for AI training, so it’s clear that there are differing opinions to how it should be used, and if it’s viable for the future of publishing. So, let’s dive into exactly what makes AI so controversial.

What’s the controversy with AI in publishing?

Whether we like it or not, AI is likely here to stay. So, we may as well start figuring out to coexist with it and put up parameters on how to most effectively use AI in a way that’s fair for everybody. AI has benefits too, of course, and AI is actually used by plenty of authors when writing books, considering how many manuscripts are edited by proofreading apps like Hemingway or Grammarly, which use AI technology to spot grammatical errors.

But the deep concern with AI mainly lies with its dependency on already licensed material, which is exactly why tech companies are seeking permission from publishers, production companies, institutions to use their materials for AI training.

OpenAI, the company behind ChatGPT, has already faced numerous lawsuits from notable authors, including Game of Thrones author George R. R. Martin and Amos Decker series author David Baldacci, claiming that their books have been unlawfully used for AI training.

There is also the concern of users inputting entire book excerpts into the prompts and asking AI to interpret or rewrite these books. And AI subsequently uses this content in its responses and thereby breaks copyright laws simply by doing what it was designed to do. Somehow, someway, unauthorized copies of books are finding their way to these LLMs, leading to dozens of plaintiffs filing claims against AI. And given the precedents set by Hachette v. Internet Archive this past year, I’d venture guess these outcomes will not bode well for these AI companies.

I haven’t even mentioned the matter of authors who wish to have AI-generated books made from their prompt responses published and be compensated for the work. This obviously brings in a plethora of questions regarding how much ‘authorship’ actually went into AI-generated content, what the royalty share looks like, and whether or not they should be recognized in the same light as authentic human-written literature.

The same debate is happening in art and music circles where AI artists are fighting to have their content licensed and recognized by the general public. I have my own opinions on the topic, but I figured I’d just add that all of these matters have been and will continue to be debated for as long as AI continues to rise in relevance over the coming years.

AI also just in general is a massive energy waster, stating that on average, a query on ChatGPT uses 10x as much energy as a Google search, and some saying that a year’s worth of ChatGPT queries uses enough energy to power nine houses. How did electrical companies even greenlight this?

All this said, AI is a relatively new landscape, and there aren’t set laws and regulations in place just yet that dictate how it all works. It’s a similar feeling to the late 90s/early 2000s when e-book and audiobook royalty rates were being set for the first time and how the publishing landscape was going digital.

I already made a couple blogs regarding ChatGPT usage and also its involvement in the Writers and Actors guild strikes back in 2023. I have a feeling this will not be the last I’ll have to talk about the lasting effects of AI on the arts and entertainment industries.

How will this affect authors?

Authors are incentivized with a $2,500 flat payment that will not be counted against their advance. Reports I’ve seen value the deal as a three-year license, and the rate is non-negotiable. Authors can either consent to it or not, but what is slightly alarming to me is the messaging used by HarperCollins in this instance. So far, authors have been contacted with third party/agent emails, like in the case of author Daniel Kibblesmith who shared screenshots on Bluesky that verifies the previous information, and the response has not been favorable.

Authors view this as AI attempting to replace them, and that publishing companies are selling out for AI with very little reward for the author. Some claim the deal infringes on the rights of authors.

Kibblesmith’s agent does mention that there are certain protections and limits to what AI can do with the book, including limiting word-for-word usage, but obviously we don’t know what the exact clauses entail yet. None of my clients have had this offered to them and I have yet to be approached by publishers on this subject.

So, while the initial responses may be slightly overblown, it’s a realistic concern in the long term that certain publishers are already all-hands-on-deck with the new AI revolution. I do worry that this may result in yet another strike caused by AI, this time from the Authors’ Guild, especially given the legal battles already in play.

What is AI going to do with these books?

All that’s known for certain is what HarperCollins representatives have reported in the deal, which is that AI will use these select non-fiction titles to “improve model quality and performance”. Now we can infer what this means in a couple ways, but most broadly, it’s so AI can have more accurate information in its model responses.

If you’ve used any version of ChatGPT – which I assume all of us have at this point – then you’ve probably noticed that it needs constant fact-checking and correction. ChatGPT works by essentially analyzing millions of articles, blogs, datasets, and other sources of information exclusively found on the public browsable index and gives a general summary of that information in its response.

To give more accurate responses, it needs more data, information, knowledge, and quite frankly there is no better source for this than allowing the model to access full-length books. But there are obvious copyright concerns, ethical concerns, and questions marks leading into how this new HarperCollins deal will turn out, and if will set dangerous precedents moving forward.

Leave a comment