Become a member


September 28, 2023

ASA response to use of Australian books to train AI

After months of concern and speculation, it was horrifying to have confirmed yesterday that many Australian authors’ books are included in a dataset of books used to train artificial intelligence without permission from the authors. It confirms what we have suspected: that books from pirate sites have been used to train AI.

AI tools have been developed without transparency. And this opaqueness has made it impossible to know which – if any – Australian books have been ingested by AI developers. However, since The Atlantic published a search tool which allows authors to search for their books in one of the datasets that has been used to train generative-AI systems, we’ve heard from many dismayed authors who had no idea their works had been used without permission.

While authors are reeling from large-scale unauthorised use of their work, generative AI products have already been released onto the market and monetised. Authors appropriately feel outraged. The fact is that this technology relies upon books, journals, essays, and scripts written by authors, as well as images and artwork created by artists, yet permission was not sought nor compensation granted.

The dataset made searchable yesterday is known as the “Books3” corpus and we understand it contains 183,000 books, downloaded from pirate sources. Due to the lack of transparency from tech companies regarding their datasets, it is unclear whether the Books3 dataset has been used by OpenAI, against which the Authors Guild has filed a class action for copyright infringement. OpenAI has referred in a public paper to two “internet-based books corpora” called Books1 and Books2. There are now author class action suits pending against OpenAI, Meta, and Google.

ASA CEO Olivia Lanchester says, “This issue is one of basic fairness. The developers of AI chose to help themselves to authors’ property in order to build powerful, revolutionary software. They didn’t ask permission. The inescapable message to authors and artists is that while your work has been essential in developing our product, we’re not prepared to pay you for it. Tech companies will charge the end user of their products but will not pay for the labour that enabled it. It’s a supply chain that stops short of the primary producer. Like paying the supermarket for your fruit and vegetables without any of that revenue going back to the farmers who grew the produce.” 

“I know the argument will be made that AI services are so valuable to the public that any means are justified. But turning a blind eye to the legitimate rights of copyright owners threatens to diminish already-precarious creative careers. Where does that approach get us? The enrichment of a few powerful companies at the cost of thousands of individual creators. This is not how a fair market functions. Writers and artists are real people who bring us joy, give our lives meaning and deserve dignity and fair payment for their very real work. 

“To be clear, we are not anti-tech and we support emerging technologies but feel there has been a missed opportunity to develop artificial intelligence ethically; with transparency, permission and payment, unlocking new opportunities for our creative industries. Instead, authors and artists are being locked out of the AI boom. It’s not too late to turn this around and move to appropriate licensing.” 

The ASA will write to AI companies to express our serious concerns and to request action, and continue advocating on behalf of authors to the Federal Government.

Read here for the Authors Guild’s advice on the steps you can take if you’ve found your books have been included in the Books3 dataset. 

We stand with the US Authors Guild, author organisations around the world and the thousands of individual authors protesting this industrial-scale appropriation of their works.

We will be using ongoing member feedback to inform our advocacy on this issue. If you would like to share any feedback or experiences please contact Lucy Hayward, Marketing & Communications Manager: [email protected].