On September 25, an article published by The Atlantic, titled “These 183,000 Books Are Fueling the Biggest Fight in Publishing and Tech,” caught my eye. Toward the end of the summer, Alex Reisner, the writer of that piece, had published a series of articles that revealed that a data set of books, called Books3, was among those being used by companies to train their generative artificial intelligence systems. In order to sound like humans, these systems need to be fed text written by humans.
The authors of the books included in this data set received no compensation and no royalties for the use of copyrighted works. As Reisner writes: “These authors spent years thinking, researching, imagining, and writing, and had no idea that their books were being used to train machines that could one day replace them. Meanwhile, the people building and training these machines stand to profit enormously.” Several writers in the United States have launched lawsuits claiming that this amounts to copyright infringement.
In Reisner’s latest article, he included a search tool: any author can check whether or not their books are being used without their consent. Seeing that a book I wrote was included in the data set felt not only like a violation of rights but also a plundering, both personal and existential. Reisner sums it up perfectly in the first instalment of the series: “The future promised by AI is written with stolen words.”
Hundreds of titles by Canadian authors, including Alice Munro, Austin Clarke, Leonard Cohen, and Miriam Toews, were listed in Books3. We reached out to some of them to ask how they felt about one or several of their books being included.—Harley Rustad, author of Big Lonely Doug and Lost in the Valley of Death
“I was frankly shocked to see a book of mine as part of this data set. It feels invasive. That said, I am already used to this violation—having found out my work is being photocopied for course packages at major universities in Canada to ‘train’ students. Like the AI corporations, these universities do not remunerate writers, nor do they pay into copyright-protecting organizations like Access Copyright that collect and disseminate funds to authors for use of their work. The key difference here is that at universities, my work is ostensibly being used to teach writing, critical thinking, and communication skills to humans, while at the corporations that train AI, it is being used to create software designed to eliminate writing jobs, mimic and replace human interaction, and create greater profits for corporations. A student may find something to like in my work and buy future books. An AI will not.”—George Murray, author of several books of poetry, including Whiteout
“The whole point of novels, plays, poems, and essays is to increase our humanity, not reduce it. That comes from one human being reaching out to another. It’s both a creative and a social act. Introducing AI short-circuits that, because AI-generated writing is neither social nor humanly created. So it’s not a question of whether AI generates good or bad writing. After all, there’s plenty of human-crafted writing out there that’s terrible. But it’s still human, it’s still one person’s attempt to say something about the human condition. I’d rather a bad human poem than a robot’s sterling ode.”—Yann Martel, author of several books, including Life of Pi and The High Mountains of Portugal
“Discovering that pirated versions of my novel All the Broken Things were stolen to provide content in the development of generative-AI systems makes me sick to my stomach. The outrage of my intellectual property being pillaged without my consent—for the purpose of, ultimately, Meta et al.’s capital gain—is infuriating and deeply upsetting. Canadian writers, publishers, and readers need to stand up against this egregious and unfair usage.”—Kathryn Kuitenbrouwer, author of several books, including All the Broken Things and Wait Softly Brother
“I think books and their authors are inseparable. And the beauty of the relationship between the writer and their work and their readers is what makes novels part of the history of humanity. To steal an author’s voice, which they created out of the ecstatic and punishing moments of their life, is akin to stealing their soul. It’s monstrous.”—Heather O’Neill, author of several books, including When We Lost Our Heads and The Lonely Hearts Hotel
“Big tech, again and again, shows itself to be an industry that moves with entitlement and lack of care. While what we’re calling AI is developing into a useful tool, companies that build their programs out of stolen parts can’t be trusted to share the technology in a way that respects the rules and rights of people. The violation of rights goes far beyond writers. Tech companies have been mining us for years with varying degrees of consent (location tracking, facial recognition, and numerous other incidents of using others’ work to train generative AI)—and I hope writers will use this opportunity to join the many ongoing fights to ensure that use of these developing technologies is equitable, both in their creation and their execution.”—Michael Melgaard, author of Not That Kind of Place and Pallbearing
“When I first learned that my books appear in the Books3 data set, I had two strong reactions. The first: I made it! The second: Money? But these are lizard-brain writer reactions (which I do not disavow). In more reflective terms: when I was working on my second novel, Beggar’s Feast, Michael Meyler’s Dictionary of Sri Lankan English was an important sourcebook. Years later, looking something up in the online version of the dictionary, I came across a new citation of the word “talipot,” which the compilers cited as occurring in Beggar’s Feast. I drew on a source and in turn contributed to the source for others to use, as part of a living dynamism that makes possible the growth, evolution, and spread of stories and language. By comparison, the AI system version feels like dead-letter one-way voracity.”—Randy Boyagoda, author of several books, including Governor of the Northern Province and Beggar’s Feast
“As Cormac McCarthy said, ‘Books are made out of books.’ It’s true—ideas feed each other—and the system of exchange works when it involves consent, credit, and compensation. Technology is always changing, but those things stay the same. It’s disheartening to see them ignored in this case.”—Claire Cameron, author of The Last Neanderthal and The Bear
“I asked ChatGPT how I should feel about it: ‘Ultimately, how Pasha Malla feels about being included in the “Books3” dataset would depend on various factors, including the terms of inclusion, the reputation and purpose of the dataset, and how it aligns with his professional and personal objectives as an author.’”—Pasha Malla, author of several books, including The Withdrawal Method and People Park
“Learning that my books are on an AI data set made me think about how much I value fiction as a meeting with another consciousness. The idea of a bot-generated novel—it’s monstrous. A novel without intentionality or insight or novelty—that unsettles me more than any breach of my copyright. The degrading of our culture with more crap content.”—Joan Thomas, author of several books, including Wild Hope and Five Wives
“The question of what to do about AI art—of how to understand that provocation and how to find solidarity with other artists—is the subject of my recent novel, Do You Remember Being Born? But, for me, the question posed by Books3, as well as other similar training sets (most of which are closed, i.e., secret), isn’t about artificial intelligence at all. It’s about the covert manipulation of my work, and my peers’, for other people’s profit. How do we throw sand in the gears of this new form of exploitation? And, if we can’t, what steps can we take to protect each other from the avarice of those who would grind us up into feather meal? Writing isn’t threatened by large language models—and some artists will find ways to use them in astonishing work. The real adversary is the same old cast of robbers; we may just need to get more creative to oppose them.”—Sean Michaels, author of Us Conductors and Do You Remember Being Born?
“To begin with, I felt an echoey sense of dread. Imagine, all that work—all that meaning—shattered into atoms of code. Don’t we already ‘sound like people’? Must we really innovate ourselves to smithereens? Remember the holodeck? I always pitied those poor star-trekking bastards, kidding themselves they were making love in a meadow while they hurtled through the loneliest reaches of space. The dread ebbed away—in its place, a blessing that caught me off guard. I felt grateful. For writers and for readers. For all that remains steadfastly anti-artificial in my life: real love, real meadows, real art.”—Alissa York, author of Effigy and Far Cry
“I was disappointed but not surprised to learn that one of my books was found in this data set. Sadly, it seems that nowadays all artistic content is vulnerable to exploitation in the digital realm. I think it’s a good reminder for Indigenous writers to be careful with our people’s stories, especially those that have survived and still persevere in the oral tradition. I believe many of our cultural truths and sagas should remain only in the spoken realm so they’re not abused by outside forces beyond our control. We should definitely keep writing our experiences, but we need to be particularly mindful going forward about what we publish because of how our stories could be manipulated.”—Waubgeshig Rice, author of several books, including Moon of the Crusted Snow and Moon of the Turning Leaves
“Finding my books on the Books3 data set was disappointing and disorienting: writing is how I’ve made my life, artistically, and—this is important—practically too. I’ve been watching the quick turn to AI with growing concern for over a year—all the ways it could threaten our literacy as a culture but also the ways it threatens my personal way of making a living. Books and writing are how I pay my mortgage, my children’s tuition, my grocery bill. To see my work so cavalierly stolen and used, without my consent, by corporations eager only to increase their own profits, is frankly terrifying.”—Elisabeth de Mariaffi, author of The Retreat and The Devil You Know
“I am honoured to discover that Eunoia appears within the data set of Books3—a list of 183,000 books, all among ‘the Elect,’ now gone to Heaven and used to train the minds of our futurist machines (which, like any of our children, do not need our permission to become literate).”—Christian Bök, whose books include Eunoia and The Xenotext: Book 1
“The thing I’ve never seen anyone write or talk about is that it’s not only writers and authors involved here. It’s everyone. If AI gets access to your email, then it’s got you forever. It will know how you actually speak, how you actually write, who you like, what you really think of people, your food and porn and travel preferences. Everything. This can actually be a comfort, because people in your life can speak with you long after you’re dead. So it’s a trade-off.”—Douglas Coupland, author of several books, including Girlfriend in a Coma and JPod
“To me, this particular case isn’t about technology—it’s about capitalism and labour. A lot of writers I know enjoy playing with AI, and I’ve played with it myself. There are many exciting possibilities for the intersections of art and technology, and it’s not like writers are sitting around using quill pens dipped in ink, refusing to recognize the rapidly changing conditions of our times. But the data set here was gathered without informed consent. The fact that our information infrastructure is clustered in the hands of a few companies, largely without transparency and based on short-term profit making, should concern not just writers but also readers, citizens. We turn online all the time to make sense of the world, and we’re pretty trusting, overall, about what we find there and the mechanisms by which it came to be available. I’m not sure any of us think about that as much as we should.”—Alix Ohlin, author of several books, including Dual Citizens and The Missing Person
“As unsettling, alienating, and violating as this feels, we benefit from, and now expect, this kind of high-speed agglomerating, sorting, and rearranging technology every day. That AI is conceived and enabled by brilliant, ambitious, but immature men bodes ill for our civilization. That it can put us all out of a job is a real possibility, though this disemployment will likely occur in ways far more subtle than the infinite mimicry promised by LLaMA. It’s hard to know how to feel about something so difficult to meaningfully imagine, but I recognize that it is inherently uncontrollable. In this way, AI is the logical cousin to twenty-first-century climate disruption: while we may have unleashed it, no one is ready for it, and our efforts to rein it in will manifest as perpetual catch-up: lunging to grab its tail before it disappears around the next corner.”—John Vaillant, author of several books, including Fire Weather and The Golden Spruce
“AI induces a sense of helplessness in me. I feel angry about the ravages of climate change and the burgeoning of fake news. ExxonMobil has earned my fury. Donald Trump has earned my fury. But AI, by its very facelessness, defeats anger. It’s hard to feel anything about an algorithm except a numb dismay. Then I remind myself: AI systems are thieves. I realize the systems don’t care about the quality of my prose and the depth (or shallowness) of my insights. But an impersonal motive doesn’t excuse the theft of language. AI cannot strip writers of imagination. What it can do is scrape away hope.”—Mark Abley, author of several books, including Spoken Here and The Prodigal Tongue
“I recall a quote by Trotsky that the goal of communism was to free people to become poets. I realize that the communist project went badly. In the ideal version of AI reality, what is being promised writers and readers in exchange for the lives we lead now? At the level of pure vanity, I am flattered that my books are included in the data set that seems designed to make me redundant. On an aesthetic level, I recoil at the terms Pile and Books3. Big Tech companies are so rich. I know the rich also steal. If they had paid the licence fees, the people who oppose what they are doing would still, fundamentally, oppose it. The theft just makes it gross. Is the prospect of this technology worse than the technology we already have? Will my children, who prefer TikTok to books, be more likely to read books by AI? Is this not perhaps a battle in a greater war that we are already losing?”—David Bezmozgis, author of several books, including Free World and Natasha
“A book is a consciousness scored on the page. It is the result of years of living and thinking and experimenting and dreaming. And now, for me and so many other authors, it is just another neuron firing in the techno-capitalist mind, stolen from creators for the profit of a select few. They are soul eaters. As authors, we are solitary by nature and vocation, but this might prove our collective downfall in the era of AI. Why aren’t we striking alongside screenwriters and actors, given the mutual threat to our livelihood, our reason for being? Where is our Fran Drescher to lead the charge?”—Kate Harris, author of Lands of Lost Borders