Indexing FAQ

What is an index?

A good subject index:

is a structured analysis of a text, presented in alphabetical order at the back of a book.
reflects a book's content accurately, consistently, and concisely.
uses bespoke terminology for different readers, thereby allowing them to find relevant information even if they’re unfamiliar with a discipline's jargon.
groups related discussions and provides multiple access points to those discussions.
includes only substantive discussions in a text.
excludes subjects only mentioned in passing, for which no additional information is found.
is the result of thousands of decisions by a trained professional who's primary job is to facilitate book–reader interaction.

A usable subject index is not:

the result of an automated process.
a list of all words in a book. (A list of all words in a book, in alphabetical order, is called a concordance. A concordance is rarely useful.)
a probably distribution of word co-occurrence (see “Can AI index my book?” below)

Indexes are important tools for readers and researchers, and including a usable index speaks to the professionalism of a book as a whole.

Books without good indexes frustrate readers. They do not stay on reference shelves for long, no matter how well written.

And they are unlikely to be cited by other writers or researchers, because their content is not easily accessible.

What does an indexer do?

An indexer:

reads a book closely and analyzes every piece of content for importance, then shapes that analysis into an ordered structure, all while keeping the needs of different readers in mind.
typically needs 2–3 weeks to create a single index.
uses software for computer-friendly tasks such as alphabetization and tracking page numbers. However, the majority of indexing work is intellectual and intensive.
often has an advanced degree in library science and/or professional proofreading or copyediting experience.

Can AI index my book?

In short, no. Large language models (LLMs) can do some impressive things. However, they do not analyze text and then organize access points so readers might find what they're looking for.

LLMs induce the statistical probability that words will appear in a given context. And that is the limit of their capacity, no matter how big their datasets.

Because LLMs succeed in pulling out some words from the text, what they output might at first glance resemble an index. However, this resemblance falls apart if you try to use the index as a tool.

For instance:

LLMs underindex, routinely omitting 60–80% of indexable content.
LLMs hallucinate terms not in the book. They also hallucinate page numbers.
LLMs often provide comically long (read: not helpful) lists of undifferentiated page numbers.
The memory capacity of LLMs is limited to one chapter at a time, so they cannot make connections between chapters or consistently index overarching topics.
If you upload your whole book at once to an LLM, it will tend to skip large chunks (i.e., all those pesky chapters in the middle).

BUYER BEWARE: Marketing materials for LLMs that claim to create good indexes are misleading and do not reflect their actual indexing capacity.

For more technical information about using LLMs for indexing, please see ASI's 2025 white paper by Elizabeth Bartmess and Michele R. Combs here.

***

I do not use large language model (LLMs) to generate indexes, nor do I upload or share client documents with LLMs at any point during the indexing process.

***