- Word doc, docx, and rtf formats
- HTML coding
- Tab-delimited files
- Comma-delimited files
- Ebook formats
- Standalone indexing tools
- Embedded indexing tools (InDesign, Word)
- Tagging tools (a form of embedding with placeholder tags)
We can work with you determine what formats you need, and help figure out when in the process you need indexing created.
Which software package and technique is used depends on variables such as budget, eventual re-usability of the source material, translation needs, time constraints, media used to publish the material, file sizes and transfer issues, and individual preferences. All of these terms can sound confusing, so here's an overview of what software can accomplish in indexing.
Standalone indexing tools, usually used for back-of-the-book indexes, allow indexers to work from page-numbered galleys in PDF form. The indexing is completely separate from the published material. The index is supplied to the publisher as a Word file, which is then inserted into the book as a last chapter. Nothing is interactive. Wright Information uses CINDEX for these tasks, which allows formatting of print-ready Word files and the generation of a database of entries for any translation purposes.
Embedded indexing tools allow indexing codes to be embedded in the electronic text of a book or file, and allow the index's locators to be updated as the text changes.
When edits or additions or deletions are made in a chapter, the text rolls from page to page in the files. An index entry embedded near the word "The" on page 4 (in the above example) will wind up on page 5 (in the following example) if a lot of new text is added before it on page 3, forcing "The" onto page 5. Because the codes flow with the text, the index can be regenerated at any time to match new text arrangements. The codes know that the index entry near "The" is now on page 5.
When inserting codes, or "embedding" codes, indexers must work in the same files as the publishers.
InDesign embedded codes:
InDesign's codes are invisible until you open up the Index palette or the Index Marker dialog box. Word's codes can have their visibility switched on or off. (For more about InDesign-specific ebook indexing, click HERE.)
Word's embedded codes:
Each embedded indexing tool has its own characteristics, coding mechanisms, and foibles. Some can import and export text with indexing intact; some cannot. We can recommend processes that work with each program.
A sample of index entries for a tagged text system:
A sample of text tagged in preparation for a tagged indexing system:
Ebook indexing tools: Indexes in ebooks are startlingly primitive at this point in ebook development. To answer this need, the American Society for Indexing's Digital Trends Task Force has focused on educating and creating standards for indexes in eBooks. The vision is that search in eBooks can integrate with indexing, and that the indexing can inform the search, making it better and more productive. We feel the user should still be able to browse the index when needed, but a dead chapter in the back of an ebook does no one any good.
With the PDF format, active indexes are easily generated if your indexing is embedded into the files in a tool like InDesign. Once the entries are embedded, you can output a PDF and the index links will be active. Word files are not quite so easily converted, but using a tool like Sonar Activate, you can convert a Word-generated index output to PDF quite easily.
With the other formats, such as epub or mobi, your layout tool may not output the needed index entries to make the index active once it goes into an ebook format. Older versions of InDesign, for instance, do not output index entries when generating the ePpb format. Current versions InDesign. Check our page on InDesign.
We are happy to work with you to find a way to activate the index in your ebook.
Web indexing software aids in building HTML web indexes. Wright Information uses a variety of proprietary tools as needed by the client to build metadata sets, Web-based indexes, and compiled scripted web indexes. More and more web sites are including an A-Z index to help users find information. These indexes will not link to every document on a web site, but rather to portal pages. By portal pages, we mean a page that is the main location for a governmental department, or the lead-in page for a body of knowledge. By linking to portals, the web index does not need to track ever-changing documents, and can survive the updating that is necessary for web site information.
Other software tools:
Taxonomy, thesaurus, and controlled-vocabulary tools aid in building controlled languages and sets of keywords for metadata and web sites. These are tools to help indexers and taggers choose specific language for describing and tagging a document. Taxonomies and thesaurus tools also help visualize the relationships between broader terms, narrower terms and related terms.
Folksonomies, tag clouds, and tagging tools vary in nearly every application and web site in which they are used. A folksonomy is a list of labels or tags that users generate. They can be a label for a picture, a title for a file, a category for a blog posting. When several people combine their tags, the results can be displayed as a folksonomy. Usually these labels are displayed in a tag cloud, allowing a reader to easily see which term has the most information, and what other words are being used.
Folksonomies work best when administrators can merge similar words and edit the vocabulary. Folksonomies are the most powerful for personal use when you can retain your own tags that have meaning for you. The ideal software solution would allow both merging terms at a broad level as well as allowing users to keep their own data sets as well. Drop-down or automatic-fill boxes for tags can suggest already-approved tags to help with consistency. At some point, all folksonomies become chaotic, and control and cleanup should be done to make them usable again.
Keywording is used primarily in online materials, abstracts and other areas. It can be hard-coded jumps, similar to HTML jumps, or it can be inserted as embedded coding and compiled into a list by the software.
Automated indexing software builds a concordance, or a word list, from processed files. Although the manufacturers often claim these packages build indexes, the actual results are a list of words and phrases, sometimes useful in the beginning stages of building and index. Usability tests of these packages have shown that the word lists omit many key ideas and phrases, and cannot fine-tune terminology for easy retrieval, or build the needed hierarchies of ideas that professional indexing can. Free-text search, also produced automatically by software, is useful in some environments, but tests have shown the retrieval is much higher with a human-generated index. Wright Information owns software that will generate concordances, but doesn't use it for a finished index.
Abstracting and citation-control software aids in building abstracts with associated keywords.
AI or artificial intelligence tools are not yet capable of creating an index, but can create text that looks like an index, with page numbers and references. We don't know when AI will actually be able to index intelligently. Always check an AI-created file against the source material and specific edition of the book it has claimed to have indexed for you for correct page numbers and for hallucinations. An example we created from Darwin's Origin of Species included an entry for Carl Sagan, which was not in the book's text.
Contact Wright Information at jancw@wrightinformation.com.