Google maps and millions of books

It was only a matter of time: Google has added overview maps for full-view books in Google Book Search.

Even though Google is not the first organization to employ geoparsing technologies and autogenerated maps in the interface to a digital library, they certainly are the biggest media darling to do so. Consequently — and because of the prominent role Google plays in web search, earth visualization and on-going mass digitization efforts — the average person is likely to be introduced to this class of information interaction via Google’s new feature.

But what good is it? Will it get better? Why should humanists care?

I’m contemplating a series of posts offering some idiosyncratic answers to these questions … but first off, let’s just focus on what it does …

What does it do?

According to David Petrou’s post on the Inside Google Book Search blog, they’ve

begun to animate the static information found in books by organizing a sample of locations from them on an interactive Google Map, with snippets of text from the book, and links to the actual pages where the locations are mentioned

I’ve cavalierly swiped his explanatory screenshot, which shows such a “sample of locations” from the 1888 Illustrated New York: The Metropolis of To-Day:

Screenshot from Google About: Illustrated New York

One finds this feature on the About this Book page associated with individual works. Note that not every book gets a map overview. Petrou writes:

When our automatic techniques determine that there are a good number of quality locations from a book to show you, you’ll find a map

What doesn’t it do?

Off the top of my head, a few interesting possibilities:

  • Comprehensive map of all places in the work
  • Page-by-page or passage-by-passage maps to provide context and explanation
    • See comment on Perseus, above
    • Pull a copy of the Landmark Thucydides off a shelf somewhere to see a good example in print
  • Spatial browse and search to discover works of potential interest
    • The Alexandria Digital Library Geospatial Network Search is just one example of this for a spatial-oriented collection
    • Linda Hill argued in 2004 that such an interface to all kinds of collections is needed (see: Linda L. Hill, “Georeferencing in Digital Libraries,” D-Lib Magazine 10:5 (May 2004): html).
    • This feature is not unrelated to a question raised yesterday by Joe Francica: When Can We Issue Semantic Spatial Queries Using Web Search Engines?, though in this case the availability of data is not exactly the issue … if you think you’ve successfully geoparsed the document, then you can let me run a semantic spatial query if you want to. The question of the accuracy of your geoparsing is another issue.

How does it work?

I don’t know. There may be pigeons involved. But I’m willing to make some general comments on methods anyway …

Before you can map something, you’ve got to know where it is. When presented with a plain text, humans apply their existing knowledge of the world and of the text’s subject matter to understand the proper nouns they encounter in the work. Some of these are personal names, some geographical, some are titles of organizations or objects. If we’re successful in identifying the proper nouns in a work (making a list of them, separate from all the other text), and then in disambiguating the placenames from the other types of proper nouns, then we can turn to an atlas or other geographic work to determine the corresponding locations.

We may discover, at this stage, that we have further disambiguation work to do. Placenames are not unique identifiers. A single string of characters, say, “Rome” can represent the name of more than one place (the capital of Italy or one of its many namesakes). We have to apply more contextual knowledge, or even do research, to handle this sort of disambiguation task. That’s how I know, when reading a book on Roman history, that “Carthage” means an ancient Phoenician city in the vicinity of modern Tunis in North Africa, not a small town in Illinois.

Geoparsing software attempts to instruct a computer in performing these functions automatically. The placename links and automatic maps in Perseus are dependent upon geoparsing software and an associated gazetteer that works over the texts in advance (see: David A. Smith and Gregory Crane, “Disambiguating geographic names in a historical digital library” in Proceedings of ECDL, Darmstadt, 4-9 September 2001, pp. 127-136: pdf). For more of the gory details from the Metacarta perspective, you might have a look at: Joe Francica, “MetaCarta, Inc. – Geographical Text Searching”, Directions Magazine, 11 March 2004: html).

What’s next?

In my next post on this subject, I’ll try to tackle the questions:

  • What good is it?
  • What should humanists be doing about or with it?

About Tom Elliott

Associate Director for Digital Programs and Senior Research Scholar, Institute for the Study of the Ancient World, New York University
This entry was posted in General, Tools. Bookmark the permalink.

One Response to Google maps and millions of books

  1. Note that Gutenkarte has passage-by-passage mapping context too: open a book, and hit the browse link, and you can get to a page like the first chapter of History of the Pelopenisian War.

    There are at least 56 places in the world called Rome — MetaCarta Maps can show you that.

    One of the other things that MetaCarta Maps can show you is information about water shortages around the world, which I think is part of what the third part of your “What doesn’t it do” list. Although this is not Gutenkarte, the ability to combine free text search with geographic information is a useful task which MetaCarta does well.

    So, although Google may have the largest collection, it’s good to know that there are still improvements to be made, and that there are other tools that provide at least some of the way forward in that regard.

    (Disclaimer: MetaCarta is my employer, and I work in MetaCarta Labs, the origin of things like Gutenkarte, MetaCarta Maps, and the MetaCarta Web Services. So I’m obviously a bit biased.)

Leave a Reply

Your email address will not be published. Required fields are marked *