A Yahoo Research tool mines news archives for meaning--illuminating past, present, and even future events.
"The slick visualization allows users to discover unexpected relationships between entities at particular points in time--for example, between Slobodan Milosevic and Saddam Hussein," says Tunkelang. Refining a search for the term "Yugoslavia" with the two leaders reveals how, at first, Hussein appears as a point of comparison in coverage of the Serbian leader, but later the two leaders were directly involved, with stories reporting arms deals between them.
Although Time Explorer currently only works with old news, it could also be used to explore new coverage, and to put it in context, says Matthews. "It would be tough to update in real time, but it could certainly be done daily, and I think that would be useful for sure."
He says the service would be best deployed as a tool that works off of the topics in a breaking story. A person reading a news report about, say, Medicaid would find it useful to see the history of coverage on the topic, as well as the predictions made about its future, says Matthews. "It's like a related-articles feature, but focused in the future." He and colleagues are working on adding more up-to-date news sources, as well as content from blogs and other sites to Time Explorer's scope.
The Times has digitized and made searchable its content going back to 1851, yet today's search technologies and interfaces are not up to the task of making such large collections explorable, says Evan Sandhaus, a member of the New York Times Research and Development Labs who oversaw the release of the article archive in late 2008.
"We can say, 'show me all the articles about Barack Obama,' but we don't have a database that can tell us when he was born, or how many books he wrote," says Sandhaus, who adds that tools developed to process the meaning of news articles could have wider uses. "That resource will not only help the research community move the needle for our company but for any company with a large-scale data-management problem."
With most organizations harboring millions of text documents, from e-mails to reports, smarter tools to handle them would likely be popular, Matthews says. "In theory, the underlying algorithms should work on anything, perhaps with a little tweaking."