Usability Views Article Details
 home | timeline | recent | popular | e-reports | userati | books | about 

From Babel to Knowledge - Data Mining Large Digital Collections (20 Mar 2006)
For the past three years I have been experimenting with how to provide such end-user tools that is, tools that harness the power of vast electronic collections while hiding much of their complicated technical plumbing. In particular, I have made extensive use of the application programming interfaces (APIs) the leading search engines provide for programmers to query their databases directly (from server to server without using their web interfaces). In addition, I have explored how one might extract information from large digital collections, from the well-curated lexicographic database WordNet to the democratic (and poorly curated) online reference work Wikipedia. While processing these digital corpuses is currently an imperfect science, even now useful tools can be created by combining various collections and methods for searching and analyzing them. And more importantly, these nascent services suggest a future in which information can be gleaned from, and sense can be made out of, even imperfect digital libraries of enormous scale. A brief examination of two approaches to data mining large digital collections hints at this future, while also providing some lessons about how to get there.
Add this article to Del.icio.us

Article URL: http://www.dlib.org/dlib/march06/cohen/03cohen.html

Read 108 more articles from D-Lib Magazine sorted by date, popularity, or title.
Next Article: Growing a Business Website: Fix the Basics First
  RSS 0.91  Subscribe with Bloglines  Add to My Yahoo!  Preview in Google Reader
Books about Usability | Information Architecture | Information Visualisation | Technology History

Some of the people who make up the Userati group
This site is a labour of love built by Chris McEvoy




Popular Items
> Top Knitting Sites
> Ajax Sucks
> What if Jakob Nielsen had a blog?
> Kate Bush and Pi
> Simply Google