The Cultural Heritage Language Technologies Consortium (17 May 2005)
The primary impact of our work in a traditional field has been in the creation of a citations database of lexicographic 'slips' for a new intermediate level Greek-English lexicon that is currently being written at Cambridge University. The database is the main repository of source material being used to create this new dictionary; it provides a key-word-in-context display for every occurrence of each word in the corpus along with English translations from the Perseus Corpus where possible. These passages are presented in chronological order and are accompanied by an author-by-author frequency summary. Links are also provided to the Online Edition of the Liddell, Scott, Jones Greek English Lexicon (LSJ) that integrates statistical information about the word, including comparative frequency data, word collocation information, and an automatically extracted list of words with similar definitions. One of the primary problems faced by lexicographers is information overload; the word grapho (to write), for example, appears more than 2,000 times in the corpus, and the creation of a definition that accounts for all of its senses and nuances can be a daunting task. While we have done some experiments in automatic categorization, we have also addressed this problem by reintegrating previously existing lexicographic resources such as earlier dictionaries. When constructing the citation database, our program mines citations from the existing LSJ, flags these passages, and presents them apart from the other citations in the order in which they appear in the Lexicon.
