Toward a Metadata Generation Framework (09 Nov 2004)
From the outset of the project, we worried that our metadata was not clean enough to do name authority work. For example, we did not have names in separate metadata elements. As mentioned earlier, we needed to extract them from a free form statement of responsibility. Additionally, many of the fields, such as publication date and publication location, contained malformed data. Fixing these problems before completing the ANAC work was not practical given the size of the collection and our resource constraints. In spite of these problems, ANAC performed reasonably well.
Also from the beginning, it was never anticipated that ANAC would entirely replace the human effort, but it could be a valuable complement. ANAC is fast enough to be used while the cataloger works on other parts of the record. For example we could easily create a graphical interface that, given a name and Levy record, displays the best matches along with ANAC's level of confidence. Alternately, ANAC might be run ahead of time to annotate the Levy metadata with possible matches. A follow-up study to examine this type of integration would be a valuable point of comparison.
Using a smaller training set created earlier in the project (not the 2,000 record evaluation set), we noticed several instances of names that were classified as not having an LC record. Upon further examination, however, the best match seemed quite likely to be correct. As a follow-up effort to characterize human cataloging error, we could identify names to which ANAC assigns a high probability of matching a given LC record, but that were assigned to the "without an LC record" class. A cataloger could then investigate these records more carefully.
Article URL: http://www.dlib.org/dlib/november04/choudhury/11choudhury.html
Read 63 more articles from D-Lib Magazine sorted by
Next Article: Acting on User Research