Usability Views Article Details
 home | timeline | recent | popular | e-reports | userati | books | about 

TREC: Improving Information Access through Evaluation (01 Nov 2005)
One objection to test collections that dates back to the Cranfield tests is the use of relevance judgments as the basis for evaluation. Relevance is known to be very idiosyncratic, and critics question how an evaluation methodology can be based on such an unstable foundation. An experiment using the TREC-4 and TREC-6 retrieval results investigated the effect of changing relevance assessors on system comparisons. The experiment demonstrated that the absolute scores for evaluation measures did change when different relevance assessors were used, but the relative scores between runs did not change. That is, if system A evaluated as better than system B using one set of judgments, then system A almost always evaluated as better than system B using a second set of judgments (the exception was in the case where the two runs evaluated as so similar to one another that they should be deemed equivalent). The stable comparisons result held for different evaluation measures and for different kinds of assessors and was independent of whether a judgment was based on a single assessorís opinion or was the consensus opinion of a majority of assessors.
Article URL:

Read 128 more articles from American Society for Information Science sorted by date, popularity, or title.
Next Article: Why People Don't Read Online and What to do About It
  RSS 0.91  Subscribe with Bloglines  Add to My Yahoo!  Preview in Google Reader
Books about Usability | Information Architecture | Information Visualisation | Technology History

Some of the people who make up the Userati group
This site is a labour of love built by Chris McEvoy

Amazon Honor SystemClick Here to PayLearn More