A Large-Scale Study of the Evolution of Web Pages (31 Mar 2004)
We found that web pages that change usually either change only in their markup or in trivial ways. Moreover, we found that there is a strong relationship between the top-level domain and the frequency of change of a document, whereas the relationship between top-level domain and degree of change is much weaker.
To our great surprise, we found that document size is another strong predictor of both frequency and degree of change. Moreover, one might expect that any change to a small document would be a significant one, by virtue of small documents having fewer words, so that any word change affects a significant fraction of the shingles. Contrary to that intuition, we found that large documents change more often and more extensively than smaller ones.
Article URL: http://research.microsoft.com/aboutmsr/labs/siliconvalley/pubs/p97-fetterly/p97-fetterly.html
Read 105 more articles from Microsoft sorted by
Next Article: Learning to love mass murder