Wednesday, December 25, 2019

The Internet Archive Preserves The Live Web - 1461 Words

The Internet archive preserves the live web by saving snapshots of the websites made with a specific date which can be browsed or searched for various reasons. Its object is to save the whole web without being in favor of a specific lan- guage, domain or geographical location. The importance of archiving made it important to check its coverage. In this paper, we try to determine how well Arabic websites are archived and indexed, and if the number of archived and indexed websites is affected by by country code top level domain, geographic location, creation date and depth. We also crawled for Arabic hyperlinks and checked its archiving and indexing. We sampled 15092 unique URIs from three different Arabic website directories; DMOZ†¦show more content†¦Third, we found that websites with Arabic country code top level domain or Arabic geographical Internet protocol location were archived and indexed more than websites that don’t. Fourth, we found that only 34% of the URIs with depth more than one were indexed. 1. INTRODUCTION Arabic Language is the fourth largest language on the Internet, it comes after English, Chines and Spanish. Arabic Internet users have grown rapidly, in the year of 2009 17% of Arabic speaker use the Internet and at 2013 almost 36% of Arabic speakers use the Internet. Now, almost 135.6 million Arabic users are on the Internet [1]. As the Web grows quickly and at the same time pages are disappearing the need for archiving is becoming more important [3]. Web pages need to be preserved for the future cul- tural data mining. Archiving the Internet is becoming very important and several numbers of institutions have created archives to preserve websites. However, to our knowledge we could not find any Arabic institute that specializes in collecting and preserving Arabic web pages. This growth in users of Arabic web pages requires some statistical overview of Arabic websites that exist in the live web, how far do they exist, how much of it is archived and how well are they archived. To answer those

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.