Founded in the third century BC, the Alexandria Library was the largest museum and library in ancient times. It was the center of knowledge and scholarship. So where does this role play in the Internet age? It is an Internet archive. The Internet Archive is the Alexandria Library of the Internet Age.
Information on the Internet Precisely, digitized information is not easy to store, unlike physical media, and can disappear immediately. Web sites and so on. The early days of the Internet can be seen when there are a lot of places that are now gone. The Internet Archive is a nonprofit organization founded in 1996 by Brewster Kahle to store these disappearing Web sites.
Brewster Kale studied computer at MIT and devised WAIS, a textual information retrieval system. He sold it for $ 15 million in 1995 and set up his assets. In 1996 he personally started Internet backup. The project, called the Internet Archive, is likened to the Alexandria Library, which once had the largest number of books in the world at one time. The Internet Archive has set the goal of making all knowledge accessible anywhere. Bruce Kayle personally collected over 10 billion web pages for six years.
Of course there was an incident. In 2007, the US Federal Bureau of Investigation (FBI) sent a letter asking users to submit usernames, addresses, Web site usage records, However, the Internet Archive filed a complaint alleging that it was a library recognized by the state of California, and as a result, the FBI withdrew its charges and agreed to open partial documents.
The library suing the government is unusual, but the relationship between the FBI and the Internet archive has not deteriorated since the incident. Instead, the Internet Archive provides services such as Web cloning and book scanning to the US Library of Congress, the US National Library, and the WAO system is used by the Patent Office.
In retrospect, the Internet Archive uses this wide area information servers (WAIS) to store information about web pages from past archives, as well as millions of e-books, TV shows, movies, music, documents , And software. For example, in 1996, the Yahoo site allows you to find out what it is like or to know something like a valuable recording file.
The Internet Archive headquarters is located in San Francisco, California, in 2009, Presidio. The building itself was built in 1923 and there is a chapel on the second floor. Of course, there is a difference from the church in that there is a server rack on the wall. The price per server is $ 60,000, and a computer with 10 units is equipped with a 368TB drive.
The room is lined with computer servers in the building and the server has a history of Internet for 22 years. There are data collected by Internet archives such as billions of pages and images. There are as many as 1.8 billion web pages on the Internet, but the amount doubles every two to five years. The average web page is about 100 days long, and after a few thousand pages have been posted, it’s been forgotten even after five minutes. The Internet Archive has a mission to store such a vast number of Web pages.
By 2018, the Internet Archive has recorded 338 billion Web page records. The amount of data held by the Internet archives reaches 40PB. In October of last year, the Internet Archive quadrupled in a few years, considering that it had reached 10PB. Of the 40 BPs, 63% can be retrieved using WAIS.
If you say 40PB, it may be too big. For example, the human race on Earth has invented letters and is slightly less than all the modern characters. The combined text of the US Library, the largest library in the United States, is said to be 28 TB. It is less than 0.1% of the amount of data that the Internet archives hold.
The Internet Archive collects a copy of 7,000 Internet Web pages each week. It saves the web page state at a certain frequency and accumulates the contents of the web page in the archive at a specific time. For example, the content of the CNN web page can be retrieved by WAIS with 187,000 snapshots in 18 years. A new web page with 500 million new pages each week is stored in the Internet archive. Twenty million Wikipedia tweets, twenty million tweets, and hundreds of news articles are stored every week.
All of these tremendous tasks within the Internet Archive are made non-profit. Operating costs such as technology development, software development servers, and systems that run bots all depend on donations. The Internet Archive not only collects and stores data but also tries to solve ethical issues related to Internet history.
Of course, the Internet is growing at a rate of 70TB per second, even if it is calculated simply. No matter how big a server you have, it’s impossible to cover everything in the Internet archive. In addition, personal data such as email or cloud data is not an archive of Internet archives. In this regard, the Internet archive side judges which web page to back up by considering the priority. This includes contributions. In addition, the level of storage of a particular website is determined by the number of accesses, and these criteria include YouTube, Wikipedia, Reddit, and Twitter. It also targets governments, NGOs and news-related sites around the world. The Internet Archive is backed by this principle in cooperation with 600 experts and partners.
Web page retrieval service through WAIS can be an important tool in the era when the news is likely to spread. This is because if the correct information is stored in the Internet archive or vice versa it can help determine whether it is false. In fact, the Internet Archives announced the move to President Trump’s election in November 2016, and announced plans to install a copy of the data collected by the Internet Archive after his election in a country where the US government is not in power.
The ancient Alexandria Library was built with the goal of collecting literature from all over the world. The goal of the Internet Archive is the second Alexandria Library. Obviously, it will not be able to store all the data of the world, but it will be the largest new age Alexandria library. Internet archives can be found here .