1. Introduction
Development of new media like Internet and World Wide Web confronts the libraries with a challenge to identify, catalogue and preserve the Internet resources. Similarly to the traditional resources, the National and University Library in Zagreb is responsible for collection, description, storage and providing access to web resources as an integral part of national cultural heritage. (For more detail see Legal deposit copy)
Web resources differ from the traditional publications in many features: frequent change of location, content and size, short and unpredictable life cycle on the Internet, and so on. With the aim to save these resources for the posterity the National and University Library in Zagreb, in cooperation with the University Computing Centre (Srce) created in 2003 a system for archiving Internet contents – Croatian Web Archive
1.1. Types of contents on the Internet
In addition to traditional publications like books, journals, newspapers and articles, new types of contents appear on the web, e.g. official websites of public administration bodies, web pages (web sites) of companies, enterprises, organisations, trusts, associations, individuals, events, clubs, scientific and research projects and meetings, as well as portals, databases, e-newspapers, e-zines, forums, chats, online conferences, mailing lists, electronic mail, newsletters, video and audio clips, exhibitions, interactive maps, search engines, software, computer games, web art, blogs wikis, e-learning, web shops, online communities. The Library selects some of these contents for storage in the Croatian Web Archive, according to the criteria listed below.
1.2 Definitions
Archiving. A process of cataloguing, harvesting and providing access to web resources.
Harvester. (gatherer, crawler) A robot used for harvesting web resources.
Integrating resource. A bibliographic resource that is added to or changed by means of updates that are integrated into the whole and do not remain discrete, e.g. updating web sites.
Legal deposit copy. A legal provision by which publishers and producers have to deliver a fixed number of copies of each publication to a library or a similar institution.
Publisher The person or corporate body with the financial and/or administrative responsibility for the release of a publication. Everybody who publishes contents on the web is considered to be a publisher.
Web archive. A system enabling long-term storage, protection and access to electronic resources published on the Internet.
Web publication (web resource, online publication). Electronic document made available to the public via Internet.
Web site (web page, Internet page). Location on the World Wide Web; a set of web pages identified by a unique URL that make up a whole.
2. General and specific selection criteria for cataloguing and archiving of web resources
2. 1. General criteria
Same general criteria are applied to printed and web resources:
2.2. Specific criteria
3. Selection of publications for Croatian Web Archive
Includedare journals, books, articles, web sites of institutions, associations, clubs, scientific and research projects, e-zines, e-newspapers, portals, selected personal pages, personal, collective and thematic blogs as sources of information on contemporary culture and economic, social and political trends, blogs with significant influence in the public life the authors of which write under their real name, selected forums that conform to the criteria given above.
Excluded are search engines, games, advertising pages, pages of companies and businesses, preliminary versions of publications, mailing lists, chat, resources distributed exclusively by e-mail, resources on the intranet, most personal pages, blogs and forums, pages that contain links to texts from other resources.
Digitised resources in digital collections of other institutions and other web archives are not archived.
Archiving frequency
Archiving frequency is determined by the Library according to the importance of the publication for broader community, significance of content and technical changes, e.g. new design of web pages and the actual updating frequency, e.g. new periodical issues on the Internet.
Newspapers published on paper that are published on the Internet as integrating resources are archived occasionally.
Not every change of content of every publication is archived.