Selection criteria

1. Introduction

Development of new media like Internet and World Wide Web confronts the libraries with a challenge to identify, catalogue and preserve the Internet resources. Similarly to the traditional resources, the National and University Library in Zagreb is responsible for collection, description, storage and providing access to web resources as an integral part of national cultural heritage. (For more detail see Legal deposit copy)
Web resources differ from the traditional publications in many features: frequent change of location, content and size, short and unpredictable life cycle on the Internet, and so on. With the aim to save these resources for the posterity the National and University Library in Zagreb, in cooperation with the University Computing Centre (Srce) created in 2003 a system for archiving Internet contents – Croatian Web Archive

1.1. Types of contents on the Internet

In addition to traditional publications like books, journals, newspapers and articles, new types of contents appear on the web, e.g. official websites of public administration bodies, web pages (web sites) of companies, enterprises, organisations, trusts, associations, individuals, events, clubs, scientific and research projects and meetings, as well as portals, databases, e-newspapers, e-zines, forums, chats, online conferences, mailing lists, electronic mail, newsletters, video and audio clips, exhibitions, interactive maps, search engines, software, computer games, web art, blogs wikis, e-learning, web shops, online communities. The Library selects some of these contents for storage in the Croatian Web Archive, according to the criteria listed below.

1.2 Definitions

Archiving. A process of cataloguing, harvesting and providing access to web resources.

Harvester. (gatherer, crawler) A robot used for harvesting web resources.

Integrating resource. A bibliographic resource that is added to or changed by means of updates that are integrated into the whole and do not remain discrete, e.g. updating web sites.

Legal deposit copy. A legal provision by which publishers and producers have to deliver a fixed number of copies of each publication to a library or a similar institution.

Publisher The person or corporate body with the financial and/or administrative responsibility for the release of a publication. Everybody who publishes contents on the web is considered to be a publisher.

Web archive. A system enabling long-term storage, protection and access to electronic resources published on the Internet.

Web publication (web resource, online publication). Electronic document made available to the public via Internet.

Web site (web page, Internet page). Location on the World Wide Web; a set of web pages identified by a unique URL that make up a whole.

2. General and specific selection criteria for cataloguing and archiving of web resources

2. 1. General criteria

Same general criteria are applied to printed and web resources:

  1. Works by Croatian authors published in Croatia and abroad
  2. Works about Croatia and Croatians, regardless of place of publication and authorship
  3. Works on Croatia
  4. Works published in Croatia, i.e.:
    1. Place of publication on the resource indicates that it is created in Croatia
    2. Publisher's residence in Croatia
    3. Author's residence in Croatia

2.2. Specific criteria

  1. Content
    Refers to the presentation of data such as title and publisher/issuing body responsible for content and creation of publication, design and arrangement of menus and data, regularity of updating. (For more detail see Recommendations for the presentation of web resources).
  2. Publication structure
    Refers to the presentation of data such as title and publisher/issuing body responsible for content and creation of publication, design and arrangement of menus and data, regularity of updating.
  3. Publisher
    All content published on the Internet is considered to be published, and everybody who publishes is a publisher. These can be publishers in the traditional sense of the word, as well as authors of personal web pages. Reputation and reliability of the publisher are important selection criteria.
  4. Domain
    Resources that are originally published on the .hr domain are primarily selected for the web archive. Resources on other domains (.com, .net, .info, .org) may be selected if they meet other selection criteria.
  5. Format
    The data format in which the resource was published, for example text processor (Word), portable document format (PDF), hypertext mark-up language (HTML). The Library collects, stores and provides access to publications in their original format. If the resource is published in several different formats, the Library selects the format that can be stored in the original form in order to keep the integrity and authenticity of the resource (appearance, design, search mode) and secure the readability of data. Standard formats have priority.
  6. Uniqueness
    Priority is given to the publications that exist on the Internet only. Web publications, e. g. books and journals, are frequently copies of originals published on paper or on portable digital media (CD, DVD). Online version of such publications is archived selectively.

3. Selection of publications for Croatian Web Archive

Includedare journals, books, articles, web sites of institutions, associations, clubs, scientific and research projects, e-zines, e-newspapers, portals, selected personal pages, personal, collective and thematic blogs as sources of information on contemporary culture and economic, social and political trends, blogs with significant influence in the public life the authors of which write under their real name, selected forums that conform to the criteria given above.

Excluded are search engines, games, advertising pages, pages of companies and businesses, preliminary versions of publications, mailing lists, chat, resources distributed exclusively by e-mail, resources on the intranet, most personal pages, blogs and forums, pages that contain links to texts from other resources.
Digitised resources in digital collections of other institutions and other web archives are not archived.

Archiving frequency

Archiving frequency is determined by the Library according to the importance of the publication for broader community, significance of content and technical changes, e.g. new design of web pages and the actual updating frequency, e.g. new periodical issues on the Internet.
Newspapers published on paper that are published on the Internet as integrating resources are archived occasionally.
Not every change of content of every publication is archived.