Selection criteria

Development of new media like Internet and World Wide Web confronts the libraries with a challenge to identify, catalogue and preserve the Internet resources. Similarly to the traditional resources, the National and University Library in Zagreb is responsible for collection, description, storage and providing access to web resources as an integral part of national cultural heritage.
Web resources differ from the traditional publications in many features: frequent change of location, content and size, short and unpredictable life cycle on the Internet, and so on. With the aim to save these resources for the posterity the National and University Library in Zagreb, in cooperation with the University Computing Centre (Srce) created in 2004 a system for archiving Internet contents – Croatian Web Archive.

Definitions

ArchivingA process of cataloguing, crawling and providing access to web resources.

Crawler (harvester, gatherer). A robot used for crawling web resources.

Integrating resource. A bibliographic resource that is added to or changed by means of updates that are integrated into the whole and do not remain discrete, e.g. updating web sites.

Legal deposit copy. A legal provision by which publishers and producers have to deliver a fixed number of copies of each publication to a library or a similar institution.

Publisher. The person or corporate body with the financial and/or administrative responsibility for the release of a publication. Everybody who publishes contents on the web is considered to be a publisher.

Web archive. A system enabling long-term storage, protection and access to electronic resources published on the Internet.

Web publication (web resource, online publication). Electronic document made available to the public via Internet.

Web site (web page, Internet page). Location on the World Wide Web; a set of web pages identified by a unique URL that make up a whole.

General and specific selection criteria for cataloguing and archiving of web resources

General criteria

Same general criteria are applied to printed and web resources:

  1. Works by Croatian authors published in Croatia and abroad
  2. Works about Croatia and Croatians, regardless of place of publication and authorship
  3. Works on Croatia
  4. Works published in Croatia, i.e.:
    1. Place of publication on the resource indicates that it is created in Croatia
    2. Publisher’s residence in Croatia
    3. Author’s residence in Croatia

Specific criteria

  1. Content
    Refers to the presentation of data such as title and publisher/issuing body responsible for content and creation of publication, design and arrangement of menus and data, regularity of updating. (For more detail see Recommendations for the presentation of web resources).
  2. Publication structure
    Refers to the presentation of data such as title and publisher/issuing body responsible for content and creation of publication, design and arrangement of menus and data, regularity of updating.
  3. Publisher
    All content published on the Internet is considered to be published, and everybody who publishes is a publisher. These can be publishers in the traditional sense of the word, as well as authors of personal web pages.
  4. Domain
    Resources that are originally published on the .hr domain are primarily selected for the web archive. Resources on other domains (.com, .net, .info, .org) may be selected if they meet other selection criteria.
  5. Format
    The data format in which the resource was published, for example text processor (Word), portable document format (PDF), hypertext mark-up language (HTML). The Library collects, stores and provides access to publications in their original format. If the resource is published in several different formats, the Library selects the format that can be stored in the original form in order to keep the integrity and authenticity of the resource (appearance, design, search mode) and secure the readability of data. Standard formats have priority.

Selection of publications for Croatian Web Archive

Catalogued and archived are news portals, thematic portals, portals, web sites of institutions, associations, clubs, scientific and research projects, journals, books, selected personal pages, personal, collective and thematic blogs as sources of information on contemporary culture and economic, social and political trends, blogs with significant influence in the public life the authors of which write under their real name, selected forums that conform to the criteria given above.

Not catalogued are search engines, games, advertising pages, pages of companies and businesses, preliminary versions of publications, mailing lists, chat, resources distributed exclusively by e-mail, resources on the intranet, most personal pages, blogs and forums, pages that contain links to texts from other resources.
Digitised resources in digital collections of other institutions and other web archives are not catalogued.

National and University Library in Zagreb in collaboration with the University of Zagreb University Computing Centre (Srce) crawls the national web domain (.hr) once a year. In addition, NSK periodically crawls websites related to topics and events of national importance.