During domain and thematic crawls our crawler is assigned the IP address 188.8.131.52 and it identifies itself as Mozilla/5.0 (compatible; heritrix/3.4.x; +http://haw.nsk.hr/faq). During selective archiving the crawler is assigned the IP address 184.108.40.206 and it identifies itself as Mozilla/5.0 (compatible; SrceDAMP/4.2.2; +http://haw.nsk.hr/faq). The crawler has been set to ignore the robots.txt protocol.
Please set your website to allow our crawler access to your website content.
Please complete the Registration Form; archiving and terms of access are agreed on a case to case basis with each publisher individually.
The library determines the frequency of archiving, subject to priority status of the resource for a wider user community, and the importance of change in the content and technical details of the resource. Very large databases (over 500MB) are archived less frequently.
A caption reading archived copy is placed on top of each archived copy. The archived copy is posted on the Internet address starting with http://haw.nsk.hr/arhiva.
Search engines indexes the Croatian Web Archive homepage, not archived copies of publications.
Using the Registration form, online publishers/authors of web resources notify the NSK of the existence of a resource on the web. By doing so, the publisher meets the requirement to deposit a legal copy of the resource, and the resource can subsequently be processed and archived.
For assignment of the ISSN, ISBN, or ISMN identifiers, it is necessary to send a request to the relevant offices in Croatia.
Title, author or editor, name and seat of the publisher, and date of publishing on the web should be indicated on the cover, or in the impressum.
Prior to cataloging and archiving in HAW, the resource must already be published on the Internet. Web resources are not assigned CIP – a full catalogue record is made immediately.
A system for collecting and archiving legal deposit copies of web resources. The HAW was established in the September 2004.
Crawl quality is directly determined by the technology used when creating a web homepage.
A collection of selected web resources pertaining to institutions, associations, clubs, events, scientific projects, bodies of public administration, e-newspapers, portals, journals, books, articles.
Content posted online prior to 2004, pages with exclusively commercial content, web pages of companies and businesses, pages under construction, resources distributed exclusively via e-mail, personal pages, computer games, resources or parts of resources not possible to archive due to method of their creation.