Skip to content

Harvesting CSW services

This harvester will connect to a remote CSW server and retrieve metadata records that match the query parameters specified.

Adding a CSW harvester

To create a CSW harvester go to Admin console > Harvesting and select Harvest from > CSW:

Provide the following information:

  • Identification

    • Node name and logo: A unique name for the harvester and, optionally, a logo to assign to the harvester.
    • Group: Group which owns the harvested records. Only the catalog administrator or users with the profile UserAdmin of this group can manage the harvester.
    • User: User who owns the harvested records.
  • Schedule: Scheduling options to execute the harvester. If disabled, the harvester must be run manually from the harvester page. If enabled, a scheduling expression using cron syntax should be configured (See examples).

  • Configure connection to OGC CSW 2.0.2

    • Service URL: The URL of the capabilities document of the CSW server to be harvested. eg. http://geonetwork-site.com/srv/eng/csw?service=CSW&request=GetCapabilities&version=2.0.2. This document is used to discover the location of the services to call to query and retrieve metadata.
    • Remote authentication: If checked, provide credentials for basic HTTP authentication on the CSW server.
    • API Key authentication:
      Optionally, provide an API Key for authentication and specify the header name.
      • API Key value: The API key or token string as provided by the CSW service.
      • API Key header name: The HTTP header to use (default is Authorization).
        If both Basic Auth and API Key are configured, both will be sent in requests as headers. This supports servers that require or accept multiple authentication schemes.
    • Search filter: (Optional) Define the search criteria to restrict the records to harvest.
    • Search options:
      • Sort by: Define sort option to retrieve the results. Sorting by 'identifier:A' means by UUID with alphabetical order. Any CSW queryables can be used in combination with A or D for setting the ordering.
      • Output Schema: The metadata standard to request the metadata records from the CSW server.
      • Distributed search: Enables the distributed search in remote server (if the remote server supports it). When this option is enabled, the remote catalog cascades the search to the Federated CSW servers that it has configured.
  • Configure response processing for CSW

    • Action on UUID collision: When a harvester finds the same UUID on a record collected by another method (another harvester, importer, dashboard editor,...), should this record be skipped (default), overridden, or generate a new UUID?
    • Validate records before import: Defines the criteria to reject metadata that is invalid according to XML structure (XSD) and validation rules (schematron).
      • Accept all metadata without validation.
      • Accept metadata that are XSD valid.
      • Accept metadata that are XSD and schematron valid.
    • Check for duplicate resources based on the resource identifier: If checked, ignores metadata with a resource identifier (gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:identifier/*/gmd:code/gco:CharacterString) that is assigned to another metadata record in the catalog. It only applies to records in ISO19139 or ISO profiles.
    • XPath filter: (Optional) When record is retrived from remote server, check an XPath expression to accept or discard the record.
    • XSL transformation to apply: (Optional) The referenced XSL transform will be applied to each metadata record before it is added to GeoNetwork.
    • Batch edits: (Optional) Allows to update harvested records, using XPATH syntax. It can be used to add, replace or delete element.
    • Category: (Optional) A GeoNetwork category to assign to each metadata record.
  • Privileges - Assign privileges to harvested metadata.