Skip to content

SFTP Server Harvesting

This harvester collects metadata as XML files from a SFTP server.

Adding a SFTP Server Harvester

To create a SFTP server harvester, navigate to Admin console > Harvesting and select Harvest from > SFTP server:

Provide the following information:

  • Identification

    • Node name and logo: Specify a unique name for the harvester and, optionally, assign a logo.
    • Group: Select the group that will own the harvested records. Only the catalog administrator or users with the UserAdmin profile for this group can manage the harvester.
    • User: Indicate the user who will own the harvested records.
  • Schedule: Configure scheduling options for executing the harvester. If disabled, the harvester must be run manually from the harvester page. If enabled, provide a scheduling expression using cron syntax.

  • SFTP Connection Configuration

    • SFTP host address: Enter the SFTP host name or IP address (omit the 'sftp://' prefix).
    • SFTP port: Specify the port to connect to (usually 22).
    • SFTP home directory: Enter the directory path on the SFTP server containing the metadata (XML files) to be harvested. Use '/' to start from the root directory of the connection.
    • Recurse subfolders: If checked, the harvester will recursively process metadata in all folders within the specified home directory, including subfolders. If unchecked, only items in the specified directory will be harvested.
    • Username: Provide the username for connecting to the SFTP server.
    • Use SSH key: If checked, a private/public key pair will be generated to connect to the SFTP server. Using an SSH key is a more secure authentication method than a password and is recommended.
      • SSH key type: Choose the algorithm for SSH key creation: RSA (4096 bits) or ECDSA.
    • Password: Enter the password for connecting to the SFTP server. This is only required if Use SSH key is unchecked.
  • Response Processing Configuration

    • Action on UUID collision: Specify what to do when a harvested record shares a UUID with a record collected by another method (e.g., another harvester, importer, dashboard editor). You can choose to skip the record (default) or update it.
    • Update catalog record only if file was updated: Only update the catalog record if the source file has changed.
    • Keep local even if deleted at source: If checked, metadata records that have already been harvested will be retained even if they are deleted from the specified directory on the SFTP server.
    • Validate records before import: Set the criteria for rejecting invalid metadata based on XML structure (XSD) and validation rules (schematron):
      • Accept all metadata without validation.
      • Accept metadata that are XSD valid.
      • Accept metadata that are both XSD and schematron valid.
    • XSL transformation to apply: (Optional) Reference an XSL transform to apply to each metadata record before it is added to GeoNetwork.
    • Batch edits: (Optional) Update harvested records using XPath syntax. You can add, replace, or delete elements.
    • Category: (Optional) Assign a GeoNetwork category to each metadata record.
  • Privileges: Assign privileges to the harvested metadata.