Introduction

Synapse NSRL is a Storm service for locally indexing data from the National Software Reference Library (NSRL) Reference Data Set (RDS). This service adds additional Storm commands for querying and ingesting the NSRL data into a Cortex. Hash nodes created by the service are additionally tagged with #rep.nsrl.

Synapse NSRL supports indexing the full Modern RDS as well as the Android RDS and iOS RDS. Synapse NSRL also supports indexing the SHA1-to-SHA256 hash file that is provided outside of the RDS ISO.

The source NSRL RDS data is available as ISO 9660 ISO image files, and therefore appropriate file system access is required in order to mount the images and make them available to the service for indexing. For details see Updating the Index.

For more information about Storm services, visit the Storm service documentation.

Indexing

The NSRL RDS data is provided as individual file-detail records, keyed with a SHA1 and optionally MD5 hash. The SHA1 is used as the primary index, and if the MD5 is available in the record a secondary index is created to the SHA1 hash.

If the SHA1-to-SHA256 hash file is provided, an additional secondary index is created for the SHA256 to the SHA1 hash.

Although the entire detailed record is stored, currently only the filename is returned in the query response along with the hashset.

Note

Starting in v2.3.0 the service will track offsets of records added to the service. This enables getting storage statistics via the nsrl.stats command.

If updating from a prior version, and the service contains data, a one-time load will automatically be run in the background at startup. New update requests will wait for this to finish before proceeding. For example, a 125GB service data directory with 276M records will take approximate 2 hours to finish.

Configuration

Synapse NSRL implements the Synapse Cell class and as such can be configured much like other Cell implementations. Details on the general configuration options can be found in the Cell configuration section of the Synapse documentation.

The Synapse NSRL service provides two additional configuration option beyond the general Cell configuration:

  • axon: Telepath URL to the Axon for storing files used to add records.

  • mntpnt: Mount point for RDS ISO images used when calling updateRdsByIso.

Since Synapse NSRL is locally indexing the data, the service should be deployed and configured with sufficient storage, and monitored as indexing progresses. For example, updating the index with one set of RDS files will require approximately 100GB. The additional storage required when updating an existing index with a new set of RDS files will vary depending on the amount of new data included by NSRL, since records will be deconflicted with those that already exist in the index.

When deploying as a Docker container, the service requires access to the ISO file mount points (and optionally the hashmap location). For example, these paths could be mapped in the docker-compose.yml file as:

---
version: '3'

services:

    nsrl00:

        image: vertexproject/synapse-nsrl:v2.1.0

        volumes:
            # Map in the host mount point
            - /path/to/mnt:/vertex/mnt
            # Map in the host hashmap file location
            - /path/to/hashmap:/vertex/hashmap
            # Map in a persistent storage directory
            - /path/to/storage:/vertex/storage

        environment:
            # Specify log level (indexing progress logs are level=INFO)
            - SYN_LOG_LEVEL=INFO
            # Specify the mount point
            - SYN_NSRL_MNTPNT=/vertex/mnt
            # Set a default password for the root user
            - SYN_NSRL_AUTH_PASSWD=secretsauce

        ports:
            # Default https port
            - "4443:4443"
            # Default telepath port
            - "27492:27492"
...

Once the service has been deployed, add a service user and add the service to the Cortex, as described in the Storm service documentation.

Updating the Index

After the service is deployed the Telepath APIs can be used to update the index.

  1. Download the RDS ISO images from Current RDS Hash Sets.

    • The supported images are:

      • Modern RDS (microcomputer applications from 2000 to present)

      • Android RDS

      • iOS RDS

  2. Create mount points and mount each of the images, for example, from a terminal:

    mkdir /path/to/mnt/{RDS_modern,RDS_android,RDS_ios}
    mount -o loop "/path/to/RDS_modern.iso" "/path/to/mnt/RDS_modern"
    mount -o loop "/path/to/RDS_android.iso" "/path/to/mnt/RDS_android"
    mount -o loop "/path/to/RDS_ios.iso" "/path/to/mnt/RDS_ios"
    
  3. Download the SHA1-to-SHA256 hash map from Non-RDS Hash Sets.

  4. Open a shell in the service container and execute the update commands:

    $ docker exec -it nsrl00 /bin/bash
    # python
    >>> import synapse.telepath as s_telepath
    >>> prx = s_telepath.openurl('cell:///vertex/storage')
    >>> # Example shown for a Docker container configured as shown in the Configuration section
    >>> prx.updateHashmapByFp("vertex/hashmap/rds241-sha256.zip")
    {...<results>...}
    >>> prx.updateRdsByIso("RDS_modern")
    {...<results>...}
    >>> prx.updateRdsByIso("RDS_android")
    {...<results>...}
    >>> prx.updateRdsByIso("RDS_ios")
    {...<results>...}
    

If a Docker container is not used for deployment, or it is preferred to execute the updates remotely, a terminal with a Python environment that has Synapse installed can be used. In this case, the connection string would be:

>>> prx = s_telepath.openurl('tcp://<svcuser>:<passwd>@<svcip>:<svcport>')

Note

Updating the index from the ISO files can take many hours and so a reliable connection to the service is required.

Parsing errors on individual record lines will be logged and skipped.

Docker Containers

The Synapse NSRL service is available as a Docker container from Docker Hub. The repository can be found at:

Note

There are tagged images available on Docker Hub which correspond to software releases seen in the changelog. The docker tag master is the latest development release. A generic major version tag is available, representing the latest release on a given major version. For example, the v2.x.x tag represents the most current release for the v2.x.x release line. You can utilize specific tagged versions, or a major version specifier, depending on your chosen deployment strategy.