feed
The Synapse feed
tool is a way to ingest data exported from one Cortex into another Cortex. Users should be familiar with both the Synapse data model (Data Model Objects et al.) as well as Synapse concepts such as packed nodes in order to use and understand the feed
tool effectively.
Syntax
The feed
tool is executed from an operating system command shell. The command usage is as follows (line is wrapped for readability):
usage: synapse.tools.feed [-h] (--cortex CORTEX | --test) [--debug] [--format FORMAT] [--modules MODULES]
[--chunksize CHUNKSIZE] [--offset OFFSET] [files ...]
Where:
- -h
displays detailed help and these command line options
- CORTEX
specifies the telapth URL to the Cortex where the data should be ingested.
--test
means to perform the ingest against a temporary, local Cortex instead of a live cortex, for testing or validationWhen using a temporary Cortex, you do not need to provide a path.
--debug
specifies to drop into an interactive prompt to inspect the state of the Cortex post-ingest.FORMAT
specifies the format of the input files.Currently, only the value “syn.nodes” is supported. This is also the default value.
MODULES
specifies a path to a Synapse CoreModule class that will be loaded into the temporary Cortex.This option has no effect if the
--test
option is not specified
CHUNKSIZE
specifies how many lines or chunks of data to read at a time from the given files.Defaults to 1000 if not specified
OFFSET
specifies how many chunks of data to skip over (starting at the beginning)files
is a series of file paths containing data to load into the Cortex (or temporary Cortex)Every file must be either json-serialized data, msgpack-serialized data, yaml-serialized data, or a json lines file. The files do not have to all be of the same type.
Ingest Examples - Overview
The feed
tool
Ingest Example 1
This example demonstrates loading a set of nodes via the feed
tool with the “syn.nodes” format option. The nodes
are of a variety of types, and are encoded in a json lines (jsonl) format.
JSONL File:
The jsonl file (testnodes.jsonl
) contains a list of nodes in their packed form. Each line in the file corresponds
to a single node, with all of the properties, tags, and nodedata on the node encoded in a json friendly format.
[["it:reveng:function", "9710579930d831abd88acff1f2ecd04f"], {"iden": "508204ebc73709faa161ba8c111aec323f63a78a84495694f317feb067f41802", "tags": {"my": [null, null], "my.cool": [null, null], "my.cool.tag": [null, null]}, "props": {".created": 1625069466909, "description": "An example function"}, "tagprops": {}, "nodedata": {}, "path": {}}]
[["inet:ipv4", 386412289], {"iden": "d6270ca2dc592cd0e8edf8c73000f80b63df4bcd601c9a631d8c68666fdda5ae", "tags": {"my": [null, null], "my.cool": [null, null], "my.cool.tag": [null, null]}, "props": {".created": 1625069584577, "type": "unicast"}, "tagprops": {}, "nodedata": {}, "path": {}}]
[["inet:url", "https://synapse.docs.vertex.link/en/latest/synapse/userguide.html#userguide"], {"iden": "dba0a280fc1f8cf317dffa137df0e1761b6f94cacbf56523809d4f17d8263840", "tags": {"my": [null, null], "my.cool": [null, null], "my.cool.tag": [null, null]}, "props": {".created": 1625069758843, "proto": "https", "path": "/en/latest/synapse/userguide.html#userguide", "params": "", "fqdn": "synapse.docs.vertex.link", "port": 443, "base": "https://synapse.docs.vertex.link/en/latest/synapse/userguide.html#userguide"}, "tagprops": {}, "nodedata": {}, "path": {}}]
[["file:bytes", "sha256:ffd19426d3f020996c482255b92a547a2f63afcfc11b45a98fb3fb5be69dd75c"], {"iden": "137fd16d2caab221e7580be63c149f83a11dd11f10f078d9f582fedef9b57ad5", "tags": {"my": [null, null], "my.cool": [null, null], "my.cool.tag": [null, null]}, "props": {".created": 1625070470041, "sha256": "ffd19426d3f020996c482255b92a547a2f63afcfc11b45a98fb3fb5be69dd75c", "md5": "be1bb5ab2057d69fb6d0a9d0684168fe", "sha1": "57d13f1fa2322058dc80e5d6d768546b47238fcd", "size": 16}, "tagprops": {}, "nodedata": {}, "path": {}}]
Verifying the Data:
Typically, users will want to double check the data they have before loading it into a production Cortex. The feed
tool allows us to perform an ingest our of nodes file against an empty, ephemeral Cortex, so that we can check what
nodes get created before adding them to a production Cortex. To load testnodes.jsonl
into an ephemeral Cortex and
drop into a prompt to explore the ingested nodes, run:
python -m synapse.tools.feed --test --debug testnodes.jsonl
Assuming the command completed with no errors, we should now have a cmdr
prompt connected to our test Cortex:
cli>
From which we can issue Storm commands to interact with and validate the nodes that were just ingested. For example:
cli> storm #my.cool.tag
it:reveng:function=9710579930d831abd88acff1f2ecd04f
.created = 2021/06/30 19:46:31.810
:description = An example function
#my.cool.tag
inet:ipv4=23.8.47.1
.created = 2021/06/30 19:46:31.810
:type = unicast
#my.cool.tag
inet:url=https://synapse.docs.vertex.link/en/latest/synapse/userguide.html#userguide
.created = 2021/06/30 19:46:31.810
:base = https://synapse.docs.vertex.link/en/latest/synapse/userguide.html#userguide
:fqdn = synapse.docs.vertex.link
:params =
:path = /en/latest/synapse/userguide.html#userguide
:port = 443
:proto = https
#my.cool.tag
file:bytes=sha256:ffd19426d3f020996c482255b92a547a2f63afcfc11b45a98fb3fb5be69dd75c
.created = 2021/06/30 19:46:31.810
:md5 = be1bb5ab2057d69fb6d0a9d0684168fe
:sha1 = 57d13f1fa2322058dc80e5d6d768546b47238fcd
:sha256 = ffd19426d3f020996c482255b92a547a2f63afcfc11b45a98fb3fb5be69dd75c
:size = 16
#my.cool.tag
complete. 4 nodes in 16 ms (250/sec).
Loading the Data:
Once we’ve inspected and verified the data is acceptable for loading, we can point the feed
tool to the Cortex we
want to load the nodes into, and the same nodes should be added.
python -m synapse.tools.feed --cortex "aha://cortex..." testnodes.jsonl
However, once we’ve inspected the data, let’s say that the it:reveng:function and inet:ipv4 nodes are not allowed in
the production Cortex, but the inet:url and file:bytes are. We can skip these two nodes by using a combination of
the chunksize
and offset
parameters:
python -m synapse.tools.feed --cortex "aha://cortex..." testnodes.jsonl --chunksize 1 --offset 1
With the chunksize
parameter signifying that the feed
tool should read two lines at a time from the file and
process those before reading the next line, and the offset
parameter meaning the feed
tool should skip all
lines before and including line 1 (so lines 1 and 0) when attempting to add nodes, and only add nodes once it’s read
in lines 2 and beyond.