Storm Reference - Automation

Background

Synapse is designed to facilitate large-scale analysis over disparate data sources with speed and efficiency. Many of the features that support this analysis are built into Synapse’s architecture, from performance-optimized indexing and storage to an extensible data model that allows you to reason over data in a structured manner.

Synapse also supports large-scale analysis through the use of automation. Synapse’s automation features are available through the Storm runtime and include:

All forms of automation in Synapse utilize the Storm query language. That is, regardless of the type of automation used, the result is the execution of a predefined Storm query. This means that anything that can be written in Storm can be automated, from the simple and straightforward to the highly complex. Actions performed via automation are limited only by imagination and Storm proficiency. Some automation is fairly simple (“if X occurs, do Y” or “once a week, update Z”). However, automation can take advantage of all available Storm features, including subqueries, variables, libraries, control flow logic, and so on.

Considerations

This section is not meant to be a detailed guide on implementing automation. However, a few areas are included here for consideration when utilizing automation in your environment.

Permissions.

The ability to create, modify, or delete the various types of automation depends on a user having the appropriate permissions to do so. If you assign permissions for these tasks in your environment, keep the following in mind:

  • Triggers, cron jobs, and dmons execute in the context of (with the permissions of) the user who creates them.
  • Macros execute in the context of the user who calls them for execution, regardless of who created them.

The Storm query executed by a given automation task cannot perform tasks that it does not have appropriate permissions to perform. This leads to some straightforward (and some less straightforward) situations. For example:

  • A user with higher privileges can write a macro that can be called and executed by a user with lower privileges. However, if the lower-privileged user does not have permissions to perform actions encoded in the macro, execution by the lower-privileged user will fail (i.e., with an AuthDeny error).
  • A user with higher privileges can write a trigger that calls a macro. Because the macro was called by the trigger, the macro executes with the permissions of the user who wrote the trigger. As triggers are event-driven (see below), this means it is possible for a lower-privileged user to cause an event that fires the higher-privileged trigger that calls the higher-privileged macro. Both will execute as written even though the execution was “caused” by a lower-privileged user.

Use Cases.

Organizations can implement automation in any way they see fit, and at any scale. Some automation may be enterprise-wide, where it is used to support an organization’s overall mission or analysis efforts. Other automation may be put in place by individual analysts to support their own research efforts, either on an ongoing or temporary basis. For example:

  • an analyst may write a cron job to execute a long-running data collection and analysis task during off-peak hours, or may create a macro as a “shortcut” to make it easier to store and execute a particular Storm query on demand.
  • an organization may implement a set of automation to track malicious indicators within Synapse and orchestrate the integration of those indicators with security devices or SIEM systems.

Architecture.

There are varying approaches for “how” to write and implement automation. For example:

  • The various types of automation can be kept entirely separate and independent from one another, each executing their own Storm code and performing different tasks. Alternately, automation can be somewhat centralized (say in macros), where other types of automation execute minimal Storm queries whose purpose is to call the more extensive Storm stored in macros.
  • Automation can be written as many small, individual elements that each perform a relatively simple task, but which are designed to work together like building blocks to orchestrate larger-scale operations. In contrast, automation can be implemented using fewer elements that perform larger, more unified tasks using more complex Storm to carry out multiple operations.

Each approach has its pros and cons; there is no single “right” way, and what works best in your environment or for a particular task will depend on your needs (and possibly some trial and error).

Governance / Management.

Where multiple users have the ability to create automation tasks, it is possible for them to create duplicative or even conflicting automation. This is less likely to occur (or less likely to be potentially harmful) when users create automation to support their personal workflow, but can present problems when multiple users have permissions to both create automation and perform actions within Synapse that have “global” effects. For example:

  • Tags typically represent assessments about data. This can include things like whether an IP address is a sinkhole or whether an indicator is associated with a particular threat group. Organizations may wish to consider under what circumstances these assessments should be applied via automation. (For organizations not comfortable with fully automating certain analytical decisions, automation can be used to tag something “for review” to ensure a human in the loop.)
  • Automation is particularly useful for “enriching” indicators (such as domains or file hashes) by ingesting data into Synapse from various third-party sources. In some cases third-party access may be subject to query limits or may incur per-query (or per-result) usage costs. Organizations may wish to consider how to balance the convenience of automation with potential operational or financial impacts.

Organizations should plan whether or how to coordinate and deconflict automation where appropriate.

Review and Testing.

Automation in Synapse provides significant advantages. It relieves analysts from performing tedious work, freeing them to focus on more detailed analysis and complex tasks. It also allows you to scale your analytical operations by limiting the amount of work that must be performed manually. However, badly written automation or automation that contains logical errors also allows you to “make errors at machine speed”. We strongly encourage testing automation in a development environment before implementing it on a production system.

Nodes In and Nodes Out.

In some cases automation can be used to “chain” operations (e.g., both triggers and cron can call macros) or may be executed inline with other Storm operations (a macro can be called in the middle of a Storm query where the query expects to perform further operations on the nodes returned by the macro). In these cases, both the automation itself and any Storm executed after the automation may fail if the inbound nodes are not what is expected by the query.

Users should keep the Storm Operating Concepts in mind when writing automation.

Triggers and Cron

Triggers and cron are similar in terms of permissions and management. Both triggers and cron:

  • Support permissions. Synapse uses permissions to determine who can create, modify, and delete triggers and cron jobs. Triggers and cron jobs execute with the permissions of the user who creates them. (This is fairly intuitive for cron, but may be less so in the case of triggers.) Conversely, a trigger or cron job can only perform actions that their creator has permissions to perform.
  • Are introspectable. Triggers and cron jobs are created as run-time nodes (“runt nodes”) and can be lifted, filtered, and pivoted across just like other elements of the Synapse data model (see the Storm Reference - Model Introspection guide for details).
  • Are view-specific. Synapse allows the optional segregation of data in a Cortex into multiple layers (Layer) that can be “stacked” to provide a unified View of data to users. Triggers and cron jobs are specific to a view. This means that they are carried over (for example) if a user forks (Fork) a view. See the layer and view commands in the Storm Reference - Storm Commands guide for additional detail on working with views and layers.

Note

A simple Synapse implementation consists of a single Cortex with a single layer and a single view.

Triggers

Triggers are event-driven. As their name implies, they trigger (“fire”) their associated Storm query on demand when specific events occur in a Cortex. As noted above, a trigger executes with the permissions of the user who created it; the trigger can only perform actions that the user is allowed to perform.

Triggers can fire on the following events:

  • Adding a node (node:add)
  • Deleting a node (node:del)
  • Setting a property (prop:set)
  • Adding a tag to a node (tag:add)
  • Deleting a tag from a node (tag:del)

Each event requires an object (a form, property, or tag) to act upon - that is, if you write a trigger to fire on a node:add event, you must specify the type of node (form) associated with the event. Similarly, if a trigger should fire on a tag:del event, you must specify the tag whose removal fires the trigger.

tag:add and tag:del events can take an optional form, allowing you to specify that a trigger should fire when a given tag is added (or removed) from a specific form as opposed to any / all forms.

Triggers execute immediately when their associated event occurs in a Cortex. This allows automation to occur in “real time” as opposed to waiting for a scheduled cron job to execute (or for an analyst to manually perform some task). As such, triggers are most appropriate for automating tasks that should occur right away (e.g., based on efficiency or importance). For example, triggers can be used to perform simple but tedious tasks, freeing analysts from having to perform such tasks manually; or they can be used to collect additional data (e.g., from third-party services) about nodes of interest (“enrich” those nodes).

Warning

Triggers execute inline with the process (typically a Storm query) that causes them to fire. That is, if a process (or an analyst) creates a node as part of a longer Storm query and creation of that node causes a trigger to fire, the trigger’s Storm will execute immediately and in full before returning “back” to the main query to complete any additional Storm operations. This includes execution of any additional code that may be called by the trigger itself, such as a macro. Conceptually, it is as though all of the trigger’s Storm code and any additional Storm that it calls were inserted into the middle of the original Storm query that fired the trigger.

This inline execution can impact a Cortex’s performance, depending on the Storm executed by the trigger and the number of nodes causing the trigger to fire. For example, let’s say you are using CSVTool (see Synapse Tools - csvtool) to load 3,000 indicators and tag them as “malicious”. Your Cortex includes a trigger that fires when a “malicious” tag is applied to any node, calling a Storm query that contacts five third-party services to enrich those indicators, resulting in the creation of dozens of additional nodes for each indicator enriched. This tag-and-enrich process is executed inline for each of the 3,000 nodes created by the CSVTool ingest.

Conversely, triggers only execute when the specified event occurs. The events used to fire triggers typically will only ever occur once - the node inet:fqdn=woot.com will only ever be added to a Cortex once; applying the tag #my.tag to a hash:md5 node will only occur one time (barring deleting and later re-adding the node or tag). The only exception is the prop:set event, which will fire when the specified property is first set (often on node creation) as well as any time the property is modified (i.e., “set” again).

The “one-time” nature of triggers may result in a small number of edge cases where the trigger does not perform the desired action. For example:

  • Triggers do not operate retroactively; they will not fire for nodes that already exist in a Cortex at the time the trigger is added. That is, if you write a new trigger to fire any time the tag #my.tag is applied to a hash:md5 node, the trigger will have no effect on existing hash:md5 nodes that already have the tag.
  • If a trigger depends on a resource (process, service, etc.) that is not available when it fires, the trigger will simply fail; it will not “try again”.

Finally, users should be aware that in some cases proper trigger execution may depend on the timing and order of events with respect to creating nodes, setting properties, creating nodes from secondary properties (“derivative nodes”), etc. The detailed technical aspects of Synapse write operations are beyond the scope of this discussion, but as always it is good practice to test triggers (or other automation) before putting them into practice to ensure they perform as intended.

Syntax:

Triggers are created, modified, viewed, enabled, disabled, and deleted using the Storm trigger.* commands. See the trigger command in the Storm Reference - Storm Commands document for details.

Note

Triggers can be modified after they are created, but modification is limited to updating the Storm query executed by the trigger. If other aspects of a trigger need to be changed (such as the event that fires the trigger, or the form a trigger operates on), the trigger must be deleted and re-created.

Examples:

In the examples below, the command to create (add) the specified trigger is shown.

For illustrative purposes, in the first example the newly created trigger is displayed by using the trigger.list command and then by lifting the associated syn:trigger runtime (“runt”) node.

  • Add a trigger that fires when an inet:whois:email node is created. If the email address is associated with a privacy-protected registration service (e.g., the email address is tagged #whois.private), then also tag the inet:whois:email node.
trigger.add node:add --form inet:whois:email --query { +{ -> inet:email +#whois.private } [ +#whois.private ] }

Newly created trigger via trigger.list

cli> storm trigger.list
Executing query at 2021/08/02 21:52:04.600
user       iden                             en?    cond      object                    storm query
root       9b0f237f56ad41cbd6fbb16ca6f0f501 True   node:add  inet:whois:email             +{ -> inet:email +#whois.private } [ +#whois.private ]
complete. 0 nodes in 448 ms (0/sec).

The output of trigger.list contains the following columns:

  • The username used to create the trigger.
  • The trigger’s identifier (iden).
  • Whether the trigger is currently enabled or disabled.
  • The condition that causes the trigger to fire.
  • The object that the condition operates on, if any.
  • The tag or tag expression used by the condition (for tag:add or tag:del conditions only).
  • The query to be executed when the trigger fires.

Newly created trigger as a syn:trigger node

cli> storm syn:trigger
Executing query at 2021/08/02 21:52:05.066
syn:trigger=9b0f237f56ad41cbd6fbb16ca6f0f501
        :cond = node:add
        :doc =
        :enabled = True
        :form = inet:whois:email
        :name =
        :storm =  +{ -> inet:email +#whois.private } [ +#whois.private ]
        :user = d7bcb433b7a5b7398c48d149f996bde6
        :vers = 1
complete. 1 nodes in 4 ms (250/sec).

While runt nodes (Node, Runt) are typically read-only, syn:trigger nodes include :name and :doc secondary properties that can be set and modified via Storm. This allows management of triggers by providing them with meaningful names and descriptions of their purpose. Changes to these properties will persist even after a Cortex restart.

  • Add a trigger that fires when the :exe property of an inet:dns:request node is set. Check to see whether the queried FQDN is malicious; if so, tag the associated file:bytes node for analyst review.
trigger.add prop:set --prop inet:dns:request:exe --query { +{ :query:name -> inet:fqdn +#malicious } :exe -> file:bytes [ +#review ] }
  • Add a trigger that fires when the tag #ttp.phish.attach (indicating that a file was an attachment to a phishing email) is applied to a file:bytes node. Use the trigger to also apply the tag #attck.t1566.001 (indicating MITRE ATT&CK technique “spear phishing attachment”).
trigger.add tag:add --form file:bytes --tag ttp.phish.attach --query { [ +#attck.t1566.001 ] }
  • Add a trigger that fires when the tag #osint (indicating that the node was listed as a malicious indicator in public reporting) is applied to an FQDN. The trigger should call extended Storm commands to submit the FQDN to third-party services (such as a domain whois service, a passive DNS service, or a malware execution / sandbox service) to enrich the FQDN. (Note: Synapse does not include these services in its public distribution; this trigger assumes such services have been implemented).
trigger.add tag:add --form inet:fqdn --tag osint --query { | whois | pdns | malware }
  • Add a trigger that fires when the tag #osint (indicating that the node was listed as a malicious indicator in public reporting) is applied to any node. The trigger should call (execute) a macro called enrich. The macro contains a Storm query that uses a switch case to call the appropriate extended Storm commands based on the tagged node’s form (e.g., perform different enrichment / call different third-party services based on whether the node is an FQDN, an IPv4, an email address, a URL, etc.).
trigger.add tag:add --tag osint --query { | macro.exec enrich }

Cron

Cron jobs in Synapse are similar to the well-known cron utility. Where triggers are event-driven, cron jobs are time / schedule based. Cron jobs can be written to execute once or on a recurring schedule. When creating a cron job, you must specify the schedule for the job and the Storm query to be executed. As noted above, a cron job executes with the permissions of the user who created it; the job can only perform actions that the user is allowed to perform.

As such, cron is most appropriate for automating things such as non-urgent tasks, tasks that only need to run intermittently, or resource-intensive tasks that should run during off-hours.

Note

When scheduling jobs, cron interprets all times as UTC.

Syntax:

Cron jobs are created, modified, viewed, enabled, disabled, and deleted using the Storm cron.* commands. See the cron command in the Storm Reference - Storm Commands document for details.

Note

Cron jobs can be modified after they are created, but modification is limited to updating the Storm query executed by the job. If other aspects of a job need to be changed (such as its schedule for execution), the job must be deleted and re-created.

Examples:

In the examples below, the command to create (add) the specified cron job is shown.

For illustrative purposes, in the first example the newly created cron job is then displayed by using the cron.list command and by lifting the associated syn:cron runtime (“runt”) node.

  • Add a one-time / non-recurring cron job to run at 7:00 PM to create the RFC1918 IPv4 addresses in the 172.16.0.0/16 range.
cron.at --hour 19 { [ inet:ipv4=172.16.0.0/16 ] }

Newly created cron job via cron.list

cli> storm cron.list
Executing query at 2021/08/02 21:52:05.901
user       iden       view       en? rpt? now? err? # start last start       last end         query
root       52b60015.. da857983.. Y   N    N         0       Never            Never             [ inet:ipv4=172.16.0.0/16 ]
complete. 0 nodes in 351 ms (0/sec).

The output of cron.list contains the following columns:

  • The username used to create the job.
  • The first eight characters of the job’s identifier (iden).
  • Whether the job is currently enabled or disabled.
  • Whether the job is scheduled to repeat.
  • Whether the job is currently executing.
  • Whether the last job execution encountered an error.
  • The number of times the job has started.
  • The date and time of the job’s last start and last end.
  • The query executed by the cron job.

Newly created cron job as a syn:cron node

cli> storm syn:cron
Executing query at 2021/08/02 21:52:06.270
syn:cron=52b60015365cd2e62e22c7753b53451a
        :doc =
        :name =
        :storm =  [ inet:ipv4=172.16.0.0/16 ]
complete. 1 nodes in 4 ms (250/sec).

While runt nodes (Node, Runt) are typically read-only, syn:cron nodes include :name and :doc secondary properties that can be set and modified via Storm. This allows management of cron jobs by providing them with meaningful names and descriptions of their purpose. Changes to these properties will persist even after a Cortex restart.

  • Add a cron job to run every Tuesday, Thursday, and Saturday at 2:00 AM that lifts 1,000 IPv4 address nodes with missing geolocation data (i.e., no :loc property) and submits them to an extended Storm command that calls an IP geolocation service. (Note: Synapse does not include a geolocation service in its public distribution; this cron job assumes such a service has been implemented).
cron.add --day Tue,Thu,Sat --hour 2 { inet:ipv4 -:loc | limit 1000 | ipgeoloc }
  • Add a cron job to run on the 15th of every month to lift all MD5, SHA1, and SHA256 hashes tagged “malicious” that do not have corresponding file (file:bytes) nodes and submit them to an extended Storm command that queries a third-party malware service for those files. (Note: Synapse does not include a malware service in its public distribution; this cron job assumes such a service has been implemented).
cron.add --day 15 { hash:md5#malicious hash:sha1#malicious hash:sha256#malicious -{ -> file:bytes } | malwaresvc }

Macros

Similar to Triggers and Cron, a macro is a stored Storm query / set of Storm code. However, macros differ from triggers and cron in some important ways:

  • Cortex-specific. Where triggers and cron jobs are specific to a View, macros are specific to a given Cortex.
  • Execution. Where triggers and cron execute based on specific criteria (events for triggers and scheduled times for cron), macros do not “execute” by default. Instead, macros are executed with the Storm macro.exec command.
  • Permissions. Where triggers and cron execute with the permissions of the user who created them, macros execute with the permissions of the user who calls the macro. This means that if a macro is called by a trigger or cron job, it executes as the creator of that trigger or cron job. A macro can only perform actions that the calling user can perform.

The Storm query contained in a macro can be as simple or complex as you like. Users can write “personal” macros to store complex or frequently-used queries that can be executed easily by calling the macro name instead of manually typing a lengthy query. Organizations can use them to store longer or more complex Storm queries that may be more difficult to view and manage as triggers or cron jobs within the Storm cmdr CLI.

Because macros are executed via a Storm command, they can be executed any way Storm can be executed. In short, they can be called manually on demand from the cmdr CLI, or they can be called as part of the Storm query executed by a trigger, cron job, or other automation. This provides a great deal of flexibility, particularly for organization-wide automation. By using a macro, you can write a single Storm query to carry out a task that can then be called in a variety of ways.

Note

Macros can contain any valid Storm. You can write a macro to be fully “self-contained” so that it both lifts a specified set of nodes and operates on those nodes. Alternately, you can write a macro with the expectation that it will be called as part of a larger Storm query and will operate on a particular set of “inbound” nodes that will be piped to the macro.exec command.

Syntax:

Macros are created, modified, viewed, and deleted using the Storm macro.* commands. See the macro command in the Storm Reference - Storm Commands document for details.

Note

Macros consist entirely of Storm, so can be edited / modified freely with appropriate permissions.

Examples:

  • Add a macro named sinkhole.check that lifts all IPv4 addresses tagged as sinkholes (#infra.sinkhole) and submits those nodes to an extended Storm command that calls a third-party passive DNS service. (Note: Synapse does not include a passive DNS service in its public distribution; the macro assumes such a service has been implemented).
macro.set sinkhole.check ${ inet:ipv4#infra.sinkhole | pdns }
  • Add a macro named check.c2 that takes an inbound set of file:bytes nodes and returns any FQDNs that the files query and any IPv4 addresses the files connect to. Use a filter in the macro to ensure that the macro code only attempts to process inbound file:bytes nodes.
macro.set check.c2 ${ +file:bytes | tee { -> inet:dns:request :query:name -> inet:fqdn | uniq } { -> inet:flow:src:exe :dst:ipv4 -> inet:ipv4 | uniq } }

Add a macro named enrich that takes any node as input and uses a switch statement to call extended Storm commands for third-party services able to enrich a given form (line breaks and indentations used for readability). (Note: Synapse does not include third-party services / connectors in its public distribution; the macro assumes such services have been implemented.)

macro.set enrich ${ switch $node.form() {

    /* You can put comments in macros!!! */

    "inet:fqdn": { | whois | pdns | malware }
    "inet:ipv4": { | pdns }
    "inet:email": { | revwhois }
    *: { }
} }

Note

The macro above uses variables, methods, and a switch case for control flow.

Dmons

A Dmon is s a long-running or recurring query or process that runs continuously in the background, similar to a traditional Linux or Unix daemon.

Syntax:

Users can interact with dmons using the Storm dmon.* commands (see the dmon command in the Storm Reference - Storm Commands document for details) and the $lib.dmon Storm libraries.