Storm Reference - Advanced - Control Flow

Storm includes a number of common programming control flow structures to facilitate more advanced Storm queries. These include:

The examples below are for illustrative purposes. This guide is not meant as a Storm programming tutorial. The intent is to introduce Storm users who may not be familiar with programming concepts (or programmers who are learning to program in Storm) to possible use cases and simple examples for these structures. We’ve included some Advanced Storm - Tips and an Advanced Storm - Example to provide some pointers and an illustration of how Storm’s “pipeline” behavior and control flow structures may interact.

See the follwing User Guide and Reference sections for additional information:

Storm Developers may also wish to refer to the Synapse Developer Guide.

Advanced Storm - Tips

Storm Operating Concepts - Review

Tip

It is essential to keep the Storm Operating Concepts in mind when using more advanced Storm queries and constructs. For standard Storm queries, these concepts are intuitive - you don’t really need to think about them, and Storm just works. However, these concepts are critical to writing more advanced Storm - remembering these fundamentals can save you time and headaches trying to debug a Storm query that is not behaving the way you think it should.

Users are strongly encouraged to review the Storm Operating Concepts as well as the additional data below.

WORKING SET

  • We use the term working set to refer to the set of nodes you are operating on in Storm.

    • The nodes you “start with” (often by performing an initial lift operation) are your initial working set.

    • The nodes at any given point in a Storm query are your current working set.

OPERATION CHAINING

  • Storm operations (lifts, filters, pivots, subqueries, commands, control flow elements…) can be chained together to form longer queries. However, each operation is executed individually and in sequence.

NODE CONSUMPTION

  • Each Storm operation typically changes your current working set. (We sometimes say that Storm operations consume nodes. This refers to the fact that in many cases the nodes that “come out” of a Storm operation are not the same nodes that “went in”.)

    If you perform a filter operation, the resulting set of nodes is typically a subset of the ones you started with. If you use a pivot operation, you typically leave your “original” nodes behind and “move to” the new nodes. In most cases, your current working set is constantly changing.

STORM AS A PIPELINE

  • A Storm query made up of multiple chained operations acts as a pipeline through which each node passes individually.

    • This means that a Storm query is executed once for each node that passes through the pipeline - not “just once” for the entire set of nodes.

    • The fact that each inbound node is processed independently by each Storm operation impacts your current working set.

      A simple example of this impact is the “duplication” of results (nodes) following some pivot operations - this is expected behavior. If you pivot from a set of FQDNs (inet:fqdn) to their DNS A records (inet:dns:a) and then to the associated IPv4 addresses (inet:ipv4), your results may include multiple instances of the same IPv4 node if more than one FQDN resolved to the same IPv4 address. In short, you get one “copy” of the IPv4 for each FQDN that resolved to (had a DNS A record for) that address. (The Storm uniq command is used to de-duplicate the nodes in the current working set).

The way Storm behaves - these operating concepts - impacts advanced constructs such as control flow, and are important to understand when writing and debugging more advanced Storm.

Tip

See the Advanced Storm - Example below for an illustration of how these concepts may impact your Storm in unexpected ways.

Storm Debugging Tips

A few helpful tips when writing and debugging advanced Storm:

Be aware of your pipeline.

That is, understand what is in your current working set at any point in your query. A significant part of Storm troubleshooting comes down to figuring out that the current working set is not what you think it is.

Be aware of your variables.

Storm supports both runtime-safe (“runtsafe”) and non-runtime-safe (“non-runtsafe”) variables. Non-runtsafe variables have values that may change based on the current node in the Storm pipeline. Another significant part of Storm troubleshooting involves understanding the values of any variables at any given point in your Storm code. (See Variable Concepts for addditional information.)

Operations may execute multiple times.

Because each node passes through each operation in a Storm query individually, operations execute more than once (typically once for each node in the pipeline as it passes through that operation). This includes control flow operations, such as for loops! If you don’t account for this behavior with control flow operations in particular, it can result in behavior such as:

  • An exponentially increasing working set (if each node passing through an operation generates multiple results, and the results are not deduplicated / uniq’ed appropriately).

  • A variable that is set by an operation being consistently changed (re-set) for each node passing through the operation (commonly resulting in “last node wins” with respect to variable assignment).

  • A variable that fails to be set for a node that does not pass through the operation where the variable is assigned (resulting in a NoSuchVar error).

Use subqueries…but understand how they work.

Unlike most Storm operations and commands, subqueries do not consume nodes - by default, what goes into a subquery comes out of a subquery, regardless of what happens inside the subquery itself. This means you can use subqueries (see Storm Reference - Subqueries) with advanced Storm to isolate certain operations and keep the “primary” nodes passing through the Storm pipeline consistent.

That said, a node still has to pass into a subquery for the Storm inside a subquery to run. If your subquery fails to execute, it may be because nothing is going in to it.

Start small and add to your Storm incrementally.

It’s easier to verify that smaller Storm queries execute correctly and then build on that code than to try and write a more advanced query all at once and try to figure out where things aren’t working.

As with all debugging, print statements are your friend.

Scatter $lib.print() or $lib.pprint() statements (see $lib.print(mesg, **kwargs) and $lib.pprint(item, prefix=, clamp=None)) generously throughout your Storm during testing. You can print message strings at various points during execution:

$lib.print("Hey! This worked!")

You can print the value of a variable, to check its value at a given point in your query:

inet:ipv4=1.2.3.4
$asn=:asn
$lib.print($asn)

You can also print values associated with the node(s) in the current working set, using the various methods associated with the $node Storm type. (See Storm Reference - Advanced - Methods for a user-focused introduction to methods, or storm:node in the detailed Storm Libraries / Storm Types documentation for a more technical discussion.)

$lib.print($node.ndef())

Control Flow Operations

Tip

The examples below are Storm excerpts used to illustrate specific concepts, but do not represent complete Storm queries / Storm code.

Init Block

An init block allows you to execute the specified Storm once at the beginning of your Storm query. This allows you to use Storm to perform a set of operations a single time only. Without the init block, the code would be executed once for each node passing through the Storm query.

See also Fini Block.

Syntax:

init { <storm> }

Example:

You want to use an init block to initialize a set of variables that will be used later in the Storm query. Initializing the variables to default values can:

  • Explicitly set a variable value up front.

  • Specify default values for variables in the event they are not set during subsequent execution (e.g., due to a missing node, property, or tag that the variable depends on).

  • Initialize variables that will be modified during execution (e.g., lists, sets, tallies, or other ‘count’ values you expect to change or increment).

init {

    $url=https://www.example.com/my_data/
    $threatname=''
    $fqdns=$lib.set()
    $fqdn_count=0
}

Fini Block

A fini block allows you to execute the specified Storm once after all nodes have passed through the preceding code in the Storm pipeline. This allows you to use Storm to perform a set of operations a single time “outside” of the normal Storm pipeline; without the fini block, the code would be executed once for each node passing through the query.

See also Init Block.

Syntax:

fini { <storm> }

Example:

You have a Storm query that processes a series of inet:fqdn nodes, adding nodes that meet certain criteria to a set (specified with the variable $fqdns). After processing the nodes, you want to print a message with the total number of nodes in your set (which you stored in the variable $fqdn_count) and return the set of nodes.

fini {

    $lib.print(`Total count is {$fqdn_count}`)
    return($fqdns)
}

If-Else Statement

An if-else statement matches inbound objects against a specified condition. If that condition is met, a set of Storm operations are performed. If the condition is not met, a different set of Storm operations are performed. Storm supports the use of if by itself; if-else; or if-elif-else.

Note that the “Storm operations” performed can include no operations / “do nothing” if no Storm is provided (e.g., if the associated curly braces are left empty).

If

Syntax:

if <condition> { <storm> }

If <condition> is met, execute the Storm query in the curly braces. If <condition> is not met, do nothing. (Note that this is equivalent to an if statement followed by an empty else statement.)

Note

If <condition> is an expression to be evaluated, it must be enclosed in parentheses ( ). If the expression includes strings, they must be enclosed in single or double quotes.

if ( $str = 'Oh hai!' ) { <storm> }

Or:

if ( :time > $date ) { <storm> }

(Where :time represents a property on an inbound node.)

If-Else

Syntax:

if <condition> { <storm> }
else { <storm> }

If <condition> is met, execute the associated Storm; otherwise, execute the alternate Storm.

Similar to the if example above with no else option (or an empty query for else), you can have an empty if query:

if <condition> { }
else { <storm> }

If <condition> is met, do nothing; otherwise, execute the alternate Storm query.

If-Elif-Else

Syntax:

if <condition> { <storm> }
elif <condition> { <storm> }
else { <storm> }

If <condition> is met, execute the associated Storm; otherwise, if (else if) the second <condition> is met, execute the associated Storm; otherwise (else) execute the final Storm query.

You can use multiple elif statements before the final else. If-elif-else is helpful because it allows you to handle multiple conditions differently while avoiding “nested” if-else statements.

Example:

You have a subscription to a third-party malware service that allows you to download malware binaries via the service’s API. However, the service has a query limit, so you don’t want to make any unnecessary API requests that might exhaust your limit.

You can use a simple if-else statement to check whether you already have a copy of the binary in your storage Axon before attempting to download it.

<inbound file:bytes node(s)>

if $lib.bytes.has(:sha256) { }

else { | malware.download }

The Storm query above:

  • takes an inbound file:bytes node;

  • checks for the file in the Axon ($lib.bytes.has(sha256)) using the :sha256 value of the inbound file;

  • if $lib.bytes.has(:sha256) returns true (i.e., we have the file), do nothing ({  });

  • otherwise call the malware.download service to attempt to download the file.

Note: In the above example, malware.download is used as an example Storm service name only; it does not exist in the base Synapse code.

Switch Statement

A switch statement matches inbound objects against a set of specified constants. Depending on which constant is matched, a set of Storm operations is performed. The switch statement can include an optional default case to perform a set of Storm operations in the case where none of the explicitly defined constants are matched.

Syntax:

<inbound nodes>

switch <constant> {

  <case1>: { <storm> }
  <case2>: { <storm> }
  <case3>: { <storm> }
  *: { <storm for optional default case> }
}

Example:

You want to write a macro (see Macros) to automatically enrich a set of indicators (i.e., query third-party data sources for additional data). Instead of writing separate macros for each type of indicator, you want a single macro that can take any type of indicator and send it to the appropriate Storm commands.

A switch statement can send your indicators to the correct services based on the kind of inbound node (e.g., the node’s form).

<inbound nodes>

switch $node.form() {

    "hash:md5": { | malware.service }

    "hash:sha1": { | malware.service }

    "hash:sha256": { | malware.service }

    "inet:fqdn": { | pdns.service | whois.service }

    "inet:ipv4": { | pdns.service }

    "inet:email": { | whois.service }

    *: { $lib.print("{form} is not supported.", form=$node.form()) }
}

The Storm query above:

  • takes a set of inbound nodes;

  • checks the switch conditions based on the form of the node (see $node.form());

  • matches the form name against the list of forms;

  • handles each form differently (e.g., hashes are submitted to a malware service, domains are submitted to passive DNS and whois services, etc.)

  • if the inbound form does not match any of the specified cases, print ($lib.print(mesg, **kwargs)) the specified statement (e.g., "file:bytes is not supported.").

The default case above is not strictly necessary - any inbound nodes that fail to match a condition will simply pass through the switch statement with no action taken. It is used above to illustrate the optional use of a default case for any non-matching nodes.

Note: the Storm command names used above are examples only. Commands with those names do not exist in the base Synapse code.

For Loop

A for loop will iterate over a set of objects, performing the specified Storm operations on each object in the set.

Syntax:

for $<var> in $<vars> {

    <storm>
}

Note: The user documentation for the Synapse csvtool (csvtool) includes additional examples of using a for loop to iterate over the rows of a CSV-formatted file (i.e., for $row in $rows { <storm> }).

Example:

You routinely apply tags to files (file:bytes nodes) to annotate things such as whether the file is associated with a particular malware family (cno.mal.redtree) or threat group (cno.threat.viciouswombat). When you apply any of these tags to a file, you want to automatically apply those same tags to the file’s associated hashes (e.g., hash:md5, etc.)

You can use a for loop to iterate over the relevant tags on the file and apply(“push”) the same set of tags to the file’s hashes. (Note: this code could be executed by a trigger (see Triggers) that fires when the relevant tag(s) are applied.)

<inbound file:bytes node(s)>

for $tag in $node.tags(cno.**) {

    { :md5 -> hash:md5 [ +#$tag ] }
    { :sha1 -> hash:sha1 [ +#$tag ] }
    { :sha256 -> hash:sha256 [ +#$tag ] }
    { :sha512 -> hash:sha512 [ +#$tag ] }
}

For each inbound node, the for loop:

  • Looks for tags on the node that match the specified pattern (cno.**)

  • For each tag that matches the pattern, execute the Storm code to:

    • Pivot from each of the file’s hash properties to the associated hash node.

    • Apply the tag to the node.

Because each “pivot and tag” operation is isolated in a Subquery, the original file:bytes node remains in our Storm pipeline throughout the set of operations.

Note

A for loop will iterate over “all the things” as defined by the for loop syntax. In the example above, a single inbound node may have multiple tags that match the pattern defined by the for loop. This means that the for loop operations will execute once per matching tag per node and yield the inbound node (the file:bytes node) to the pipeline for each iteration of the for loop.

In other words, for each inbound node:

  • the first matching tag causes the for loop to execute;

  • the loop operations are performed for that tag (i.e., the tag is applied to the associated hashes);

  • the file:bytes node is yielded from the for loop;

  • if there are additional matching tags to process from the inbound node, repeat the for loop for each tag.

Recall that a “single” multi-element tag (such as cno.mal.redtree) actually represents three tags (cno, cno.mal, and cno.mal.redtree). If an inbound file:bytes node has the tag #cno.mal.redtree, the for loop will execute twice (for the matching tags cno.mal and cno.mal.redtree) and yield two copies of the file:bytes node (one for each match / each iteration of the for loop).

This is by design, and is the way Storm variables (specifically, non-runtime safe variables (Non-Runtsafe)) and the Storm execution pipeline (see Storm Operating Concepts) are intended to work.

See the Advanced Storm - Example below for an illustration of how for loops in particular are impacted by Storm’s pipeline behavior.

While Loop

A while loop checks inbound nodes against a specified condition and performs the specified Storm operations for as long as the condition is met.

Syntax:

while <condition> {

    <storm>
}

While loops are more frequently used for developer tasks, such as consuming from Queues; and are less common for day-to-day user use cases.

Try…Catch Statement

A try…catch statement allows you to attempt (try) a Storm operation and handle (catch) any errors if they occur. Because Storm’s default behavior is to halt execution when an error occurs, try…catch statements allow for more graceful error handling within Storm. “Catching” an error allows the remainder of your Storm to continue executing.

Tip

Storm supports some basic error handling (allowing you to “warn and continue” vs “error and halt”) specifically when creating nodes and setting properties or tags through the use of the Edit “Try” Operator (?=).

Syntax:

try {

    <storm>

} catch <name> as err {

    <storm>
}

If the Storm in the try block runs without error, the catch block (or blocks) are ignored. If an error occurs, execution of the try block halts (any remaining Storm in the try block is ignored) and flow passes to the appropriate catch block to handle the error. Multiple catch blocks can be used to handle different kinds of errors.

Because the catch block handles the error, any additional Storm (i.e., after the catch block) will continue to execute.

In the catch block above, <name> can be the name of a single error type, a set of error types, or the asterisk ( * ) to represent any error. When using multiple catch blocks, the asterisk can be used in the final block as a default case to catch any error not explicitly handled by a previous catch block.

The catch block can return a status (e.g., return((1))) or output a warning message (e.g., using $lib.warn() - see $lib.warn(mesg, **kwargs)).

Example:

You have an “enrich” macro used to send various kinds of nodes to Storm commands that connect to third-party data sources. There is a particular data source that occasionally returns malformed data, which throws an error and causes the entire macro to halt. You want to isolate the Storm command for that vendor within a try…catch block so the macro will continue to run if an error is encountered.

try {

    | enrich.badvendor

} catch * as err {

   $lib.warn("BadVendor blew up again!")
}

Advanced Storm - Example

The example below is meant to provide a more concrete illustration of some of Storm’s pipeline behavior when combined with certain control flow operations - specifically, with for loops. Control flow operations such as if-else or switch statements allow you to perform more advanced Storm operations, but still typically represent a single “path” through the pipeline for any given node - even though the specific path for a given node may vary depending on the if-else or switch conditions.

With for loops, however, we may execute the same Storm multiple times, which may have unexpected results if you don’t keep Storm’s pipeline concept in mind.

For Loop - No Subquery

Consider the following query:

inet:fqdn=vertex.link
$list = ('foo','bar','baz')

for $item in $list {

    $lib.print($item)
}

$lib.print('And we're done!')

The query:

  • lifts a single FQDN node;

  • defines a list containing three elements, foo, bar, and baz;

  • uses a for loop to iterate over the list, printing each element;

  • prints And we're done!

When executed, the query generates the following output:

storm> inet:fqdn=vertex.link
  $list = ('foo', 'bar', 'baz')

  for $item in $list {

    $lib.print($item)
  }

  $lib.print("And we're done!")


foo
And we're done!
inet:fqdn=vertex.link
        :domain = link
        :host = vertex
        :issuffix = False
        :iszone = True
        :zone = vertex.link
        .created = 2022/12/05 20:24:59.572
bar
And we're done!
inet:fqdn=vertex.link
        :domain = link
        :host = vertex
        :issuffix = False
        :iszone = True
        :zone = vertex.link
        .created = 2022/12/05 20:24:59.572
baz
And we're done!
inet:fqdn=vertex.link
        :domain = link
        :host = vertex
        :issuffix = False
        :iszone = True
        :zone = vertex.link
        .created = 2022/12/05 20:24:59.572

What’s going on here? Why does And we're done! print three times? Why do we apparently have three copies of our FQDN node? The reason has to do with Storm’s pipeline behavior, and how our FQDN node travels through the pipeline when the pipeline loops.

Our query starts with a single inet:fqdn node in our initial working set. Setting the $list variable does not change our working set of nodes.

When we reach the for loop, the loop needs to execute multiple times (three times in this case, once for each item in $list). Anything currently in our pipeline (any nodes that are inbound to the for loop, as well as any variables that are currently set) is passed into each iteration of the for loop.

In this case, because the for loop is part of our main Storm pipeline (it is not isolated in any way, such as by being placed inside a subquery), each iteration of the loop outputs our original FQDN node…which then continues its passage through the remainder of the Storm pipeline, causing the $lib.print('And we're done!') statement to print (remember, each node travels through the pipeline one by one). Storm then executes the second iteration of the for loop, and the FQDN that exits from this second iteration continues through the pipeline, and so on.

It may help to think of this process as the for loop effectively “splitting” the main Storm pipeline into multiple pipelines that then each continue to execute in full, one after the other.

Note

Each pipeline still executes sequentially - not in parallel. So the first iteration of the for loop (where $item=foo) will execute and the remainder of the Storm pipeline will run to completion; followed by the second iteration of the for loop and the remainder of the Storm pipeline, and so on. (This is why one instance of And we're done! prints before the messages associated with the second iteration of the loop where $item=bar, etc.).

For Loop - With Subquery

In this variation on our original query, we isolate the for loop within a subquery (Storm Reference - Subqueries):

inet:fqdn=vertex.link
$list = ('foo','bar','baz')

{
    for $item in $list {

        $lib.print($item)
    }
}

$lib.print('And we're done!')

The query performs the same actions as described above, but thanks to the subquery, the behavior of this query is different, as we can see from the query’s output:

storm> inet:fqdn=vertex.link
  $list = ('foo', 'bar', 'baz')

  {
      for $item in $list {
          $lib.print($item)
      }
  }

  $lib.print("And we're done!")


foo
bar
baz
And we're done!
inet:fqdn=vertex.link
        :domain = link
        :host = vertex
        :issuffix = False
        :iszone = True
        :zone = vertex.link
        .created = 2022/12/05 20:24:59.572

In this case, the query behaves more “as expected” - the strings within the for loop print once for each item / iteration of the loop, And we're done! prints once, and a single FQDN node exits our pipeline when our query completes. So what’s different?

One of the key features of a subquery is that by default (i.e., unless the yield option is used), the nodes that go into a subquery also come out of a subquery, regardless of what occurs inside the subquery itself. In other words, subqueries do not “consume” nodes.

We still have our single FQDN inbound to the subquery. Inside the subquery, our for loop still executes, effectively “splitting” the Storm pipeline into three pipelines that execute in sequence. But once we complete the for loop and exit the subquery, those pipelines are “discarded”. The single FQDN that went into the subquery exits the subquery. We are back to our single node in the main pipeline. That single node causes our print statement to print And we're done! only once, and we are left with our single node at the end of the query.