Storm Reference - Subqueries

This section discusses the following topics:

Subquery

A subquery is a Storm query that is executed inside of another Storm query. Curly braces ( { } ) are used to enclose the embedded query.

Note

Curly braces are a Storm syntax element that simply indicates “a Storm query is enclosed here”. They can be used to denote a subquery, but have other uses as well.

Recall from Storm Operating Concepts that a Storm query can consist of multiple elements (lift, filter, pivot, pipe to command, etc.). This sequence of Storm operations acts as a “pipeline” through which the nodes in the query pass. Regardless of the number of nodes you start with (i.e., the number of nodes in your initial lift), each node is processed individually by each element in the query, from left to right.

The elements in the query can be thought of as “gates”. The nodes “inbound” to each gate are processed by that gate in some way. For example, if the “gate” is a filter operation, some nodes may be allowed to pass, while others are dropped (“consumed”), based on the filter. If the gate is a pivot, the inbound node is dropped while the node that is the “target” of the pivot is picked up and added to the pipeline.

Note that in a standard Storm query (as described above) the set of nodes at any given point in the query is constantly changing - the “working set” of nodes is transformed by the various operations. The nodes that “go in” to a particular operation in the query are generally not the same ones that “come out”. Note also that as described, this process is linear (hence “pipeline”).

A subquery is another element that can be used as part of a longer Storm query, only in this case the “element” is itself an entire Storm query (as opposed to a filter, pivot, or Storm command).

One advantage of a subquery is that the actions that occur inside the subquery do not affect the “main” Storm execution pipeline - the nodes that “go in” to a subquery are the same nodes that “come out”, regardless of what operations occur within the subquery itself. (In terms of the Storm Operating Concepts, subqueries do not consume nodes by default.) In this way, a subquery can allow you to “branch off” the main Storm execution pipeline, “do a thing” off to the side, and then return to the main execution pipeline as though nothing happened; you resume at the point you left off, with the same set of nodes in the pipeline as when you left.

If you want the nodes that result from the subquery operations to be returned, the yield option can be used to do so. Note that yielding the subquery nodes is in addition to the set of nodes that passed in to the subquery (not “instead of” the inbound nodes). If you only want the nodes resulting from the subquery, you probably don’t need a subquery and can just use a standard Storm query instead.

Note

Any actions performed inside of a subquery will persist. For example, any modifications made to nodes inside a subquery (setting or modifying properties, applying tags, even creating new nodes) will remain; those changes will be present in the Cortex.

In addition, when setting or updating a Variable inside a subquery, the variable can pass back “out” of the subquery and be available to the main Storm query.

What remains unchanged is that the set of nodes inbound to the subquery will be the same set of nodes available (inbound) to the next element in the main Storm query - whatever happens inside the subquery does not affect the set of nodes in the pipeline (barring the use of yield of course).

This ability to “do a thing off to the side” inside of a Storm query pipeline can add efficiencies to certain queries, allowing you to perform some action inline that would otherwise require a second, separate query to perform. While subqueries have their uses in “standard” Storm, they are particularly useful for more advanced Storm use cases involving variables and control flow.

Note

A subquery is typically used to perform some action related to the Storm query in which it is embedded. But there is no requirement for this to be the case. The subquery can contain any valid Storm, so you could (for example) write a subquery that lifts ten arbitrary email addresses ( { inet:email | limit 10 } ) in the middle of a longer query. There’s not much point to this, but Storm will dutifully lift the nodes, discard them (unless the yield option is used), and continue on.

Finally, one important characteristic of a subquery is that it requires inbound nodes in order to execute. That is, the subquery is meant to be an element in a larger Storm pipeline, not a stand-alone query, and not the first element in a longer query. Even though the subquery does not affect the inbound nodes (that is, the nodes “pass through” the subquery and are still available as inbound nodes to the next query element), nodes must still be “fired into” the subquery for the subquery action(s) to take place.

For example, the following query will return zero nodes, even though the yield directive is present. Because no nodes are “inbound” to cause the subquery to execute, the embedded Storm is never run:

storm> yield { inet:email | limit 10 }

Syntax:

<query> [ yield ] { <query> } [ <query> ]

<query> [ yield ] { <query> [ { <query> } ] } [ <query> ]

Examples:

  • Pivot from a set of DNS A records to their associated IPs and then to additional DNS A records associated with those IPs. Use a subquery to check whether any of the IPs are RFC1918 addresses (i.e., have :type=private) and if so, tag the IP as non-routable.

<inet:dns:a> -> inet:ipv4 { +:type=private [ +#nonroutable ] } -> inet:dns:a
  • Pivot from a set of IP addresses to any servers associated with those IPs. Use a subquery to check whether the IP has a location (:loc) property, and if not, call a third-party geolocation service to attempt to identify a location and set the property. (Note: Synapse does not include a geolocation service in its public distribution; this example assumes such a service has been implemented and is called using an extended Storm command named ipgeoloc.)

<inet:ipv4> { -:loc | ipgeoloc } -> inet:server
  • Pivot from a set of FQDNs to any files (binaries) that query those FQDNs. Use a subquery with the yield option to return the file nodes as well as the original FQDNs.

<inet:fqdn> yield { -> inet:dns:request:query:name +:exe -> file:bytes }

Note

The “pivot and join” operator ( -+> ) allows you to combine a set of inbound nodes with the set of nodes reached by the pivot into a single result set. However, the operator only allows you to join sets of nodes that are “one degree” (one pivot) apart. The subquery syntax above effectively allows you to join two sets of nodes that are more than one pivot apart.

Usage Notes:

  • Subqueries can be nested; you can place subqueries inside of subqueries.

  • When the yield option is used, Storm will return the nodes from the subquery first, followed by the nodes from the original working set.

Subquery Filter

A subquery filter is a filter where the filter itself is a Storm expression.

Standard Storm filter operations are designed to operate on the nodes in the current working set, that is, the nodes actively passing through the Storm pipeline. Regardless of whether the filter uses a Standard Comparison Operator or Extended Comparison Operator, the filter evaluates some aspect of the node itself such as its primary or secondary property(ies), or whether or not the node has a particular tag.

A subquery filter allows you to use a subquery to filter the current set of nodes based on their relationship to other nodes, or on the properties or tags of “nearby” nodes. The subquery content is still evaluated “off to the side”; any pivots, filters, or other operations performed inside the subquery are still “contained within” the subquery. But the nodes passing through the main Storm pipeline are evaluated against the contents of the subquery, and are then filtered - passed or dropped - based on that evaluation.

For additional detail on subquery filters and examples of their use, refer to the Subquery Filters section of the Storm Reference - Filtering guide.

Using Subqueries to Reference Nodes

A subquery can be used as an alternative (and potentially simpler) way to reference a node in Synapse.

A common use case is to use a subquery as a simpler way to refer to a GUID Form or Composite Form without needing to enter (type, copy and paste) the form’s primary property.

Examples:

Lift all of the contacts (ps:contact nodes) for The Vertex Project:

ps:contact:org = { ou:org:name = vertex }

Set the :id:number property of a contact (ps:contact node) to the ID number whose value is 444-44-4444:

ps:contact:name='ron the cat' [ :id:number = { ou:id:number:value = 444-44-4444 } ]

Tip

When a subquery is used to specify a property value (as in the queries above), Synapse will generate an error if the subquery expression returns more than node. Specifically, the :org property of a ps:contact node is a single ou:org guid. If ou:org:name = vertex returns more than one org node, Synapse returns a BadTypeValu error.

See the Add or Modify Properties Using Subqueries section of the Storm Reference - Data Modification document for additional examples of setting properties using subqueries.