Analytical Model - Tag Concepts

Recall from Data Model - Terminology that two of the key components within Synapse are nodes and tags. Broadly speaking:

  • Nodes commonly represent “things”: observables that can be verified and are unlikely to change over time.
  • Tags commonly represent information that may change or evolve over time. In some cases this information may be something that is true for a given period, but then is no longer true (such as the association of an IP address with a specialized service such as Tor). In most cases, tags represent assessments or analytical evaluations: conclusions drawn from observables that may change in light of new data or re-evaluation of existing data.

The types, forms, and properties that define nodes make up the Synapse data model. The tags representing labels applied to those nodes can be thought of as the analytical model used to record observations or assessments about that data. This section provides some additional background on tags before a more in-depth discussion on their use:

Tags as Nodes

While tags primarily record analytical observations, tags are also nodes within the Synapse data model. Every tag is a node of the form syn:tag (whose type is syn:tag).

A tag node’s primary property (<form> = <valu>) is the name of the tag; so the tag foo.bar has the primary property syn:tag = foo.bar. The dotted notation can be used to construct “tag hierarchies” that can represent varying levels of specificity (see below).

This example shows the node for the tag syn:tag = aka.feye.thr.apt1:

cli> storm syn:tag=aka.feye.thr.apt1
Executing query at 2021/09/23 21:43:12.264
syn:tag=aka.feye.thr.apt1
        .created = 2021/09/23 21:43:12.138
        :base = apt1
        :depth = 3
        :doc = Indicator or activity FireEye calls (or associates with) the APT1 threat group.
        :title = APT1 (FireEye)
        :up = aka.feye.thr
complete. 1 nodes in 15 ms (66/sec).

The following properties are present for this node:

  • .created, which is a universal property showing when the node was added to a Cortex.
  • :title and :doc, which are meant to store concise and more detailed (if necessary) definitions for the tag, respectively. Applying explicit definitions to tag nodes limits ambiguity and helps ensure tags are being applied (and interpreted) correctly by Synapse analysts and other users.

The :depth, :up, and :base secondary properties help to lift and pivot across tag nodes:

  • :depth is the “location” of the tag in a given dotted tag hierarchy, with the count starting from zero. A single-element tag (syn:tag = aka) has :depth = 0, while a three-element tag (syn:tag = aka.feye.thr) has :depth = 2.
  • :base is the final (rightmost) element in the dotted tag hierarchy.
  • :up is the tag one “level” up in the dotted tag hierarchy.

Additional information on viewing and pivoting across tags can be found in Storm Reference - Model Introspection. For detail on the Storm query language, see Storm Reference - Introduction.

Tags (syn:tag forms) have a number of type-specific behaviors within Synapse with respect to how they are indexed, created, and manipulated via Storm. Most important for practical purposes is that syn:tag nodes are created “on the fly” when a tag is applied to another node. That is, the syn:tag node does not need to be created manually before the tag can be used; the act of applying a tag will cause the creation of the appropriate syn:tag node (or nodes).

See the syn:tag section within Storm Reference - Type-Specific Storm Behavior for additional detail on tags and tag behavior within Synapse and Storm.

Tags as Labels

Synapse does not include any pre-populated tags (syn:form = <tag>), just as it does not include any pre-populated domains (inet:fqdn = <domain>). Because tags can be highly specific to both a given knowledge domain and to the type of analysis being done within that domain, organizations have the flexibility to create a tag structure that is most useful to them.

A tag node’s value (syn:tag = <valu>) is simply a string and can be set to any user-defined alphanumeric value. However, the strings are designed to use a dotted naming convention, with the period ( . ) used as a separator character to delimit individual elements of a tag if necessary. This dotted notation means it is possible to create tag hierarchies of arbitrary depth that support increasingly detailed or specific observations. For example, the top level tag foo can represent a broad set of observations, while foo.bar and foo.baz could represent subsets of foo or more specific observations related to foo.

Within this hierarchy, specific terms are used for the tag and its various components:

  • Leaf tag: The full tag path / longest tag in a given tag hierarchy.
  • Root tag: The top / leftmost element in a given tag hierarchy.
  • Base tag: The bottom / rightmost element in a given tag hierarchy.

For the tag foo.bar.baz:

  • foo.bar.baz is the leaf tag (leaf).
  • foo is the root tag (root).
  • baz is the base tag (base).

When you apply a tag to a node, all of the tags above that tag in the tag hierarchy are automatically applied as well (and the appropriate syn:tag nodes are created if they do not exist). That is, when you apply the tag foo.bar.baz to a node, Synapse automatically applies the tags foo.bar and foo as well. Because tags are meant to be hierarchical, if the specific assessment foo.bar.baz is applicable to a node and foo.bar.baz is a subset of foo, it follows that the broader assessment foo is applicable as well.

When you delete (remove) a tag from a node, the tag and all tags below it in the hierarchy are deleted. If you delete the tag foo.bar.baz from a node, the tags foo.bar and foo will remain. However, if you delete the tag foo from a node with the tag foo.bar.baz, then all three tags (foo, foo.bar, and foo.bar.baz) are deleted.

Deleting a tag from a node does not delete the node for the tag itself; that is, removing the tag #foo.bar.baz from any (or all) nodes does not affect the node syn:tag = foo.bar.baz.

See the syn:tag section within Storm Reference - Type-Specific Storm Behavior for additional detail on tags and tag behavior within Synapse and Storm.

See Analytical Model - Tags as Analysis and Design Concepts - Analytical Model for additional considerations for tag use and creating tag hierarchies.

Tag Timestamps

Applying a tag to a node has a particular meaning; it typically represents the recording of an assessment about that node with respect to the existing data in the Cortex. Many assessments are binary in the sense that they are either always true or always false; in these cases, the presence or absence of a tag is sufficient to accurately reflect the current analytical assessment, based on available data.

There are other cases where an assessment may be true only for a period of time or within a specified time frame. Internet infrastructure is one example; you can annotate whether an IP address is part of an anonymization service such as Tor using tags such as cno.infra.anon.tor. However, this information can change over time if the Tor service is removed or the IP address is reallocated to a different customer. Although the relevant tag can be applied while the IP is a Tor node and removed when that is no longer true, completely removing the tag causes us to lose the historical knowledge that the IP was a Tor node at one time.

To address these use cases, Synapse supports the optional use of timestamps (technically, time intervals) with any tag applied to a node. The timestamps can represent “when” (first known / last known times) the assessment represented by the tag was relevant for the node to which the tag is applied. (These timestamps are analogous to the .seen universal property that can be used to represent the first and last known times the data represented by a node was true / real / in existence.)

Applying a timestamp to a tag affects that specific tag only. The timestamps are not automatically propagated to tags higher up (or lower down) in the tag tree. This is because the specific tag to which the timestamps are applied is the most relevant with respect to those timestamps; tags elsewhere in the tree may have different shades of meaning and the timestamps may not apply to those tags in the same way (or at all).

Like .seen properties, tag timestamps represent a time range and not necessarily specific instances (other than the “first known” and “last known” observations). This means that the assessment represented by the tag is not guaranteed to have been true throughout the entire date range (though depending on the meaning of the tag, that may in fact be the case). That said, the use of timestamps allows much greater granularity in recording analytical observations in cases where the timing of an assessment (“when” something was true or applicable) is relevant.

Example - Tor Exit Nodes

Many web sites provide lists of Tor nodes or allow users to query IP addresses to determine whether they are Tor nodes. These sites may provide “first seen” and “last seen” dates for when the IP was identified as part of the Tor network. These dates can be used as timestamps for “when” the tag #cno.infra.anon.tor was applicable to that IP address.

If we have a data source that verifies that IP address 197.231.221.211 was a Tor node between December 19, 2017 and February 15, 2019, we can apply the tag #cno.infra.anon.tor with the appropriate time range as follows:

cli> storm inet:ipv4 = 197.231.221.211 [ +#cno.infra.anon.tor = (2017/12/19, 2019/02/15) ]
Executing query at 2021/09/23 21:43:12.397
inet:ipv4=197.231.221.211
        .created = 2021/09/23 21:43:12.341
        :asn = 37560
        :dns:rev = exit1.ipredator.se
        :latlong = 8.4219,-9.7478
        :loc = lr.lo.voinjama
        :type = unicast
        #cno.infra.anon.tor = (2017/12/19 00:00:00.000, 2019/02/15 00:00:00.000)
complete. 1 nodes in 40 ms (25/sec).

Tag Properties

Synapse supports the creation and use of custom tag properties that can be used to provide additional context to a given tag or set of tags.

Unlike tags, tag properties cannot be created “on the fly”; they must first be created programmatically (i.e., via addTagProp). On creation, a tag property must be assigned a name and a Type; currently, tag properties can be of any Synapse base type except array. Any constraints on the property (e.g., such as an allowed range) must also be defined when the tag property is created.

Once a tag property is created, it can be applied (appended) to any tag.

Example - Risk / Reputation

An example of how a custom tag property can provide additional context is the idea of a “risk” or “reputation” score. With respect to cyber threat data, many commercial security companies provide a “risk” score for various indicators, such as domains or URLs. Different companies may provide different risk scores (based on their data and algorithms) for the same indicator, so it would be convenient to use tags to show which companies consider an indicator to be “risky”.

Companies providing risk scores typically do so using a numerical scale, say between 0 - 100. This metric provides more context to the idea of risk - specifically how “risky” something is based on the company’s assessment. If we wish to track this type of reporting, we can define the custom tag property :risk as an integer type with a minimum value of 0 (no risk) and a maximum value of 100 (confirmed malicious).

# Create the custom tag property for the concept of risk
await cortex_proxy.addTagProp('risk', ('int', {'min': 0, 'max': 100}),
                              {'doc': 'Risk score'})

Once created, the :risk property and its associated value can then be appended to any appropriate tags (e.g., #rep.symantec:risk=80 or #rep.domaintools:risk=63).

See Storm Reference - Data Modification for additional detail on adding, modifying, and removing tag properties.

See the appropriate Storm Reference guides for additional detail on lifting (Storm Reference - Lifting), filtering (Storm Reference - Filtering), and pivoting (Storm Reference - Pivoting) using tags and tag properties.

Tag Display

When a tag is used as a label applied to a node, the data is displayed differently than it is for a syn:tag node itself. This example shows a node with multiple tags applied:

cli> storm inet:fqdn = aunewsonline.com
Executing query at 2021/09/23 21:43:12.539
inet:fqdn=aunewsonline.com
        .created = 2021/09/23 21:43:12.517
        :domain = com
        :host = aunewsonline
        :issuffix = False
        :iszone = True
        :zone = aunewsonline.com
        #aka.feye.thr.apt1
        #cno.threat.t15.tc = (2009/09/08 00:00:00.000, 2013/09/08 00:00:00.000)
        #rep.symantec:risk = 92
complete. 1 nodes in 19 ms (52/sec).

Any tags on the node are listed alphabetically following the node’s properties. Tags are prefixed with the pound / hashtag ( # ) symbol to indicate they are tags.

By default, Storm displays only the leaf tags applied to a node in the node’s output (e.g., #aka.feye.thr.apt1 but not #aka.feye.thr, #aka.feye, etc.) and any tags with tag timestamps or custom properties (even if they are not leaf tags).

Any timestamp values are displayed following an equals sign after the tag. In the example above, the tag #cno.threat.t15.tc indicates the domain is associated with internally-tracked Threat Cluster 15 (T15). The dates reflect the assessment that the domain was associated with / controlled by T15 between September 8, 2009 and September 8, 2013.

Any custom tag properties are displayed using a colon (:) followed by the property name and its associated value(s) (#<tag>:<tagprop> = <valu>). In the example above, the tag / tag property #rep.symantec:risk = 92 indicates that Symantec has reported that the domain has a risk score of 92.