Synapse’s knowledge graph is built on a robust, extensible data model that can represent a broad range of data and relationships. The data model (and associated Analytical Model) allow both data and assertions to be represented in a structured, consistent manner. This means that instead of analysts needing to review prose reports to understand current state (and fuse those reports into still more prose to revise their assessments), analysts (and algorithms) can ask analytical questions directly of the data - and answer those questions quickly and easily.
Capturing data and analysis in a structured model abstracts away some of the subtleties and caveats that can be conveyed in prose, and finished reporting is still appropriate in many cases. But a good data model can represent enough information so that key objects, relationships, and assessments are well-defined, unambiguous, and self-evident upon examination.
This section provides background on the components of Synapse’s data model and their use.
There are various ways to examine Synapse’s data model in greater detail:
Synapse Enterprise customers or users who have requested a Synapse demo instance and have access to the Synapse UI (Optic) can use the Data Model Explorer to view Synapse’s forms and light edges and their relationships to each other. The Tag Explorer can be used to view the tags that exist in your instance of Synapse.
Data model components such as types, forms, and properties are generated as runtime nodes (“runt nodes”) when a Cortex is initialized and can be viewed as meta-objects within Synapse itself. See the Storm Reference - Model Introspection section for details.
Data Model Objects
To work effectively with Synapse and the Storm query language, you need to understand the basic elements of the Synapse data model.
A type is the definition of a data element within the Synapse data model. A type describes what the element is and enforces how it should look, including how it should be normalized, if necessary, for both storage (including indexing) and representation (display).
Synapse’s data model includes standard types such as integers and strings, but further defines a broad range of
types such as globally unique identifiers (
guid), date/time values (
time), time intervals (
Objects (nodes) may also be specialized types. For example, an IPv4 address (
inet:ipv4) is its own type.
An IPv4 address is stored as an integer, but the
inet:ipv4 type has additional constraints (e.g., to ensure
that IPv4s created in Synapse only use integer values that fall within the allowable IPv4 address space). These
constraints may be defined by a Constructor that specifies how a property of that type can
be created (constructed) in Synapse.
Synapse includes optimizations for some types to improve performance and functionality. Some of these are “back end” optimizations (i.e., for indexing and storage) while some are more “front end” in terms of how users can interact with data. See Storm Reference - Type-Specific Storm Behavior for additional detail.
A form is the definition of an object in the Synapse data model. A form acts as a “template” that tells you how
to create a particular object (node). While the concepts of form and node are closely related, it is useful
to maintain the distinction between the template for creating an object (a form) and an instance of a particular
object (a node).
inet:fqdn is a form;
inet:fqdn = vertex.link is a node.
All forms must have a primary property. The primary property is the name of the form and the definition of the
value to be provided for individual instances (nodes) of that form. The primary property must be defined so that it
is unique across all possible instances of that form. For example, FQDNs are unique, based on the way they are defined
and registered (two different organizations cannot both register the FQDN
vertex.link). So the primary property
value of an
inet:fqdn is simply the FQDN itself.
All properties in Synapse must have a defined type; in many cases, a form is also its own type (for example, the form
inet:fqdn has a type of
Forms may have secondary properties that record additional information about the form or further describe it. Secondary properties are form-specific and are explicitly defined for each form.
Synapse also supports a set of universal secondary properties (universal properties) that are valid for all forms.
Synapse uses a structured namespace for forms. Each form name consists of at least two elements separated by a
: ). For example:
The first element in the namespace represents a rough “category” for the form (i.e.,
inet for Internet-related
objects). The Synapse data model is broad and extensible. The ability to group portions of the data model into
related categories makes a large model easier to manage, and also allows Synapse users to focus on those portions
of the model most relevant to them.
The second and / or subsequent elements in the form name define the specific “subcategory” or “thing” within the
form’s primary category (e.g.,
inet:fqdn represents a fully qualified domain name (FQDN) within the “Internet”
A node is a unique object within Synapse; they are specific instances of generic forms. Every node consists of:
A primary property, represented by the form of the node plus its value (
<form> = <valu>). All primary properties must be unique for a given form; the uniqueness of the
<form> = <valu>pair ensures there can be only one node in Synapse that represents the domain
Because the unique pair “defines” the node, the comma-separated form / value combination (
<form>,<valu>) is also known as the node’s Ndef (short for “node definition”).
One or more universal properties. As the name implies, universal properties are applicable to all nodes.
Optional secondary properties. Similar to primary properties, secondary properties consist of a property name (of a specific type) and the property’s value (
<prop> = <pval>).
Optional tags. A tag acts as a label with a particular meaning that can be applied to a node to provide context.
The Storm query below lifts and displays the node for the domain
storm> inet:fqdn=www.google.com inet:fqdn=www.google.com :domain = google.com :host = www :issuffix = false :iszone = false :zone = google.com .created = 2023/09/28 14:39:57.441 #rep.moz.500
In the output above:
inet:fqdn = www.google.comis the primary property (
<form> = <valu>).
.createdis a universal property showing when the node was added to the Cortex.
:host, etc. are form-specific secondary properties with their associated values (
<prop> = <pval>). For readability, secondary properties are displayed as relative properties within the namespace of the form’s primary property (e.g.,
:domainas opposed to
#rep.moz.500is a tag indicating that
www.google.comhas been reported by web analytics company Moz as one of their top 500 most popular websites.
See Kinds of Nodes below for additional detail on how nodes are used to represent various objects in Synapse.
Properties are the individual elements that define a form or (along with their values) that comprise a node. All properties in Synapse must have a defined type.
Every form consists of (at minimum) a primary property: the name of the form and the definition of the value to be provided for individual instances (nodes) of that form. All forms must be designed so that their primary property value is unique across all instances (nodes) of that form.
This uniqueness is straightforward for simple objects such as FQDNs or email addresses. Ensuring “uniqueness” for more complex nodes (such as those representing a Relationship or an Event) can be more challenging; these forms are often GUID forms.
Because a primary property uniquely defines a node, it cannot be modified once the node is created. To “change” a node’s primary property value you must delete and re-create the node.
A form can include optional secondary properties that provide additional detail about the form. Secondary
properties are specific to a given form and further describe that form. A node may include secondary properties
with their associated values (
<prop> = <pval>).
Some secondary properties are based on (derived from) a node’s primary property value. For example, an email
inet:email) has secondary properties for both the associated FQDN (
inet:email:fqdn) and username
inet:email:user). When you create the node
inet:firstname.lastname@example.org, Synapse automatically sets the
associated secondary property values. Any secondary properties derived from a node’s primary property are read-only
(just like the primary property they are based on) and cannot be changed once set.
Any secondary properties not based on a node’s primary property are optional. Their values can be set if
the data is available and relevant to your use case; otherwise they can remain unset. For example, an IPv4 node
inet:ipv4) has an optional secondary property for its associated Autonomous System (AS) number (
All optional secondary property values can be set, modified, or removed as needed.
Synapse defines a subset of secondary properties as universal properties that are applicable to all forms:
.created, which is set automatically by Synapse for all nodes and whose value is the date/time that the node was created within that instance of Synapse (Cortex).
.seen, which is optional for all nodes and whose value is a time interval (minimum or “first seen” and maximum or “last seen”) during which the node was observed, existed, or was valid.
Properties extend the Form Namespace. Form names are primary properties, and consist of at least
two elements separated by a colon (
: ). Secondary properties exist within the namespace of their primary
property (form). Secondary properties are preceded by a colon (
: ) and use the colon to separate additional
namespace elements, if needed. Universal properties are preceded by a period (
. ) to distinguish them from
form-specific secondary properties.
For example, the secondary (both universal and form-specific) properties of
Secondary properties also make up a relative namespace (set of relative properties) with respect to their primary
property (form). The Storm query language allows (or in some cases, requires) you to reference a secondary property
using its relative property name (i.e.,
Relative properties are also used for display purposes within Synapse for visual clarity (see the Node Example above).
Secondary properties may have their own “namespace”. Both primary and secondary properties use colons to separate elements of the property name. However, not all separators represent property “boundaries”; some act more as “sub-namespace” separators.
file:bytes is a primary property / form. A
file:bytes form may include
secondary properties such as
:mime:pe:compiled. In this case
are not secondary properties, but sub-namespaces for individual MIME data types and the “PE executable” data type
Tags are annotations applied to nodes. They can be thought of as labels that provide context to the data represented by the node.
Broadly speaking, within Synapse:
Nodes represent things: objects, relationships, or events. In other words, nodes typically represent observables that are verifiable and largely unchanging.
Tags typically represent assessments: observations that could change if the data or the analysis of the data changes.
An Internet domain is an “observable thing” - a domain exists, was registered through a domain registrar, and can be created as a node such as
inet:fqdn = woot.com.
Whether a domain has been sinkholed is an assessment. A researcher may need to evaluate data related to that domain (such as domain registration records or current and past IP resolutions) to decide whether the domain appears to be sinkholed. This assessment can be represented by applying a tag such as
inet:fqdn = woot.comnode.
Tags are unique within the Synapse model because tags are both nodes and labels applied to nodes. The tag
cno.infra.dns.sink.holed can be applied to another node; but the tag itself also exists as the node
syn:tag = cno.infra.dns.sink.holed. This difference is illustrated in the example below.
Synapse does not have any pre-defined tags. Users are free to create tags that are meaningful for their analysis. See Analytical Model for more detail.
The Storm query below displays the node for the tag
storm> syn:tag=cno.infra.dns.sink.holed syn:tag=cno.infra.dns.sink.holed :base = holed :depth = 4 :doc = A domain (zone) that has been sinkholed. :title = Sinkholed domain :up = cno.infra.dns.sink .created = 2023/09/28 14:39:57.519
The Storm query below displays the tag
cno.infra.dns.sink.holed applied to the node
inet:fqdn = hugesoft.org:
storm> inet:fqdn=hugesoft.org inet:fqdn=hugesoft.org :domain = org :host = hugesoft :issuffix = false :iszone = true :zone = hugesoft.org .created = 2023/09/28 14:39:57.543 #cno.infra.dns.sink.holed
Note that a tag applied to a node uses the “hashtag” symbol (
# ). This is a visual cue to distinguish tags
on a node from the node’s secondary properties. The symbol is also used within the Storm query language syntax to
reference a tag as opposed to a
Lightweight (Light) Edge
Lightweight (light) edges are used in Synapse to provide greater flexibility and improved performance when representing
certain types of relationships. A light edge is similar to an edge in a traditional directed graph; each light edge
links exactly two nodes (
n2), and consists of:
A direction. Light edge relationships only “make sense” in one direction, given the forms that they link. For example, an article can reference an indicator such as an MD5 hash, but an MD5 hash does not “reference” an article.
A “verb” that represents the relationship (e.g.,
refsfor “references” in the example above).
Light edges do not have properties, and you cannot apply tags to light edges - hence the “light” in light edge.
Light edges are used for performance and flexibility in certain use cases. For example:
When the only information you need to record about a relationship is that it exists (that is, no properties are required to further “describe” the relationship). An example is
meta:ruleset -(contains)> meta:rule.
When the objects (nodes) involved in the relationship may vary. That is, either the
n2node (or both) may be any kind of node, depending on the context of the relationship. Examples include
meta:source -(seen)> *(where a data source may “see”, observe, or provide data on any
* -(refs)> *(where a variety of
n1nodes may “reference” or contain a reference to any
Synapse’s source code includes some pre-defined light edges that represent The Vertex Project’s conventions. While we recommend the use of these conventions, we do not enforce their use. Synapse users are free to create / define their own light edges and use them as they see fit. (Note that Synapse Power-Ups provided by The Vertex Project will create light edges according to our conventions when ingesting data.)
Light edges should not be used as a convenience to short-circuit proper data modeling using forms. Using forms and nodes (combined with Synapse’s strong typing, type enforcement, and type awareness) are key to the powerful analysis and performance capabilities of a Synapse hypergraph.
Kinds of Forms
Synapse forms can be broadly grouped based on how their primary properties (
<form> = <valu>) are formed.
Recall that primary properties must be defined so that they are unique for all possible instances of that form.
A simple form refers to a form whose primary property is a single value. Simple forms are commonly used to represent an Object and are the most readily understood from a modeling perspective. The “object itself” is unique by definition, so the form’s primary property value is the object. Examples of simple forms include FQDNs, IP addresses (IPv4 or IPv6), hashes, and so on.
Composite (Comp) Form
A composite (comp) form is one where the primary property is a comma-separated list of two or more elements. While no single element makes the form unique, a set of elements may be sufficiently unique to define the form. Comp forms are often (though not universally) used to represent a Relationship.
Fused DNS A records are an example of a comp form. A DNS A record can be uniquely defined by the combination
of the domain (
inet:fqdn) and the IP address (
inet:ipv4) in the A record. In Synapse, an
form represents the knowledge that a given domain resolved to a specific IP at some time, or within a time window.
.seen property captures “when” (first observed / last observed) the resolution took place.)
A guid (Globally Unique Identifier) form is uniquely defined by a machine-generated 128-bit number. Guids account for cases where it is impossible to uniquely define a thing based on a property or set of properties. Guids are also useful for cases where the amount of data available to create a particular object (node) may vary greatly - that is, not all properties or details are available from all data sources. A guid form gives you the flexibility (through secondary properties) to capture as much (or as little) data as is available to you.
A guid form can be considered a special case of a simple form where the form’s value is a
Forms that represent one-time events are often guid forms. Examples include host execution activity (such as
it:exec:file:add nodes) or network activity (such as
inet:dns:request nodes). Guid forms are also used
to represent entities such as people (
ps:person) or organizations (
Edge (Digraph) Form
Edge forms predate the addition of light edges to the data model. The use of light edges is generally preferred over edge forms where possible.
An edge (digraph) form is a specialized composite form whose primary property value consists of two
<form>,<valu> pairs (“node definitions”, or ndefs). An edge form is a specialized relationship form that
can be used to link two arbitrary forms in a generic relationship.
Edge forms have not been officially deprecated. However, edge forms (used to create nodes) incur some additional performance overhead vs. light edges (particularly for large numbers of edge nodes).
The Synapse data model includes a number of “generic” forms that can be used to represent metadata and / or arbitrary data.
Synapse’s extensible data model can be expanded as needed, so ideally all data in Synapse would be represented using an appropriate form. However, designing a new form may require discussion, subject matter expertise, and testing against “real world” data, as well as time to implement the changes. Analysts may have a need to capture data “in the moment” without waiting for model updates. Alternatively, some data may be “one off” information that needs to be represented, but does not necessarily require its own form for a limited or unique use case.
In the above cases, generic forms may be used to capture data where a more specific form does not exist. Generic
forms reside in two primary parts of the data model:
meta:* forms and
meta:rule form is an example of a generic form. Synapse includes more specific forms to represent common
detection logic such as antivirus (
it:av:filehit) or YARA rules (
it:app:yara:match). Other technologies or organizations may have their own specific (and often “black box”)
meta:rule form can represent an arbitrary detection rule, with a
-(matches)> light edge used to link the
rule to the “thing” (file, network traffic, etc.) that the rule fired on.
Kinds of Nodes
Nodes represent standard objects (“nouns”) such as IP addresses, files, people, conferences, or airplanes. They can also represent more abstract objects such as industries, risks, attacks, or goals. However, in Synapse nodes can also represent relationships or specific time-based events. You can think of a node generically as a “thing” - most “things” you want to model within Synapse are nodes.
Broadly speaking, nodes can be thought of in terms of some generic categories:
Nodes can represent atomic objects or entities, whether real or abstract. Entities are often (though not always)
represented as a Simple Form. An email address (
inet:email) is a basic example of an entity-type node /
Nodes can represent specific relationships among entities. Examples include a domain resolving to an IPv4 address, a malware dropper containing or extracting another file, a company being a subsidiary of another business, or a person being a member of a group.
Relationship nodes are often represented as a Composite (Comp) Form. Comp forms have a primary property consisting
of a comma-separated list of two or more values that uniquely define the relationship. A DNS A record (
is a basic example of a relationship node:
storm> inet:dns:a=(google.com,126.96.36.199) inet:dns:a=('google.com', '188.8.131.52') :fqdn = google.com :ipv4 = 184.108.40.206 .created = 2023/09/28 14:39:57.679
Nodes can represent individual time-based occurrences. The term event implies that an entity existed or a relationship occurred at a specific point in time. Events represent the combination of a node and a timestamp for when the node was observed. Examples of event forms include an individual login to an account, a specific DNS query, or a domain registration (whois) record captured on a specific date.
The structure of an event node may vary depending on the specific event being modeled. A “simple” event
may be represented as a Composite (Comp) Form that combines an entity and a timestamp; for example, a domain whois
inet:whois:rec) consists of the whois record and the time that record was observed or retrieved.
Other more complex events are represented as a Guid Form with the timestamp as one of several secondary
properties on the form. A specific, individual DNS query (
inet:dns:request) is an example of an event node:
storm> inet:dns:request=00000a17dbe261d10ce6ed514872bd37 inet:dns:request=00000a17dbe261d10ce6ed514872bd37 :query = ('tcp://220.127.116.11', 'download.applemusic.itemdb.com', '1') :query:name = download.applemusic.itemdb.com :query:name:fqdn = download.applemusic.itemdb.com :query:type = 1 :reply:code = 0 :server = tcp://18.104.22.168 :time = 2018/09/30 16:01:27.506 .created = 2023/09/28 14:39:57.725
Instance Knowledge vs. Fused Knowledge
For some types of data, event nodes and relationship nodes can encode similar information but represent the difference between instance knowledge and fused knowledge.
Event forms represent the specific point-in-time existence of an entity or occurrence of a relationship - an instance of that knowledge.
Relationship forms can leverage the universal
.seenproperty to set “first observed” and “last observed” times during which an entity existed or a relationship was true. This date range can be viewed as fused knowledge - knowledge that summarizes or “fuses” the data from many individual observations (instances) of the node over time.
Instance knowledge and fused knowledge represent differences in data granularity. Whether to create an event node or a relationship node (or both) depends on how much detail is required for your analysis. This consideration often applies to relationships that change over time, particularly those that may change frequently.
DNS records are a good example of these differences. The IP address that a domain resolves to may change infrequently (e.g., for a website hosted on a stable server) or may change quite often (e.g., where the IP is dynamically assigned or where load balancing is used).
One option to represent and track DNS A records is to create individual events every time you check the domain’s
current resolution (e.g.,
inet:dns:answer forms). This represents a very high degree
of granularity as the nodes will record the exact time a domain resolved to a given IP. The nodes can also capture
additional detail such as the querying client, the responding server, the response code, and so on. However, the
number of such nodes could readily reach into the hundreds of millions if you create nodes for every resolution
of every domain you want to track.
On the other hand, it may be sufficient to know that a domain resolved to an IP address during a given period
of time – a “first observed” and “last observed” (
.seen) range. A single
inet:dns:a node can be created
to show that domain
woot.com resolved to IP address
22.214.171.124, where the earliest observed resolution was
2014/08/06 at 13:56 and the most recently observed resolution was 2018/05/29 at 7:32. These timestamps can be
extended (earlier or later) if additional data changes our observation boundaries.
This second approach loses some granularity:
The domain is not guaranteed to have resolved to that IP continuously throughout the entire time period.
Given only this node, we don’t know exactly when the domain resolved to the IP address during that time period, except for the earliest and most recent observations.
However, this fused knowledge may be sufficient for our needs and may be preferable to creating thousands of nodes for individual DNS resolutions.
Of course, a hybrid approach is also possible, where most DNS A record data is recorded in fused
nodes but it is also possible to record high-resolution, point-in-time
nodes when needed.