Synapse’s Data Model provides a structured way to record, query, and navigate “observables” - objects, relationships, and events that can be captured and are unlikely to change.
Synapse also gives analysts a structured way to record observations or assessments through the use of labels (tags) applied to data (nodes). Assessments represent conclusions based on the data available to you at the time. As new data becomes available, your analysis is revised. As labels on nodes, tags are flexible and can be easily added, updated, or removed when assessments change.
Tags provide immediate context to individual nodes. In addition, by representing both data (nodes) and assessments (tags) consistently, analysts can use Synapse to query both of these in very powerful ways.
Synapse uses the
syn:tag form to represent tags, which is simple and straightforward. The appropriate
use of tags to annotate data is more nuanced. You can think of tags - their structure and application -
as an analytical model that complements and extends the power of the data model.
The annotations and assessments that are “useful” for analysis may vary widely based on the analytical discipline in question, or even the needs of individual organizations within the same discipline. For this reason, Synapse does not include any “built in” tags. Organizations are free to design and use tags and tag trees that are most useful and relevant to them.
We encourage the design and use of tags that:
annotate assessments and conclusions that are relevant to your analysis.
allow you to ask the analytical questions that are most important to your organization.
While many disciplines will have similar tagging needs, tags are not necessarily “one size fits all”. For an example of tags/tag trees used by The Vertex Project, see our Vertex Tag Tree Overview blog.
This section discusses tags, their unique features, and their uses in more detail.
Tag Best Practices
The tags that you use to annotate data represent your analytical model. Your ability to conduct meaningful analysis depends in part on whether your analytical model is well-designed to meet your needs. The tags that work best for you may be different from those that work well for another organization.
The following recommendations should be considered when creating, maintaining, and using tags and tag trees.
Tag trees generally move from “less specific” to “more specific” the deeper you go within a hierarchy. The order of elements in your hierarchy can affect the types of analysis questions you can most easily answer. The structure you create should allow you to increase specificity in a way that is meaningful to the questions you’re trying to answer.
For example, let’s say you are storing copies of articles from various news feeds within Synapse (i.e., as
media:news nodes). You want to use tags to annotate the subject matter of the articles. Two possible options
Tag Tree #1
<country>.<topic>.<subtopic>.<subtopic>: us.economics.trade.gdp us.economics.trade.deficit us.economics.banking.lending us.economics.banking.regulatory us.politics.elections.national france.politics.elections.national france.politics.elections.local china.economics.banking.lending
Tag Tree #2
<topic>.<subtopic>.<subtopic>.<country>: economics.trade.gdp.us economics.trade.deficit.us economics.banking.lending.us economics.banking.regulatory.us politics.elections.national.us politics.elections.national.france politics.elections.local.france economics.banking.lending.china
Neither tag tree is right or wrong; which is more suitable depends on the types of questions you want to answer. If your analysis focuses primarily on news content within a particular region, the first option (which places “country” at the root of the tree) is probably more suitable. If your analysis focuses more on global geopolitical topics, the second option is probably better. As a general rule, the analytical focus that you “care about most” should generally go at the top of the hierarchy in order to make it easier to ask those questions.
Each positional element within a tag tree should have the same “category” or meaning. This makes it easier to work with portions of the tag tree in a consistent manner. For example, if you are tagging indicators of compromise with assessments related to third-party reporting, you should maintain a consistent structure:
In this example
rep is a top-level namespace for third party reporting, the second element refers to the reporter,
and the third element to what is being reported (threat, malware family, campaign, etc.).
A tag should represent “one thing” - an atomic assessment. This makes it easier to change that specific assessment without impacting other assessments. For example, let’s say you assess that an IPv4 address was used by the Vicious Wombat threat group as a C2 location for Redtree malware. It might be tempting to create a tag such as:
By combining three assessments (who used the IPv4, the malware associated with the IPv4, and how the IPv4 was used) you have made it much more difficult to update the context on the IP if any one of those three assessments changes. What if you realize the IPv4 was used by Sparkling Unicorn instead? Or that the IPv4 was used for data exfiltration and not C2? Using three separate tags makes it much easier to revise your assesments if necessary:
You can store both short-form and long-form definitions directly on
syn:tag nodes using the
:doc properties, respectively. We recommend that you use these properties to clearly define the meaning of
the tags you create within Synapse to ensure they are both applied and interpreted consistently.
Tag trees can be arbitrarily deep (that is, can support an arbitrary number of tag elements). This implies that deep tag trees can potentially represent very fine-grained observations. While more detail is sometimes helpful, tag trees should reflect the level of detail that is relevant for your analysis, and no more. Overly-detailed tag trees can actually hamper analysis by providing too many choices for analysts.
Tags that represent analytical assertions mean that a human analyst typically needs to evaluate the data, make an assessment, and decide what tag (or tags) to apply to the data. If tags are overly detailed analysts may get bogged down in “analysis paralysis” - worrying about whether tag A or tag B is correct when that distinction really doesn’t matter to the analysis at hand.
We recommend that tags have no more than five elements at most. As always, your specific use case may vary but this works well as general guidance.
Tagging data may represent a novel approach to analysis for many users. As analysts adjust to new workflows, it may be helpful to implement a subset of tags at first. Getting used to applying some basic tags may be easier than suddenly being asked to annotate data with a broad range of observations. As analysts get comfortable with the process, you can introduce additional tags or tag trees as appropriate.
Tags are meant to be flexible - the ability to easily add, remove, and modify tags is a built-in aspect of Synapse. Synapse also includes tools to help move, migrate, or restructure entire tag trees (e.g., the Storm movetag command).
No one designs a complete, perfect tag structure from the start. It is common to design an initial tag tree and then make changes once you have tested it in practice. Your tag trees will grow over time as analysts identify new observations they want to record. Your analytical needs may change, requiring you to reorganize multiple trees.
This is fine (and expected)! Don’t be afraid to try things or change your mind. In most cases, bulk changes and migrations can be made using Storm.
Any user with the appropriate permissions can create a new tag. The ability to create tags on the fly makes tags extremely flexible and convenient for analysts – they can create annotations to reflect their observations “in the moment” without the need to wait for code changes or approval cycles.
There is also some risk to this approach, particularly with large numbers of analysts, as analysts may create tags in an uncoordinated and haphazard fashion. Creating arbitrary (and potentially duplicative or contradictory) tags can work against effective analysis.
Your approach to tag creation and approval will depend on your needs and your environment. Where possible, we recommend a middle ground between “tag free-for-all” and “tightly-enforced change management”. It is useful for an analyst to create a tag on demand; if they have to wait for review and approval, their observation is likely to be lost as they move on to other tasks. That said, it is also helpful to have some type of regular review process to ensure the tags are being used in a consistent manner, fit appropriately into your analytical model, and have been given clear definitions.
No matter how well-designed a tag tree is, it is ineffective if the tags aren’t used consistently – that is, by a majority of analysts across a majority of relevant data. It’s true that 100% visibility into a given data set and 100% analyst review and annotation of that data is an unrealistic goal. However, for data and annotations that represent your most pressing analytical questions, you should strive for as much completeness as possible.
Looked at another way, inconsistent use of tags can result in gaps that can skew your assessment of the data. At best, this can lead to the inability to draw meaningful conclusions; at worst, to faulty analysis.
Inconsistency often occurs as both the number of analysts and the number of tags increase. The larger the team of analysts, the more difficult it is for that team to work closely and consistently together. Similarly, the more tags available to represent different assessments, the fewer tags an analyst can reasonably work with. In both cases, analysts may tend to drift towards analytical tasks that are most immediately relevant to their work or most interesting to them – thus losing sight of the collective analytical goals of the entire team.
Consider an example of tracking Internet domains that masquerade as legitimate companies for malicious purposes. If some
analysts are annotating this data but others are not, your ability to answer questions about this data is skewed. Let’s say
Threat Cluster 12 is associated with 200 domains, and 173 of them imitate real companies, but only 42 have been annotated
with “masquerade” tags (e.g.,
If you try to use the data to answer the question “does Threat Cluster 12 consistently register domains that imitate valid companies?”, your assessment is likely to be “no” (only 42 out of 200 domains have the associated tag) based on the incompletely annotated data. There are gaps in your analysis because the information to answer this question has only been partially recorded.
As the scope of analysis within Synapse increases, it is essential to recognize these gaps as a potential shortcoming that may need to be addressed. Options include:
Establish policy around which assessments and observations (and associated tags) are essential or “required”, and which are secondary (“optional” or “as time allows”).
Designate individual analysts or teams to be responsible for particular tasks and associated tags - often matching their area of expertise, such as “malware analysis”.
Leverage Synapse’s tools such as triggers, cron jobs, or macros to apply tags in cases where this can be automated. Automation also helps to ensure tags are applied consistently. (See Storm Reference - Automation for a more detailed discussion of Synapse’s automation tools.)