Formats
The Synapse FileParser extracts data from the formats listed below for each parser.
By default the MIME will be auto-detected, however it can be specified by parser name, MIME, or alias using --mime
.
By default, the size
, mime
, md5
, sha1
, and sha256
properties will be updated on the inbound
file:bytes
node. Additional properties that may be set are noted.
Field documentation:
--------------------
doc : Documentation for the parser.
name : Name of the parser (which can also be used as an alias).
confdef : Command-line configuration options.
forms : Forms of nodes that the parser may create (or add props to if it exists).
mimes : MIME types the parser supports.
props : Properties that the parser may set on the file:bytes node.
aliases : Alias names which can be used to call the parser.
Name: exe
---------
Documentation:
Portable Executable file parser.
Using the extracted data one or more mime:pe:* props will be set on the file:bytes node,
and file:mime:pe:* nodes will created where appropriate. The file will also be sent to
the C2 Config parser, with additional nodes yielded if the configured YARA rules match.
Certificates are also extracted in order to create crypto:x509:signedfile nodes,
and are then sent to the X.509 subparser to create the full crypto:x509:cert nodes.
Properties for certificate chains are also handled.
Each signer item parsed from the file is checked against the included certificates
to see if the signer's certificate is present. The match is executed using either
issuer and serial or the key identifier value. Note that no included certificates
may match in which case a crypto:x509:signedfile node will not be created.
If the exe:strings configuration option is set to true, the parser will
attempt to detect strings in the PE. Only strings with a minimum length of
exe:strings:minlen will be detected. Any detected strings will also be added as
it:dev:str nodes with a -(refs)> light-edge to the original node.
Aliases: n/a
MIMEs:
application/vnd.microsoft.portable-executable
Forms:
file:bytes
file:mime:pe:export
file:mime:pe:resource
file:mime:pe:section
file:mime:pe:vsvers:info
crypto:x509:signedfile
it:dev:str
Props (file:bytes):
mime:pe:imphash
mime:pe:compiled
mime:pe:pdbpath
mime:pe:exports:time
mime:pe:exports:libname
mime:pe:richhdr
mime:pe:size
Configuration options:
"exe:strings": Enable string processing.
type: boolean
default: false
"exe:strings:minlen": Minimum detected string length in characters.
type: integer
default: 6
"exe:strings:scrape": Scrape detected strings.
type: boolean
default: false
Name: lnk
---------
Documentation:
The two limitations of this parser are currently that network locations
are not currently parsed, and only LNK files from Windows XP machines and up are
supported.
Aliases: n/a
MIMEs:
application/x-ms-shortcut
Forms:
file:mime:lnk
Props (file:bytes):
n/a
Configuration options:
n/a
Name: pdf
---------
Documentation:
PDF file parser.
The PDF parser extracts the file text (with an optional password),
and passes it through scrape to extract nodes.
Annotations, metadata, and images are not currently parsed.
Rasterized PDFs are not currently supported.
Aliases: n/a
MIMEs:
application/pdf
Forms:
file:path
inet:url
inet:email
inet:server
inet:ipv4
inet:ipv6
inet:fqdn
hash:md5
hash:sha1
hash:sha256
it:sec:cve
it:sec:cwe
it:sec:cpe
crypto:currency:address
Props (file:bytes):
n/a
Configuration options:
n/a
Name: rar
---------
Documentation:
RAR file parser.
A file:archive:entry node is created for each member
file and is passed to other subparsers.
Due to the proprietary nature of RAR some formats
may not be supported.
Aliases: n/a
MIMEs:
application/vnd.rar
Forms:
file:archive:entry
Props (file:bytes):
n/a
Configuration options:
n/a
Name: tar
---------
Documentation:
Tar file parser.
The file is opened as an uncompressed Tar archive,
and each member is passed to other subparsers.
Aliases: n/a
MIMEs:
application/x-tar
Forms:
n/a
Props (file:bytes):
n/a
Configuration options:
n/a
Name: pem
---------
Documentation:
PEM file parser.
Each item is extracted from the file and subparsed using the X.509 parser.
Certificate chains are recognized and the appropriate properties
are set in the output.
Aliases: n/a
MIMEs:
application/x-pem-file
Forms:
n/a
Props (file:bytes):
n/a
Configuration options:
n/a
Name: text
----------
Documentation:
Plain text file parser.
By default the parser uses the UTF-8 codec.
Each line is decoded and run through scrape to extract nodes.
Aliases: n/a
MIMEs:
text/plain
Forms:
file:path
inet:url
inet:email
inet:server
inet:ipv4
inet:ipv6
inet:fqdn
hash:md5
hash:sha1
hash:sha256
it:sec:cve
it:sec:cwe
it:sec:cpe
crypto:currency:address
Props (file:bytes):
n/a
Configuration options:
"text:line:maxsize": Break up lines over this size in bytes. Specify -1 to remove limits.
type: integer
default: 104857600
Name: csv
---------
Documentation:
CSV file parser.
The Python CSV sniffer is used to auto-detect the dialect and delimiter.
By default the UTF-8 codec is used.
If the MIME is auto-detected only the following delimiters are considered valid:
{'|', ';', '\t', ','}
Each item in each row is run through the scrape library to generate nodes.
Aliases: n/a
MIMEs:
text/csv
Forms:
file:path
inet:url
inet:email
inet:server
inet:ipv4
inet:ipv6
inet:fqdn
hash:md5
hash:sha1
hash:sha256
it:sec:cve
it:sec:cwe
it:sec:cpe
crypto:currency:address
Props (file:bytes):
n/a
Configuration options:
n/a
Name: xml
---------
Documentation:
XML file parser.
The Python lxml parser is used to extract text within all tags,
which is then run through the scrape library to generate nodes.
Aliases: n/a
MIMEs:
text/xml
application/xml
Forms:
file:path
inet:url
inet:email
inet:server
inet:ipv4
inet:ipv6
inet:fqdn
hash:md5
hash:sha1
hash:sha256
it:sec:cve
it:sec:cwe
it:sec:cpe
crypto:currency:address
Props (file:bytes):
n/a
Configuration options:
n/a
Name: zip
---------
Documentation:
Zip file parser.
The file is parsed depending on the type of archive detected.
Android and Java archives are detected to set the mime,
but no further parsing is executed.
If the archive is identified as an OpenXML file, the XML files below are extracted
and sent to a subparser to generate nodes. If available, metadata will also be extracted
to create file:mime:* nodes.
ppt/slides/*.xml
word/document.xml
xl/sharedStrings.xml
Additional embedded objects within the OpenXML file are not currently extracted
(e.g. macros, images).
If the archive is not identified as one of the above, the archive is treated as a
normal Zip archive (with optional password support), and each file is extracted
and sent to a subparser to generate nodes.
If the ZIP archive is detected as needing a password, but a password is
not specified, this parser will attempt several common passwords in attempt
to extract files from the archive. The passwords that will be attempted
are: infected, infected666, password123, malware
Aliases: xlsx, docx, pptx
MIMEs:
application/zip
application/vnd.android.package-archive
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
application/vnd.ms-excel.sheet.binary.macroEnabled.12
application/vnd.openxmlformats-officedocument.wordprocessingml.document
application/vnd.openxmlformats-officedocument.presentationml.presentation
application/java-archive
Forms:
file:mime:msppt
file:mime:msxls
file:mime:msdoc
Props (file:bytes):
n/a
Configuration options:
n/a
Name: gzip
----------
Documentation:
GZip file parser.
The file is opened as a gzip archive, and the uncompressed bytes are passed
to other subparsers.
Aliases: n/a
MIMEs:
application/gzip
Forms:
n/a
Props (file:bytes):
n/a
Configuration options:
n/a
Name: html
----------
Documentation:
HTML file parser.
This parser loads the HTML as structured data and scrapes text from the following items
to create nodes:
- <title> text
- All text within navigatable tags
Href and src attributes from tags are analyzed to form inet:url/inet:email/tel:phone
nodes, depending on
the protocol information given in the href URI. If no protocol information is found, the
parser will
fall back to scraping the text of the href/src attribute.
By default the following tags are excluded:
script, link, style, comment, img, image, audio, video, input, embed, source, iframe,
track
Aliases: n/a
MIMEs:
text/html
application/xhtml+xml
Forms:
file:path
inet:url
inet:email
inet:server
inet:ipv4
inet:ipv6
inet:fqdn
hash:md5
hash:sha1
hash:sha256
it:sec:cve
it:sec:cwe
it:sec:cpe
crypto:currency:address
Props (file:bytes):
n/a
Configuration options:
n/a
Name: json
----------
Documentation:
JSON file parser.
Decode the file as JSON and extract strings from keys and values,
which are run through the scrape library to generate nodes.
Aliases: n/a
MIMEs:
application/json
Forms:
file:path
inet:url
inet:email
inet:server
inet:ipv4
inet:ipv6
inet:fqdn
hash:md5
hash:sha1
hash:sha256
it:sec:cve
it:sec:cwe
it:sec:cpe
crypto:currency:address
Props (file:bytes):
n/a
Configuration options:
"json:maxsize": Maximum size to load as a JSON object in bytes.
type: integer
default: 104857600
Name: jsonl
-----------
Documentation:
JSON lines file parser.
Decode the file as JSON lines and extract strings from keys and values,
which are run through the scrape library to generate nodes.
Aliases: n/a
MIMEs:
application/jsonlines
Forms:
file:path
inet:url
inet:email
inet:server
inet:ipv4
inet:ipv6
inet:fqdn
hash:md5
hash:sha1
hash:sha256
it:sec:cve
it:sec:cwe
it:sec:cpe
crypto:currency:address
Props (file:bytes):
n/a
Configuration options:
"jsonl:maxsize": Maximum size to load as a JSON object in bytes.
type: integer
default: 104857600
Name: mbox
----------
Documentation:
MBOX file parser.
Each message is extracted and subparsed using the RFC822 parser.
Message bytes are linked to the MBOX file through a file:subfile relationship.
Aliases: n/a
MIMEs:
application/mbox
Forms:
n/a
Props (file:bytes):
n/a
Configuration options:
n/a
Name: x509
----------
Documentation:
X.509 DER parser.
The first Common Name extracted is used to set the name and mime:x509:cn properties
on the file:bytes node.
CN and SAN values are also parsed to populate the following identities on the
crypto:x509:cert node:
ipv4, ipv6, url, fqdn, email
URLs from CRL distribution points are also included in identities:urls.
Aliases: n/a
MIMEs:
application/x-x509-ca-cert
application/x-x509-user-cert
application/x-x509-server-cert
Forms:
crypto:x509:cert
Props (file:bytes):
name
mime:x509:cn
Configuration options:
n/a
Name: yara
----------
Documentation:
YARA file parser.
Extract YARA rules and create it:app:yara:rule nodes.
Imports are detected and handled during parsing.
The raw lines from the file are also run through the scrape library to generate nodes.
Aliases: n/a
MIMEs:
text/x-yara
Forms:
it:app:yara:rule
file:path
inet:url
inet:email
inet:server
inet:ipv4
inet:ipv6
inet:fqdn
hash:md5
hash:sha1
hash:sha256
it:sec:cve
it:sec:cwe
it:sec:cpe
crypto:currency:address
Props (file:bytes):
n/a
Configuration options:
n/a
Name: image
-----------
Documentation:
Image metadata and OCR parser.
For JPEG and TIFF images the EXIF data is extracted to create file:mime:jpg and
file:mime:tif respectively.
For PNG and GIF images the metadata is extracted using Python PIL to create file:mime:png
and file:mime:gif.
Using the --debug flag will print the raw metadata in the output stream.
For all image types, Optical Character Recognition (OCR) is run to extract the text
lines,
of which the first 100kB are used to set the file:mime:image text property.
Additional nodes may be created by passing the OCR text through the scrape library.
Aliases: n/a
MIMEs:
image/png
image/gif
image/jpeg
image/tiff
Forms:
file:mime:png
file:mime:gif
file:mime:jpg
file:mime:tif
file:path
inet:url
inet:email
inet:server
inet:ipv4
inet:ipv6
inet:fqdn
hash:md5
hash:sha1
hash:sha256
it:sec:cve
it:sec:cwe
it:sec:cpe
crypto:currency:address
Props (file:bytes):
n/a
Configuration options:
"image:ocr:confidence:min": File must contain at least one segment above this confidence level to return text.
type: integer
default: 75
"image:ocr:tesseract:psm": Set the Tesseract OCR Page Segmentation Mode (PSM).
type: integer
default: 3
"image:ocr:tesseract:thresholding_method": Set the Tesseract OCR Thresholding Method.
type: integer
default: 1
Name: macho
-----------
Documentation:
Mach-O parser for Apple's executable format.
Handles both the base Mach-O file, as well as the FAT/Universal binaries that
are wrappers for multiple Mach-O files.
Certificate handling is performed exactly in the same manner as the EXE parser,
in that certificates are extracted from the corresponding CODE_SIGNATURE
load command of the corresponding Mach-O file.
Aliases: n/a
MIMEs:
application/x-mach-binary
Forms:
file:bytes
file:mime:macho:loadcmd
file:mime:macho:uuid
file:mime:macho:version
file:mime:macho:segment
file:mime:macho:section
crypto:x509:signedfile
Props (file:bytes):
n/a
Configuration options:
n/a
Name: rfc822
------------
Documentation:
RFC822 e-mail parser.
A default list of headers are extracted and run through scrape to extract nodes.
They are also used to set props on the created inet:email:message node.
Default headers:
to, from, sender, subject, date, reply-to, x-mailer, return-path
Attachments are extracted from the e-mail, with all multiparts followed,
and inet:email:messsage:attachment nodes are created.
Each attachment is parsed as a subfile, and therefore additional nodes
may be created depending on the format.
The content body of the email is extracted as both plain text and html.
The inet:email:message:body prop is set to plain text if it exists,
else html if it exists. The body is then passed to a subparser to extract
additional nodes, prefering html over plain text.
If an inet:url is scraped an inet:email:message:link node is also created.
Finally, an inet:email:message node is created for each recipient in
the "to", "cc", "bcc" fields.
If no recipient is identified a generic inet:email:message is created.
Aliases: email, message
MIMEs:
message/rfc822
Forms:
inet:email:message
inet:email:message:attachment
inet:email:message:link
file:path
inet:url
inet:email
inet:server
inet:ipv4
inet:ipv6
inet:fqdn
hash:md5
hash:sha1
hash:sha256
it:sec:cve
it:sec:cwe
it:sec:cpe
crypto:currency:address
Props (file:bytes):
n/a
Configuration options:
"email:body:maxlen": Maximum length of the body to extract in bytes.
type: integer
default: 1048576
Name: 7zip
----------
Documentation:
7z file parser.
The file is opened as a 7zip archive (with an optional password),
and the uncompressed bytes for each contained file are passed to
other subparsers.
If the 7zip archive is detected as needing a password, but a password is
not specified, this parser will attempt several common passwords in attempt
to extract files from the archive. The passwords that will be attempted
are: infected, infected666, password123, malware
Aliases: n/a
MIMEs:
application/x-7z-compressed
Forms:
n/a
Props (file:bytes):
n/a
Configuration options:
n/a
Name: msoffice
--------------
Documentation:
Parser for binary MS Office files.
Metadata and macros are extracted for all file types,
and for MS Office files the appropriate file:mime:* nodes will be created.
Currently, text scraping is only supported for Excel files.
Each cell value is extracted and run through the scrape library to generate nodes.
Word and PowerPoint files are detected, and the appropriate mime
is set on the file:bytes node.
Outlook files will have header values scraped (if available), and will create
inet:email:message nodes. The default headers used to populate
inet:email:message:headers are identical to the RFC822 parser.
Note: For newer MS Office files use the zip mime parser (e.g. xlsx vs xls).
Aliases: n/a
MIMEs:
application/msword
application/vnd.ms-excel
application/vnd.ms-outlook
application/vnd.ms-powerpoint
Forms:
file:path
inet:url
inet:email
inet:server
inet:ipv4
inet:ipv6
inet:fqdn
hash:md5
hash:sha1
hash:sha256
it:sec:cve
it:sec:cwe
it:sec:cpe
crypto:currency:address
file:mime:msppt
file:mime:msxls
file:mime:msdoc
inet:email:message
inet:email:message:attachment
Props (file:bytes):
n/a
Configuration options:
n/a
Name: rtf
---------
Documentation:
RTF file parser.
First the main body is extracted and run through the scrape library to generate nodes.
Metadata is also extracted, however currently this can only be viewed with the debug
flag.
The following fields are also extracted and run through scrape:
fldinst
Images identified by the following fields are extracted and run through a subparser:
pict
Additional embedded objects identified by oletool are also extracted and run through a
subparser.
Aliases: n/a
MIMEs:
application/rtf
Forms:
file:path
inet:url
inet:email
inet:server
inet:ipv4
inet:ipv6
inet:fqdn
hash:md5
hash:sha1
hash:sha256
it:sec:cve
it:sec:cwe
it:sec:cpe
crypto:currency:address
Props (file:bytes):
n/a
Configuration options:
n/a
Name: c2config
--------------
Documentation:
C2 configuration parser (beta).
This parser will first attempt to match the file against a set of
pre-configured YARA rules for the following families:
- Cobalt Strike BEACON
For YARA rules that match, it:app:yara:rule and it:app:yara:match nodes will be created.
A parser for the matching family will then attempt to extract the configuration,
and if successful will create an it:sec:c2:config node.
Aliases: n/a
MIMEs:
n/a
Forms:
it:app:yara:rule
it:app:yara:match
it:sec:c2:config
Props (file:bytes):
n/a
Configuration options:
n/a