Formats

The Synapse FileParser extracts data from the formats listed below for each parser. By default the MIME will be auto-detected, however it can be specified by parser name, MIME, or alias using --mime.
By default, the size, mime, md5, sha1, and sha256 properties will be updated on the inbound file:bytes node. Additional properties that may be set are noted.
Field documentation:
--------------------
doc      : Documentation for the parser.
name     : Name of the parser (which can also be used as an alias).
confdef  : Command-line configuration options.
forms    : Forms of nodes that the parser may create (or add props to if it exists).
mimes    : MIME types the parser supports.
props    : Properties that the parser may set on the file:bytes node.
aliases  : Alias names which can be used to call the parser.


Name: exe
---------

Documentation:
    Portable Executable file parser.

    Using the extracted data one or more mime:pe:* props will be set on the file:bytes node,
    and file:mime:pe:* nodes will created where appropriate. The file will also be sent to
    the C2 Config parser, with additional nodes yielded if the configured YARA rules match.

    Certificates are also extracted in order to create crypto:x509:signedfile nodes,
    and are then sent to the X.509 subparser to create the full crypto:x509:cert nodes.
    Properties for certificate chains are also handled.

    Each signer item parsed from the file is checked against the included certificates
    to see if the signer's certificate is present. The match is executed using either
    issuer and serial or the key identifier value. Note that no included certificates
    may match in which case a crypto:x509:signedfile node will not be created.

    If the exe:strings configuration option is set to true, the parser will
    attempt to detect strings in the PE. Only strings with a minimum length of
    exe:strings:minlen will be detected. Any detected strings will also be added as
    it:dev:str nodes with a -(refs)> light-edge to the original node.

Aliases: n/a

MIMEs:
    application/vnd.microsoft.portable-executable

Forms:
    file:bytes
    file:mime:pe:export
    file:mime:pe:resource
    file:mime:pe:section
    file:mime:pe:vsvers:info
    crypto:x509:signedfile
    it:dev:str

Props (file:bytes):
    mime:pe:imphash
    mime:pe:compiled
    mime:pe:pdbpath
    mime:pe:exports:time
    mime:pe:exports:libname
    mime:pe:richhdr
    mime:pe:size

Configuration options:

    "exe:strings":  Enable string processing.
        type: boolean
        default: false

    "exe:strings:minlen":  Minimum detected string length in characters.
        type: integer
        default: 6

    "exe:strings:scrape":  Scrape detected strings.
        type: boolean
        default: false


Name: lnk
---------

Documentation:


    The two limitations of this parser are currently that network locations
    are not currently parsed, and only LNK files from Windows XP machines and up are
    supported.

Aliases: n/a

MIMEs:
    application/x-ms-shortcut

Forms:
    file:mime:lnk

Props (file:bytes):
    n/a

Configuration options:
    n/a


Name: pdf
---------

Documentation:
    PDF file parser.

    The PDF parser extracts the file text (with an optional password),
    and passes it through scrape to extract nodes.

    Annotations, metadata, and images are not currently parsed.
    Rasterized PDFs are not currently supported.

Aliases: n/a

MIMEs:
    application/pdf

Forms:
    file:path
    inet:url
    inet:email
    inet:server
    inet:ipv4
    inet:ipv6
    inet:fqdn
    hash:md5
    hash:sha1
    hash:sha256
    it:sec:cve
    it:sec:cwe
    it:sec:cpe
    crypto:currency:address

Props (file:bytes):
    n/a

Configuration options:
    n/a


Name: rar
---------

Documentation:
    RAR file parser.

    A file:archive:entry node is created for each member
    file and is passed to other subparsers.

    Due to the proprietary nature of RAR some formats
    may not be supported.

Aliases: n/a

MIMEs:
    application/vnd.rar

Forms:
    file:archive:entry

Props (file:bytes):
    n/a

Configuration options:
    n/a


Name: tar
---------

Documentation:
    Tar file parser.

    The file is opened as an uncompressed Tar archive,
    and each member is passed to other subparsers.

Aliases: n/a

MIMEs:
    application/x-tar

Forms:
    n/a

Props (file:bytes):
    n/a

Configuration options:
    n/a


Name: pem
---------

Documentation:
    PEM file parser.

    Each item is extracted from the file and subparsed using the X.509 parser.
    Certificate chains are recognized and the appropriate properties
    are set in the output.

Aliases: n/a

MIMEs:
    application/x-pem-file

Forms:
    n/a

Props (file:bytes):
    n/a

Configuration options:
    n/a


Name: text
----------

Documentation:
    Plain text file parser.

    By default the parser uses the UTF-8 codec.
    Each line is decoded and run through scrape to extract nodes.

Aliases: n/a

MIMEs:
    text/plain

Forms:
    file:path
    inet:url
    inet:email
    inet:server
    inet:ipv4
    inet:ipv6
    inet:fqdn
    hash:md5
    hash:sha1
    hash:sha256
    it:sec:cve
    it:sec:cwe
    it:sec:cpe
    crypto:currency:address

Props (file:bytes):
    n/a

Configuration options:

    "text:line:maxsize":  Break up lines over this size in bytes. Specify -1 to remove limits.
        type: integer
        default: 104857600


Name: csv
---------

Documentation:
    CSV file parser.

    The Python CSV sniffer is used to auto-detect the dialect and delimiter.
    By default the UTF-8 codec is used.

    If the MIME is auto-detected only the following delimiters are considered valid:
    {'\t', ',', ';', '|'}

    Each item in each row is run through the scrape library to generate nodes.

Aliases: n/a

MIMEs:
    text/csv

Forms:
    file:path
    inet:url
    inet:email
    inet:server
    inet:ipv4
    inet:ipv6
    inet:fqdn
    hash:md5
    hash:sha1
    hash:sha256
    it:sec:cve
    it:sec:cwe
    it:sec:cpe
    crypto:currency:address

Props (file:bytes):
    n/a

Configuration options:
    n/a


Name: xml
---------

Documentation:
    XML file parser.

    The Python lxml parser is used to extract text within all tags,
    which is then run through the scrape library to generate nodes.

Aliases: n/a

MIMEs:
    text/xml
    application/xml

Forms:
    file:path
    inet:url
    inet:email
    inet:server
    inet:ipv4
    inet:ipv6
    inet:fqdn
    hash:md5
    hash:sha1
    hash:sha256
    it:sec:cve
    it:sec:cwe
    it:sec:cpe
    crypto:currency:address

Props (file:bytes):
    n/a

Configuration options:
    n/a


Name: zip
---------

Documentation:
    Zip file parser.

    The file is parsed depending on the type of archive detected.

    Android and Java archives are detected to set the mime,
    but no further parsing is executed.

    If the archive is identified as an OpenXML file, the XML files below are extracted
    and sent to a subparser to generate nodes. If available, metadata will also be extracted
    to create file:mime:* nodes.
    ppt/slides/*.xml
    word/document.xml
    xl/sharedStrings.xml

    Additional embedded objects within the OpenXML file are not currently extracted
    (e.g. macros, images).

    If the archive is not identified as one of the above, the archive is treated as a
    normal Zip archive (with optional password support), and each file is extracted
    and sent to a subparser to generate nodes.

    If the ZIP archive is detected as needing a password, but a password is
    not specified, this parser will attempt several common passwords in attempt
    to extract files from the archive. The passwords that will be attempted
    are: infected, infected666, password123, malware

Aliases: xlsx, docx, pptx

MIMEs:
    application/zip
    application/vnd.android.package-archive
    application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
    application/vnd.ms-excel.sheet.binary.macroEnabled.12
    application/vnd.openxmlformats-officedocument.wordprocessingml.document
    application/vnd.openxmlformats-officedocument.presentationml.presentation
    application/java-archive

Forms:
    file:mime:msppt
    file:mime:msxls
    file:mime:msdoc

Props (file:bytes):
    n/a

Configuration options:
    n/a


Name: gzip
----------

Documentation:
    GZip file parser.

    The file is opened as a gzip archive, and the uncompressed bytes are passed
    to other subparsers.

Aliases: n/a

MIMEs:
    application/gzip

Forms:
    n/a

Props (file:bytes):
    n/a

Configuration options:
    n/a


Name: html
----------

Documentation:
    HTML file parser.

    This parser loads the HTML as structured data and scrapes text from the following items
    to create nodes:
    - <title> text
    - All text within navigatable tags

    Href and src attributes from tags are analyzed to form inet:url/inet:email/tel:phone
    nodes, depending on
    the protocol information given in the href URI. If no protocol information is found, the
    parser will
    fall back to scraping the text of the href/src attribute.

    By default the following tags are excluded:
    script, link, style, comment, img, image, audio, video, input, embed, source, iframe,
    track

Aliases: n/a

MIMEs:
    text/html
    application/xhtml+xml

Forms:
    file:path
    inet:url
    inet:email
    inet:server
    inet:ipv4
    inet:ipv6
    inet:fqdn
    hash:md5
    hash:sha1
    hash:sha256
    it:sec:cve
    it:sec:cwe
    it:sec:cpe
    crypto:currency:address

Props (file:bytes):
    n/a

Configuration options:
    n/a


Name: json
----------

Documentation:
    JSON file parser.

    Decode the file as JSON and extract strings from keys and values,
    which are run through the scrape library to generate nodes.

Aliases: n/a

MIMEs:
    application/json

Forms:
    file:path
    inet:url
    inet:email
    inet:server
    inet:ipv4
    inet:ipv6
    inet:fqdn
    hash:md5
    hash:sha1
    hash:sha256
    it:sec:cve
    it:sec:cwe
    it:sec:cpe
    crypto:currency:address

Props (file:bytes):
    n/a

Configuration options:

    "json:maxsize":  Maximum size to load as a JSON object in bytes.
        type: integer
        default: 104857600


Name: jsonl
-----------

Documentation:
    JSON lines file parser.

    Decode the file as JSON lines and extract strings from keys and values,
    which are run through the scrape library to generate nodes.

Aliases: n/a

MIMEs:
    application/jsonlines

Forms:
    file:path
    inet:url
    inet:email
    inet:server
    inet:ipv4
    inet:ipv6
    inet:fqdn
    hash:md5
    hash:sha1
    hash:sha256
    it:sec:cve
    it:sec:cwe
    it:sec:cpe
    crypto:currency:address

Props (file:bytes):
    n/a

Configuration options:

    "jsonl:maxsize":  Maximum size to load as a JSON object in bytes.
        type: integer
        default: 104857600


Name: mbox
----------

Documentation:
    MBOX file parser.

    Each message is extracted and subparsed using the RFC822 parser.
    Message bytes are linked to the MBOX file through a file:subfile relationship.

Aliases: n/a

MIMEs:
    application/mbox

Forms:
    n/a

Props (file:bytes):
    n/a

Configuration options:
    n/a


Name: x509
----------

Documentation:
    X.509 DER parser.

    The first Common Name extracted is used to set the name and mime:x509:cn properties
    on the file:bytes node.

    CN and SAN values are also parsed to populate the following identities on the
    crypto:x509:cert node:
    ipv4, ipv6, url, fqdn, email

    URLs from CRL distribution points are also included in identities:urls.

Aliases: n/a

MIMEs:
    application/x-x509-ca-cert
    application/x-x509-user-cert
    application/x-x509-server-cert

Forms:
    crypto:x509:cert

Props (file:bytes):
    name
    mime:x509:cn

Configuration options:
    n/a


Name: yara
----------

Documentation:
    YARA file parser.

    Extract YARA rules and create it:app:yara:rule nodes.
    Imports are detected and handled during parsing.

    The raw lines from the file are also run through the scrape library to generate nodes.

Aliases: n/a

MIMEs:
    text/x-yara

Forms:
    it:app:yara:rule
    file:path
    inet:url
    inet:email
    inet:server
    inet:ipv4
    inet:ipv6
    inet:fqdn
    hash:md5
    hash:sha1
    hash:sha256
    it:sec:cve
    it:sec:cwe
    it:sec:cpe
    crypto:currency:address

Props (file:bytes):
    n/a

Configuration options:
    n/a


Name: image
-----------

Documentation:
    Image metadata and OCR parser.

    For JPEG and TIFF images the EXIF data is extracted to create file:mime:jpg and
    file:mime:tif respectively.

    For PNG and GIF images the metadata is extracted using Python PIL to create file:mime:png
    and file:mime:gif.

    Using the --debug flag will print the raw metadata in the output stream.

    For all image types, Optical Character Recognition (OCR) is run to extract the text
    lines,
    of which the first 100kB are used to set the file:mime:image text property.
    Additional nodes may be created by passing the OCR text through the scrape library.

Aliases: n/a

MIMEs:
    image/png
    image/gif
    image/jpeg
    image/tiff

Forms:
    file:mime:png
    file:mime:gif
    file:mime:jpg
    file:mime:tif
    file:path
    inet:url
    inet:email
    inet:server
    inet:ipv4
    inet:ipv6
    inet:fqdn
    hash:md5
    hash:sha1
    hash:sha256
    it:sec:cve
    it:sec:cwe
    it:sec:cpe
    crypto:currency:address

Props (file:bytes):
    n/a

Configuration options:

    "image:ocr:confidence:min":  File must contain at least one segment above this confidence level to return text.
        type: integer
        default: 75

    "image:ocr:tesseract:psm":  Set the Tesseract OCR Page Segmentation Mode (PSM).
        type: integer
        default: 3

    "image:ocr:tesseract:thresholding_method":  Set the Tesseract OCR Thresholding Method.
        type: integer
        default: 1


Name: macho
-----------

Documentation:
    Mach-O parser for Apple's executable format.

    Handles both the base Mach-O file, as well as the FAT/Universal binaries that
    are wrappers for multiple Mach-O files.

    Certificate handling is performed exactly in the same manner as the EXE parser,
    in that certificates are extracted from the corresponding CODE_SIGNATURE
    load command of the corresponding Mach-O file.

Aliases: n/a

MIMEs:
    application/x-mach-binary

Forms:
    file:bytes
    file:mime:macho:loadcmd
    file:mime:macho:uuid
    file:mime:macho:version
    file:mime:macho:segment
    file:mime:macho:section
    crypto:x509:signedfile

Props (file:bytes):
    n/a

Configuration options:
    n/a


Name: rfc822
------------

Documentation:
    RFC822 e-mail parser.

    A default list of headers are extracted and run through scrape to extract nodes.
    They are also used to set props on the created inet:email:message node.

    Default headers:
    to, from, sender, subject, date, reply-to, x-mailer, return-path

    Attachments are extracted from the e-mail, with all multiparts followed,
    and inet:email:messsage:attachment nodes are created.

    Each attachment is parsed as a subfile, and therefore additional nodes
    may be created depending on the format.

    The content body of the email is extracted as both plain text and html.
    The inet:email:message:body prop is set to plain text if it exists,
    else html if it exists. The body is then passed to a subparser to extract
    additional nodes, prefering html over plain text.
    If an inet:url is scraped an inet:email:message:link node is also created.

    Finally, an inet:email:message node is created for each recipient in
    the "to", "cc", "bcc" fields.

    If no recipient is identified a generic inet:email:message is created.

Aliases: email, message

MIMEs:
    message/rfc822

Forms:
    inet:email:message
    inet:email:message:attachment
    inet:email:message:link
    file:path
    inet:url
    inet:email
    inet:server
    inet:ipv4
    inet:ipv6
    inet:fqdn
    hash:md5
    hash:sha1
    hash:sha256
    it:sec:cve
    it:sec:cwe
    it:sec:cpe
    crypto:currency:address

Props (file:bytes):
    n/a

Configuration options:

    "email:body:maxlen":  Maximum length of the body to extract in bytes.
        type: integer
        default: 1048576


Name: 7zip
----------

Documentation:
    7z file parser.

    The file is opened as a 7zip archive (with an optional password),
    and the uncompressed bytes for each contained file are passed to
    other subparsers.

    If the 7zip archive is detected as needing a password, but a password is
    not specified, this parser will attempt several common passwords in attempt
    to extract files from the archive. The passwords that will be attempted
    are: infected, infected666, password123, malware

Aliases: n/a

MIMEs:
    application/x-7z-compressed

Forms:
    n/a

Props (file:bytes):
    n/a

Configuration options:
    n/a


Name: msoffice
--------------

Documentation:
    Parser for binary MS Office files.

    Metadata and macros are extracted for all file types,
    and for MS Office files the appropriate file:mime:* nodes will be created.

    Currently, text scraping is only supported for Excel files.
    Each cell value is extracted and run through the scrape library to generate nodes.

    Word and PowerPoint files are detected, and the appropriate mime
    is set on the file:bytes node.

    Outlook files will have header values scraped (if available), and will create
    inet:email:message nodes. The default headers used to populate
    inet:email:message:headers are identical to the RFC822 parser.

    Note: For newer MS Office files use the zip mime parser (e.g. xlsx vs xls).

Aliases: n/a

MIMEs:
    application/msword
    application/vnd.ms-excel
    application/vnd.ms-outlook
    application/vnd.ms-powerpoint

Forms:
    file:path
    inet:url
    inet:email
    inet:server
    inet:ipv4
    inet:ipv6
    inet:fqdn
    hash:md5
    hash:sha1
    hash:sha256
    it:sec:cve
    it:sec:cwe
    it:sec:cpe
    crypto:currency:address
    file:mime:msppt
    file:mime:msxls
    file:mime:msdoc
    inet:email:message
    inet:email:message:attachment

Props (file:bytes):
    n/a

Configuration options:
    n/a


Name: rtf
---------

Documentation:
    RTF file parser.

    First the main body is extracted and run through the scrape library to generate nodes.
    Metadata is also extracted, however currently this can only be viewed with the debug
    flag.

    The following fields are also extracted and run through scrape:
    fldinst

    Images identified by the following fields are extracted and run through a subparser:
    pict

    Additional embedded objects identified by oletool are also extracted and run through a
    subparser.

Aliases: n/a

MIMEs:
    application/rtf

Forms:
    file:path
    inet:url
    inet:email
    inet:server
    inet:ipv4
    inet:ipv6
    inet:fqdn
    hash:md5
    hash:sha1
    hash:sha256
    it:sec:cve
    it:sec:cwe
    it:sec:cpe
    crypto:currency:address

Props (file:bytes):
    n/a

Configuration options:
    n/a


Name: c2config
--------------

Documentation:
    C2 configuration parser (beta).

    This parser will first attempt to match the file against a set of
    pre-configured YARA rules for the following families:
        - Cobalt Strike BEACON

    For YARA rules that match, it:app:yara:rule and it:app:yara:match nodes will be created.
    A parser for the matching family will then attempt to extract the configuration,
    and if successful will create an it:sec:c2:config node.

Aliases: n/a

MIMEs:
    n/a

Forms:
    it:app:yara:rule
    it:app:yara:match
    it:sec:c2:config

Props (file:bytes):
    n/a

Configuration options:
    n/a