Connectors

Synapse Sidepocket supports connections to the following data sources.

Apache Impala

Connector name: impala

Connection arguments:

host: Host where the database is located.
port: Port to connect to.
user: Username to log in as.
password: Password to use.
database: Database to use.
timeout: Connection/query timeout.

Optional connection argument overrides supported in opts data:

user
password
timeout

Elasticsearch

Connector name: elasticsearch

Connection arguments:

host: URL to the Elasticsearch host.
index: Index name to query against.
sniff: Use connection sniffing.
timeout: Connection/query timeout.
basicauth: A tuple of username and password information for supporting Elasticsearch authentication via HTTP basic auth.
verify_certs: A boolean, that if set to false, disables TLS certificate verification.

Optional connection argument overrides supported in opts data:

basicauth
timeout

This connector can query a Elasticsearch host and yield the raw records, given in the _source field of a result. The query parameter for connector can accept either a Lucene style query string, or a complete Elasticsearch Query DSL body. In both cases, the query values are sent to the _search/ API endpoint for the configured index value. The index value may have a wildcard (*) at the end of it, this is passed to the Elasticearch API and can be used to configure a connector to search against multiple indices which share a common prefix at once.

The Elasticsearch connector supports the following options being passed to it in the opts data when making a query:

size

An integer value. This is the total number of results that may be returned. If unspecified, this defaults to the default values for the Elasticsearch server.

fields

A list of strings. This will be the list of fields which may be returned in matching documents.

complete_hit

A boolean value. If set to true, the complete hit for a given document will be returned. This would include additional information such as the index, document id and the raw _source data.

MySQL

Connector name: mysql

Connection arguments:

host: Host where the database is located.
port: Port to connect to.
user: Username to log in as.
password: Password to use.
db: Database to use.
timeout: Connection/query timeout.

Optional connection argument overrides supported in opts data:

user
password
timeout

Neo4j

Connector name: neo4j

Connection arguments:

URI: URI to the Neo4j instance.
auth: A tuple of username and password information for supporting Neo4j authentication via basic auth.
timeout: Connection/query timeout.

Optional connection argument overrides supported in opts data:

auth
timeout

Neo4j versions supported:

Neo4j 4.2
Neo4j 4.1
Neo4j 3.5

This connector can query a Neo4j instance and yield the resulting records as dictionaries of key/value pairs.

The Neo4j connector supports the following options being passed to it in the opts data when making a query:

database

Name of the database to query. If unspecified, the default database on the Neo4j instance will be queried.

PostgreSQL

Connector name: postgresql

Connection arguments:

host: Host where the database is located.
port: Port to connect to.
user: Username to log in as.
password: Password to use.
dbname: Database to use.
timeout: Connection/query timeout.

Optional connection argument overrides supported in opts data:

user
password
timeout

Presto

Connector name: presto

Connection arguments:

host: Host where the database is located.
port: Port to connect to.
username: Username to log in as.
catalog: Catalog to use.
schema: Schema to use.
timeout: Connection/query timeout.

Optional connection argument overrides supported in opts data:

username
timeout

Additional options which may be provided in opts data:

arraysize: The number of results to fetch at a time from the database.

Trino

Connector name: trino

Connection arguments:

host: Host where the database is located.
port: Port to connect to.
username: Username to log in as.
catalog: Catalog to use.
schema: Schema to use.
timeout: Connection/query timeout.

Optional connection argument overrides supported in opts data:

username
timeout

Additional options which may be provided in opts data:

arraysize: The number of results to fetch at a time from the database.

Athena

Connector name: athena

Connection arguments:

database: The name of the database to use.
catalog: The name of the data catalog to use.
workgroup: The name of the workgroup to use.
output: The location in S3 to store query results (must be provided if no workgroup is set).
boto3: A dictionary of Boto3 configuration key-vals.

Athena does not support any connection argument overrides in opts data.

This connector can query Athena, and yield result rows. The output is provided as a message tuple of the form (<message_type>, <data>, <info>). For details on the message types see the Storm help for sidepocket.athena.results.

In addition to executing a query and yielding results, the Athena connector supports the following:

results

If the query string is None and query_id is provided in the opts data the connector will retrieve results from an existing query.

status

The status of a single query, or all queries (up to a specified limit), yielded as dictionaries.

cancel

Cancel an existing queued or running query.

By default the connector will load the AWS configuration from the host environment. However, a dictionary can be provided via the boto3 option to configure a data source. The most common configurations are (for details see the Boto3 Session reference:

aws_access_key_id
aws_secret_access_key
region_name
profile_name

The Athena connector supports the following options being passed to it in the opts data when making a query:

timeout

Time, in seconds, to wait for each Athena operation to complete (defaults to 30 seconds).

background

Defaults to False, i.e. the connector will wait for the query to complete and then yield results. If True, return after starting the query execution. The query id is provided in the output.

When background is False, the following additional options are supported:

max_items

The maximum total number of items to retrieve (defaults to no limit).

page_size

The maximum number of items to retrieve in a given page (defaults to 500). Adjusting this parameter changes how often results will be yield, and the number of requests to Athena.

poll_s

If the query has not yet finished, poll for the status at this interval (defaults to 5 seconds).

Snowflake

Connector name: snowflake

Connection arguments:

authenticator: Authenticator to use for Snowflake. Valid values are snowflake and oauth (defaults to snowflake).
account: Account to log in to.
user: User to log in as.
password: Password to use
token: OAuth access token to use for authentication when using the oauth authenticator.

Optional connection arguments:

database: Default database to use.
schema: Default database to use.
warehouse: Default warehouse to use.
role: Default role to use.
login-timeout: Timeout in seconds for login (defaults to 60).
network-timeout: Timeout in seconds for all other operations (defaults to none).

Optional connection argument overrides supported in opts data:

user
password
token

This connector can query Snowflake, and yield result rows. The output is provided as a message tuple of the form (<message_type>, <data>, <info>). For details on the message types see the Storm help for sidepocket.snowflake.results.

In addition to executing a query and yielding results, the Snowflake connector supports the following:

results

If the query string is None and query_id is provided in the opts data the connector will retrieve results from an existing query.

status

The status of a single query, or all queries (up to a specified limit), yielded as dictionaries.

When retrieving the status for a single Snowflake query, if no database is specified (either by a default database being specified when the data source was added, or by specifying one with the –database argument) only basic status information with be available. If no query_id is specified, a database must be specified to retrieve query history information.

cancel

Cancel an existing queued or running query.

The Snowflake connector supports the following options being passed to it in the opts data when making a query:

timeout

Time, in seconds, to wait for each Snowflake operation to complete (defaults to 30 seconds).

background

Defaults to False, i.e. the connector will wait for the query to complete and then yield results. If True, return after starting the query execution. The query id is provided in the output.

When background is False, the following additional options are supported:

max_items

The maximum total number of items to retrieve (defaults to no limit).

page_size

The maximum number of items to retrieve in a given page (defaults to 500). Adjusting this parameter changes how often results will be yield, and the number of requests to Snowflake.

poll_s

If the query has not yet finished, poll for the status at this interval (defaults to 5 seconds).

OAuth2

This connector supports using an OAuth2 provider configured in the Cortex, which can be shared across multiple sources. When a user connects to a source with an OAuth2 provider Optic will prompt the user to authenticate with Snowflake. If successful, the user will have a personal token associated with the provider that will be background refreshed.

To setup a Snowflake source using OAuth2, create a Snowflake OAuth2 custom client with support for refresh tokens and the redirect URI set to https://<your_optic_netloc>/oauth2.

The client ID and secret can then be used to add a new named provider in the Cortex via the sidepocket.snowflake.oauth2.add command, which can then be used in the sidepocket.snowflake.add command. The Synapse-Sidepocket Source Management workflow also supports configuring a Snowflake source to use OAuth2.

Both the sidepocket.snowflake.oauth2.add command and the workflow will use the following defaults when creating the provider:

{
    "iden": $lib.guid("sidepocket", $name)
    "name": `sidepocket:{$name}`
    "scope": "refresh_token"
    "auth_uri": `https://{$account}.snowflakecomputing.com/oauth/authorize`,
    "token_uri": `https://{$account}.snowflakecomputing.com/oauth/token-request`,
    "extensions": { "pkce": true }
}

To use different configuration options setup the provider with $lib.inet.http.oauth.v2.addProvider(). The provider can then be associated with a source by using the provider iden for the oauth-provider option on the sidepocket.snowflake.add command, or selecting by name in the workflow dropdown.

Kusto

Connector name: kusto

Connection arguments:

authenticator: The authentication method to use. See below for details.
cluster: The cluster URL to use.
client_id: The AAD application id.
client_secret: The AAD application key.
authority_id: The AAD tenant id.
username: The AAD username.
password: The AAD password.
private_key: PEM encoded certificate private key.
certificate: PEM encoded public certificate matching provided private_key.
certificate_hash: Hex encoded thumbprint of the certificate.
dbname: The named database to use.

Optional connection argument overrides supported in opts data:

client_id
client_secret
username
password

If dbname is provided in the source setup it will always be used, and cannot be overridden. Otherwise dbname must be provided the opts dictionary when querying.

The following authentication methods can be set using the authenticator argument:

basic: Requires username and password, and optionally authority_id.
application_key: Requires client_id, client_secret, and authority_id.
application_cert: Requires client_id, private_key, certificate_hash, and authority_id.
application_cert_sni: Requires client_id, private_key, certificate, certificate_hash, and authority_id.
msi: Will optionally use client_id if provided.
unsecured: No additional arguments are required.
oauth2: No additional arguments are required.

OAuth2

This connector supports using an OAuth2 provider configured in the Cortex, which can be shared across multiple sources. When a user connects to a source with an OAuth2 provider Optic will prompt the user to authenticate with Azure. If successful, the user will have a personal token associated with the provider that will be background refreshed.

To setup a Kusto source using OAuth2, follow the instructions for registering an application with access to Azure Data Explorer and set the redirect URI to https://<your_optic_netloc>/oauth2.

The client ID and secret can then be used to add a new named provider in the Cortex via the sidepocket.kusto.oauth2.add command, which can then be used in the sidepocket.kusto.add command. The Synapse-Sidepocket Source Management workflow also supports configuring a Kusto source to use OAuth2.

Both the sidepocket.kusto.oauth2.add command and the workflow will use the following defaults when creating the provider:

{
    "iden": $lib.guid("sidepocket", $name),
    "name": `sidepocket:{$name}`,
    "client_id": $client_id,
    "client_secret": $client_secret,
    "scope": `offline_access https://{$cluster_id}.{$region}.kusto.windows.net/.default`,
    "auth_uri": `https://login.microsoftonline.com/{$authority_id}/oauth2/v2.0/authorize`,
    "token_uri": `https://login.microsoftonline.com/{$authority_id}/oauth2/v2.0/token`,
    "redirect_uri": $redirect_uri,
    "extensions": { "pkce": true }
}

To use different configuration options, setup the provider with $lib.inet.http.oauth.v2.addProvider(). The provider can then be associated with a source by using the provider iden for the oauth-provider option on the sidepocket.kusto.add command, or selecting by name in the workflow dropdown.

OrientDB

Connector name: orientdb

Connection arguments:

host: Host where the database is located.
port: Port to connect to.
user: Username to log in as.
password: Password to use.
db_name: Default database to use.
timeout: Connection/query timeout.

Optional connection argument overrides supported in opts data:

user
password
timeout

If db_name is not provided in the source setup, it must be provided in the opts dictionary when querying.

The output is provided as a message tuple of the form (<message_type>, <data>, <info>). Messages of type data contain query result objects, with the keys _rid, _version, _class, and oRecordData populated with the corresponding data from the OrientRecord for the result.

Note: Only OrientDB versions prior to 3.0.0 are supported with this connector.