Connectors
Synapse Sidepocket supports connections to the following data sources.
Apache Impala
Connector name: impala
Connection arguments:
host
: Host where the database is located.port
: Port to connect to.user
: Username to log in as.password
: Password to use.database
: Database to use.timeout
: Connection/query timeout.
Optional connection argument overrides supported in opts
data:
user
password
timeout
Elasticsearch
Connector name: elasticsearch
Connection arguments:
host
: URL to the Elasticsearch host.index
: Index name to query against.sniff
: Use connection sniffing.timeout
: Connection/query timeout.basicauth
: A tuple of username and password information for supporting Elasticsearch authentication via HTTP basic auth.verify_certs
: A boolean, that if set to false, disables TLS certificate verification.
Optional connection argument overrides supported in opts
data:
basicauth
timeout
This connector can query a Elasticsearch host and yield the raw records, given in the _source
field of a result.
The query parameter for connector can accept either a Lucene style query string, or a complete Elasticsearch Query
DSL body. In both cases, the query values are sent to the _search/
API endpoint for the configured index value.
The index value may have a wildcard (*
) at the end of it, this is passed to the Elasticearch API and can be used to
configure a connector to search against multiple indices which share a common prefix at once.
The Elasticsearch connector supports the following options being passed to it in the opts
data when making a query:
size
An integer value. This is the total number of results that may be returned. If unspecified, this defaults to the default values for the Elasticsearch server.
fields
A list of strings. This will be the list of fields which may be returned in matching documents.
complete_hit
A boolean value. If set to true, the complete hit for a given document will be returned. This would include additional information such as the index, document id and the raw
_source
data.
MySQL
Connector name: mysql
Connection arguments:
host
: Host where the database is located.port
: Port to connect to.user
: Username to log in as.password
: Password to use.db
: Database to use.timeout
: Connection/query timeout.
Optional connection argument overrides supported in opts
data:
user
password
timeout
Neo4j
Connector name: neo4j
Connection arguments:
URI
: URI to the Neo4j instance.auth
: A tuple of username and password information for supporting Neo4j authentication via basic auth.timeout
: Connection/query timeout.
Optional connection argument overrides supported in opts
data:
auth
timeout
Neo4j versions supported:
Neo4j 4.2
Neo4j 4.1
Neo4j 3.5
This connector can query a Neo4j instance and yield the resulting records as dictionaries of key/value pairs.
The Neo4j connector supports the following options being passed to it in the opts
data when making a query:
database
Name of the database to query. If unspecified, the default database on the Neo4j instance will be queried.
PostgreSQL
Connector name: postgresql
Connection arguments:
host
: Host where the database is located.port
: Port to connect to.user
: Username to log in as.password
: Password to use.dbname
: Database to use.timeout
: Connection/query timeout.
Optional connection argument overrides supported in opts
data:
user
password
timeout
Presto
Connector name: presto
Connection arguments:
host
: Host where the database is located.port
: Port to connect to.username
: Username to log in as.catalog
: Catalog to use.schema
: Schema to use.timeout
: Connection/query timeout.
Optional connection argument overrides supported in opts
data:
username
timeout
Additional options which may be provided in opts
data:
arraysize
: The number of results to fetch at a time from the database.
Trino
Connector name: trino
Connection arguments:
host
: Host where the database is located.port
: Port to connect to.username
: Username to log in as.catalog
: Catalog to use.schema
: Schema to use.timeout
: Connection/query timeout.
Optional connection argument overrides supported in opts
data:
username
timeout
Additional options which may be provided in opts
data:
arraysize
: The number of results to fetch at a time from the database.
Athena
Connector name: athena
Connection arguments:
database
: The name of the database to use.catalog
: The name of the data catalog to use.workgroup
: The name of the workgroup to use.output
: The location in S3 to store query results (must be provided if no workgroup is set).boto3
: A dictionary of Boto3 configuration key-vals.
Athena does not support any connection argument overrides in opts
data.
This connector can query Athena, and yield result rows.
The output is provided as a message tuple of the form (<message_type>, <data>, <info>)
.
For details on the message types see the Storm help for sidepocket.athena.results
.
In addition to executing a query and yielding results, the Athena connector supports the following:
results
If the query string is None and
query_id
is provided in theopts
data the connector will retrieve results from an existing query.
status
The status of a single query, or all queries (up to a specified limit), yielded as dictionaries.
cancel
Cancel an existing queued or running query.
By default the connector will load the AWS configuration from the host environment.
However, a dictionary can be provided via the boto3
option to configure a data source.
The most common configurations are (for details see the Boto3 Session reference:
aws_access_key_id
aws_secret_access_key
region_name
profile_name
The Athena connector supports the following options being passed to it in the opts
data when making a query:
timeout
Time, in seconds, to wait for each Athena operation to complete (defaults to 30 seconds).
background
Defaults to False, i.e. the connector will wait for the query to complete and then yield results. If True, return after starting the query execution. The query id is provided in the output.
When background is False, the following additional options are supported:
max_items
The maximum total number of items to retrieve (defaults to no limit).
page_size
The maximum number of items to retrieve in a given page (defaults to 500). Adjusting this parameter changes how often results will be yield, and the number of requests to Athena.
poll_s
If the query has not yet finished, poll for the status at this interval (defaults to 5 seconds).
Snowflake
Connector name: snowflake
Connection arguments:
authenticator
: Authenticator to use for Snowflake. Valid values aresnowflake
andoauth
(defaults tosnowflake
).account
: Account to log in to.user
: User to log in as.password
: Password to usetoken
: OAuth access token to use for authentication when using theoauth
authenticator.
Optional connection arguments:
database
: Default database to use.schema
: Default database to use.warehouse
: Default warehouse to use.role
: Default role to use.login-timeout
: Timeout in seconds for login (defaults to 60).network-timeout
: Timeout in seconds for all other operations (defaults to none).
Optional connection argument overrides supported in opts
data:
user
password
token
This connector can query Snowflake, and yield result rows.
The output is provided as a message tuple of the form (<message_type>, <data>, <info>)
.
For details on the message types see the Storm help for sidepocket.snowflake.results
.
In addition to executing a query and yielding results, the Snowflake connector supports the following:
results
If the query string is None and
query_id
is provided in theopts
data the connector will retrieve results from an existing query.
status
The status of a single query, or all queries (up to a specified limit), yielded as dictionaries.
When retrieving the status for a single Snowflake query, if no database is specified (either by a default database being specified when the data source was added, or by specifying one with the –database argument) only basic status information with be available. If no query_id is specified, a database must be specified to retrieve query history information.
cancel
Cancel an existing queued or running query.
The Snowflake connector supports the following options being passed to it in the opts
data when making a query:
timeout
Time, in seconds, to wait for each Snowflake operation to complete (defaults to 30 seconds).
background
Defaults to False, i.e. the connector will wait for the query to complete and then yield results. If True, return after starting the query execution. The query id is provided in the output.
When background is False, the following additional options are supported:
max_items
The maximum total number of items to retrieve (defaults to no limit).
page_size
The maximum number of items to retrieve in a given page (defaults to 500). Adjusting this parameter changes how often results will be yield, and the number of requests to Snowflake.
poll_s
If the query has not yet finished, poll for the status at this interval (defaults to 5 seconds).
OAuth2
This connector supports using an OAuth2 provider configured in the Cortex, which can be shared across multiple sources. When a user connects to a source with an OAuth2 provider Optic will prompt the user to authenticate with Snowflake. If successful, the user will have a personal token associated with the provider that will be background refreshed.
To setup a Snowflake source using OAuth2, create a Snowflake OAuth2 custom client
with support for refresh tokens and the redirect URI set to https://<your_optic_netloc>/oauth2
.
The client ID and secret can then be used to add a new named provider in the Cortex via
the sidepocket.snowflake.oauth2.add
command, which can then be used in the sidepocket.snowflake.add
command.
The Synapse-Sidepocket Source Management workflow also supports configuring a Snowflake source to use OAuth2.
Both the sidepocket.snowflake.oauth2.add
command and the workflow
will use the following defaults when creating the provider:
{
"iden": $lib.guid("sidepocket", $name)
"name": `sidepocket:{$name}`
"scope": "refresh_token"
"auth_uri": `https://{$account}.snowflakecomputing.com/oauth/authorize`,
"token_uri": `https://{$account}.snowflakecomputing.com/oauth/token-request`,
"extensions": { "pkce": true }
}
To use different configuration options setup the provider with $lib.inet.http.oauth.v2.addProvider()
.
The provider can then be associated with a source by using the provider iden
for the oauth-provider
option on the sidepocket.snowflake.add
command,
or selecting by name in the workflow dropdown.
Kusto
Connector name: kusto
Connection arguments:
authenticator
: The authentication method to use. See below for details.cluster
: The cluster URL to use.client_id
: The AAD application id.client_secret
: The AAD application key.authority_id
: The AAD tenant id.username
: The AAD username.password
: The AAD password.private_key
: PEM encoded certificate private key.certificate
: PEM encoded public certificate matching provided private_key.certificate_hash
: Hex encoded thumbprint of the certificate.dbname
: The named database to use.
Optional connection argument overrides supported in opts
data:
client_id
client_secret
username
password
If dbname
is provided in the source setup it will always be used, and cannot be overridden.
Otherwise dbname
must be provided the opts dictionary when querying.
The following authentication methods can be set using the authenticator
argument:
basic
: Requiresusername
andpassword
, and optionallyauthority_id
.application_key
: Requiresclient_id
,client_secret
, andauthority_id
.application_cert
: Requiresclient_id
,private_key
,certificate_hash
, andauthority_id
.application_cert_sni
: Requiresclient_id
,private_key
,certificate
,certificate_hash
, andauthority_id
.msi
: Will optionally useclient_id
if provided.unsecured
: No additional arguments are required.oauth2
: No additional arguments are required.
OAuth2
This connector supports using an OAuth2 provider configured in the Cortex, which can be shared across multiple sources. When a user connects to a source with an OAuth2 provider Optic will prompt the user to authenticate with Azure. If successful, the user will have a personal token associated with the provider that will be background refreshed.
To setup a Kusto source using OAuth2, follow the instructions for registering an application
with access to Azure Data Explorer and set the redirect URI to https://<your_optic_netloc>/oauth2
.
The client ID and secret can then be used to add a new named provider in the Cortex via
the sidepocket.kusto.oauth2.add
command, which can then be used in the sidepocket.kusto.add
command.
The Synapse-Sidepocket Source Management workflow also supports configuring a Kusto source to use OAuth2.
Both the sidepocket.kusto.oauth2.add
command and the workflow
will use the following defaults when creating the provider:
{
"iden": $lib.guid("sidepocket", $name),
"name": `sidepocket:{$name}`,
"client_id": $client_id,
"client_secret": $client_secret,
"scope": `offline_access https://{$cluster_id}.{$region}.kusto.windows.net/.default`,
"auth_uri": `https://login.microsoftonline.com/{$authority_id}/oauth2/v2.0/authorize`,
"token_uri": `https://login.microsoftonline.com/{$authority_id}/oauth2/v2.0/token`,
"redirect_uri": $redirect_uri,
"extensions": { "pkce": true }
}
To use different configuration options, setup the provider with $lib.inet.http.oauth.v2.addProvider()
.
The provider can then be associated with a source by using the provider iden
for the oauth-provider
option on the sidepocket.kusto.add
command,
or selecting by name in the workflow dropdown.
OrientDB
Connector name: orientdb
Connection arguments:
host
: Host where the database is located.port
: Port to connect to.user
: Username to log in as.password
: Password to use.db_name
: Default database to use.timeout
: Connection/query timeout.
Optional connection argument overrides supported in opts
data:
user
password
timeout
If db_name
is not provided in the source setup, it must be provided in the opts dictionary
when querying.
The output is provided as a message tuple of the form (<message_type>, <data>, <info>)
.
Messages of type data
contain query result objects, with the keys _rid
, _version
,
_class
, and oRecordData
populated with the corresponding data from the OrientRecord
for the result.
Note: Only OrientDB versions prior to 3.0.0 are supported with this connector.