Package Documentation
Storm Package: synapse-databricks
The following Commands are available from this package. This documentation is generated for version 0.1.0 of the package.
Storm Commands
This package implements the following Storm Commands.
databricks.setup.credentials
Manage the Databricks API credentials.
Examples:
// Set global Databricks OAuth machine-to-machine (M2M) credentials
databricks.setup.credentials oauth-m2m https://myhost.databricks.com --client-id client_id --client-secret client-secret
// Set Databricks personal access token (PAT) for the current user
databricks.setup.credentials pat https://myhost.databricks.com --token token --self
// Set a default warehouse to use with a set of credentials
databricks.setup.credentials pat https://myhost.databricks.com --token token --self --warehouse id1234
// Display the current scope of the credentials
databricks.setup.credentials --show-scope
// Display the current credentials
databricks.setup.credentials --show-credentials
// Remove the credentials for the current user
databricks.setup.credentials --self --remove
Usage: databricks.setup.credentials [options] <type> <host>
Options:
--help : Display the command usage.
--client-id <client_id> : The Databricks client ID.
--client-secret <client_secret>: The Databricks client secret.
--token <token> : The Databricks personal access token (PAT).
--warehouse <warehouse> : Optionally associate a Databricks warehouse ID with the credentials.
--self : Set or remove the credentials for the current user. If not used, the credentials are
set globally.
--show-scope : Display the API credentials scope in use.
--show-credentials : Display the API credentials (requires admin perms or "self" scope credentials).
--remove : Remove the configured API credentials. May be used with --self.
Arguments:
[type] : No help available (choices: oauth-m2m, pat)
[host] : The Databricks workspace URL (without the trailing slash).
databricks.sql.cancel
Cancel an existing SQL query.
Examples:
// Cancel a query by statement_id
databricks.sql.cancel 1234
Usage: databricks.sql.cancel [options] <statement_id>
Options:
--help : Display the command usage.
--debug : Show verbose debug output.
Arguments:
<statement_id> : The statement ID to cancel.
databricks.sql.execute
Execute a SQL query and print the results.
Examples:
// Execute a query using the warehouse configured with the credentials
databricks.sql.execute "select * from foo"
// Execute a query and do not wait for the results
databricks.sql.execute "select * from foo" --wait 0 --poll 0
// Execute a query using parameters
databricks.sql.execute "select * from foo where zip=:zip" --params ([{"name": "zip", "value": 10110}])
Usage: databricks.sql.execute [options] <query>
Options:
--help : Display the command usage.
--warehouse <warehouse> : Databricks warehouse ID; if not specified the warehouse from the credentials config
will be used.
--params <params> : A list of query parameters, each specified as a dict with "name", "value", and
(optionally) "type" keys.
--wait <wait> : The time in seconds to wait for query results. If 0, the command will not wait for the
query to complete.
--poll <poll> : The time to wait to poll for results if the status is pending or running. If 0, the
command will not poll. (default: 10)
--pprint : Pretty print the data rows.
--debug : Show verbose debug output.
Arguments:
<query> : The query to execute.
databricks.sql.history
Print the SQL query history.
Examples:
// Print the history for queries with a given status
databricks.sql.history --status CANCELED
// Print the history for queries matching multiples statuses
databricks.sql.history --status (CANCELED, FINISHED)
// Print the history for queries within a given start and end time
databricks.sql.history --start-time 2024-06-10 --end-time 2024-06-12
Usage: databricks.sql.history [options]
Options:
--help : Display the command usage.
--status <status> : Filter results by a single status, or a list of statuses.
--user <user> : Filter results by a single user ID, or a list of user IDs.
--warehouse <warehouse> : Filter results by a single warehouse ID, or a list of IDs; if not specified the
warehouse from the credentials config will be used.
--start-time <start_time> : Filter results to queries that started after this time. (default: -24hours)
--end-time <end_time> : Filter results to queries that started before this time.
--debug : Show verbose debug output.
databricks.sql.results
Print results from an existing query.
Examples:
// Print results for a statement id
databricks.sql.results 1234
// Print results for a statement id, but do not poll for results if pending or running.
databricks.sql.results 1234 --poll 0
Usage: databricks.sql.results [options] <statement_id>
Options:
--help : Display the command usage.
--poll <poll> : The time to wait to poll for results if the status is pending or running. If 0, the
command will not poll. (default: 10)
--pprint : Pretty print the data rows.
--debug : Show verbose debug output.
Arguments:
<statement_id> : The statement ID to retrieve results for.
Storm Modules
This package implements the following Storm Modules.
databricks
sqlExecute(query, warehouse=$lib.null, params=$lib.null, wait=$lib.null, poll=(10))
Execute a SQL query and emit results.
- Example:
Iterate over results and print the data rows:
$mod = $lib.import(databricks) $query = "select * from foo where zip = :zip" $params = ([ {"name": "zip", "value": 10110} ]) for ($mtyp, $data, $info) in $mod.sqlExecute($query, params=$params) { switch $mtyp { "init": { $lib.print(`statement_id={$data}`) $lib.print(`columns={$info.columns}`) } "print": { $lib.print($data) } "warn": { $lib.warn($data) } "data": { $lib.print($data) } *: { $lib.warn(`Unexpected message type {$mtyp} - {$data}`) } } }
- Args:
query (str): The query to execute.
warehouse (str): Databricks warehouse ID; if not specified the warehouse from the credentials config will be used.
params (dict): A list of query parameters, each specified as a dict with “name”, “value”, and (optionally) “type” keys.
wait (integer): The time in seconds to wait for query results. If 0, the command will not wait for the query to complete.
poll (integer): The time to wait to poll for results if the status is pending or running. If 0, the command will not poll.
- Yields:
A message list containing (type, data, info). The return type is
list
.
sqlResults(statement_id, poll=(10))
Emit results for an existing query.
- Example:
Iterate over results and print the data rows:
$mod = $lib.import(databricks) for ($mtyp, $data, $info) in $mod.sqlResults("sid1234") { switch $mtyp { "init": { $lib.print(`statement_id={$data}`) $lib.print(`columns={$info.columns}`) } "print": { $lib.print($data) } "warn": { $lib.warn($data) } "data": { $lib.print($data) } *: { $lib.warn(`Unexpected message type {$mtyp} - {$data}`) } } }
- Args:
statement_id (str): The statement ID to retrieve results for.
poll (integer): The time to wait to poll for results if the status is pending or running. If 0, the command will not poll.
- Yields:
A message list containing (type, data, info). The return type is
list
.
sqlCancel(statement_id)
Cancel execution for an existing query.
- Args:
statement_id (str): The statement ID to cancel.
- Returns:
An (ok, message) list. The return type is
list
.
sqlHistory(status=$lib.null, user=$lib.null, warehouse=$lib.null, start_time=-24hours, end_time=$lib.null)
Emit the history of queries.
- Example:
Iterate over results and print the history details:
$mod = $lib.import(databricks) for $item in $mod.sqlHistory() { $lib.pprint($item) }
- Args:
status: Filter results by a single status, or a list of statuses. The input type may be one of the following:
list
,str
.user: Filter results by a single user ID, or a list of user IDs. The input type may be one of the following:
list
,str
.warehouse: Filter results by a single warehouse ID, or a list of IDs; if not specified the warehouse from the credentials config will be used. The input type may be one of the following:
list
,str
.start_time: Filter results to queries that started after this time. The input type may be one of the following:
integer
,str
.end_time: Filter results to queries that started before this time. The input type may be one of the following:
integer
,str
.- Yields:
Dictionaries containing query history information. The return type is
dict
.