Package Documentation

Storm Package: synapse-databricks

The following Commands are available from this package. This documentation is generated for version 0.1.0 of the package.

Storm Commands

This package implements the following Storm Commands.

databricks.setup.credentials

Manage the Databricks API credentials.

Examples:

  // Set global Databricks OAuth machine-to-machine (M2M) credentials
  databricks.setup.credentials oauth-m2m https://myhost.databricks.com --client-id client_id --client-secret client-secret

  // Set Databricks personal access token (PAT) for the current user
  databricks.setup.credentials pat https://myhost.databricks.com --token token --self

  // Set a default warehouse to use with a set of credentials
  databricks.setup.credentials pat https://myhost.databricks.com --token token --self --warehouse id1234

  // Display the current scope of the credentials
  databricks.setup.credentials --show-scope

  // Display the current credentials
  databricks.setup.credentials --show-credentials

  // Remove the credentials for the current user
  databricks.setup.credentials --self --remove


Usage: databricks.setup.credentials [options] <type> <host>

Options:

  --help                      : Display the command usage.
  --client-id <client_id>     : The Databricks client ID.
  --client-secret <client_secret>: The Databricks client secret.
  --token <token>             : The Databricks personal access token (PAT).
  --warehouse <warehouse>     : Optionally associate a Databricks warehouse ID with the credentials.
  --self                      : Set or remove the credentials for the current user. If not used, the credentials are set globally.
  --show-scope                : Display the API credentials scope in use.
  --show-credentials          : Display the API credentials (requires admin perms or "self" scope credentials).
  --remove                    : Remove the configured API credentials. May be used with --self.

Arguments:

  [type]                      : No help available (choices: oauth-m2m, pat)
  [host]                      : The Databricks workspace URL (without the trailing slash).

databricks.sql.cancel

Cancel an existing SQL query.

Examples:

  // Cancel a query by statement_id
  databricks.sql.cancel 1234


Usage: databricks.sql.cancel [options] <statement_id>

Options:

  --help                      : Display the command usage.
  --debug                     : Show verbose debug output.

Arguments:

  <statement_id>              : The statement ID to cancel.

databricks.sql.execute

Execute a SQL query and print the results.

Examples:

  // Execute a query using the warehouse configured with the credentials
  databricks.sql.execute "select * from foo"

  // Execute a query and do not wait for the results
  databricks.sql.execute "select * from foo" --wait 0 --poll 0

  // Execute a query using parameters
  databricks.sql.execute "select * from foo where zip=:zip" --params ([{"name": "zip", "value": 10110}])


Usage: databricks.sql.execute [options] <query>

Options:

  --help                      : Display the command usage.
  --warehouse <warehouse>     : Databricks warehouse ID; if not specified the warehouse from the credentials config will be used.
  --params <params>           : A list of query parameters, each specified as a dict with "name", "value", and (optionally) "type" keys.
  --wait <wait>               : The time in seconds to wait for query results. If 0, the command will not wait for the query to complete.
  --poll <poll>               : The time to wait to poll for results if the status is pending or running. If 0, the command will not poll. (default: 10)
  --pprint                    : Pretty print the data rows.
  --debug                     : Show verbose debug output.

Arguments:

  <query>                     : The query to execute.

databricks.sql.history

Print the SQL query history.

Examples:

  // Print the history for queries with a given status
  databricks.sql.history --status CANCELED

  // Print the history for queries matching multiples statuses
  databricks.sql.history --status (CANCELED, FINISHED)

  // Print the history for queries within a given start and end time
  databricks.sql.history --start-time 2024-06-10 --end-time 2024-06-12


Usage: databricks.sql.history [options]

Options:

  --help                      : Display the command usage.
  --status <status>           : Filter results by a single status, or a list of statuses.
  --user <user>               : Filter results by a single user ID, or a list of user IDs.
  --warehouse <warehouse>     : Filter results by a single warehouse ID, or a list of IDs; if not specified the warehouse from the credentials config will be used.
  --start-time <start_time>   : Filter results to queries that started after this time. (default: -24hours)
  --end-time <end_time>       : Filter results to queries that started before this time.
  --debug                     : Show verbose debug output.

databricks.sql.results

Print results from an existing query.

Examples:

  // Print results for a statement id
  databricks.sql.results 1234

  // Print results for a statement id, but do not poll for results if pending or running.
  databricks.sql.results 1234 --poll 0


Usage: databricks.sql.results [options] <statement_id>

Options:

  --help                      : Display the command usage.
  --poll <poll>               : The time to wait to poll for results if the status is pending or running. If 0, the command will not poll. (default: 10)
  --pprint                    : Pretty print the data rows.
  --debug                     : Show verbose debug output.

Arguments:

  <statement_id>              : The statement ID to retrieve results for.

Storm Modules

This package implements the following Storm Modules.

databricks

sqlExecute(query, warehouse=$lib.null, params=$lib.null, wait=$lib.null, poll=(10))

Execute a SQL query and emit results.

Example:

Iterate over results and print the data rows:

$mod = $lib.import(databricks)

$query = "select * from foo where zip = :zip"
$params = ([
  {"name": "zip", "value": 10110}
])

for ($mtyp, $data, $info) in $mod.sqlExecute($query, params=$params) {
  switch $mtyp {
    "init": {
      $lib.print(`statement_id={$data}`)
      $lib.print(`columns={$info.columns}`)
    }
    "print": {
      $lib.print($data)
    }
    "warn": {
      $lib.warn($data)
    }
    "data": {
      $lib.print($data)
    }
    *: {
      $lib.warn(`Unexpected message type {$mtyp} - {$data}`)
    }
  }
}
Args:

query (str): The query to execute.

warehouse (str): Databricks warehouse ID; if not specified the warehouse from the credentials config will be used.

params (dict): A list of query parameters, each specified as a dict with “name”, “value”, and (optionally) “type” keys.

wait (integer): The time in seconds to wait for query results. If 0, the command will not wait for the query to complete.

poll (integer): The time to wait to poll for results if the status is pending or running. If 0, the command will not poll.

Yields:

A message list containing (type, data, info). The return type is list.

sqlResults(statement_id, poll=(10))

Emit results for an existing query.

Example:

Iterate over results and print the data rows:

$mod = $lib.import(databricks)

for ($mtyp, $data, $info) in $mod.sqlResults("sid1234") {
  switch $mtyp {
    "init": {
      $lib.print(`statement_id={$data}`)
      $lib.print(`columns={$info.columns}`)
    }
    "print": {
      $lib.print($data)
    }
    "warn": {
      $lib.warn($data)
    }
    "data": {
      $lib.print($data)
    }
    *: {
      $lib.warn(`Unexpected message type {$mtyp} - {$data}`)
    }
  }
}
Args:

statement_id (str): The statement ID to retrieve results for.

poll (integer): The time to wait to poll for results if the status is pending or running. If 0, the command will not poll.

Yields:

A message list containing (type, data, info). The return type is list.

sqlCancel(statement_id)

Cancel execution for an existing query.

Args:

statement_id (str): The statement ID to cancel.

Returns:

An (ok, message) list. The return type is list.

sqlHistory(status=$lib.null, user=$lib.null, warehouse=$lib.null, start_time=-24hours, end_time=$lib.null)

Emit the history of queries.

Example:

Iterate over results and print the history details:

$mod = $lib.import(databricks)

for $item in $mod.sqlHistory() {
  $lib.pprint($item)
}
Args:

status: Filter results by a single status, or a list of statuses. The input type may be one of the following: list, str.

user: Filter results by a single user ID, or a list of user IDs. The input type may be one of the following: list, str.

warehouse: Filter results by a single warehouse ID, or a list of IDs; if not specified the warehouse from the credentials config will be used. The input type may be one of the following: list, str.

start_time: Filter results to queries that started after this time. The input type may be one of the following: integer, str.

end_time: Filter results to queries that started before this time. The input type may be one of the following: integer, str.

Yields:

Dictionaries containing query history information. The return type is dict.