Package Documentation

Storm Package: synapse-databricks

The following Commands are available from this package. This documentation is generated for version 0.1.0 of the package.

Storm Commands

This package implements the following Storm Commands.

databricks.setup.credentials

Manage the Databricks API credentials.

Examples:

  // Set global Databricks OAuth machine-to-machine (M2M) credentials
  databricks.setup.credentials oauth-m2m https://myhost.databricks.com --client-id client_id --client-secret client-secret

  // Set Databricks personal access token (PAT) for the current user
  databricks.setup.credentials pat https://myhost.databricks.com --token token --self

  // Set a default warehouse to use with a set of credentials
  databricks.setup.credentials pat https://myhost.databricks.com --token token --self --warehouse id1234

  // Display the current scope of the credentials
  databricks.setup.credentials --show-scope

  // Display the current credentials
  databricks.setup.credentials --show-credentials

  // Remove the credentials for the current user
  databricks.setup.credentials --self --remove


Usage: databricks.setup.credentials [options] <type> <host>

Options:

  --help                      : Display the command usage.
  --client-id <client_id>     : The Databricks client ID.
  --client-secret <client_secret>: The Databricks client secret.
  --token <token>             : The Databricks personal access token (PAT).
  --warehouse <warehouse>     : Optionally associate a Databricks warehouse ID with the credentials.
  --self                      : Set or remove the credentials for the current user. If not used, the credentials are
                                set globally.
  --show-scope                : Display the API credentials scope in use.
  --show-credentials          : Display the API credentials (requires admin perms or "self" scope credentials).
  --remove                    : Remove the configured API credentials. May be used with --self.

Arguments:

  [type]                      : No help available (choices: oauth-m2m, pat)
  [host]                      : The Databricks workspace URL (without the trailing slash).

databricks.sql.cancel

Cancel an existing SQL query.

Examples:

  // Cancel a query by statement_id
  databricks.sql.cancel 1234

Usage: databricks.sql.cancel [options] <statement_id>

Options:

  --help                      : Display the command usage.
  --debug                     : Show verbose debug output.

Arguments:

  <statement_id>              : The statement ID to cancel.

databricks.sql.execute

Execute a SQL query and print the results.

Examples:

  // Execute a query using the warehouse configured with the credentials
  databricks.sql.execute "select * from foo"

  // Execute a query and do not wait for the results
  databricks.sql.execute "select * from foo" --wait 0 --poll 0

  // Execute a query using parameters
  databricks.sql.execute "select * from foo where zip=:zip" --params ([{"name": "zip", "value": 10110}])


Usage: databricks.sql.execute [options] <query>

Options:

  --help                      : Display the command usage.
  --warehouse <warehouse>     : Databricks warehouse ID; if not specified the warehouse from the credentials config
                                will be used.
  --params <params>           : A list of query parameters, each specified as a dict with "name", "value", and
                                (optionally) "type" keys.
  --wait <wait>               : The time in seconds to wait for query results. If 0, the command will not wait for the
                                query to complete.
  --poll <poll>               : The time to wait to poll for results if the status is pending or running. If 0, the
                                command will not poll. (default: 10)
  --pprint                    : Pretty print the data rows.
  --debug                     : Show verbose debug output.

Arguments:

  <query>                     : The query to execute.

databricks.sql.history

Print the SQL query history.

Examples:

  // Print the history for queries with a given status
  databricks.sql.history --status CANCELED

  // Print the history for queries matching multiples statuses
  databricks.sql.history --status (CANCELED, FINISHED)

  // Print the history for queries within a given start and end time
  databricks.sql.history --start-time 2024-06-10 --end-time 2024-06-12


Usage: databricks.sql.history [options]

Options:

  --help                      : Display the command usage.
  --status <status>           : Filter results by a single status, or a list of statuses.
  --user <user>               : Filter results by a single user ID, or a list of user IDs.
  --warehouse <warehouse>     : Filter results by a single warehouse ID, or a list of IDs; if not specified the
                                warehouse from the credentials config will be used.
  --start-time <start_time>   : Filter results to queries that started after this time. (default: -24hours)
  --end-time <end_time>       : Filter results to queries that started before this time.
  --debug                     : Show verbose debug output.

databricks.sql.results

Print results from an existing query.

Examples:

  // Print results for a statement id
  databricks.sql.results 1234

  // Print results for a statement id, but do not poll for results if pending or running.
  databricks.sql.results 1234 --poll 0


Usage: databricks.sql.results [options] <statement_id>

Options:

  --help                      : Display the command usage.
  --poll <poll>               : The time to wait to poll for results if the status is pending or running. If 0, the
                                command will not poll. (default: 10)
  --pprint                    : Pretty print the data rows.
  --debug                     : Show verbose debug output.

Arguments:

  <statement_id>              : The statement ID to retrieve results for.

Storm Modules

This package implements the following Storm Modules.

databricks

sqlExecute(query, warehouse=(null), params=(null), wait=(null), poll=(10))

Execute a SQL query and emit results.

Example:

Iterate over results and print the data rows:

$mod = $lib.import(databricks)

$query = "select * from foo where zip = :zip"
$params = ([
  {"name": "zip", "value": 10110}
])

for ($mtyp, $data, $info) in $mod.sqlExecute($query, params=$params) {
  switch $mtyp {
    "init": {
      $lib.print(`statement_id={$data}`)
      $lib.print(`columns={$info.columns}`)
    }
    "print": {
      $lib.print($data)
    }
    "warn": {
      $lib.warn($data)
    }
    "data": {
      $lib.print($data)
    }
    *: {
      $lib.warn(`Unexpected message type {$mtyp} - {$data}`)
    }
  }
}

Args:

query (str): The query to execute.

warehouse (str): Databricks warehouse ID; if not specified the warehouse from the credentials config will be used.

params (dict): A list of query parameters, each specified as a dict with “name”, “value”, and (optionally) “type” keys.

wait (integer): The time in seconds to wait for query results. If 0, the command will not wait for the query to complete.

poll (integer): The time to wait to poll for results if the status is pending or running. If 0, the command will not poll.

Yields:

A message list containing (type, data, info). The return type is list.

sqlResults(statement_id, poll=(10))

Emit results for an existing query.

Example:

Iterate over results and print the data rows:

$mod = $lib.import(databricks)

for ($mtyp, $data, $info) in $mod.sqlResults("sid1234") {
  switch $mtyp {
    "init": {
      $lib.print(`statement_id={$data}`)
      $lib.print(`columns={$info.columns}`)
    }
    "print": {
      $lib.print($data)
    }
    "warn": {
      $lib.warn($data)
    }
    "data": {
      $lib.print($data)
    }
    *: {
      $lib.warn(`Unexpected message type {$mtyp} - {$data}`)
    }
  }
}

Args:

statement_id (str): The statement ID to retrieve results for.

poll (integer): The time to wait to poll for results if the status is pending or running. If 0, the command will not poll.

Yields:

A message list containing (type, data, info). The return type is list.

sqlCancel(statement_id)

Cancel execution for an existing query.

Args:: statement_id (str): The statement ID to cancel.
Returns:: An (ok, message) list. The return type is list.

sqlHistory(status=(null), user=(null), warehouse=(null), start_time=-24hours, end_time=(null))

Emit the history of queries.

Example:

Iterate over results and print the history details:

$mod = $lib.import(databricks)

for $item in $mod.sqlHistory() {
  $lib.pprint($item)
}

Args:

status: Filter results by a single status, or a list of statuses. The input type may be one of the following: list, str.

user: Filter results by a single user ID, or a list of user IDs. The input type may be one of the following: list, str.

warehouse: Filter results by a single warehouse ID, or a list of IDs; if not specified the warehouse from the credentials config will be used. The input type may be one of the following: list, str.

start_time: Filter results to queries that started after this time. The input type may be one of the following: integer, str.

end_time: Filter results to queries that started before this time. The input type may be one of the following: integer, str.

Yields:

Dictionaries containing query history information. The return type is dict.