Skip to content

Sandhill Data Processors

Sandhill routes are composed of a list of data processors. These are single actions that Sandhill may take while processing a request.

Things data processors can do:

  • Gathering data by querying an API
  • Loading configuration from a file
  • Transforming or manipulating data
  • Performing some evaluation or computation

If the data processors provided with Sandhill are not sufficient, you can develop your own data processor as well.

Data Processors Included With Sandhill

  • evaluate - Evaluate a set of conditions and return a truthy result.
  • file - Find and load files from the instance.
  • iiif - Calls related to IIIF APIs.
  • request - Do generic API calls and redirects.
  • solr - Calls to a Solr endpoint.
  • stream - Stream data to client from a previously open connection.
  • string - Simple string manipualtion.
  • template - Render files or strings through Jinja templating.
  • xml - Load XML or perform XPath queries.

Common Data Processor Arguments

These arguments are valid to pass to all data processors. Data processors should be written to handle these arguments appropriately.

name - Required

Defines the label under which the data processor will run. Results from the processor will be stored under this key in the data passed to subsequent processors.

processor - Required

Specifies the processor and method to call within the processor, period delimited.

{
    "name": "searchresults",
    "processor": "solr.search"
}

on_fail - Optional

Unless specified, the data processor is allowed to fail silently and proceed onto the next processor.
When specified, the value must be the integer of a valid 4xx or 5xx HTTP Status Code or 0. If the data processor fails and on_fail set, Sandhill will abort the page request and return an error page with the selected code. If set to 0, the processor may choose to return an appropriate code to the type of failure.

when - Optional

A string which is first rendered through Jinja and then evaluated for truth. If the value is not truthy, then the given data processor will be skipped.

sandhill.processors.evaluate

Processor for evaluation functions

conditions(data)

Evaluates the condtions specified in the processor section of the configs.

Detailed documentation

Parameters:

Name Type Description Default
data dict

Processor arguments and all other data loaded from previous data processors.

  • conditions string: Indicates the location in data where conditions to be evaluated are.

    This is a . delimited string of dict keys and/or list indexes.

  • match_all _boolean__: Whether to require all conditions to match for success; if false, any single match will be considered a success.

  • abort_on_match boolean, optional: If true, then trigger an abort when the conditions are truthy.

required

Returns:

Type Description
bool | None

Returns True if given conditions match appropriate to the parameters, False if they do not, or None on failure

Raises: HTTPException: If abort_on_match is true and the evaluation is truthy.

Source code in sandhill/processors/evaluate.py
def conditions(data):
    """
    Evaluates the condtions specified in the processor section of the configs.\n
    [Detailed documentation](https://msu-libraries.github.io/sandhill/evaluate-conditions/) \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `conditions` _string_: Indicates the location in `data` where conditions to be \
                evaluated are.\n
                This is a `.` delimited string of dict keys and/or list indexes.\n
            * `match_all` _boolean__: Whether to require all conditions to match for success; \
                if false, any single match will be considered a success.\n
            * `abort_on_match` _boolean, optional_: If true, then trigger an abort when the \
                conditions are truthy.\n
    Returns:
        (bool|None): Returns True if given conditions match appropriate to the \
                     parameters, False if they do not, or None on failure
    Raises:
        HTTPException: If `abort_on_match` is true and the evaluation is truthy.
    """
    # TODO refactor suggestion:
    #   replace `match_all` with `match` key; new possible values:
    #       "all"  : similar to match_all: True
    #       "any"  : similar to match_all: False
    #       "none" : new state which considers success when 0 matches
    evaluation = None
    condition_keys = ifnone(data, 'conditions', '')
    _conditions = getdescendant(data, condition_keys if condition_keys else [])
    if 'match_all' not in data or not isinstance(data['match_all'], bool):
        app.logger.warning("Processor 'evaluate' is missing or has invalid 'match_all': "
                           + ifnone(data, 'match_all', "not defined"))
    elif not _conditions:
        ick = data['conditions'] if 'conditions' in data else "'conditions' undefined"
        app.logger.warning(f"Invalid condition keys: {ick}")
    else:
        evaluation = evaluate_conditions(_conditions,
                                         data, match_all=data['match_all']) > 0
        if 'abort_on_match' in data and data['abort_on_match'] and evaluation:
            dp_abort(503)
            evaluation = None

    return evaluation

sandhill.processors.file

Processing functions for files

create_json_response(data)

Wrapper for load_json that will return a JSON response object.

This can be used to stream JSON instead of loading it to use it as data.

Parameters:

Name Type Description Default
data dict

Processor arguments and all other data loaded from previous data processors.

  • path string: A single file path to search for.

  • paths list: A list of file paths to search for.

required

Returns:

Type Description
Response

The response object with the JSON data loaded into it.

Source code in sandhill/processors/file.py
def create_json_response(data):
    '''
    Wrapper for `load_json` that will return a JSON response object. \n
    This can be used to stream JSON instead of loading it to use it as data. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `path` _string_: A single file path to search for.\n
            * `paths` _list_: A list of file paths to search for.\n
    Returns:
        (requests.Response): The response object with the JSON data loaded into it.
    '''
    resp = RequestsResponse()
    resp.status_code = 200

    content = load_json(data)
    if content:
        resp.raw = io.StringIO(json.dumps(content))
    return resp

load_json(data)

Search for files at the paths within 'path' and 'paths' keys of data. Will load JSON from the first file it finds and then return the result.

If both 'path' and 'paths' are set, paths from both will be searched starting with 'path' first.

Parameters:

Name Type Description Default
data dict

Processor arguments and all other data loaded from previous data processors.

  • path string: A single file path to search for.

  • paths list: A list of file paths to search for.

required

Returns:

Type Description
dict | None

The loaded JSON data or None if no file was found.

Note: Paths must be relative to the instance/ directory.

Source code in sandhill/processors/file.py
def load_json(data):
    '''
    Search for files at the paths within 'path' and 'paths' keys of `data`. \
    Will load JSON from the first file it finds and then return the result. \n
    If both 'path' and 'paths' are set, paths from both will be searched \
    starting with 'path' first.\n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `path` _string_: A single file path to search for.\n
            * `paths` _list_: A list of file paths to search for.\n
    Returns:
        (dict|None): The loaded JSON data or None if no file was found.
    Note:
        Paths must be relative to the `instance/` directory.
    '''
    file_data = None
    # loop over each provided path and stop when one is found
    if "path" in data:
        data.setdefault("paths", []).insert(0, data["path"])
    if "paths" in data:
        for path in data["paths"]:
            full_path = os.path.join(app.instance_path, path.lstrip("/"))
            if os.path.exists(full_path):
                file_data = load_json_config(full_path)
                break
    return file_data

load_matched_json(data)

Loads all the config files and returns the file that has the most matched conditions.

Parameters:

Name Type Description Default
data dict

Processor arguments and all other data loaded from previous data processors.

  • location string: A directory path within the instance with JSON files containing match_conditions keys.
required

Returns:

Type Description
dict | None

The loaded JSON data from the file that most matched its conditions, or None if no files matched.

Source code in sandhill/processors/file.py
def load_matched_json(data):
    """
    Loads all the config files and returns the file that has the most \
    [matched conditions](#TODO). \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `location` _string_: A directory path within the instance \
               with JSON files containing `match_conditions` keys.\n
    Returns:
        (dict|None): The loaded JSON data from the file that most matched its conditions, \
            or None if no files matched.
    """
    file_data = None
    matched_dict = {}
    config_dir_path = None
    if 'location' in data:
        config_dir_path = os.path.join(app.instance_path, data['location'])

    config_files = load_json_configs(config_dir_path, recurse=True)
    for path, config in config_files.items():
        if "match_conditions" in config:
            try:
                matched_dict[path] = evaluate_conditions(config['match_conditions'], data)
            except KeyError:
                app.logger.warning(
                    f"Missing 'evaluate' and/or 'match_when' for 'match_condition' in: {path}")
                continue
    matched_path = max(matched_dict.items(), key=itemgetter(1))[0] if matched_dict else None

    for path, score in matched_dict.items():
        app.logger.debug(f"load_matched_json(score={score}, path={path})")

    # Ensure number of matches is greater than 0
    if matched_path in matched_dict and matched_dict[matched_path]:
        app.logger.debug(f"load_matched_json(matched={matched_path})")
        file_data = config_files[matched_path]

    return file_data

sandhill.processors.iiif

Processor for IIIF

load_image(data, url=None, api_get_function=api_get)

Load and return a IIIF image.

Parameters:

Name Type Description Default
data dict

route data where data[view_ags][iiif_path] and data[identifier] exist

required
url str

Override the IIIF server URL from the default IIIF_BASE in the configs

None
api_get_function function

function to use when making the GET request

api_get

Returns:

Type Description
Response | None

Requested image from IIIF, or None on failure.

Raises:

Type Description
HTTPException

On failure if on_fail is set.

Source code in sandhill/processors/iiif.py
@catch(RequestException, "Call to IIIF Server failed: {exc}", abort=503)
def load_image(data, url=None, api_get_function=api_get):
    '''
    Load and return a IIIF image. \n
    Args:
        data (dict): route data where `data[view_ags][iiif_path]` and `data[identifier]` exist \n
        url (str): Override the IIIF server URL from the default IIIF_BASE in the configs \n
        api_get_function (function): function to use when making the GET request \n
    Returns:
        (requests.Response|None): Requested image from IIIF, or None on failure. \n
    Raises:
        (HTTPException): On failure if `on_fail` is set. \n
    '''
    image = None
    url = establish_url(url, getconfig('IIIF_BASE', None))
    if 'iiif_path' in data['view_args'] and 'identifier' in data:
        image = api_get_function(
            url=os.path.join(url, data['identifier'], data['view_args']['iiif_path']),
            stream=True)
    else:
        app.logger.warning("Could not call IIIF Server; missing identifier or iiif_path")
        dp_abort(500)

    if not image.ok:
        app.logger.debug(f"Call to IIIF Server returned {image.status_code}")
        dp_abort(image.status_code)
        image = None
    return image

sandhill.processors.request

Processor for requests

api_json(data)

Make a call to an API and return the response content as JSON.

Parameters:

Name Type Description Default
data dict

Processor arguments and all other data loaded from previous data processors.

  • url str: The URL to make the API call to.

  • method str, optional: The HTTP method to use.

    Default: "GET"

  • timeout int, optional: The request timeout in seconds.

    Default: 10

required

Returns:

Type Description
dict

The JSON response from the API call.

Raises:

Type Description
HTTPException

On failure if on_fail is set.

Source code in sandhill/processors/request.py
@catch(RequestException, "Call to {data[url]} returned {exc}.", abort=503)
def api_json(data):
    '''
    Make a call to an API and return the response content as JSON. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `url` _str_: The URL to make the API call to.\n
            * `method` _str, optional_: The HTTP method to use.\n
                Default: `"GET"` \n
            * `timeout` _int, optional_: The request timeout in seconds.\n
                Default: `10` \n
    Returns:
        (dict): The JSON response from the API call. \n
    Raises:
        (HTTPException): On failure if `on_fail` is set. \n
    '''
    method = data['method'] if 'method' in data else 'GET'
    app.logger.debug(f"Connecting to {data['url']}")
    response = requests.request(
        method=method,
        url=data["url"],
        timeout=data.get('timeout', 10)
    )

    if not response.ok:
        app.logger.warning(f"Call to {data['url']} returned a non-ok status code: " \
                           f"{response.status_code}. {response.__dict__}")
        if 'on_fail' in data:
            abort(response.status_code if data['on_fail'] == 0 else data['on_fail'])

    try:
        return response.json()
    except JSONDecodeError:
        app.logger.warning(f"Call returned from {data['url']} that was not JSON.")
        dp_abort(503)
        return {}

redirect(data)

Trigger a redirect response to specified url.

Parameters:

Name Type Description Default
data dict

Processor arguments and all other data loaded from previous data processors.

required
* `location` _str_

URL to redirect client to.

required
* `code` _int, optional_

HTTP status code to redirect with.

Default: 302

required

Returns:

Type Description
Response

The flask response object with the included redirect.

Raises:

Type Description
HTTPException

If the location key is not present.

Source code in sandhill/processors/request.py
@catch(KeyError, "Processor request.redirect called without a 'location' given.", abort=500)
def redirect(data):
    '''
    Trigger a redirect response to specified url. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
        * `location` _str_: URL to redirect client to.\n
        * `code` _int, optional_: HTTP status code to redirect with. \n
            Default: 302 \n
    Returns:
        (flask.Response): The flask response object with the included redirect. \n
    Raises:
        (HTTPException): If the `location` key is not present. \n
    '''
    code = data['code'] if 'code' in data else 302
    return FlaskRedirect(data['location'], code=code)

sandhill.processors.solr

Wrappers for making API calls to a Solr node.

search(data, url=None, api_get_function=api_get)

Perform a configured Solr search and return the result.

Parameters:

Name Type Description Default
data dict

Processor arguments and all other data loaded from previous data processors.

  • path string, paths list: The path to a search config file. Loaded per file.load_json.

  • params dict: Query arguments to pass to Solr.

  • record_keys string, optional: Return this descendant path from the response JSON. Default: response.docs

required
url str

Overrides the default SOLR_URL normally retrieved from the Sandhill config file.

None
api_get_function function

Function used to call Solr with. Used in unit tests.

api_get

Returns:

Type Description
dict | Response

A dict of the loaded JSON response, or a flask.Response instance if view_args.format is text/json.

Source code in sandhill/processors/solr.py
def search(data, url=None, api_get_function=api_get):
    """
    Perform a [configured Solr search](#TODO) and return the result. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `path` _string_, `paths` _list_: The path to a search config file. Loaded \
              per [file.load_json](#TODO).\n
            * `params` _dict_: Query arguments to pass to Solr.\n
            * `record_keys` _string, optional_: Return this [descendant path](#TODO) from \
              the response JSON. Default: `response.docs`\n
        url (str): Overrides the default SOLR_URL normally retrieved from \
                   the [Sandhill config](#TODO) file.\n
        api_get_function (function): Function used to call Solr with. Used in unit tests.\n
    Returns:
        (dict|flask.Response): A dict of the loaded JSON response, or a `flask.Response` instance \
                               if `view_args.format` is `text/json`. \n
    """
    # TODO module should return None and call dp_abort instead of abort
    # TODO allow "path"
    if 'paths' not in data or not data['paths']:
        app.logger.error(
            f"Missing 'config' setting for processor "
            f"'{data['processor']}' with name '{data['name']}'")
        abort(500)

    # Load the search settings
    search_config = load_json(data)
    if 'solr_params' not in search_config:
        app.logger.error(
            f"Missing 'solr_params' inside search config file(s) '{ str(data['paths']) }'")
        abort(500)
    if 'config_ext' in data and 'solr_params' in data['config_ext']:
        solr_config = recursive_merge(
            search_config['solr_params'],
            data['config_ext']['solr_params']
        )
    else:
        solr_config = search_config['solr_params']

    # override default parameters with request query parameters
    data['params'] = overlay_with_query_args(solr_config, \
            request_args=data.get('params', None),
            allow_undefined=True)

    solr_results = select(data, url, api_get_function)

    # check if the json results were requested
    result_format = match_request_format('format', ['text/html', 'application/json'])
    if result_format == 'application/json':
        solr_results = jsonify(solr_results)

    return solr_results

select(data, url=None, api_get_function=api_get)

Perform a Solr select call and return the loaded JSON response.

"name": "mysearch",
"processor": "solr.search",
"params": { "q": "*", "rows":"20" }

Parameters:

Name Type Description Default
data dict

Processor arguments and all other data loaded from previous data processors.

  • params dict: Query arguments to pass to Solr.

  • record_keys string, optional: Return this descendant path from the response JSON.

required
url str

Overrides the default SOLR_URL normally retrieved from the Sandhill config file.

None
api_get_function function

Function used to call Solr with. Used in unit tests.

api_get

Returns:

Type Description
dict | None

The loaded JSON data or None if nothing matched.

Raises:

Type Description
HTTPException

If on_fail is set.

Source code in sandhill/processors/solr.py
@catch((RequestException, HTTPError), "Call to Solr failed: {exc}", abort=503)
@catch(JSONDecodeError, "Call returned from Solr that was not JSON.", abort=503)
@catch(KeyError, "Missing url component: {exc}", abort=400) # Missing 'params' key
def select(data, url=None, api_get_function=api_get):
    """
    Perform a Solr select call and return the loaded JSON response. \n
    ```json
    "name": "mysearch",
    "processor": "solr.search",
    "params": { "q": "*", "rows":"20" }
    ``` \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `params` _dict_: Query arguments to pass to Solr.\n
            * `record_keys` _string, optional_: Return this [descendant path](#TODO) from \
              the response JSON.\n
        url (str): Overrides the default SOLR_URL normally retrieved from \
                   the [Sandhill config](#TODO) file.\n
        api_get_function (function): Function used to call Solr with. Used in unit tests.\n
    Returns:
        (dict|None): The loaded JSON data or None if nothing matched. \n
    Raises:
        wergzeug.exceptions.HTTPException: If `on_fail` is set. \n
    """

    response = None
    url = establish_url(url, getconfig('SOLR_URL', None))
    url = url + "/select"

    # query solr with the parameters
    app.logger.debug(f"Connecting to {url}?{urlencode(data['params'])}")
    response = api_get_function(url=url, params=data['params'])
    response_json = None
    if not response.ok:
        app.logger.warning(f"Call to Solr returned {response.status_code}. {response}")
        try:
            if 'error' in response.json():
                app.logger.warning(
                    f"Error returned from Solr: {str(response.json()['error'])}")
        except JSONDecodeError:
            pass
        dp_abort(response.status_code)
    else:
        response_json = response.json()
        # Get the records that exist at the provided record_keys
        if 'record_keys' in data and data['record_keys']:
            response_json = getdescendant(response_json, data['record_keys'])

    return response_json

select_record(data, url=None, api_get_function=api_get)

Perform a Solr select call and return the first result from the response.

Parameters:

Name Type Description Default
data dict

Processor arguments and all other data loaded from previous data processors.

  • params dict: Query arguments to pass to Solr.

  • record_keys string, optional: Return this descendant path from the response JSON. Default: response.docs

required
url str

Overrides the default SOLR_URL normally retrieved from the Sandhill config file.

None
api_get_function function

Function used to call Solr with. Used in unit tests.

api_get

Returns:

Type Description
Any

The first item matched by record_keys in the JSON response, otherwise None.

Raises:

Type Description
HTTPException

If on_fail is set.

Source code in sandhill/processors/solr.py
def select_record(data, url=None, api_get_function=api_get):
    """
    Perform a Solr select call and return the first result from the response. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `params` _dict_: Query arguments to pass to Solr.\n
            * `record_keys` _string, optional_: Return this [descendant path](#TODO) from \
              the response JSON. Default: `response.docs`\n
        url (str): Overrides the default SOLR_URL normally retrieved from the \
                   [Sandhill config](#TODO) file.\n
        api_get_function (function): Function used to call Solr with. Used in unit tests.\n
    Returns:
        (Any): The first item matched by `record_keys` in the JSON response, otherwise None. \n
    Raises:
        wergzeug.exceptions.HTTPException: If `on_fail` is set. \n
    """
    data['record_keys'] = ifnone(data, 'record_keys', 'response.docs')
    records = select(data, url, api_get_function)

    if records and isinstance(records, Sequence):
        return records[0]
    return None

sandhill.processors.stream

Processor for streaming data

response(data)

Stream a Requests library response that was previously loaded.

Parameters:

Name Type Description Default
data dict

Processor arguments and all other data loaded from previous data processors.

  • response str: The key where the response is located.

  • Key from data[response] requests.Response: The response to stream.

required

Returns:

Type Description
Response | None

A stream of the response

Raises:

Type Description
HTTPException

If on_fail is set.

Source code in sandhill/processors/stream.py
def response(data):
    '''
    Stream a Requests library response that was previously loaded. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `response` _str_: The key where the response is located.\n
            * Key from `data[response]` _requests.Response_: The response to stream.\n
    Returns:
        (flask.Response|None): A stream of the response \n
    Raises:
        wergzeug.exceptions.HTTPException: If `on_fail` is set. \n
    '''
    allowed_headers = [
        'Content-Type', 'Content-Disposition', 'Content-Length',
        'Range', 'accept-ranges', 'Content-Range'
    ]
    if 'response' not in data:
        app.logger.error("stream.response requires a 'response' variable to be set.")
        abort(500)
    resp = data[data["response"]] if data["response"] in data else None

    # Not a valid response
    if not isinstance(resp, RequestsResponse):
        dp_abort(503)
        return None
    # Valid response, but not a success (bool check on resp fails if http code is 400 to 600)
    if not resp:
        dp_abort(resp.status_code)
        return None

    stream_response = FlaskResponse(
        resp.iter_content(chunk_size=app.config['STREAM_CHUNK_SIZE']),
        status=resp.status_code
    )
    for header in resp.headers.keys():
        # Case insensitive header matching
        if header.lower() in [allowed_key.lower() for allowed_key in allowed_headers]:
            stream_response.headers.set(header, resp.headers.get(header))
    return stream_response

string(data)

Stream a data variable as string data to the output

Parameters:

Name Type Description Default
data dict

Processor arguments and all other data loaded from previous data processors.

  • var str: The name of the variable whose content should be sent.

  • mimetype str: The mimetype to send for the data (default: text/plain).

required

Returns:

Type Description
Response | None

A stream of the response

Source code in sandhill/processors/stream.py
def string(data):
    '''
    Stream a data variable as string data to the output \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `var` _str_: The name of the variable whose content should be sent.\n
            * `mimetype` _str_: The mimetype to send for the data (default: text/plain).\n
    Returns:
        (flask.Response|None): A stream of the response \n
    '''
    if 'var' not in data or not data.get(data['var']):
        app.logger.error("requires that 'var' is set to name of non-empty data variable")
        abort(500)
    mimetype = data.get('mimetype', 'text/plain')

    string_response = make_response(data.get(data['var']))
    string_response.mimetype = mimetype
    return string_response

sandhill.processors.string

Processor for string functions

replace(data)

For the given name in data, replace all occurances of an old string with new string and return the result.

Parameters:

Name Type Description Default
data dict

Processor arguments and all other data loaded from previous data processors.

  • name str|requests.Response: The context in which to find and replace.

  • old str: The string to find.

  • new str: The string to replace it with.

required

Returns:

Type Description
str | Response | None

The same type as data[name] was, only now with string replacements done. Or None if the 'name' value is None or missing.

Source code in sandhill/processors/string.py
def replace(data):
    '''
    For the given `name` in data, replace all occurances of an old string with new string and \
    return the result. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `name` _str|requests.Response_: The context in which to find and replace.\n
            * `old` _str_: The string to find.\n
            * `new` _str_: The string to replace it with.\n
    Returns:
        (str|requests.Response|None): The same type as `data[name]` was, only now with string \
            replacements done. Or None if the 'name' value is None or missing. \n
    '''
    data_copy = deepcopy(data.get(data.get('name')))
    cont_copy = data_copy if data_copy is not None else ''

    # TODO able to handle regular string data (non-JSON)
    # TODO handle FlaskResponse as well
    if isinstance(data_copy, RequestsResponse):
        cont_copy = data_copy.text
    if cont_copy and not isinstance(cont_copy, str):
        cont_copy = json.dumps(cont_copy)
    cont_copy = cont_copy.replace(data['old'], data['new'])

    # pylint: disable=protected-access
    if isinstance(data_copy, RequestsResponse):
        data_copy._content = cont_copy.encode()
        data_copy.headers['Content-Length'] = len(data_copy._content)
    elif cont_copy:
        data_copy = json.loads(cont_copy)
    return data_copy

sandhill.processors.template

Processor for rendering templates

render(data)

Render the response as a template or directly as a Flask Response.

Parameters:

Name Type Description Default
data dict

Processor arguments and all other data loaded from previous data processors.

  • file str: Path to the template file.
required

Returns:

Type Description
Response

The rendered template in a Flask response.

Raises:

Type Description
HTTPException

If file is not set in data.

Source code in sandhill/processors/template.py
@catch(TemplateError, "An error has occured when rendering {data[file]}: {exc}", abort=500)
@catch(TemplateNotFound, "Failure when rendering {data[file]}. " \
       "Could not find template to render: {exc}", abort=501)
def render(data):
    '''
    Render the response as a template or directly as a Flask Response. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `file` _str_: Path to the template file.\n
    Returns:
        (flask.Response): The rendered template in a Flask response. \n
    Raises:
        wergzeug.exceptions.HTTPException: If `file` is not set in data. \n
    '''
    if 'file' not in data:
        app.logger.error("template.render: 'file' not set in data; unable to render response.")
        abort(500)
    template = data["file"]

    return make_response(render_template(template, **data))

render_string(data)

Given a Jinja2 template string, it will render that template to a string and set it in the name variable.

Parameters:

Name Type Description Default
data dict

Processor arguments and all other data loaded from previous data processors.

  • value str: The template string to render.
required

Returns:

Type Description
str | None

The rendered template string, or None if no value key was in data.

Source code in sandhill/processors/template.py
@catch(TemplateError, "Invalid template provided for: {data[value]}. Error: {exc}",
       return_val=None)
def render_string(data):
    """
    Given a Jinja2 template string, it will render that template to a string and set it in
    the `name` variable. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `value` _str_: The template string to render.\n
    Returns:
        (str|None): The rendered template string, or None if no `value` key was in data. \n
    """
    evaluation = None
    if 'value' in data:
        evaluation = render_template_string(data['value'], data)
    return evaluation

sandhill.processors.xml

XML Data Processors

load(data: dict) -> etree._Element

Load an XML document.

Parameters:

Name Type Description Default
data dict

Processor arguments and all other data loaded from previous data processors.

  • source str: Either path, url, or string to load.
required

Returns:

Type Description
_Element | None

The loaded XML object tree, or None if source not in data.

Source code in sandhill/processors/xml.py
def load(data: dict) -> etree._Element: # pylint: disable=protected-access
    '''
    Load an XML document. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `source` _str_: Either path, url, or string to load.\n
    Returns:
        (lxml.etree._Element|None): The loaded XML object tree, or None if `source` not in data. \n
    '''
    if 'source' not in data:
        app.logger.warning("No source XML provided. Missing key: 'source'")
        return None
    return xml.load(data['source'])

xpath(data: dict) -> list

Retrieve the matching xpath content from an XML source.

Parameters:

Name Type Description Default
data dict

Processor arguments and all other data loaded from previous data processors.

  • xpath str: An XPath query.

  • source str: Either path, url, or string to load.

required

Returns:

Type Description
list

Matching results from XPath query, or None if any required keys are not in data.

Source code in sandhill/processors/xml.py
def xpath(data: dict) -> list:
    '''
    Retrieve the matching xpath content from an XML source. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `xpath` _str_: An XPath query.\n
            * `source` _str_: Either path, url, or string to load.\n
    Returns:
        (list): Matching results from XPath query, or None if any required keys are not in data. \n
    '''
    if 'xpath' not in data:
        app.logger.warning("No xpath search provided. Missing key: 'xpath'")
        return None
    return xml.xpath(load(data), data['xpath'])

xpath_by_id(data: dict) -> dict

For the matching xpath content, organize into dict with key being the id param of the matched tags. Elements without an id attribute will not be returned.

Parameters:

Name Type Description Default
data dict

Processor arguments and all other data loaded from previous data processors.

  • xpath str: An XPath query.

  • source str: Either path, url, or string to load.

required

Returns:

Type Description
dict

Dict mapping with keys of id, and values of content within matching elements, or None if missing any required keys in data.

Source code in sandhill/processors/xml.py
def xpath_by_id(data: dict) -> dict:
    '''
    For the matching xpath content, organize into dict with key \
    being the id param of the matched tags. Elements without an id attribute \
    will not be returned. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `xpath` _str_: An XPath query.\n
            * `source` _str_: Either path, url, or string to load.\n
    Returns:
        (dict): Dict mapping with keys of id, and values of content within matching elements, \
            or None if missing any required keys in data. \n
    '''
    if 'xpath' not in data:
        app.logger.warning("No xpath search provided. Missing key: 'xpath'")
        return None
    return xml.xpath_by_id(load(data), data['xpath'])

Developing a Data Processor

Sandhill makes developing your own data processors quite easy, perhaps best explained with a simple example.

Simple Processor

Within your instance/ ensure there is processors/ sub-directory. If not create it.

Next create a new Python file in instance/processors/; we'll call our example file myproc.py (the name of the file is up to you). Next up, we create a function in that file which must accept a single parameter data.

# instance/processors/myproc.py
"""The myproc data processors"""

def shout(data):
    """The shout data processor; will upper case all text and add an exlcaimation point."""
    ...

The data here is a dict containing all loaded data from a route up until this point. If previous data processors loaded anything, it will be present in data. Sandhill always includes the standard view_args key which contains any route variables. Also, all keys arguments set for this data processor call will also be in data.

For our shout() processor, let's say we want to expect a key words, which will contain the data we want to transform with our processor.

def shout(data):
    """The shout data processor; will upper case all text and add an exlcaimation point."""
    return data["words"].upper() + "!"

That's mostly it! Now we could include our custom data processor in a route with this entry in our route's JSON data list:

{
    "name": "loudly",
    "processor": "myproc.shout",
    "words": "This is my statement"
}

And after the data processor runs, Sandhill will have the following in your route's data dict:

{
    "data": {
        ... # other route data as may be appropriate
        "loudly": "THIS IS MY STATEMENT!"
    }
}

Improving your Processor

But what if someone fails to pass in the words key? Right now that would result in a KeyError.

In Sandhill, best practice for data processors is to return None on most failures; that is unless the on_fail key is set in data. In this case, we ought to abort with the value of on_fail.

To assist with this, Sandhill provide the dp_abort() function (short for "data processor abort") which will do most of the heavy lifting for you. Let's rework our method to handle failures.

from sandhill.utils.error_handling import dp_abort

def shout(data):
    """The shout data processor; will upper case all text and add an exlcaimation point."""
    if "words" not in data:
        # Here we choose HTTP status 500 for default, but `on_fail` value will take precedence.
        dp_abort(500)
        # If no `on_fail` is set, None indicates failure, so always return None after a db_abort().
        return None
    return data["words"].upper() + "!"

With that, you have a nicely functioning data processor! For more advanced examples, feel free to peek at the source code of the built-in Sandhill data processors above.