Sandhill Data Processors

Sandhill routes are composed of a list of data processors. These are single actions that Sandhill may take while processing a request.

Things data processors can do:

Gathering data by querying an API
Loading configuration from a file
Transforming or manipulating data
Performing some evaluation or computation

If the data processors provided with Sandhill are not sufficient, you can develop your own data processor as well.

Data Processors Included With Sandhill

evaluate - Evaluate a set of conditions and return a truthy result.
file - Find and load files from the instance.
iiif - Calls related to IIIF APIs.
request - Do generic API calls and redirects.
solr - Calls to a Solr endpoint.
stream - Stream data to client from a previously open connection.
string - Simple string manipualtion.
template - Render files or strings through Jinja templating.
xml - Load XML or perform XPath queries.

Common Data Processor Arguments

These arguments are valid to pass to all data processors. Data processors should be written to handle these arguments appropriately.

`name` - Required

Defines the label under which the data processor will run. Results from the processor will be stored under this key in the data passed to subsequent processors.

`processor` - Required

Specifies the processor and method to call within the processor, period delimited.

{
    "name": "searchresults",
    "processor": "solr.search"
}

`on_fail` - Optional

Unless specified, the data processor is allowed to fail silently and proceed onto the next processor.
When specified, the value must be the integer of a valid 4xx or 5xx HTTP Status Code or 0. If the data processor fails and on_fail set, Sandhill will abort the page request and return an error page with the selected code. If set to 0, the processor may choose to return an appropriate code to the type of failure.

`when` - Optional

A string which is first rendered through Jinja and then evaluated for truth. If the value is not truthy, then the given data processor will be skipped.

`sandhill.processors.evaluate`

Processor for evaluation functions

`conditions(data)`

Evaluates the condtions specified in the processor section of the configs.

Detailed documentation

Parameters:

Name	Type	Description	Default
`data`	`dict`	Processor arguments and all other data loaded from previous data processors. `conditions` string: Indicates the location in `data` where conditions to be evaluated are. This is a `.` delimited string of dict keys and/or list indexes. `match_all` _boolean__: Whether to require all conditions to match for success; if false, any single match will be considered a success. `abort_on_match` boolean, optional: If true, then trigger an abort when the conditions are truthy.	required

Returns:

Type	Description
`bool \| None`	Returns True if given conditions match appropriate to the parameters, False if they do not, or None on failure

Raises: HTTPException: If abort_on_match is true and the evaluation is truthy.

Source code in sandhill/processors/evaluate.py

def conditions(data):
    """
    Evaluates the condtions specified in the processor section of the configs.\n
    [Detailed documentation](https://msu-libraries.github.io/sandhill/evaluate-conditions/) \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `conditions` _string_: Indicates the location in `data` where conditions to be \
                evaluated are.\n
                This is a `.` delimited string of dict keys and/or list indexes.\n
            * `match_all` _boolean__: Whether to require all conditions to match for success; \
                if false, any single match will be considered a success.\n
            * `abort_on_match` _boolean, optional_: If true, then trigger an abort when the \
                conditions are truthy.\n
    Returns:
        (bool|None): Returns True if given conditions match appropriate to the \
                     parameters, False if they do not, or None on failure
    Raises:
        HTTPException: If `abort_on_match` is true and the evaluation is truthy.
    """
    # TODO refactor suggestion:
    #   replace `match_all` with `match` key; new possible values:
    #       "all"  : similar to match_all: True
    #       "any"  : similar to match_all: False
    #       "none" : new state which considers success when 0 matches
    evaluation = None
    condition_keys = ifnone(data, 'conditions', '')
    _conditions = getdescendant(data, condition_keys if condition_keys else [])
    if 'match_all' not in data or not isinstance(data['match_all'], bool):
        app.logger.warning("Processor 'evaluate' is missing or has invalid 'match_all': "
                           + ifnone(data, 'match_all', "not defined"))
    elif not _conditions:
        ick = data['conditions'] if 'conditions' in data else "'conditions' undefined"
        app.logger.warning(f"Invalid condition keys: {ick}")
    else:
        evaluation = evaluate_conditions(_conditions,
                                         data, match_all=data['match_all']) > 0
        if 'abort_on_match' in data and data['abort_on_match'] and evaluation:
            dp_abort(503)
            evaluation = None

    return evaluation

`sandhill.processors.file`

Processing functions for files

`create_json_response(data)`

Wrapper for load_json that will return a JSON response object.

This can be used to stream JSON instead of loading it to use it as data.

Parameters:

Name	Type	Description	Default
`data`	`dict`	Processor arguments and all other data loaded from previous data processors. `path` string: A single file path to search for. `paths` list: A list of file paths to search for.	required

Returns:

Type	Description
`Response`	The response object with the JSON data loaded into it.

Source code in sandhill/processors/file.py

def create_json_response(data):
    '''
    Wrapper for `load_json` that will return a JSON response object. \n
    This can be used to stream JSON instead of loading it to use it as data. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `path` _string_: A single file path to search for.\n
            * `paths` _list_: A list of file paths to search for.\n
    Returns:
        (requests.Response): The response object with the JSON data loaded into it.
    '''
    resp = RequestsResponse()
    resp.status_code = 200

    content = load_json(data)
    if content:
        resp.raw = io.StringIO(json.dumps(content))
    return resp

`load_json(data)`

Search for files at the paths within 'path' and 'paths' keys of data. Will load JSON from the first file it finds and then return the result.

If both 'path' and 'paths' are set, paths from both will be searched starting with 'path' first.

Parameters:

Name	Type	Description	Default
`data`	`dict`	Processor arguments and all other data loaded from previous data processors. `path` string: A single file path to search for. `paths` list: A list of file paths to search for.	required

Returns:

Type	Description
`dict \| None`	The loaded JSON data or None if no file was found.

Note: Paths must be relative to the instance/ directory.

Source code in sandhill/processors/file.py

def load_json(data):
    '''
    Search for files at the paths within 'path' and 'paths' keys of `data`. \
    Will load JSON from the first file it finds and then return the result. \n
    If both 'path' and 'paths' are set, paths from both will be searched \
    starting with 'path' first.\n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `path` _string_: A single file path to search for.\n
            * `paths` _list_: A list of file paths to search for.\n
    Returns:
        (dict|None): The loaded JSON data or None if no file was found.
    Note:
        Paths must be relative to the `instance/` directory.
    '''
    file_data = None
    # loop over each provided path and stop when one is found
    if "path" in data:
        data.setdefault("paths", []).insert(0, data["path"])
    if "paths" in data:
        for path in data["paths"]:
            full_path = os.path.join(app.instance_path, path.lstrip("/"))
            if os.path.exists(full_path):
                file_data = load_json_config(full_path)
                break
    return file_data

`load_matched_json(data)`

Loads all the config files and returns the file that has the most matched conditions.

Parameters:

Name	Type	Description	Default
`data`	`dict`	Processor arguments and all other data loaded from previous data processors. `location` string: A directory path within the instance with JSON files containing `match_conditions` keys.	required

Returns:

Type	Description
`dict \| None`	The loaded JSON data from the file that most matched its conditions, or None if no files matched.

Source code in sandhill/processors/file.py

def load_matched_json(data):
    """
    Loads all the config files and returns the file that has the most \
    [matched conditions](#TODO). \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `location` _string_: A directory path within the instance \
               with JSON files containing `match_conditions` keys.\n
    Returns:
        (dict|None): The loaded JSON data from the file that most matched its conditions, \
            or None if no files matched.
    """
    file_data = None
    matched_dict = {}
    config_dir_path = None
    if 'location' in data:
        config_dir_path = os.path.join(app.instance_path, data['location'])

    config_files = load_json_configs(config_dir_path, recurse=True)
    for path, config in config_files.items():
        if "match_conditions" in config:
            try:
                matched_dict[path] = evaluate_conditions(config['match_conditions'], data)
            except KeyError:
                app.logger.warning(
                    f"Missing 'evaluate' and/or 'match_when' for 'match_condition' in: {path}")
                continue
    matched_path = max(matched_dict.items(), key=itemgetter(1))[0] if matched_dict else None

    for path, score in matched_dict.items():
        app.logger.debug(f"load_matched_json(score={score}, path={path})")

    # Ensure number of matches is greater than 0
    if matched_path in matched_dict and matched_dict[matched_path]:
        app.logger.debug(f"load_matched_json(matched={matched_path})")
        file_data = config_files[matched_path]

    return file_data

`sandhill.processors.iiif`

Processor for IIIF

`load_image(data, url=None, api_get_function=api_get)`

Load and return a IIIF image.

Parameters:

Name	Type	Description	Default
`data`	`dict`	route data where `data[view_ags][iiif_path]` and `data[identifier]` exist	required
`url`	`str`	Override the IIIF server URL from the default IIIF_BASE in the configs	`None`
`api_get_function`	`function`	function to use when making the GET request	`api_get`

Returns:

Type	Description
`Response \| None`	Requested image from IIIF, or None on failure.

Raises:

Type	Description
`HTTPException`	On failure if `on_fail` is set.

Source code in sandhill/processors/iiif.py

@catch(RequestException, "Call to IIIF Server failed: {exc}", abort=503)
def load_image(data, url=None, api_get_function=api_get):
    '''
    Load and return a IIIF image. \n
    Args:
        data (dict): route data where `data[view_ags][iiif_path]` and `data[identifier]` exist \n
        url (str): Override the IIIF server URL from the default IIIF_BASE in the configs \n
        api_get_function (function): function to use when making the GET request \n
    Returns:
        (requests.Response|None): Requested image from IIIF, or None on failure. \n
    Raises:
        (HTTPException): On failure if `on_fail` is set. \n
    '''
    image = None
    url = establish_url(url, getconfig('IIIF_BASE', None))
    if 'iiif_path' in data['view_args'] and 'identifier' in data:
        image = api_get_function(
            url=os.path.join(url, data['identifier'], data['view_args']['iiif_path']),
            stream=True)
    else:
        app.logger.warning("Could not call IIIF Server; missing identifier or iiif_path")
        dp_abort(500)

    if not image.ok:
        app.logger.debug(f"Call to IIIF Server returned {image.status_code}")
        dp_abort(image.status_code)
        image = None
    return image

`sandhill.processors.request`

Processor for requests

`api_json(data)`

Make a call to an API and return the response content as JSON.

Parameters:

Name	Type	Description	Default
`data`	`dict`	Processor arguments and all other data loaded from previous data processors. `url` str: The URL to make the API call to. `method` str, optional: The HTTP method to use. Default: `"GET"` `timeout` int, optional: The request timeout in seconds. Default: `10`	required

Returns:

Type	Description
`dict`	The JSON response from the API call.

Raises:

Type	Description
`HTTPException`	On failure if `on_fail` is set.

Source code in sandhill/processors/request.py

@catch(RequestException, "Call to {data[url]} returned {exc}.", abort=503)
def api_json(data):
    '''
    Make a call to an API and return the response content as JSON. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `url` _str_: The URL to make the API call to.\n
            * `method` _str, optional_: The HTTP method to use.\n
                Default: `"GET"` \n
            * `timeout` _int, optional_: The request timeout in seconds.\n
                Default: `10` \n
    Returns:
        (dict): The JSON response from the API call. \n
    Raises:
        (HTTPException): On failure if `on_fail` is set. \n
    '''
    method = data['method'] if 'method' in data else 'GET'
    app.logger.debug(f"Connecting to {data['url']}")
    response = requests.request(
        method=method,
        url=data["url"],
        timeout=data.get('timeout', 10)
    )

    if not response.ok:
        app.logger.warning(f"Call to {data['url']} returned a non-ok status code: " \
                           f"{response.status_code}. {response.__dict__}")
        if 'on_fail' in data:
            abort(response.status_code if data['on_fail'] == 0 else data['on_fail'])

    try:
        return response.json()
    except JSONDecodeError:
        app.logger.warning(f"Call returned from {data['url']} that was not JSON.")
        dp_abort(503)
        return {}

`redirect(data)`

Trigger a redirect response to specified url.

Parameters:

Name	Type	Description	Default
`data`	`dict`	Processor arguments and all other data loaded from previous data processors.	required
`*`	`location` _str_	URL to redirect client to.	required
`*`	`code` _int, optional_	HTTP status code to redirect with. Default: 302	required

Returns:

Type	Description
`Response`	The flask response object with the included redirect.

Raises:

Type	Description
`HTTPException`	If the `location` key is not present.

Source code in sandhill/processors/request.py

@catch(KeyError, "Processor request.redirect called without a 'location' given.", abort=500)
def redirect(data):
    '''
    Trigger a redirect response to specified url. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
        * `location` _str_: URL to redirect client to.\n
        * `code` _int, optional_: HTTP status code to redirect with. \n
            Default: 302 \n
    Returns:
        (flask.Response): The flask response object with the included redirect. \n
    Raises:
        (HTTPException): If the `location` key is not present. \n
    '''
    code = data['code'] if 'code' in data else 302
    return FlaskRedirect(data['location'], code=code)

`sandhill.processors.solr`

Wrappers for making API calls to a Solr node.

`search(data, url=None, api_get_function=api_get)`

Perform a configured Solr search and return the result.

Parameters:

Name	Type	Description	Default
`data`	`dict`	Processor arguments and all other data loaded from previous data processors. `path` string, `paths` list: The path to a search config file. Loaded per file.load_json. `params` dict: Query arguments to pass to Solr. `record_keys` string, optional: Return this descendant path from the response JSON. Default: `response.docs`	required
`url`	`str`	Overrides the default SOLR_URL normally retrieved from the Sandhill config file.	`None`
`api_get_function`	`function`	Function used to call Solr with. Used in unit tests.	`api_get`

Returns:

Type	Description
`dict \| Response`	A dict of the loaded JSON response, or a `flask.Response` instance if `view_args.format` is `text/json`.

Source code in sandhill/processors/solr.py

def search(data, url=None, api_get_function=api_get):
    """
    Perform a [configured Solr search](#TODO) and return the result. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `path` _string_, `paths` _list_: The path to a search config file. Loaded \
              per [file.load_json](#TODO).\n
            * `params` _dict_: Query arguments to pass to Solr.\n
            * `record_keys` _string, optional_: Return this [descendant path](#TODO) from \
              the response JSON. Default: `response.docs`\n
        url (str): Overrides the default SOLR_URL normally retrieved from \
                   the [Sandhill config](#TODO) file.\n
        api_get_function (function): Function used to call Solr with. Used in unit tests.\n
    Returns:
        (dict|flask.Response): A dict of the loaded JSON response, or a `flask.Response` instance \
                               if `view_args.format` is `text/json`. \n
    """
    # TODO module should return None and call dp_abort instead of abort
    # TODO allow "path"
    if 'paths' not in data or not data['paths']:
        app.logger.error(
            f"Missing 'config' setting for processor "
            f"'{data['processor']}' with name '{data['name']}'")
        abort(500)

    # Load the search settings
    search_config = load_json(data)
    if 'solr_params' not in search_config:
        app.logger.error(
            f"Missing 'solr_params' inside search config file(s) '{ str(data['paths']) }'")
        abort(500)
    if 'config_ext' in data and 'solr_params' in data['config_ext']:
        solr_config = recursive_merge(
            search_config['solr_params'],
            data['config_ext']['solr_params']
        )
    else:
        solr_config = search_config['solr_params']

    # override default parameters with request query parameters
    data['params'] = overlay_with_query_args(solr_config, \
            request_args=data.get('params', None),
            allow_undefined=True)

    solr_results = select(data, url, api_get_function)

    # check if the json results were requested
    result_format = match_request_format('format', ['text/html', 'application/json'])
    if result_format == 'application/json':
        solr_results = jsonify(solr_results)

    return solr_results

`select(data, url=None, api_get_function=api_get)`

Perform a Solr select call and return the loaded JSON response.

"name": "mysearch",
"processor": "solr.search",
"params": { "q": "*", "rows":"20" }

Parameters:

Name	Type	Description	Default
`data`	`dict`	Processor arguments and all other data loaded from previous data processors. `params` dict: Query arguments to pass to Solr. `record_keys` string, optional: Return this descendant path from the response JSON.	required
`url`	`str`	Overrides the default SOLR_URL normally retrieved from the Sandhill config file.	`None`
`api_get_function`	`function`	Function used to call Solr with. Used in unit tests.	`api_get`

Returns:

Type	Description
`dict \| None`	The loaded JSON data or None if nothing matched.

Raises:

Type	Description
`HTTPException`	If `on_fail` is set.

Source code in sandhill/processors/solr.py

@catch((RequestException, HTTPError), "Call to Solr failed: {exc}", abort=503)
@catch(JSONDecodeError, "Call returned from Solr that was not JSON.", abort=503)
@catch(KeyError, "Missing url component: {exc}", abort=400) # Missing 'params' key
def select(data, url=None, api_get_function=api_get):
    """
    Perform a Solr select call and return the loaded JSON response. \n
    ```json
    "name": "mysearch",
    "processor": "solr.search",
    "params": { "q": "*", "rows":"20" }
    ``` \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `params` _dict_: Query arguments to pass to Solr.\n
            * `record_keys` _string, optional_: Return this [descendant path](#TODO) from \
              the response JSON.\n
        url (str): Overrides the default SOLR_URL normally retrieved from \
                   the [Sandhill config](#TODO) file.\n
        api_get_function (function): Function used to call Solr with. Used in unit tests.\n
    Returns:
        (dict|None): The loaded JSON data or None if nothing matched. \n
    Raises:
        wergzeug.exceptions.HTTPException: If `on_fail` is set. \n
    """

    response = None
    url = establish_url(url, getconfig('SOLR_URL', None))
    url = url + "/select"

    # query solr with the parameters
    app.logger.debug(f"Connecting to {url}?{urlencode(data['params'])}")
    response = api_get_function(url=url, params=data['params'])
    response_json = None
    if not response.ok:
        app.logger.warning(f"Call to Solr returned {response.status_code}. {response}")
        try:
            if 'error' in response.json():
                app.logger.warning(
                    f"Error returned from Solr: {str(response.json()['error'])}")
        except JSONDecodeError:
            pass
        dp_abort(response.status_code)
    else:
        response_json = response.json()
        # Get the records that exist at the provided record_keys
        if 'record_keys' in data and data['record_keys']:
            response_json = getdescendant(response_json, data['record_keys'])

    return response_json

`select_record(data, url=None, api_get_function=api_get)`

Perform a Solr select call and return the first result from the response.

Parameters:

Name	Type	Description	Default
`data`	`dict`	Processor arguments and all other data loaded from previous data processors. `params` dict: Query arguments to pass to Solr. `record_keys` string, optional: Return this descendant path from the response JSON. Default: `response.docs`	required
`url`	`str`	Overrides the default SOLR_URL normally retrieved from the Sandhill config file.	`None`
`api_get_function`	`function`	Function used to call Solr with. Used in unit tests.	`api_get`

Returns:

Type	Description
`Any`	The first item matched by `record_keys` in the JSON response, otherwise None.

Raises:

Type	Description
`HTTPException`	If `on_fail` is set.

Source code in sandhill/processors/solr.py

def select_record(data, url=None, api_get_function=api_get):
    """
    Perform a Solr select call and return the first result from the response. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `params` _dict_: Query arguments to pass to Solr.\n
            * `record_keys` _string, optional_: Return this [descendant path](#TODO) from \
              the response JSON. Default: `response.docs`\n
        url (str): Overrides the default SOLR_URL normally retrieved from the \
                   [Sandhill config](#TODO) file.\n
        api_get_function (function): Function used to call Solr with. Used in unit tests.\n
    Returns:
        (Any): The first item matched by `record_keys` in the JSON response, otherwise None. \n
    Raises:
        wergzeug.exceptions.HTTPException: If `on_fail` is set. \n
    """
    data['record_keys'] = ifnone(data, 'record_keys', 'response.docs')
    records = select(data, url, api_get_function)

    if records and isinstance(records, Sequence):
        return records[0]
    return None

`sandhill.processors.stream`

Processor for streaming data

`response(data)`

Stream a Requests library response that was previously loaded.

Parameters:

Name	Type	Description	Default
`data`	`dict`	Processor arguments and all other data loaded from previous data processors. `response` str: The key where the response is located. Key from `data[response]` requests.Response: The response to stream.	required

Returns:

Type	Description
`Response \| None`	A stream of the response

Raises:

Type	Description
`HTTPException`	If `on_fail` is set.

Source code in sandhill/processors/stream.py

def response(data):
    '''
    Stream a Requests library response that was previously loaded. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `response` _str_: The key where the response is located.\n
            * Key from `data[response]` _requests.Response_: The response to stream.\n
    Returns:
        (flask.Response|None): A stream of the response \n
    Raises:
        wergzeug.exceptions.HTTPException: If `on_fail` is set. \n
    '''
    allowed_headers = [
        'Content-Type', 'Content-Disposition', 'Content-Length',
        'Range', 'accept-ranges', 'Content-Range'
    ]
    if 'response' not in data:
        app.logger.error("stream.response requires a 'response' variable to be set.")
        abort(500)
    resp = data[data["response"]] if data["response"] in data else None

    # Not a valid response
    if not isinstance(resp, RequestsResponse):
        dp_abort(503)
        return None
    # Valid response, but not a success (bool check on resp fails if http code is 400 to 600)
    if not resp:
        dp_abort(resp.status_code)
        return None

    stream_response = FlaskResponse(
        resp.iter_content(chunk_size=app.config['STREAM_CHUNK_SIZE']),
        status=resp.status_code
    )
    for header in resp.headers.keys():
        # Case insensitive header matching
        if header.lower() in [allowed_key.lower() for allowed_key in allowed_headers]:
            stream_response.headers.set(header, resp.headers.get(header))
    return stream_response

`string(data)`

Stream a data variable as string data to the output

Parameters:

Name	Type	Description	Default
`data`	`dict`	Processor arguments and all other data loaded from previous data processors. `var` str: The name of the variable whose content should be sent. `mimetype` str: The mimetype to send for the data (default: text/plain).	required

Returns:

Type	Description
`Response \| None`	A stream of the response

Source code in sandhill/processors/stream.py

def string(data):
    '''
    Stream a data variable as string data to the output \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `var` _str_: The name of the variable whose content should be sent.\n
            * `mimetype` _str_: The mimetype to send for the data (default: text/plain).\n
    Returns:
        (flask.Response|None): A stream of the response \n
    '''
    if 'var' not in data or not data.get(data['var']):
        app.logger.error("requires that 'var' is set to name of non-empty data variable")
        abort(500)
    mimetype = data.get('mimetype', 'text/plain')

    string_response = make_response(data.get(data['var']))
    string_response.mimetype = mimetype
    return string_response

`sandhill.processors.string`

Processor for string functions

`replace(data)`

For the given name in data, replace all occurances of an old string with new string and return the result.

Parameters:

Name	Type	Description	Default
`data`	`dict`	Processor arguments and all other data loaded from previous data processors. `name` str\|requests.Response: The context in which to find and replace. `old` str: The string to find. `new` str: The string to replace it with.	required

Returns:

Type	Description
`str \| Response \| None`	The same type as `data[name]` was, only now with string replacements done. Or None if the 'name' value is None or missing.

Source code in sandhill/processors/string.py

def replace(data):
    '''
    For the given `name` in data, replace all occurances of an old string with new string and \
    return the result. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `name` _str|requests.Response_: The context in which to find and replace.\n
            * `old` _str_: The string to find.\n
            * `new` _str_: The string to replace it with.\n
    Returns:
        (str|requests.Response|None): The same type as `data[name]` was, only now with string \
            replacements done. Or None if the 'name' value is None or missing. \n
    '''
    data_copy = deepcopy(data.get(data.get('name')))
    cont_copy = data_copy if data_copy is not None else ''

    # TODO able to handle regular string data (non-JSON)
    # TODO handle FlaskResponse as well
    if isinstance(data_copy, RequestsResponse):
        cont_copy = data_copy.text
    if cont_copy and not isinstance(cont_copy, str):
        cont_copy = json.dumps(cont_copy)
    cont_copy = cont_copy.replace(data['old'], data['new'])

    # pylint: disable=protected-access
    if isinstance(data_copy, RequestsResponse):
        data_copy._content = cont_copy.encode()
        data_copy.headers['Content-Length'] = len(data_copy._content)
    elif cont_copy:
        data_copy = json.loads(cont_copy)
    return data_copy

`sandhill.processors.template`

Processor for rendering templates

`render(data)`

Render the response as a template or directly as a Flask Response.

Parameters:

Name	Type	Description	Default
`data`	`dict`	Processor arguments and all other data loaded from previous data processors. `file` str: Path to the template file.	required

Returns:

Type	Description
`Response`	The rendered template in a Flask response.

Raises:

Type	Description
`HTTPException`	If `file` is not set in data.

Source code in sandhill/processors/template.py

@catch(TemplateError, "An error has occured when rendering {data[file]}: {exc}", abort=500)
@catch(TemplateNotFound, "Failure when rendering {data[file]}. " \
       "Could not find template to render: {exc}", abort=501)
def render(data):
    '''
    Render the response as a template or directly as a Flask Response. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `file` _str_: Path to the template file.\n
    Returns:
        (flask.Response): The rendered template in a Flask response. \n
    Raises:
        wergzeug.exceptions.HTTPException: If `file` is not set in data. \n
    '''
    if 'file' not in data:
        app.logger.error("template.render: 'file' not set in data; unable to render response.")
        abort(500)
    template = data["file"]

    return make_response(render_template(template, **data))

`render_string(data)`

Given a Jinja2 template string, it will render that template to a string and set it in the name variable.

Parameters:

Name	Type	Description	Default
`data`	`dict`	Processor arguments and all other data loaded from previous data processors. `value` str: The template string to render.	required

Returns:

Type	Description
`str \| None`	The rendered template string, or None if no `value` key was in data.

Source code in sandhill/processors/template.py

@catch(TemplateError, "Invalid template provided for: {data[value]}. Error: {exc}",
       return_val=None)
def render_string(data):
    """
    Given a Jinja2 template string, it will render that template to a string and set it in
    the `name` variable. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `value` _str_: The template string to render.\n
    Returns:
        (str|None): The rendered template string, or None if no `value` key was in data. \n
    """
    evaluation = None
    if 'value' in data:
        evaluation = render_template_string(data['value'], data)
    return evaluation

`sandhill.processors.xml`

XML Data Processors

`load(data: dict) -> etree._Element`

Load an XML document.

Parameters:

Name	Type	Description	Default
`data`	`dict`	Processor arguments and all other data loaded from previous data processors. `source` str: Either path, url, or string to load.	required

Returns:

Type	Description
`_Element \| None`	The loaded XML object tree, or None if `source` not in data.

Source code in sandhill/processors/xml.py

def load(data: dict) -> etree._Element: # pylint: disable=protected-access
    '''
    Load an XML document. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `source` _str_: Either path, url, or string to load.\n
    Returns:
        (lxml.etree._Element|None): The loaded XML object tree, or None if `source` not in data. \n
    '''
    if 'source' not in data:
        app.logger.warning("No source XML provided. Missing key: 'source'")
        return None
    return xml.load(data['source'])

`xpath(data: dict) -> list`

Retrieve the matching xpath content from an XML source.

Parameters:

Name	Type	Description	Default
`data`	`dict`	Processor arguments and all other data loaded from previous data processors. `xpath` str: An XPath query. `source` str: Either path, url, or string to load.	required

Returns:

Type	Description
`list`	Matching results from XPath query, or None if any required keys are not in data.

Source code in sandhill/processors/xml.py

def xpath(data: dict) -> list:
    '''
    Retrieve the matching xpath content from an XML source. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `xpath` _str_: An XPath query.\n
            * `source` _str_: Either path, url, or string to load.\n
    Returns:
        (list): Matching results from XPath query, or None if any required keys are not in data. \n
    '''
    if 'xpath' not in data:
        app.logger.warning("No xpath search provided. Missing key: 'xpath'")
        return None
    return xml.xpath(load(data), data['xpath'])

`xpath_by_id(data: dict) -> dict`

For the matching xpath content, organize into dict with key being the id param of the matched tags. Elements without an id attribute will not be returned.

Parameters:

Name	Type	Description	Default
`data`	`dict`	Processor arguments and all other data loaded from previous data processors. `xpath` str: An XPath query. `source` str: Either path, url, or string to load.	required

Returns:

Type	Description
`dict`	Dict mapping with keys of id, and values of content within matching elements, or None if missing any required keys in data.

Source code in sandhill/processors/xml.py

def xpath_by_id(data: dict) -> dict:
    '''
    For the matching xpath content, organize into dict with key \
    being the id param of the matched tags. Elements without an id attribute \
    will not be returned. \n
    Args:
        data (dict): Processor arguments and all other data loaded from previous data processors.\n
            * `xpath` _str_: An XPath query.\n
            * `source` _str_: Either path, url, or string to load.\n
    Returns:
        (dict): Dict mapping with keys of id, and values of content within matching elements, \
            or None if missing any required keys in data. \n
    '''
    if 'xpath' not in data:
        app.logger.warning("No xpath search provided. Missing key: 'xpath'")
        return None
    return xml.xpath_by_id(load(data), data['xpath'])

Developing a Data Processor

Sandhill makes developing your own data processors quite easy, perhaps best explained with a simple example.

Simple Processor

Within your instance/ ensure there is processors/ sub-directory. If not create it.

Next create a new Python file in instance/processors/; we'll call our example file myproc.py (the name of the file is up to you). Next up, we create a function in that file which must accept a single parameter data.

# instance/processors/myproc.py
"""The myproc data processors"""

def shout(data):
    """The shout data processor; will upper case all text and add an exlcaimation point."""
    ...

The data here is a dict containing all loaded data from a route up until this point. If previous data processors loaded anything, it will be present in data. Sandhill always includes the standard view_args key which contains any route variables. Also, all keys arguments set for this data processor call will also be in data.

For our shout() processor, let's say we want to expect a key words, which will contain the data we want to transform with our processor.

def shout(data):
    """The shout data processor; will upper case all text and add an exlcaimation point."""
    return data["words"].upper() + "!"

That's mostly it! Now we could include our custom data processor in a route with this entry in our route's JSON data list:

{
    "name": "loudly",
    "processor": "myproc.shout",
    "words": "This is my statement"
}

And after the data processor runs, Sandhill will have the following in your route's data dict:

{
    "data": {
        ... # other route data as may be appropriate
        "loudly": "THIS IS MY STATEMENT!"
    }
}

Improving your Processor

But what if someone fails to pass in the words key? Right now that would result in a KeyError.

In Sandhill, best practice for data processors is to return None on most failures; that is unless the on_fail key is set in data. In this case, we ought to abort with the value of on_fail.

To assist with this, Sandhill provide the dp_abort() function (short for "data processor abort") which will do most of the heavy lifting for you. Let's rework our method to handle failures.

from sandhill.utils.error_handling import dp_abort

def shout(data):
    """The shout data processor; will upper case all text and add an exlcaimation point."""
    if "words" not in data:
        # Here we choose HTTP status 500 for default, but `on_fail` value will take precedence.
        dp_abort(500)
        # If no `on_fail` is set, None indicates failure, so always return None after a db_abort().
        return None
    return data["words"].upper() + "!"

With that, you have a nicely functioning data processor! For more advanced examples, feel free to peek at the source code of the built-in Sandhill data processors above.

Sandhill Data Processors

Data Processors Included With Sandhill

Common Data Processor Arguments

name - Required

processor - Required

on_fail - Optional

when - Optional

sandhill.processors.evaluate

conditions(data)

sandhill.processors.file

create_json_response(data)

load_json(data)

load_matched_json(data)

sandhill.processors.iiif

load_image(data, url=None, api_get_function=api_get)

sandhill.processors.request

api_json(data)

redirect(data)

sandhill.processors.solr

search(data, url=None, api_get_function=api_get)

select(data, url=None, api_get_function=api_get)

select_record(data, url=None, api_get_function=api_get)

sandhill.processors.stream

response(data)

string(data)

sandhill.processors.string

replace(data)

sandhill.processors.template

render(data)

render_string(data)

sandhill.processors.xml

load(data: dict) -> etree._Element

xpath(data: dict) -> list

xpath_by_id(data: dict) -> dict

Developing a Data Processor

Simple Processor

Improving your Processor

`name` - Required

`processor` - Required

`on_fail` - Optional

`when` - Optional

`sandhill.processors.evaluate`

`conditions(data)`

`sandhill.processors.file`

`create_json_response(data)`

`load_json(data)`

`load_matched_json(data)`

`sandhill.processors.iiif`

`load_image(data, url=None, api_get_function=api_get)`

`sandhill.processors.request`

`api_json(data)`

`redirect(data)`

`sandhill.processors.solr`

`search(data, url=None, api_get_function=api_get)`

`select(data, url=None, api_get_function=api_get)`

`select_record(data, url=None, api_get_function=api_get)`

`sandhill.processors.stream`

`response(data)`

`string(data)`

`sandhill.processors.string`

`replace(data)`

`sandhill.processors.template`

`render(data)`

`render_string(data)`

`sandhill.processors.xml`

`load(data: dict) -> etree._Element`

`xpath(data: dict) -> list`

`xpath_by_id(data: dict) -> dict`