Skip to content

Helpers

To help in creating your custom DataInterface implementation, oai_repo comes with a number of helpers to assist with common problems.

Helper Functions

These are functions which may prove useful when implementing your your custom DataInterface instance.

apicall_getxml(url: str = None) -> etree._Element

Perform API call to a URL and load the response as XML.

Parameters:

Name Type Description Default
url str

A URL path to call.

None

Returns:

Type Description
_Element

A lxml.etree._Element containing the root of the loaded XML.

Raises:

Type Description
OAIRepoExternalException

when the URL call fails or returns non-200 response.

OAIRepoInternalException

when call to URL does not return valid XML or no URL was provided.

Examples:

loadedXml = helpers.apicall_getxml("https://api.example.edu/record/42")

apicall_querypath(url: str = None, jsonpath: str = None, xpath: str = None) -> str | None

Perform an API call on the given URL and then run either a jsonpath or xpath query, returning the first matching result.

API call results are cached while processing a single OAI request.
Subsequent calls to the same URL will used previous results, without resulting in an additional API call.

Parameters:

Name Type Description Default
url str

The URL to perform an API call to.

None
jsonpath str

A JSONPath query to run on the results from the URL
(must be None if xpath is passed)

None
xpath str

An XPath query to run on the results from the URL
(must be None if jsonpath is passed)

None

Returns:

Type Description
str | None

The matching string value, or None if not found

Raises:

Type Description
OAIRepoInternalException

on invalid URL, invalid query, or wrong API response type.

OAIRepoExternalException

on API call failure, or a non-200 response.

Examples:

# JSONPath
earliest_api = {
    "url": f"{my_solr_url}?fl=dateyear_dt&q=*%3A*&rows=1&sort=dateyear_dt%20asc",
    "jsonpath": "$.response.docs[0].dateyear_dt[0]"
}
earliest = helpers.apicall_querypath(**earliest_api)
# XPath
earliest_url = f"{my_solr_url}?fl=dateyear_dt&q=*%3A*&rows=1&sort=dateyear_dt%20asc&wt=xml"
earliest_query = "/response/result/doc[0]/arr[name=dateyear_dt]/str[0]/text()"
earliest = helpers.apicall_querypath(url=earliest_url, xpath=earliest_query)

bytes_to_xml(bdata: bytes | BytesIO) -> etree._Element

Given a bytes or BytesIO, parse and return an lxml.etree._Element. If passed an lxml.etree._Element, then will return it unchanged.

Parameters:

Name Type Description Default
bdata bytes | BytesIO

The bytes data to parse

required

Returns:

Type Description
_Element

The loaded XML element.

Raises:

Type Description
XMLSyntaxError

On XML parse error

datestamp_long(timestamp: datetime) -> str

Convert a datetime to long form datestamp: YYYY-MM-DDThh:mm:ssZ

Parameters:

Name Type Description Default
timestamp datetime

A Python datetime

required

Returns:

Type Description
str

A long granularity formatted date string

Examples:

from datetime import datetime
from oai_repo import helpers
# Making a YYYY-MM-DDThh:mm:ssZ granularity time string from a datetime
timestr = helpers.datestamp_long(datetime.now())

datestamp_short(timestamp: datetime) -> str

Convert a datetime to short form datestamp: YYYY-MM-DD

Parameters:

Name Type Description Default
timestamp datetime

A Python datetime

required

Returns:

Type Description
str

A short granularity formatted date string

Examples:

from datetime import datetime
from oai_repo import helpers
# Making a YYYY-MM-DD granularity time string from a datetime
timestr = helpers.datestamp_short(datetime.now())

granularity_format(granularity: str, timestamp: datetime) -> str

Format a timestamp according to the OAI granularity and return it.

Parameters:

Name Type Description Default
granularity str

The granularity from OAI (either YYYY-MM-DDThh:mm:ssZ or YYYY-MM-DD)

required
timestamp datetime

A Python datetime

required

Returns:

Type Description
str

A granularity formatted date string appropriate to the granularity passed in

Examples:

from datetime import datetime
from oai_repo import helpers
timestr = helpers.granularity_format("YYYY-MM-DD", datetime.now())

jsonpath_find(data: dict | list, path: str) -> list

Get all matching values for a given JSONPath.

Parameters:

Name Type Description Default
data dict | list

The already loaded JSON data

required
path str

The JSONPath to find

required

Returns:

Type Description
list

A list of matching values

Raises:

Type Description
JSONPathError

On jsonpath failure

Examples:

ids = helpers.jsonpath_find(loaded_json, '$.docs[*].id')

jsonpath_find_first(data: dict | list, path: str) -> any

Get the first matching value for a given JSONPath

Parameters:

Name Type Description Default
data dict | list

The already loaded JSON data

required
path str

The JSONPath to find

required

Returns:

Type Description
any

The matched value, or None if not found

Raises:

Type Description
JSONPathError

On jsonpath failure

Examples:

first_id = helpers.jsonpath_find_first(loaded_json, '$.docs[*].id')

xpath_find(xmlr: etree.Element, path: str) -> list

Get matching values for a given XPath

Parameters:

Name Type Description Default
xmlr Element

The root xml object to query

required
path str

The xpath query

required

Returns:

Type Description
list

A list of matching values

Raises:

Type Description
XPathError

On xpath failure

Examples:

ids = helpers.xpath_find(loaded_xml, "/response/result/doc/str[name=id]/text()")

xpath_find_first(xmlr: etree.Element, path: str) -> any

Get the first matching value for a given XPath

Parameters:

Name Type Description Default
xmlr Element

The root xml object to query

required
path str

The xpath query

required

Returns:

Type Description
any

The matched value, or None if not found

Raises:

Type Description
XPathError

On xpath failure

Examples:

first_id = helpers.xpath_find_first(loaded_xml, "/response/result/doc/str[name=id]/text()")

Transfrorm Class

Apply structured transformations to data using a linear set of rules.

Transform

Given a set of ordered transform rules, use them to transform a string forward either following those rules in order, apply rules in backward order to reverse the transformation.

Parameters:

Name Type Description Default
rules list

A list of rules in forward order. Each rule being a dict with a single key describing the rule type, and a value which is a list of arguments to that rule.

required

Examples:

rules = [
    { "replace": [":", "_"] },
    { "prefix": ["add", "oai:"] },
    { "suffix": ["del", ".edu"] },
    { "case": ["upper"] }
]
val = "abcd:5678:example.edu"
tr = Transform(rules)
val = tr.forward(val)
# val is now "OAI:ABCD_5678_EXAMPLE"
val = tr.reverse(val)
# val is now "abcd:5678:example.edu" again

Rules:

Type Parameters Example
replace [find, replace_with] [":", "_"] (replace all : with _)
prefix [add|del, string] ["del", "oai:"] (remove oai: at start of value)
suffix [add|del, string] ["add", ".id"] (add .id to end of value)
case [upper|lower] ["upper"] (convert value to upper case)
Important

Applying rules in reverse may not always return the original value!

forward(value)

Apply the set of rules to the provided value in original order.

reverse(value)

Apply the set of rules to the provided value in reverse order.