Skip to content

Implementation Classes

The DataInterface Class

Class in which all required OAI data retrieval actions must be implemented. The instantiated instance of this class is then passed to the OAI repository.

Attributes:

Name Type Description
limit int

Max number of results to return per request for ListSets, ListIdentifiers, ListRecords

get_identify() -> Identify

Create and return an instantiated Identify object.

Returns:

Type Description
Identify

The Identify object with all properties set appropriately

get_metadata_formats(identifier: str | None = None) -> list[MetadataFormat]

Return a list of metadata prefixes for the identifier. If no identifier identifier is passed, then list must contain all possible prefixes for the repository.

Parameters:

Name Type Description Default
identifier str | None

An identifer string

None

Returns:

Type Description
list[MetadataFormat]

A list of instantiated MetadataFormat objects with all properties set appropriately to the identifer. If identifier is None, then list of all possible MetadataFormat objects for the entire repository.

get_record_abouts(identifier: str) -> list[lxml.etree._Element]

Return a list of XML elements which will populate the <about> tags in GetRecord responses.

Parameters:

Name Type Description Default
identifier str

A valid identifier string

required

Returns:

Type Description
list[_Element]

A list of lxml.etree.Elements to populate <about> tags for the record.

Important

oai_repo will wrap the response with a <about> tag; do not add it yourself.

Note

If you implement get_records_abouts, you may not need this method implemented. By default, get_records_abouts is the only method which calls get_record_abouts.

get_record_header(identifier: str) -> RecordHeader

Return a RecordHeader instance for the identifier.

Parameters:

Name Type Description Default
identifier str

A valid identifier string

required

Returns:

Type Description
RecordHeader

The RecordHeader object with all properties set appropriately.

Note

If you implement get_records_header, you may not need this method implemented. By default, get_records_header is the only method which calls get_record_header.

get_record_metadata(identifier: str, metadataprefix: str) -> lxml.etree._Element | None

Return a lxml.etree.Element representing the root element of the metadata found for the given prefix.

Parameters:

Name Type Description Default
identifier str

A valid identifer string

required
metadataprefix str

A metadata prefix

required

Returns:

Type Description
_Element | None

The lxml.etree.Element for the requested record metadata, or None if record has no metadata for provided prefix.

Important

oai_repo will wrap the response with a <metadata> tag; do not add it yourself.

Note

If you implement get_records_metadata, you may not need this method implemented. By default, get_records_metadata is the only method which calls get_record_metadata.

get_records_abouts(identifiers: list[str]) -> list[list[lxml.etree._Element]]

Return a list of XML elements which will populate the <about> tags in GetRecord responses.

Parameters:

Name Type Description Default
identifier list

A list of valid identifier strings

required

Returns:

Type Description
list[list[_Element]]

A list of lists, each being the lxml.etree.Elements to populate <about> tags for

list[list[_Element]]

the record in the first list.

Important

oai_repo will wrap each response with a <about> tag; do not add them yourself.

Note

Implementing this function in your DataInterface is optional. You may want to implement a custom version if pulling record metadata is individually slow and could be accomplished faster in bulk.

get_records_header(identifiers: list[str]) -> list[RecordHeader]

Return a list of RecordHeader instances for the identifiers.

Parameters:

Name Type Description Default
identifier list

A list of valid identifier strings

required

Returns:

Type Description
list[RecordHeader]

A list of the RecordHeader objects with all properties set appropriately.

Note

Implementing this function in your DataInterface is optional. You may want to implement a custom version if pulling record headers is individually slow and could be accomplished faster in bulk.

get_records_metadata(identifiers: list[str], metadataprefix: str) -> list[lxml.etree._Element | None]

Return a list of lxml.etree.Element representing the root elements for the metadata found for the requested prefix and identifers.

Parameters:

Name Type Description Default
identifiers list

A list of valid identifer strings

required
metadataprefix str

A metadata prefix

required

Returns:

Type Description
list[_Element | None]

list containing the lxml.etree.Element for each requested record metadata, or None for records which have no metadata for provided prefix.

Note

Implementing this function in your DataInterface is optional. You may want to implement a custom version if pulling record metadata is individually slow and could be accomplished faster in bulk.

get_set(setspec: str) -> Set

Return an instatiated OAI Set object for the provided setSpec string.

Parameters:

Name Type Description Default
setspec str

a setSpec string

required

Returns:

Type Description
Set

The Set object with all properties set appropriately, or None if the setspec is not valid or does not exist.

is_valid_identifier(identifier: str) -> bool

Determine if an identifier string is valid format and exists.

Parameters:

Name Type Description Default
identifier str

A string to check for being an identifier

required

Returns:

Type Description
bool

True if given string is an identifier that exists.

list_identifiers(metadataprefix: str, filter_from: datetime = None, filter_until: datetime = None, filter_set: str = None, cursor: int = 0) -> tuple

Return valid identifier strings, filtered appropriately to passed parameters.

Parameters:

Name Type Description Default
metadataprefix str

The metadata prefix to match.

required
filter_from datetime

Include only identifiers on or after given datetime.

None
filter_until datetime

Include only identifiers on or before given datetime.

None
filter_set str

Include only identifers within the matching setSpec string.

None
cursor int

position in results to start retrieving from

0

Returns:

Type Description
tuple

A tuple of length 3:

  1. (list) Valid identifier strings for the repository, filtered appropriately, or None if no resuptionToken is needed.
  2. (int|None) The completeListSize for a resumptionToken or Null to not send.
  3. (Any|None) An str()-able value which indicates the constant-ness of the complete result set. If any value in the results changes, this value should also change. A changed value will invalidate current resumptionTokens. If None, the resumptionTokens will only invalidate based on reduction in in completeListSize.

list_set_specs(identifier: str = None, cursor: int = 0) -> tuple

Return a list of setSpec string for the given identifier string if provided, or the list of all valid setSpec strings for the repository if no identifier is None.

Parameters:

Name Type Description Default
identifier str

a valid identifier string

None
cursor int

position in results to start from

0

Returns:

Type Description
tuple

A tuple of length 3:

  1. (list|None) List of setSpec strings or None if the repository does not support sets, or None if no resuptionToken is needed.
  2. (int|None) The completeListSize for a resumptionToken or Null to not send.
  3. (Any|None) An str()-able value which indicates the constant-ness of the complete result set. If any value in the results changes, this value should also change. A changed value will invalidate current resumptionTokens. If None, the resumptionTokens will only invalidate based on reduction in in completeListSize.

Classes Returned by DataInterface Methods

Identify dataclass

The info needed for the Identify verb. In your DataInterface.get_identify_instance() method create an instance of this class, set appropriate data, and return it.

Attributes:

Name Type Description
repository_name str

The name of the OAI repository

base_url str

the base url for this repository

admin_email list

a list of email addresses, cannot be empty

earliest_datestamp str | datetime

a string in the granularity format or a datetime object

deleted_record str

OAI deleted record value, one of no, persistent, transient

granularity str

OAI granularity, either YYYY-MM-DDThh:mm:ssZ or YYYY-MM-DD

compression list

compression to be available (typically left empty)

description list

can be bytes data or a pre-loaded lxml Element

Examples:

ident = oai_repo.Identify()
ident.repository_name = "My Repo"
ident.base_url = f"https://example.edu/oai"
ident.deleted_record = "no"
ident.granularity = "YYYY-MM-DDThh:mm:ssZ"
ident.compression = []
... # remaining attributes

MetadataFormat dataclass

Class to define fields necessary for an OAI metadata format. Your definition of the DataInterface.get_metadata_formats() method should return a list of these.

Attributes:

Name Type Description
metadata_prefix str

A metadataPrefix string

schema str

The schema for the metadata

metadata_namespace str

The namespace for the metadata

Examples:

mdf = oai_repo.MetadataFormat(
    "oai_dc",
    "http://www.openarchives.org/OAI/2.0/oai_dc.xsd",
    "http://www.openarchives.org/OAI/2.0/oai_dc/"
)

RecordHeader dataclass

Class to define a record header for an identifier. Your definition of the DataInterface.get_record_header() method should return one of these.

Attributes:

Name Type Description
identifier str

The OAI identifier

datestamp str | datetime

The datestamp for when this record was created or last modified

setspecs list[str]

A list of setspec strings this recdord is part of

status str

The optional OAI status

Set dataclass

Class to define fields for an OAI set. Your definition of the DataInterface.get_metadata_formats() method should return a list of these.

Attributes:

Name Type Description
spec str

The setspec string

name str

The name associated with the setspec

description list

A list of lxml.etree.Elements to populate <setDescription> tags for the set.