OAI-PMH

What is OAI-PMH?

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a widely used standard that allows a “service provider” (for example, a repository, aggregator, or indexing tool) to collect metadata records from a “data provider” (a journal or repository) using HTTP requests. In simpler terms, OAI-PMH is a structured way to say: “Please send me your metadata in a predictable format so I can index it.”

OAI-PMH does not usually transmit full-text PDFs. Instead, it provides structured metadata—title, authors, abstract, identifiers, publication dates, and other bibliographic fields— so that services can build searchable catalogs and keep them synchronized over time.

Why OAI-PMH matters for AAAI

Accurate metadata harvesting improves discoverability through libraries and aggregators and reduces manual indexing work. It also supports long-term access by enabling repository systems to mirror and validate bibliographic records.

Who uses OAI-PMH harvesting?

OAI-PMH is typically used by institutions and services that maintain large bibliographic collections. Common users include:

  • Institutional repositories that ingest records for faculty publications.
  • National and regional repository networks that aggregate research outputs.
  • Library discovery services that maintain journal/article catalogs.
  • Scholarly aggregation services that harvest metadata from many journals to improve search.
  • Indexing pipelines that regularly re-harvest updates (e.g., corrections) to keep records current.

In practical workflows, an institution may request an OAI endpoint from the journal, configure harvesting in its repository software, and schedule automated updates. When the journal publishes new issues or updates article metadata, harvesters can fetch the latest records without manual intervention.

AAAI OAI-PMH endpoints and basic usage

AAAI provides an OAI-PMH access point intended for harvesting. If you are an indexing partner or repository administrator and need confirmation of the correct endpoint, contact the editorial office with your technical requirements.

Endpoint note

Because journal platforms can vary, AAAI may maintain a canonical OAI-PMH endpoint for stable harvesting. If the journal migrates platforms, the endpoint may be redirected while preserving continuity for harvesters.

Recommended landing page https://www.allergyimmunoljournal.com/index.php/aaai/oai-pmh
Common OAI request style OAI-PMH uses query parameters such as ?verb=Identify or ?verb=ListRecords&metadataPrefix=oai_dc. The exact base URL used for requests may be the same as this page or a dedicated OAI script endpoint depending on platform configuration.
Supported formats (typical) oai_dc (Dublin Core) is the most common baseline. Some systems also support richer formats (e.g., MODS) where configured.

Quick test (Identify)

The fastest way to verify an OAI endpoint is to call the Identify verb. It returns metadata about the repository itself (repository name, base URL, admin email, earliest datestamp, and other protocol fields).

GET [OAI_BASE_URL]?verb=Identify
        

Harvest records (ListRecords)

The most common harvesting action is ListRecords, which returns records in a chosen metadata format.

GET [OAI_BASE_URL]?verb=ListRecords&metadataPrefix=oai_dc
        

Many harvesters use incremental harvesting by date. If the endpoint supports it, you may include from/until parameters:

GET [OAI_BASE_URL]?verb=ListRecords&metadataPrefix=oai_dc&from=2025-01-01&until=2025-12-31
        

OAI-PMH verbs AAAI supports (and how they are used)

OAI-PMH defines a set of “verbs” (request types). A harvester calls these verbs to discover what the repository offers and to fetch records. Below is a practical guide to the verbs and why each matters.

Identify Returns repository-level information and validates that the endpoint is reachable and OAI-compliant.
ListMetadataFormats Shows available metadata formats (e.g., oai_dc). Useful to confirm whether richer formats exist.
ListSets Lists “sets” (sub-collections) if the repository groups records (e.g., by journal section or issue). Not always used.
ListIdentifiers Returns identifiers without full metadata. Useful for lightweight sync or checking update counts.
GetRecord Fetches a single record by identifier. Useful for targeted re-harvesting or debugging specific articles.
ListRecords Fetches full metadata records. This is the primary harvesting method for building searchable catalogs.

Example: GetRecord for a specific item

GET [OAI_BASE_URL]?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:allergyimmunoljournal.com:ARTICLE_ID
        

About identifiers

OAI identifiers follow a repository-defined pattern (often prefixed by oai:). If you are unsure of the identifier format used by AAAI, run ListIdentifiers to see actual identifiers returned by the endpoint.

What metadata can harvesters expect?

OAI-PMH metadata is typically delivered in Dublin Core (oai_dc) fields. While the exact mapping can vary by platform, harvesters typically receive:

  • dc:title — article title
  • dc:creator — authors (sometimes one field per author)
  • dc:subject — keywords or subject terms
  • dc:description — abstract
  • dc:publisher — publisher/journal
  • dc:date — publication date
  • dc:identifier — DOI and/or article URL
  • dc:language — language code
  • dc:rights — license (e.g., CC BY 4.0)

High-quality OAI metadata improves matching in library catalogs and reduces duplicates. AAAI supports this by maintaining clear article landing pages with DOI and citation blocks, and by showing license information on article pages and PDFs.

License and rights metadata

Including license information in harvested metadata helps repositories understand reuse permissions and supports open-access compliance checks. AAAI aims to expose license (CC BY 4.0) consistently in article pages and metadata wherever feasible.

Best practices for repository administrators and indexers

If you manage harvesting for a repository or indexing pipeline, these best practices reduce failures and improve data quality:

  • Start with Identify: confirm base URL, earliest datestamp, and admin email.
  • Harvest incrementally: use date-based harvesting (from/until) rather than full re-harvests where possible.
  • Handle resumption tokens: large repositories paginate results; your harvester should follow resumption tokens until complete.
  • Validate identifiers: if duplicates appear, check whether the endpoint returns multiple identifiers per item (URL + DOI) and configure matching rules.
  • Monitor rights fields: map dc:rights to your repository license display to avoid missing open-access license context.
  • Re-harvest after corrections: schedule periodic refreshes to capture metadata updates (e.g., corrected titles, added DOI, updated author affiliations).

Resumption tokens (pagination)

If a response includes a resumptionToken, you must call ListRecords again with the token to retrieve the next batch. Harvesters that ignore tokens may ingest only a partial set of records.

Troubleshooting and common errors

OAI-PMH endpoints can fail for reasons unrelated to the scholarly content itself (for example, temporary server limits, malformed parameters, or network interruptions). Below are common issues and what to do.

Common issues

  • “badVerb” or “badArgument”: the request parameters are incorrect; check spelling and required fields for that verb.
  • Empty responses: the date range may be too narrow, or the endpoint may require a different granularity (YYYY-MM-DD vs YYYY-MM-DDThh:mm:ssZ).
  • “cannotDisseminateFormat”: the requested metadataPrefix is not supported; run ListMetadataFormats.
  • Timeouts: large harvests may exceed time limits; harvest incrementally and respect resumption tokens.
  • Duplicate or inconsistent author fields: mapping differences between platforms can cause repetition; adjust repository normalization rules.

What to send AAAI if you need help

If you contact the editorial office for technical support, include: the exact URL you requested, the full response/error message, the time of the request (with timezone), and whether you are harvesting with a known tool (e.g., DSpace, EPrints, custom script).

Endpoint stability

AAAI aims to keep OAI endpoints stable. If a platform migration occurs, redirects may be used to preserve access for harvesters. If your harvester does not follow redirects, update your base URL configuration.

Frequently asked questions

Does OAI-PMH provide full text PDFs?

Typically, OAI-PMH provides metadata records (title, authors, identifiers, abstract, etc.). Full text is usually accessed via the article landing page URL or DOI included in the metadata.

Which metadata format should I use?

Most harvesters start with oai_dc (Dublin Core). If you need richer metadata and the endpoint supports it, use ListMetadataFormats to confirm available formats.

How often should I harvest AAAI metadata?

A weekly or monthly schedule is typical for journals. If your use case requires fast updates (e.g., near-real-time indexing), harvest more frequently but use incremental date ranges to reduce load.

What if my harvester shows duplicate records?

Duplicates can occur if your system treats DOI and URL identifiers as separate items. Configure your matching rules to consolidate records by DOI where available and/or by OAI identifier.

Who do I contact for technical issues?

Contact the AAAI editorial office with details of your request and the error response: [email protected].