23 January 2009

The logic of persistent identifiers

“Authority control is the process of grouping multiple terms for the same entity into a single record for the purposes of disambiguation and collocation”1 and has a long history in the library world. But, because of that long history, some practices have accumulated which are not appropriate in a digital context.

In particular, the authorised (form of name) heading concept is an artefact of card catalogues, which was used as a mechanism to collocate entries for all works (or more precisely FRBR group 1 entities: Work, Expression, Manifestation, Item) by a named entity (more precisely a FRBR group 2 entity: a person or a corporate body), including those created under variant forms of name. See and See also entries (tracings) were then used to refer to the main entry authorised form.

The authorised form of name used in this way also, confusingly, concatenates a particular name form with collocation.

In a digital environment, we don’t need an authorised form of name because any form of name can be used to link to all works by the named entity. But, we do need some form of persistent identifier (PID) to identify the group 2 entities to which the variant names and group 1 entities can be linked.

That PID could be in the form of a URI which links to information about the group 2 entity, but it should be noted that that again concatenates two logically distinct functions; that is, (a) providing a linking function between group 2 (named) entities, their names and works (group 1 entities) and (b) providing information about the group two entity.

In a local system, the PID could be as simple as any non-meaningful (that is, not linked to or derived from any data in the record) (most likely numeric) string. As long as suitable policies2, such as those developed by the PILIN project, are in place and resources provided to implement the policies, then such PIDs will work for local purposes.

However, in a situation where there is a need to identify a group 2 entity beyond the local system, as is the case with the NicNames Project, a higher level PID is required. This is because we are now trying to link namedEntityA@Swin with namedEntityA@UNSW with namedEntityA@UNew. That is, an Australian researcher may have works deposited at any of a number of Australian research repositories and we want to be able to identify both the works and any authority data not held locally.

This is where an educational or national name identification service, such as the National Library of Australia's People Australia service, could play an important role.

If the first repository to generate authority data for a researcher submits it to People Australia, a PID could be assigned for that researcher which other repositories could then use when incorporating the authority data into their own systems. If works (group 1 entities) were also linked to the authority data, then, in principle, it should be possible to easily find all works by that researcher, in whatever repository they happen to reside.

The implications of this logic are that each repository creates authority data for new researchers as they deposit work into the repository. That authority data, including any attached works and any relevant entity attributes, is submitted to People Australia, who assign a PID which is later added to the local record.

When the researcher changes institution and deposits material in that institution’s repository, the authority data is retrieved from People Australia and incorporated into the local system complete with the already assigned PID. The new work and any further attributes, such as the new affiliation, is then added to the authority data and resubmitted to People Australia.

It should then be possible, in principle, to incorporate a metasearching component into repository searches which will query People Australia to retrieve all works by a given researcher.


  1. Norrish, Jamie (2007). EATS: an entity authority tool set. http://researcharchive.vuw.ac.nz/handle/10063/220
  2. Nicholas, Nick, Ward, Nigel and Blinco, Kerry (2009). A policy checklist for enabling persistence of identifiers. D-Lib Magazine. 15 (1/2). http://www.dlib.org/dlib/january09/nicholas/01nicholas.html

No comments: