05 December 2008

“Names touch everything…” here too!

Derek Whitehead drew my attention to this entry in the hangingtogether blog. The idea of a Cooperative Identities Hub as a more broadly based name authority file suitable for use by a wide range of data custodians (libraries, archives, museums, repositories, aggregators, publishers) certainly fits well with what this project hopes to do.

It has occurred to us that, because fresh new researchers frequently publish for the first time as a co-author of a paper while still a graduate student, university research repositories will often be the first to see the researcher's name and, as a consequence, be the ones to do the original authority work (and will also be in the best position to gather researcher persona attribute data).

So, if there is data which might not be of immediate interest to a repository manager, but is nevertheless easily accessible and likely to be of use to other institutions later, then we probably should gather it and pass it on.

Progress Report December 2008

It has taken a while to make appointments, but the project is finally underway, albeit in a somewhat cart before horse fashion – the stakeholder requirements analysis will now be done in parallel with at least schema design and some preliminary investigation of name matching and distinguishing algorithms, all of which will be happening through December and January.

We are currently looking at how well EAC-CPF (Encoded Archival Context – Corporate bodies, Persons and Families) might meet our needs after doing a rough comparison of FRAD, RDA, DC, MADS, EAC, FOAF and VCARD against a set of possible attributes and relationships that might be readily available to ARROW repository managers. Rough because most of these are in a state of flux and because our learning time is limited.

EAC-CPF is attractive because it is a rich namespace structured to represent relationships as well as entities and because People Australia is proposing to use it. Once we learn how to code EAC, the next step will be to try to test it by generating some use cases and attempting to render them in EAC.

At this stage, we are not intending to go to the next step of defining an application profile and wrapping our EAC and whatever other vocabulary elements we might need into an RDF structure. It would be a desirable outcome, but we will probably not have time to get that far.

On the application side, we are going to have a look at how the BibApp application might fit into what we are doing – it does seem to have some effective mechanisms for disambiguating and distinguishing names that seem to overlap with what we are doing.