17 June 2010

NicNames is (now) OAI-PMH compliant

A short time ago, the Australian Research Data Commons Party Infrastructure Project (ARDCPIP) met for the first time by teleconference. Among other things, we discussed how the NicNames software might be able to help with the process of managing researcher names in the data curation environment. Previously the NicNames software didn't support OAI-PMH for information exchange. This was discussed at the teleconference as a very useful enhancement for the software in assisting with building capability for the People Australia party ID in the research sector.

NicNames software developer Thomas Rutter reports that the NicNames tool has been updated to bring OAI-PMH support and reporting of deletions in all harvests. The addition of OAI-PMH as a harvesting protocol provides instant compatibility between NicNames and a huge range of existing metadata harvesting services. OAI-PMH for NicNames includes full support for deletions, and metadata is available both in OAI's simplified Dublin Core format and the richer, native NicNames XML format.

New support for OAI-PMH complements the harvesting support already available in the NicNames native API, which has some additional, OAI-PMH incompatible features such as harvesting based on keywords. Both have now been updated to fully support deletions, preventing the need for a full harvest to be conducted in order to discover deleted records.

The updated version of the NicNames software (0.4) can now be downloaded from https://launchpad.net/nicnames/

22 December 2009

Names in Australian repositories: I'd say you want a revolution ...

Have you ever wondered how your colleagues manage the storage and display of author names in their repositories? Well, wonder no more! A few months ago the NicNames Project surveyed Australian repository managers to discover more about the way they store and display names in their repositories. The results are a snapshot of the metadata stored in Australian repositories, and we think they're really fascinating.

For starters, 50 percent of respondents say they record an author's name exactly as it appears on the publication:

Did you expect that? Given how many repository managers are librarians (and therefore schooled in authority control), plus how far we already distort our repositories to meet the requirements of ERA, I'm surprised that well under half of repository managers are not using either the HR name and/or another method of authority control.

Then again, perhaps that's because we only asked about the display of names in the repository. When we asked what other variants were being collected, the figures tipped a little:

Many people are wondering whether the NicNames Project is building a national authority file for researchers. My answer is no. That's a job for someone else. Our brief is to help you find practical ways to manage names in your repositories. And authority files are not practical for IRs. Here's why.

1. Think about what you need to build a traditional authority file. One of the first match points is date of birth. But it's generally not stored for authors in Australian repositories:

I'd love to know why. Is it that you feel it's inappropriate to record the date of birth for living people? Or would you like to record it but the data isn't available to more than 2 of you?

2. Between the absence of dates of birth and the increased trend towards recording authors' names as they appear on publications, it looks as though we're not storing much of what's expected for standard authority files. This sounds to me like resounding support for the idea that repositories are moving away from conventional attitudes of 'control' and 'authority' towards a more flexible idea of versions of names appearing within a particular context.

To give you an example, here's something that repositories store that other (more controlled) systems don't:

3. FORs may not be a perfect classification scheme, but they do provide a controlled vocabulary of Australasian research disciplines. And when they're read in conjunction with details about co-authors (recorded on every publication) and affiliation (recorded in over half of Australian repositories), they tell us a lot about a person's research identity.

And this may well be far more valuable to help us tell people apart in a scholarly publishing context than their dates of birth. Any thoughts, anyone?

'You say you got a real solution, well you know ... we'd all love to see the plan'
- Lennon/McCartney

28 October 2009

How does your organisation differentiate between two people with the same name?

I was flying back to Melbourne after visiting the other NicNames partners last week, when a curiously topical thing happened to me on board the plane.

After mistakenly giving two passengers the same boarding pass, thereby allocating them the same seat (a physical impossibility), it became clear as they introduced themselves to the flight attendants that both unfortunate passengers had exactly the same name - first and last. It wasn't a particularly common name, but it was a coincidence.

As the aircraft was entirely full, there was nowhere for one of the two same-named passengers to sit, so it delayed the flight for around 20 minutes as flight attendants and the second passenger walked up and down the aisles looking a bit stressed.

It's an example of the sort of thing that can go wrong when the only identifier you have for telling people apart is their name. The airline (or whoever printed up that second boarding pass for that "same" person) suffered from one of the two causes of problems NicNames aims to prevent: assuming two dealings with people with the same name mean they are the same person.

The two passengers presumably had a booking reference number, in addition to their name, to identify themselves to check-in staff (or machines). Presumably the mistake happened when someone looked up the second passenger by name, and found the other passenger's record, already with an allocated seat. They then went on to fill every other seat in the aircraft.

I don't know what happened to the extra passenger in the end - whether he got a free upgrade to business class, or was kicked off the plane. However, a similarity can be drawn to the experience of searching through citations in a repository only to find two people's work muddled in together under the same author heading.

19 October 2009

Problems with identity: why we need to be careful

There are many reasons why it's important to be able to match or disambiguate the names of people publishing in the scholarly literature. Some are administrative and involve better back-end management of names in institutional repositories. Some relate to users and how the display of name variants in repository interfaces can help their search or even confuse them further.

For researchers, there are a whole series of consequences of not managing publication names. For starters, when a database can't match J Smith and Jane Smith, citation counts and the metrics based on them become distorted. Citations belonging to a single person but distributed across name versions can be called 'split citation'.

Then there's 'mixed citation', which happens when work by two people with the same name is jumbled together. There's nothing worse than someone else taking credit for your masterpiece (or, for that matter, having to take the rap for someone else's ill-conceived ideas ...). I've just found a recent article from Nature that highlights a particularly dramatic case of 'mixed citation'.

Surgeon Liu Hui had a common name ... those of us with common names usually consider this a curse. But Dr Hui wasn't worried. In fact, he turned the ambiguity of his identity to his advantage. He added the publications of all the other Liu Huis he could find to his CV to make it look better. And it worked.

For those who believe this kind of academic fraud is always going to be found out, you're right. Hui was dismissed in 2006. But not before he became Assistant Dean at Tsinghua University on the back of his impressive publication record.

Moral of this story: name management is very, very important.

15 October 2009

Progress Report - 15 October 2009

We haven’t had any monthly progress reports in a while, so I have prepared a brief progress report to update everyone on the project status. As we move towards the last phase of the project, everybody has been working hard on the project outputs the team has defined for the NicNames project. The status of these outputs is listed below.

1. Project Plan
This has been finalized to reflect any changes to the project outcomes.

2. Review of global developments classified by possible use
A review has been carried out and an updated literature review report is being completed.

3. Stakeholder requirements analysis
Requirements of key stakeholders have been identified and documented.

4. Institutional analysis
Current methods of name authority at key institutions have been identified and documented.

5. Analysis of relevant schema and standards
Current and developing standards, schema and mapping relating to names have been analyzed. A report on preferred schema, standards and mappings for the project is being completed.

6. System specification
Requirements for the prototype application and tools have been documented. These identify the functional requirements for the NicNames project, formally set out system use cases and define the agreed scope of work to meet the requirements.

7. Guidelines toolkit
A usability study has been completed, and the outcomes are being used to generate a set of procedures for dealing with personal names in institutional repositories. Documentation for the prototype application is being developed.

8. One or more open source applications/tools
Development of a prototype NicNames application and supporting tools has progressed well and a large part of the web interface has been completed.

9. Implementation plan
Site visits for the implementation of the prototype application at partner institutions has been scheduled for the week of 19/10/2009. A draft implementation plan has been prepared for the site visits.

10. Project evaluation report with recommendations for further action
11. Release Plan
The evaluation report and release plan will be formally prepared as we move further along in the final phase of the project.

08 October 2009

All quiet on the NicNames front?

It has been a little quiet over here lately. At the moment, I'm writing a revised literature review on names. The JISC landscape review was a great summary of the names environment in June 2008, but it has been a busy year in our area and we'd like to share some of the more interesting new literature with you as well.

The JISC Names Project released its Phase One final report in July. This partnership between the University of Manchester and the British Library is building a national authority file for the whole of the UK. It's an ambitious task, and we salute them for it. They've already released a prototype of their web service; you can have a play here (I did).

Also in July, Peter Sefton from the CAIRSS Project wrote a blog post about how a NicNames web service might interact with People Australia (I particularly liked the picture of the happy repository manager and hope that will be me soon ...)

The scholarly literature is also reflecting some very interesting developments. I summarised Dorothea Salo's paper on the absence of name authority control in institutional repositories in an earlier post. It's exciting to see that the big journals are starting to weigh in on the action, too. If 2008 will be remembered as the year The Lancet published an article about two clinical researchers who had decided to become numbers, 2009 was the year Science started to care about names. Both articles discussed the merits of the ResearcherID product from Thomson Reuters, which they described as 'ready and available now'. (I'm not so sure about that ...)

And finally, a few weeks ago, Ernesto Ruelas Inzunza from Dartmouth published what looks like a very interesting paper, 'Writing and citing 'international' names'. As soon as I can get my hands on a copy, I'll let you know all about it.

Interested in more literature about names? Feel free to contact Rebecca.

18 September 2009

NicNames Project Plan
The draft project plan has been finalised to reflect any changes to the project outcomes as the project has progressed and the requirements have been refined, and to reflect the new completion dates of the project. This has been released as The ARROW NicNames Project Project Plan Version 1.1.