12 June 2009

NicNames Valet interface etc

An example:

NicNames as a webservice

The current perspective is of NicNames as a webservice and as such supplying an API set allowing for submission of names and the extraction of names and associated metadata. The standard set of DB maintenance methods (Add, Edit, Reports etc) is supplied together with extensions that enable the tie of the service to a web application (e.g. Valet) supplying resolution of names and metadata usable to populate application fields. As such there are two forms of access, via direct access to NicNames or via calls to NicNames through such as Valet or repository management applications (E.g. VITAL) - the latter requiring customisation to integrate NicNames with the application.

Since the Valet environment covers self-submission of data to the repository there are (a) some restrictions on access to NicName methods and (b) requirements for repository staff to later validate name entries when input from a Valet environment (The X-Files element - trust no one!). The security also covers harvesting attempts where NicNames data can be extracted (OAI-PMH format) covering the name(s) and a defined set/subset of existing data - the definition as set down by the associated repository manager(s) and so limiting access to such as staff IDs etc that could be exploited as part of identity theft etc. but are essential for disambiguation methods.

As a webservice NicNames is dominantly passive; population of the DB with existing names from the repository done in the form of repository staff extracting data into an XML file and submission of that file to NicNames. Once data has been added, any additional data defined when setting up the NicNames schema is required to be added to the system. The amount of data is determined by the repository manager or else one can accept the default schema that is comprehensive in its coverage of data usable to aid in the disambiguation process.

The simplicity of the approach i.e. a webservice that enables the 'transcending' of current authority control data (as MARC format etc), hides the complexity in use of that data to disambiguate names where the essential feature of NicNames is in the speed and precision achieved in the disambiguation focus. The additional benefits include access to additional metadata beyond their use in resolving ambiguities.

NicNames & Disambiguation - moving into higher dimensions

In considering disambiguation issues -

"It is a lot like the difference between solids, where the atoms are locked into place, and fluids, where the atoms tumble over one another at random. But right in between the two extremes, at a kind of abstract phase transition called the edge of chaos, you also find complexity: a class of behaviors in which the components of the system never quite lock into place, yet never quite dissolve into turbulence, either. These are the systems that are both stable enough to store information, and yet evanescent enough to transmit it. These are the systems that can be organized to perform complex computations, to react to the world, to be spontaneous, adaptive, and alive." M. Mitchell Waldrop, from Complexity [p. 293]

We are dealing with an area of mathematics called 'hinge theory':

Plastic Hinge Theory covers http://en.wikipedia.org/wiki/Plastic_hinge

The emphasis is on the "plastic rotation [deformation] of an otherwise rigid column connection" - for us the 'rigid column connection' is the key, the identifier, for people in the form of a list of names. As such we are focused on the static/dynamic, the solid/fluid border of identity.

The use of Baysian probabilities introduces a partials perspective as we try to identify the 'whole' but is still focused on a one-dimensional POV and this issue is under consideration whilst at the same time being focused on the more practical implementation of a refined one-dimensional POV methodology; refinement in the form of the metadata schema of NicNames allowing for extended analysis of name associations and so extending current authority control material used in the disambiguation process.