12 June 2009

NicNames & Disambiguation - moving into higher dimensions

In considering disambiguation issues -

"It is a lot like the difference between solids, where the atoms are locked into place, and fluids, where the atoms tumble over one another at random. But right in between the two extremes, at a kind of abstract phase transition called the edge of chaos, you also find complexity: a class of behaviors in which the components of the system never quite lock into place, yet never quite dissolve into turbulence, either. These are the systems that are both stable enough to store information, and yet evanescent enough to transmit it. These are the systems that can be organized to perform complex computations, to react to the world, to be spontaneous, adaptive, and alive." M. Mitchell Waldrop, from Complexity [p. 293]

We are dealing with an area of mathematics called 'hinge theory':

Plastic Hinge Theory covers http://en.wikipedia.org/wiki/Plastic_hinge

The emphasis is on the "plastic rotation [deformation] of an otherwise rigid column connection" - for us the 'rigid column connection' is the key, the identifier, for people in the form of a list of names. As such we are focused on the static/dynamic, the solid/fluid border of identity.

The use of Baysian probabilities introduces a partials perspective as we try to identify the 'whole' but is still focused on a one-dimensional POV and this issue is under consideration whilst at the same time being focused on the more practical implementation of a refined one-dimensional POV methodology; refinement in the form of the metadata schema of NicNames allowing for extended analysis of name associations and so extending current authority control material used in the disambiguation process.

1 comment:

Anonymous said...

Interesting points on extracting data, For simple stuff i use python to get or simplify data, data extraction can be a time consuming process but for larger projects like documents, files, or the web i tried http://www.extractingdata.com which worked great, they build quick custom screen scrapers, extracting data, and data parsing programs