David Remsen. 2016.
The use and limits of scientific names in biological informatics.
In: Michel E (Ed.) Anchoring Biodiversity Information: From Sherborn to the 21st century and beyond.
ZooKeys, 2016, 550: 207-223.
doi: 10.3897/zookeys.550.9546
http://zookeys.pensoft.net
http://zoobank.org/A812E05B-5DC3-4BE0-9551-79A80A3B99C8
Abstract
Scientific names serve to label biodiversity information: information related to species. Names, and their
underlying taxonomic definitions, however, are unstable and ambiguous. This negatively impacts the
utility of names as identifiers and as effective indexing tools in biological informatics where names are
commonly utilized for searching, retrieving and integrating information about species. Semiotics provides
a general model for describing the relationship between taxon names and taxon concepts. It distinguishes
syntactics, which governs relationships among names, from semantics, which represents the relations between
those labels and the taxa to which they refer. In the semiotic context, changes in semantics (i.e.,
taxonomic circumscription) do not consistently result in a corresponding and reflective change in syntax.
Further, when syntactic changes do occur, they may be in response to semantic changes or in response
to syntactic rules. This lack of consistency in the cardinal relationship between names and taxa places
limits on how scientific names may be used in biological informatics in initially anchoring, and in the
subsequent retrieval and integration, of relevant biodiversity information. Precision and recall are two
measures of relevance. In biological taxonomy, recall is negatively impacted by changes or ambiguity in
syntax while precision is negatively impacted when there are changes or ambiguity in semantics. Because
changes in syntax are not correlated with changes in semantics, scientific names may be used, singly or
conflated into synonymous sets, to improve recall in pattern recognition or search and retrieval. Names
cannot be used, however, to improve precision. This is because changes in syntax do not uniquely identify
changes in circumscription.
These observations place limits on the utility of scientific names within biological informatics applications
that rely on names as identifiers for taxa. Taxonomic systems and services used to organize and
integrate information about taxa must accommodate the inherent semantic ambiguity of scientific names.
CONTENTS Abstract Keywords Introduction Discussion Perfect identifiers Taxonomic synonyms Mitigation of synonyms Homonyms Mitigation of homonyms Mitigation of polysemes Summary Acknowledgements Reference