International Symposium "Data Retrieval Systems in Biodiversity Research"

International Symposium
"Data Retrieval Systems in Biodiversity Research"

St. Petersburg, Russia, May 22 - 28, 1999

ABSTRACTS

Computerized Identification & Databases of Characters

A. L. LOBANOV

Computerized identification systems in zoology and botany - present state and perspectives

Biological identification is an applied field of systematics dealing with the theory and practice of the diagnostic keys construction. It was split off, as a separate science, in the beginning of seventies, during the first period of activity in development of the computerized identification methods. In 1973 the Symposium "Biological Identification with Computers" was held in Cambridge and the book of the Symposium Proceedings (Pankhurst, 1975) has been published. For a long years the book was a manual for scientists working in this field situated on the border of biology and computer science. Only when PCs have become the usual tools in science, the convenient softwares (and not only the theoretical articles) appeared for dialogue identification and the automatic construction of the biological keys. International Conference "Computer-based Species identification" held at Canterbury, UK, December 1996, was a significant event in the development of the biological diagnostics. Conference was devoted to 21-anniversary the Cambridge Symposium. As a participant of the 30-years evolution of the computerized keys, I can evaluate its results and perspectives. Two stages of the evolution can be distinguished: 1) stage of the increase of the diversity of the computerized keys, and 2) the stage of the subsequent convergence. The first stage caused by the unequal PC possibilities of biologists in different countries, diversity of diagnostic programs varied from punch cards to the developed interactive keys. During the last 15 years there is an orientation to the unified PCs. Now the center of the diagnostic programs diversity is moved to interface, data formats and the use of computerized images. Algorithms of computerized keys are transformed to convergence and reach the optimal type that can be characterized as : "multientry polyhotomous dialogue step-by-step computerized key, with the active use of images of taxa and characters, automatic evaluation and range-making of characters at each identification step, and with the set of tools to increase the reliability of identification". Of special interest are the keys operating the Internet advantages. Analysis of the last achievements of the computerized diagnostics takes an opportunity to conclude that the modern interactive identification systems can compete with the best traditional "paper" publications; computerized diagnostic systems have a lot of advantages over the traditional issues in efficiency, accessibility for amateurs, and in identification reliability. Support: RFBR, grants 96-07-89086 and 99-07-90315.

Zoological Institute RAS, St. Petersburg, Russia, Tel. (812) 3281212, E-mail all@aster.zin.ras.spb.ru

---------------------------------------

A. V. SMIRNOV & A. V. GOODKOV

WWW-homepage "Interactive guide to gymnamoebae"

(http://now.ifmo.ru/amoebae.htm)

(Java-Script)

A new interactive key for identification of gymnamoebae (Lobosea, Gymnamoebia) is elaborated and is being realised as a WWW-homepage. The key comprises of iconographic identification tables accompanied with the most important diagnostic features for identification of genera and up-generic taxa of amoebae and detailed species descriptions with multiple illustrations and references to related literature for sub-generic identification. Iconographic tables are compiled using the conception of the morphotypes of gymnamoebae (Smirnov, Goodkov, 1999). The construction of the key allows to identify amoebae with the appropriate level of accuracy up to any taxonomic level or up to the morphotype (the latter is important if one has only few or single observed specimens), depending on the amount of available material. The last advantage is especially important for protozoan ecologists, who usually do not have enough material for reliable identification of gymnamoebae using existing keys (Page, 1983; 1988; 1991). The key is realised as a set of HTML-pages controlled by embedded Java-Script scenarios. It may be used under Netscape Navigator 3. 0 or higher shell in on-line mode or as a local copy. The homepage includes comprehensive help-pages with an information on the principles of the key organisation, conception of the morphotypes and on the exploitation of the "Interactive guide to gymnamoebae".

St. Petersburg State University: Department of Invertebrate Zoology, Fac. of Biology & Soil Sci., and Lab. of Invertebrate Zoology, Biological Research Institute; St. Petersburg, Universitetskaja nab. 7/9, 199034, Russia. Tel. (812)3289688.
E-mail smirnov@as2187.spb.edu and good@good.usr.pu.ru

---------------------------------------

M. B. DIANOV & A. L. LOBANOV

Biological diagnostic system BIKEY8 for Windows

(http://www.zin.ru/projects/pickey)

(Borland C++ Builder, MS FoxPro)

BIKEY (Biological Identification KEYs) is one of the oldest computer diagnostic system. The first version of BIKEY software was worked out by A. Lobanov for primitive computers in 1974. Later this system has been improved and reconstructed for modern computer platforms. The BIKEY6 and BIKEY7 versions were able to use both textual descriptions and digital images - though they were developed on MS-DOS platform only. However, these versions are the kernel of some computers keys to several groups of animals. Those keys were designed in Zoological Institute and "DIALOBIS EDITION" in Germany. Development of the latest version of BIKEY for Windows'95/98/NT platform is finishing now. New version provides the management of taxa and characters without restriction of their number. Number of character states is increased up to 16. As compared to previous versions of BIKEY (with firmly built-in diagnostic algorithm) it's possible now to use probabilistic algorithm for diagnosis. Therefore, all taxa are not deleted from available taxa list during identification. The possibility of multiple choice of any states of character is included. As the previous BIKEY versions, BIKEY8 is based on the standard DBF database file format. The most attractive part of BIKEY8 package is the PICKEY8 interactive dialogue identification system. New version of PICKEY inherits our old interface with reduced number of controls giving maximum manageability of screen area for the best image presentation. This feature makes PICKEY keys accessible for any users and biology experts. PICKEY8 has got an advantage during identification over analogous keys. Every program screen is automatically illustrated. User has also a possibility of finishing the identification only by image selection without character recognition. Support: RFBR, grants 96-07-89086 and 99-07-90315.

Zoological Institute RAS, Universitetskaya emb. 1, St. Petersburg 199034 Russia, Tel. (812) 3281212, E-mail mix@zin.ru

---------------------------------------

A. RYSS

Computerized identification of nematodes

(MS Excel 7, MS Access 7 and BIKEY7)

MS EXCEL 7, MS ACCESS 7 and BIKEY7 (^(r) Lobanov & Dianov, 1996) were used to construct keys to identify nematodes. Principle of identification is to filter the taxa database by character states. At every step of identification the user can choose any character. EXCEL has the priority to operate the quantitative characters. Datasheets includes the minimum and maximum values for every character. Filter and Advanced Filter commands take an opportunity to filter the database by range (from minimum to maximum) . Qualitative characters can be used for identification only if they can be arranged in a row with one direction. ACCESS has all facilities of EXCEL, identification can be done by Advanced Filter. In addition, all qualitative characters (up to 9 character states) and commands Filter by Selection and Filter by Form can be used for identification. ACCESS DB can serve as the reference system for synonymy, hosts, distribution and bibliography of taxa. BIKEY7 has built-in algorithm which minimizes the number of identification steps. At each step DB synthesizes a new sequence of characters according to their identification values. User can use any character, but the number of identification steps can increase, if DB recommendations are neglected. BIKEY uses images as identification tools. Computerized keys for plant parasitic nematodes (to genus level) and for Pratylenchidae genera (to species level) are presented.

Zoological Institute RAS, Universitetskaya emb. 1, St. Petersburg 199034 Russia, Tel. (812) 3280611, Fax (812)552 6435,
E-mail alex@ryss.spb.ru

---------------------------------------

V. PYANKOV, L. IVANOVA, A. KONDRATCHOUK, L. IVANOV & O. DZUBENKO

Data base on quantitative characteristics of leaf mesophyll structure in plants of different climatic zones

(MS Excel 7; MS Access 7)

Data base (DB) of quantitative characteristics of leaf mesophyll structure in more then 1000 plant species from differnt climatic zones inhabiting the territory of the Fomer Soviet Union and Mongolia was created. Plant species from the Arctics - Wrangel Island (30 species), Subarctics - the Polar Urals (more then 100 species), boreal and forest-steppe zones (more then 300 species), Central Asian and Mongolian deserts and semideserts (200 species), high mountain plants of West and East Pamirs (more then 300 species) are presented in DB. DB includes the information on the main characteristics of leaf structure: area and thickness, cell and chloroplast size, chloroplast and cell amount per leaf area unit, chloroplast number per cell, some integral indexes: totall surface of mesophyll cells (Amez/A) and tot all surface of chloroplasts (Achl/A) per leaf area unit. Characteristics of different tissue types (palisade, spongy for C-3 and mesophyll, bundle sheath for C-4 species) are given for the plants with different mesophyll types. DB also includes the ecobiological data of species: life form, ecobiomorph, type of ecological strategy, chorotype and others. DB is constructed in MS Excel 7 and MS Access 7. Now DB is used in the comparative ecophysiological study for the investigation of plant adaptations to the main ecological factors, including environmental and anthropogenic stress. DB is also used for the creation of functional classification of plants, identification of plant types in boreal and arcto-alpine regions for the aims of global ecological monitoring and prognosing the vegetation under climatic changes. Support: RFBR, grant 97-04-49900 and Program "Universities of Russia", grant 454.

Ural State University, Lenina 51, Ekaterinburg 620083 Russia. Tel. (3432) 613124, Fax (3432) 557401.
E-mail Vladimir.Pynkov@usu.ru

---------------------------------------

L. IVANOVA ¹ , L. ZHUKOV ² & V. PYANKOV ¹

Use of the computer neural network for the identification of plants from different biological groups based on structural characteristics of photosynthetic apparatus

Quantitative parameters of leaf mesophyll characterize its functional activity and specificity in different eco-biological groups. Method "Mesostructure of photosynthetic apparatus" elaborated in the Ural State University (Mokronosov, 1978) allow to determine more then 20 structural parameters: cell and chloroplast size, their amount in leaf area unit etc. The analysis of a great amount of species from any botanical-geographical zone based on these parameters is rather difficult because of the high natural heterogenity (life forms, ecological groups, type of mesophyll structure etc.). One of the possible approaches for the verification of classification models is the use of computer neural network that are being intensively developed now. The data base was created according to 30 structural characteristics. It includs 195 plant species from the boreal zone. Plants were divided into 3 groups according to the type of mesophyll simmetry: homogenous, dorsoventral, isopalisade. Neural simulator "MultiNeuron" (elaborated by Neurocomp, Krasnoyarsk CC, Siberian Brunch of RAS) were used for training and identification of plants from different topological groups. Training by means of neural network with tutor was done three times in groups of 97-98 random species. The results of training were used to the test of the rest 97-98 species. In 90% of cases the type of mesophyll was accurately determined only based on quantitative characteristics of photosynthetic tissues. We suggest to use this approach to identify the functional types of plants which reflect the combination of ecological and morphological features. Support: RFBR, grant 97-04-49900 & Program "Universities of Russia", grant 454.

¹Ural State University, Lenina 51, Ekaterinburg, 620083, Russia, Tel.3432 613124, Fax 3432 557401.
E-mail: Vladimir.Pynkov@usu.ru
² Siberian State Technological University, pr. Mira 82, Krasnoyarsk, Russia. E-mail it@far.sibstu.kts.ru

---------------------------------------

N. PAHORUKOV, L. SALEHOVA & J. SHATALOVA

Creation of the computer identification system for the Mediterranean fishes

The intermediate state of the project of computer identification system for Mediterranean fishes is described. The system permits to establish the appurtenance of a specimen to one of 151 fish familes. The computer identification system was created on the basis of empty expert system TAXEX. The base of the identification is system of characters. The analysis of publications and the expert knowledge are assumed as a basis of the system of characters. Advantage of the computer identification system is the possibility of visual presentation of characters. Identification of an object is carried out by means of dialog with the user. The pictures with possible variants of the fragment of definable organism is proposed to the user, who has to define the figure that corresponds to the object in the best way. Next question-frame is proposed according to the obtained answer. In case of need user may address to the dictionary of biologic terms. This identification system gives very good results when used by unprofessional users and students. It may be used as a training system. The work over project is continuing. It is supposed that the whole system to identify the Mediterranean fishes to the species level will be finished in 2001.

Institute of Biology of the Southern Seas, Ukrainian Academy of Sciences, Sevastopol, 335011, Ukraine, Tel. 0692-525642,
E-mail zalex@ibss.iuf.net

---------------------------------------

E. BUTAKOV & S. LELEKOV

Use of the informatics technologies methods for identification of biological objects

Subject of research: methods of representation of taxonomic information and data about biological objects. Aim of research: to work out the methods and tools of representation, storage and distribution of knowledge about biological objects, their use in educational and scientific programs. Empty expert system TAXEX is created for identification of biological objects. A few computer identification systems are made on the TAXEX basis. Their efficacy in scientific activity and education is examined. Such computer identification systems can be tools of revision of accumulated knowledge in systematics and taxonomy. The thesis that bases of knowledge represent the necessary step for a conversion of the biology from descriptive science to exact one is proposed. Alternative ways of representation and distribution of the knowledge about biological objects are discussed, for example, working-out of hard-copies of a computer identification systems that are a projection of possibilities of a computer on possibilities of a book. The TAXEX based computer identification system of Gastropoda of the Black Sea is given.

Institute of Biology of the Southern Seas, Ukrainian Academy of Sciences, Sevastopol, 335011, Ukraine. Tel. 0692-525642,
E-mail zalex@ibss.iuf.net

---------------------------------------

V. D. CHUHCHIN & M. B. CHERKASOVA

Computer identification of species Bivalvia of the Black Sea

Computer identification system of species Bivalvia elaborated on the TAXEX basis is the system of 50 identification keys for identification of 74 species. These species belong to 5 large orders of Bivalvia: Heterodonta, Protobranchia, Arcoida, Anysomiaria, Desmodonta. During our study a revision of of the all Bivalvia species systematics was made. Data of knowledge of the Bivalvia computer identification system includes the information on morphology, anatomy, biology, ecology, zoogeography & taxonomic system represented by scheme of the higher taxa classification and descriptions of all taxa. Part of the biological and ecological information is represented in a table form for each Bivalvia species, namely an information on feeding, distribution, occurrence, zoogeographical groups, living forms, distribution on grounds and depths. Glossary of data base describes 211 terms. All system elements are well illustrated (190 pictures). Bibliography: 22 references.

Institute of Biology of the Southern Seas, Ukrainian Academy of Sciences, Sevastopol, 335011, Ukraine, Tel. 0692-525642,
E-mail zalex@ibss.iuf.net

---------------------------------------

K. BAIKOV, A. ZVEREV & E. BAIKOVA

Computerized identification of the spurges (Euphorbia) from the Altai region.

New algorithm for computer identification of plants is proposed within the project "Determinant of plants from Altai region". Algorithm for taxa identification is based on method SYNAP (Baikov, 1996). Program SPLIT for data input (description of diagnostic attributes and taxa) and inferring the different schemes is developed. 13 species of the genus Euphorbia, distributed in Altai region are used as model objects. 35 attributes (shoots system, leaves, glands, fruits and others) are analyzed. Their diagnostic value is determined by its position on the scheme. Opportunity of logic removal of any attribute to optimize the scheme is offered. Identification scheme can be chosen as the strictly dichotomous key or mixed one, with polytomous steps. Data input is similar to the traditional procedure of keys creation. Description of attributes is placed to the special line on the screen. A taxon may be designated by number or text label. Comments (full name of taxon; geographic distribution and others) may be attached to the label. The key is produced as step-by-step protocol and the graphic scheme. The numbers of diagnostic characters are placed on the fragments of the scheme. Two lists (of active attributes and active taxa) are added to the scheme. To define a new object, it is necessary to make a new active line, to label it and define the presence (code 1) or absence (code 0) of diagnostic characters. If it is not possible to give a precise answer, variants "it is unknown" (code U - unknown), "presence and absence" (code B both) and "such attribute cannot be here" (code M - missing) are available. SPLIT demo-version is distributed free on diskettes and via Internet. Support: International Science Foundation (RA7000, RA7300), RFBR (98-04-49459, 9907-04222), Siberian Branch of RAS (program "GIS and Internet").

Central Siberian Botanical Garden, SB RAS, Zolotodolinskaya str., 101, Novosibirsk, 630090, Russia, Tel.: 3832-342367, Fax: 3832-354986, E-mail root@botgard.nsk.su

---------------------------------------

L. ZAUGOL'NOVA, L. KHANINA & E. GLUKHOVA

Development of data base and information-diagnostic system for identification of syntaxon addresses of forest vegetation communities in the European Russia

We have started a development of the DB and information system in 1998. The system consists of the following interconnected blocks: (1) DB of syntaxons of levels from vegetation type to subassociation with description of syntaxons and references between hierarchical levels; authors of publications are included. (2) Lists of diagnosis species for all syntaxons. (3) DB of references. (4) Diagnostic tables. (5) DB of geobotanic releves from publications of syntaxon�s authors. (6) DB of geobotanic releves by different authors. Species are being taken from computer dictionary of vascular plant species of Central Russia (Zaugol'nova et al., 1995, Khanina et al., 1999) or from computer list of flora of ex-USSR by Cherepanov's (1995) nomenclature. A dictionary of moss species by M.Ignatov (Ignatov, Afonina, 1992) is used also. There is an information about more then 70 vegetation associations in the system now. We plan to develop diagnosis block for identification of community address by comparison of real and diagnostic species lists. Use of Syntaxon program (Onipchenko, Ovchinnikov, 1992), Ecoscale program (Zaugol'nova, 1995) and ordination methods are planned.

Center of Forest Ecology and Productivity RAS, Novocheremushkinskaya 69, Moscow 117418 Russia, Tel. 095 3316072, Fax (095)3322917, E-mail luda@cepl.rssi.ru

---------------------------------------

J. DIEDERICH¹, R. FORTUNER² & J. MILTON¹

Concepts and Approach for a General Identification System

Many methods and systems for computer identification have been proposed in the last 30 years, but their usage remains very limited. It seems that this is due to several types of problems: Reliance on a single approach per tool (e.g., elimination in the case of multi-entry keys). Use of unreliable data (e.g., by dichotomous and multi-entry keys). Black box aspect of some approaches (e.g., neural networks) . Use of unfamiliar principles (e.g., Bayesian systems). Large amount of data entry needed to use some ID tools (e.g., statistic-based tools). Slowness of the tool compared to a printed key. Large amount of work needed to create the database. The data gathered for one tool cannot be used by other tools. Database not kept up to date. Lack of freedom for the user who must use the characters selected by the author of the tool. Lack of freedom for the user who must obey the machine (traditional expert systems). Genisys is a long term theoretical project that aims at defining the concepts for a general identification system able to overcome these difficulties. This it does mainly by two major innovations: 1 - The database should be created from the literature and include all the characters described for all the species in a group. Published data lacks uniformity, but uniformity can be restored by using a character format that is both uniform and representative. A tool (Terminator) has been designed (not currently available) to extract characters from published description and put them into a Genisys database. A version of the Terminator has been prototyped and tested demonstrating the feasibility of the task. Improvements and redesign would depend on funding. 2 - The ID tool must be in fact a set of tools, each designed to help the identifier do one of the possible tasks in an identification session (elimination, similarity, dissimilarity, instant recognition, statistics, probability, fuzzy logic, non-morphological approaches, etc.). Other tools could export Genisys data into a different format (e.g., data matrix or Delta-coded data) so that existing tools could be used as well as the tools developed for Genisys. The problems listed above would be solved as follows: Genisys will be a set of tools helping the user with many different approaches, including, but not limited to, elimination. Elimination will use reliable characters only; reliability of all characters will be evaluated based on metadata. Neural networks, Bayesian systems, etc., if included, will be only some of several available approaches. The set of tools will include some approaches that need very little input from the user (but input-intensive approaches will be available as well, if needed). Fast identification (e.g., by instant recognition) will be supported. The database will be created from published data. A single database will be used by all the tools: it will be worth it to keep it up to date. The user will be free to use any set of characters. The user will be in charge of the identification process. The Genisys concepts were first developed for nematodes (Nemisys project) but soon expanded to identification in general. Genisys is currently a set of high level principles and specifications for biological databases and identification. A summary of these principles and a list of articles already published on Genisys can be seen in the following Web site: http://math.ucdavis.edu/~milton/genisys.html. These principles have not yet been implemented, but the project is seeking funding.

¹University of California, Department of Mathematics, Davis, California 95616, USA, Tel. JD: (1) (530) 752-0892, JM: (1) (530) 752-3657; Fax (1) (530) 752-6635, E-mail dieder@math.ucdavis.edu & milton@math.ucdavis.edu
²11 place Fr�zeau de la Fr�zelli�re, 86420 Monts sur Guesnes, France, Tel. (33) 5 49 22 87 18; Fax (33) 5 49 22 74 10; E-mail fortuner@wanadoo.fr

---------------------------------------

N. KHANDJIAN & G. GULNAZARYAN

Data Base of the Caucasian representatives of tribe Anthemideae Cass

(MS Access-7)

In the Ministry of Nature Protection of Armenia the data base (DB) is created for systematization and identification of taxa of the tribe Anthemideae. DB fields are divided into the three following groups. I group consists of the field with identification features. Classification of features based on the long-term investigations (Khandjian, 1993) allows to identify the plant according to its exomorphical, anatomical, kariological, biochemical, pollen and embryonic features. II group - the fields, showing geographical location of taxa and their environment. The given group of fields will be also used for its further junction to elaborated technologies of Data Base and GIS, which will be worked out, taking into account the experience of creation and conducting DB of Caucasian representatives of Anthemidae tribe. The group will also be included into National Data Bank "Ecology" - the system of DB embracing all elements of the environment. III group - the fields, showing industrial-domestic characteristics of the plants, namely the possibility of usage of different taxa in industry and daily round. In future this information will be used to solve problems of the ex-situ and in-situ protection of the tribe representatives.

Ministry of Nature Protection - Republic of Armenia, 35, Moskovian str., Yerevan, 375002, Republic of Armenia, Tel. (3742) 53-31-81, Fax (3742) 53-49-02, 53-86-13, 15-18-40.

---------------------------------------

D. K. YEATES & K. THIELE

LucID: identification tools for the biosphere

(http://www.publish.csiro.au/lucid)

The software system LucID is one component of a solution to the identification bottleneck. LucID is designed to capture taxonomic experts' knowledge on the identification of organisms, and to allow this knowledge to be disseminated widely. LucID is a computer-based, multi-access, interactive identification tool, which uses Windows operating systems. The LucID system consists of a builder module that allows quick and effective encoding of key data, and a player module that allows users to perform identifications using the builder's data. LucID provides a link between the user's knowledge of a specimen and the builder's knowledge of the taxa. In addition to the core identification function, the builder of a LucID key may surround the data set with an unlimited variety of information files, each piece of information tied to a taxon name. Information files in LucID may contain notes on taxonomy, relationships, ecology or economic importance, descriptions, distribution maps, images, sound files or video clips. No constraints are placed on the builder as to what type of information or topics may be covered. In this way, LucID becomes a publication tool for much more than the raw identification data. The main aims in developing LucID have been to deliver interactive identifcation tools that are flexible and powerful, as well as being extremely easy to develop and use. LucID keys can be developed using any written language. Both the Builder and Player programs require minimal effort to learn, and reduce time in key construction, development and use.

CSIRO PUBLISHING, PO Box 1139, Collingwood, VIC 3066, Australia.
Contact Andrea Jordan: Tel +61 3 9662 7623; E-mail andrea.jordan@publish.csiro.au

---------------------------------------