Discover Life in America

John Longino - 7 July, 1999

barcodes on insect specimens

Date: Wed, 7 Jul 1999 16:09:24 -0800
To: "Ugalde, Jesus" <jugalde@inbio.ac.cr>,
        "Brown, Brian" <brianb@almaak.usc.edu>,
        "Kaspari, Mike" <mkaspari@ou.edu>,
        "Pickering, John" <pick@pick.uga.edu>,
        "Furth, Dave" <Furth.David@NMNH.SI.EDU>,
        "Naskrecki, Piotr" <pin93001@uconnvm.uconn.edu>
From: "John T. Longino" <longinoj@evergreen.edu>
Subject: barcodes on insect specimens
Cc: "Colwell, Rob" <colwell@uconnvm.uconn.edu>


Below is a discussion of barcodes for insect specimens, and a call to
action for the museum community. I have sent it to my circle of
barcode-using colleagues (Jesus, Brian, Mike, Pick, Piotr), Dave
Furth (as collection manager for Smithsonian; also could you send
this on to Chris Thompson; I don't have his email), and my colleague
>in ALAS, Rob Colwell. Could you all suggest other interested parties
I might contact (with emails)? Thanks.

BARCODES FOR INSECT SPECIMENS

*The Problem*

Over the past decade, a number of institutions and projects began
using barcodes to label individual insect specimens, providing unique
identifiers for specimens (Thompson, F. C.  1994.  Bar codes for
specimen data management.  Insect Collection News 9:2-4.). The hope
was that this would result in more efficient data capture and
specimen management. Barcodes come in different flavors called
symbologies, and scanner hardware must be programmed to recognize
particular symbologies. INBio chose Code49, a proprietary code
produced by Intermec, and other institutions and projects have
followed suit. The ones I know of are the LACM, John Pickering's lab,
the ALAS project, and Mike Kaspari's lab. Barcodes on insect
specimens had to be small, and Code49 was one of the first
high-density barcodes, which would allow a sufficiently large
specimen code on a sufficiently small label. Reading high-density
barcodes requires specialized, expensive scanners, and it has always
been a problem acquiring them. But a larger problem is now evident:
Code49 never obtained a large market, and is now extinct. The
existing scanning hardware cannot be repaired or replaced. Intermec
technical representatives have confirmed this.

INBio is now taking steps to migrate to a new symbology (discussed
more below). I suggest that the museum community consider this
problem jointly, rather than each institution acting independently.

*Common Standards*

If the barcode-using entomologists consider this together, we can
avoid a lot of wheel reinventing  and a chaos of competing
symbologies and inter-institutional incompatibility. Ideally a common
symbology could be adopted, which would facilitate the exchange of
specimens. I can envision a Web site that would be an information
source about symbology, sources of hardware and software, lists of
institutions or projects using barcodes, label formats or
institutional codes being used, etc.

If this is already being done please let me know so I can be in the loop.

I have done a bit of research and suggest two symbology options below
(discussing their pros and cons). If any of you have additional
suggestions or information, please let me know.

*MicroPDF417 *

This is a high-density symbology developed by Symbol Technologies
(www.symbol.com). I explained my needs to them for a small barcode
label that would hold the data content of a current INBio barcode (17
characters). They sent a sample in the form of a Word file with an
embedded graphic that was a barcode that would store 22 characters.
They said if I printed it on a 600dpi laserprinter it would scan with
no problem. I printed it and got a very clean looking barcode that
was 11x5mm (INBio's current labels are 22x8mm). I do not have a
scanner to test it, however. Symbol Technology claims that
micropdf417 is now the most popular small barcode, with 80-90% of the
market. Their Web site lists some of the current users of this
symbology. The list is not large, but one that comes to mind is
driver's licenses in the Philippines. It is a public domain
symbology, and apparently many companies are making scanners that can
read it.

The advantage of this symbology is that more than enough data can be
stored on a label of an acceptable size to entomologists. The
disadvantages are at least two: (1) the high-density symbology will
require more expensive scanners, and (2) regardless of symbology
there is no high demand for high-density barcodes, so they are prone
to rapid obsolescence. Today's hot symbology will be tomorrow's
Code49 (which, of course, could well apply to barcodes in general).

*Code128*

INBio is planning to migrate to Code128. This is one of the most
common symbologies there is. Any scanner (including cheap ones) can
read it. The problem is that Code128 is not a high-density symbology.
INBio has found that they can fit a 10-character code on a label the
size of their current labels (22x8mm). They plan to have the full
code printed on the label in human-readable form, including the INBio
institutional prefix, but only a 10-digit number in the
machine-readable symbology. They will rely on software to add the
institutional prefix in their database.

At first I was strongly against Code128, because I think eliminating
the institutional prefix from the symbology is unacceptable.
Eliminating the prefix will preclude the barcodes being used when
specimens are loaned or donated to other institutions. The movement
of specimens among institutions is an essential component of the
taxonomic process, and all institutions involved in specimen-based
data capture must recognize all individual specimen codes, regardless
of the provenance. The barcode prefix does not indicate ownership,
but only the origin of the specimen, and all institutions should
expect to gradually accumulate specimens with diverse barcode
prefixes. There is already a publication on the insect collections of
the world, in which every collection has a unique 4-letter code. A
database of these codes is now on the Web at
http://www.bishop.hawaii.org/bishop/ento/codens-r-us.html. I
recommend that these codes be adopted as the standard institutional
codes for barcodes, and that any new collection or project select a
unique 4-letter code. For example, INBio is already in the database
as INBC.

If you include a 4-letter prefix, that leaves only 6 characters on
INBio's planned Code128 labels. This would allow only a million
unique codes within an institution. But Rob Colwell alerted me to the
fact that we did not have to use a base10 system. If we used a code
based on the 26 letters of the alphabet (similar to record locators
of airline reservations), a 6-letter code would allow over 300
million unique codes. A 5-letter code allows over 11 million, and
would result in a shorter barcode label. If we use a base36 code,
with both letters and digits, a 5-digit code allows over 60 million
unique codes. A drawback of using letters and digits is the
similarity of 1's and l's, which could confound optical character
readers in the future. But these problems could be dealt with by not
using characters that look too much alike.

Manuel Zumbado, of INBio, gave me some sample Code128 labels. I tried
them on my barcode scanner and was amazed at how easily they scanned,
compared to a Code49 label. Now more than ever I am enthused about
migrating to a Code128 label.

*Optical Character Recognition*

Everyone tells me that optical character recognition will make
barcodes obsolete. This makes sense to me and emphasizes the need for
the human-readable version of the specimen code to be clear on the
label. I have no information on the availability or practicality of
this technology at present.

*Retrofitting*

INBio is contemplating an upgrade of their 3 million Code49 labels.
They propose to obtain a special printer and rolls of polyester
labels, scan sets of specimens with their old Code49 scanners, print
out new code128 labels with the same data (omitting the prefix), and
superimpose the new label over the old one. I think this is ill
advised.

Barcodes are not archival. The technology will always change, and we
should plan for machine readability of any symbology to be
short-lived (optical character recognition perhaps the exception?).
Brian Brown has pointed out to me that this may not be such a severe
drawback if material is promptly curated. For the large majority of
specimens, the barcode will only be read once - at the time of
identification. When the barcodes are first put on the specimens,
usually only the first and last of a series are scanned, the rest of
the specimen records being generated automatically. The moment for
scanning comes when a series of specimens is identified as a
particular species. Most specimens are of relatively few common
species, and their aggregation under a particular species relatively
secure. Occasional misidentifications may be found, and nomenclature
may change, but the actual aggregation of specimens in the box will
not change for the majority. They will reside in the box
indefinitely, never needing to be scanned again.

If symbology upgrading is going to be necessary every time there is a
technology change, then we should question the utility of individual
specimen coding in the first place.

Administrators and informatics people stress the need for
machine-readable specimen codes for long-term specimen and data
management, but we should resist letting these accounting functions
cloud our objective, which is a greater understanding of
biodiversity. I think it would be far more productive to take
whatever resources would have gone into relabeling, and direct them
toward the immediate curation of existing material.

******************************************************
John T. Longino
Lab I, The Evergreen State College
Olympia WA 98505 USA
longinoj@evergreen.edu
Ants of Costa Rica on the Web at http://www.evergreen.edu/ants
Project ALAS at http://viceroy.eeb.uconn.edu/ALAS/ALAS.html
******************************************************


Discover Life in America | Science | Unique Identifiers & Barcodes | Correspondence | John Longino - 7 July, 1999