barcodes on insect specimens
Date: Wed, 7 Jul 1999 16:09:24 -0800 To: "Ugalde, Jesus" <jugalde@inbio.ac.cr>, "Brown, Brian" <brianb@almaak.usc.edu>, "Kaspari, Mike" <mkaspari@ou.edu>, "Pickering, John" <pick@pick.uga.edu>, "Furth, Dave" <Furth.David@NMNH.SI.EDU>, "Naskrecki, Piotr" <pin93001@uconnvm.uconn.edu> From: "John T. Longino" <longinoj@evergreen.edu> Subject: barcodes on insect specimens Cc: "Colwell, Rob" <colwell@uconnvm.uconn.edu> Below is a discussion of barcodes for insect specimens, and a call to action for the museum community. I have sent it to my circle of barcode-using colleagues (Jesus, Brian, Mike, Pick, Piotr), Dave Furth (as collection manager for Smithsonian; also could you send this on to Chris Thompson; I don't have his email), and my colleague >in ALAS, Rob Colwell. Could you all suggest other interested parties I might contact (with emails)? Thanks. BARCODES FOR INSECT SPECIMENS *The Problem* Over the past decade, a number of institutions and projects began using barcodes to label individual insect specimens, providing unique identifiers for specimens (Thompson, F. C. 1994. Bar codes for specimen data management. Insect Collection News 9:2-4.). The hope was that this would result in more efficient data capture and specimen management. Barcodes come in different flavors called symbologies, and scanner hardware must be programmed to recognize particular symbologies. INBio chose Code49, a proprietary code produced by Intermec, and other institutions and projects have followed suit. The ones I know of are the LACM, John Pickering's lab, the ALAS project, and Mike Kaspari's lab. Barcodes on insect specimens had to be small, and Code49 was one of the first high-density barcodes, which would allow a sufficiently large specimen code on a sufficiently small label. Reading high-density barcodes requires specialized, expensive scanners, and it has always been a problem acquiring them. But a larger problem is now evident: Code49 never obtained a large market, and is now extinct. The existing scanning hardware cannot be repaired or replaced. Intermec technical representatives have confirmed this. INBio is now taking steps to migrate to a new symbology (discussed more below). I suggest that the museum community consider this problem jointly, rather than each institution acting independently. *Common Standards* If the barcode-using entomologists consider this together, we can avoid a lot of wheel reinventing and a chaos of competing symbologies and inter-institutional incompatibility. Ideally a common symbology could be adopted, which would facilitate the exchange of specimens. I can envision a Web site that would be an information source about symbology, sources of hardware and software, lists of institutions or projects using barcodes, label formats or institutional codes being used, etc. If this is already being done please let me know so I can be in the loop. I have done a bit of research and suggest two symbology options below (discussing their pros and cons). If any of you have additional suggestions or information, please let me know. *MicroPDF417 * This is a high-density symbology developed by Symbol Technologies (www.symbol.com). I explained my needs to them for a small barcode label that would hold the data content of a current INBio barcode (17 characters). They sent a sample in the form of a Word file with an embedded graphic that was a barcode that would store 22 characters. They said if I printed it on a 600dpi laserprinter it would scan with no problem. I printed it and got a very clean looking barcode that was 11x5mm (INBio's current labels are 22x8mm). I do not have a scanner to test it, however. Symbol Technology claims that micropdf417 is now the most popular small barcode, with 80-90% of the market. Their Web site lists some of the current users of this symbology. The list is not large, but one that comes to mind is driver's licenses in the Philippines. It is a public domain symbology, and apparently many companies are making scanners that can read it. The advantage of this symbology is that more than enough data can be stored on a label of an acceptable size to entomologists. The disadvantages are at least two: (1) the high-density symbology will require more expensive scanners, and (2) regardless of symbology there is no high demand for high-density barcodes, so they are prone to rapid obsolescence. Today's hot symbology will be tomorrow's Code49 (which, of course, could well apply to barcodes in general). *Code128* INBio is planning to migrate to Code128. This is one of the most common symbologies there is. Any scanner (including cheap ones) can read it. The problem is that Code128 is not a high-density symbology. INBio has found that they can fit a 10-character code on a label the size of their current labels (22x8mm). They plan to have the full code printed on the label in human-readable form, including the INBio institutional prefix, but only a 10-digit number in the machine-readable symbology. They will rely on software to add the institutional prefix in their database. At first I was strongly against Code128, because I think eliminating the institutional prefix from the symbology is unacceptable. Eliminating the prefix will preclude the barcodes being used when specimens are loaned or donated to other institutions. The movement of specimens among institutions is an essential component of the taxonomic process, and all institutions involved in specimen-based data capture must recognize all individual specimen codes, regardless of the provenance. The barcode prefix does not indicate ownership, but only the origin of the specimen, and all institutions should expect to gradually accumulate specimens with diverse barcode prefixes. There is already a publication on the insect collections of the world, in which every collection has a unique 4-letter code. A database of these codes is now on the Web at http://www.bishop.hawaii.org/bishop/ento/codens-r-us.html. I recommend that these codes be adopted as the standard institutional codes for barcodes, and that any new collection or project select a unique 4-letter code. For example, INBio is already in the database as INBC. If you include a 4-letter prefix, that leaves only 6 characters on INBio's planned Code128 labels. This would allow only a million unique codes within an institution. But Rob Colwell alerted me to the fact that we did not have to use a base10 system. If we used a code based on the 26 letters of the alphabet (similar to record locators of airline reservations), a 6-letter code would allow over 300 million unique codes. A 5-letter code allows over 11 million, and would result in a shorter barcode label. If we use a base36 code, with both letters and digits, a 5-digit code allows over 60 million unique codes. A drawback of using letters and digits is the similarity of 1's and l's, which could confound optical character readers in the future. But these problems could be dealt with by not using characters that look too much alike. Manuel Zumbado, of INBio, gave me some sample Code128 labels. I tried them on my barcode scanner and was amazed at how easily they scanned, compared to a Code49 label. Now more than ever I am enthused about migrating to a Code128 label. *Optical Character Recognition* Everyone tells me that optical character recognition will make barcodes obsolete. This makes sense to me and emphasizes the need for the human-readable version of the specimen code to be clear on the label. I have no information on the availability or practicality of this technology at present. *Retrofitting* INBio is contemplating an upgrade of their 3 million Code49 labels. They propose to obtain a special printer and rolls of polyester labels, scan sets of specimens with their old Code49 scanners, print out new code128 labels with the same data (omitting the prefix), and superimpose the new label over the old one. I think this is ill advised. Barcodes are not archival. The technology will always change, and we should plan for machine readability of any symbology to be short-lived (optical character recognition perhaps the exception?). Brian Brown has pointed out to me that this may not be such a severe drawback if material is promptly curated. For the large majority of specimens, the barcode will only be read once - at the time of identification. When the barcodes are first put on the specimens, usually only the first and last of a series are scanned, the rest of the specimen records being generated automatically. The moment for scanning comes when a series of specimens is identified as a particular species. Most specimens are of relatively few common species, and their aggregation under a particular species relatively secure. Occasional misidentifications may be found, and nomenclature may change, but the actual aggregation of specimens in the box will not change for the majority. They will reside in the box indefinitely, never needing to be scanned again. If symbology upgrading is going to be necessary every time there is a technology change, then we should question the utility of individual specimen coding in the first place. Administrators and informatics people stress the need for machine-readable specimen codes for long-term specimen and data management, but we should resist letting these accounting functions cloud our objective, which is a greater understanding of biodiversity. I think it would be far more productive to take whatever resources would have gone into relabeling, and direct them toward the immediate curation of existing material. ****************************************************** John T. Longino Lab I, The Evergreen State College Olympia WA 98505 USA longinoj@evergreen.edu Ants of Costa Rica on the Web at http://www.evergreen.edu/ants Project ALAS at http://viceroy.eeb.uconn.edu/ALAS/ALAS.html ******************************************************
Discover Life in America | Science | Unique Identifiers & Barcodes | Correspondence | John Longino - 7 July, 1999 |