Discover Life

Technology and other features

John Pickering
University of Georgia, Athens

Original version for the
Global Invasive Species Information Network
workshop, Baltimore, Maryland, U.S.A.
6-8 April, 2004

Last updated: 7 November, 2005

Fallopia_japonica, Japanese Knotweed
Fallopia japonica, Japanese Knotweed

Photograph by John Pickering, 24 September, 2003,
Harz Forest, Lower Saxony, Germany.


Title: "Discover Life -- translating across standards"

Abstract: Discover Life (http://www.discoverlife.org) provides Web tools to assemble, process, and share text and images on invasive and other species. These tools help users to identify, report, database, and map information using most browsers. They are designed to integrate data from numerous sources, require minimal training or technical knowledge to use, and are available to everyone with Web access for free. The philosophy behind their success is that it is much easier to build translators to share data than it is for everyone to adopt one or more new data standards. With the exception of TCP/IP and other low-level protocols, it is unlikely that new standards will meet all the specific needs of the global invasive species community. Furthermore, because of the high costs of moving from legacy systems, rewriting functions, and retraining personnel, it may be more efficient to integrate exisiting systems and data structures than to adopt new ones. Discover Life's tools allow users to import data using Web forms and software packages that export HTML, XML, RFC822 headers, or flat text files. They support Excel, Access, and SQL databases. The presentation will demonstrate how to add, share, and query data with (1) Discover Life's IDnature guides and checklists, (2) its Global Mapper developed in partnership with Topozone.com, (3) a reporting system that supports customized species data entry forms, and (4) a Web-based data manager that uses globally unique identifiers to label specimens and track records. For details please see http://www.discoverlife.org/pa/or/polistes/fe.


IDnature guides
20q software lets folks quickly build checklists and guides to groups of taxa. There are two easy ways of getting data into this format. One can either import them from a text file or one can enter them using Web forms. Both processes are simple. Paul Fine at Chicago's Field Museum did this recently and now has working checklists/guides on the Web. For example, see
http://pick4.pick.uga.edu/mp/20q?guide=Burseraceae for his guide, and, http://pick4.pick.uga.edu/mp/20q?act=x_checklist&list=Burseraceae for his checklist. He started these on 29 January. For all practical purposes, they are done. Note that this checklist links to species pages that include photos and maps of each species (see below). If folks have data in flattened ASCII text files, we can import their data rapidly. Robert Luecking at the Chicago's Field Museum built a guide to 154 species of Graphis fungi in an hour or so, starting with his Excel spreadsheet of the species and their character-states. In total, there over 50 guides at some stage of development. We can import data from Lucid and DELTA guide formats among others. We can export in XML format.

Photos
20q includes tools to process images and add them to guides and species pages, which are accessed through the checklists. Again this is a fairly straight forward process. Step 1 is to ftp images to Discover Life. Step 2 involves Discover Life assigning unique identifiers to each image and preprocessing them en masse. Step 3 involves the guide/page buider processing the images and linking them to the guides/pages. Step 3 is simple and done entirely through Web forms. Ten minutes of training; no Photoshop experience necessary.

Maps
Our Global Mapper overlays spatial data from multiple sources onto Topozone.com's maps and aerial photos. It is an ideal tool to map specimen level data on the Web. Points on the map link back to specimen level records. For example see the "Demo" at http://pick4.pick.uga.edu/mp/20m which maps data from 7 databases. Users can build custom maps for one or more species. For example, see http://pick4.pick.uga.edu/mp/20m?w=720&h=360&r=20.32&e=304477&n=4702700&z=19 &kind=Acer+platanoides,Celastrus+orbicultus that maps data on two invasive species from Concord's Public Works Department's database. Points on these maps link back to the databases and their individual records (see Labels below). Getting data into a format for the Global Mapper is straight forward. The mapper has multiple import/interface features. The simplest is to use a flattened ASCII text file exported from a database/spreadsheet. The format of this file is the same as for importing data into guides and checklists.

  1. It has a header line that contains fields explaining the structure of the following lines.
  2. Each record is on one line and has fields separated by tabs. In effect, it's a text export file from a spreadsheet like Excel. You can put this file on your Website, ftp it to me, or send it as an email attachment.
The Global Mapper includes a gazetteer with over 7 million georeferenced places. It easily converts data between latitude-longitude (decimal degrees or degree-minute-seconds) and UTM coordinates. It uses the WGS84/NAD83 datum and can convert data from NAD27. Our partner Topozone.com makes global maps available to the mapper at 1:1,000,000 scale, topographical maps to 1:24,000 scale for the United States, and aerial photographs at 1 pixel per square meter resolution for 89% of the United States, in total over 20 TB of data.

Text
Species pages associated with checklists, guides, and maps can be anywhere on the Web. If a contributers don't have access to a server, Discover Life is happy to host their text and images. For an example of a recently added page, see http://www.discoverlife.org/nh/tx/Plantae/Bryophyta by Matt von Konrat at the Field Museum. At the simplest level, doing this is just a matter of emailing us some text and images. The technical support at Discover Life will get it into our servers and html pages on the Web. 20q also enables its builders to combine pages and images from across the Web into a single page that is built by our proxyservers and served in real-time without caching. This is better than using Google to retrieve information. It allows taxonomic experts to select and present the very best pages available for each species. For example, see the species in the checklist of North American Mammals: http://pick4.pick.uga.edu/mp/20q?act=x_checklist&list=Mammalia which include information from the Smithsonian Institution and elsewhere.

Links
By putting links similar to those above on your Website, you can use Discover Life's tools as a Web service that is embedded into pages on you Website.

Reporting
To enable Web users to study and monitor species, Discover Life has a Web reporting system. Its generic version is built into the identification guides. It can also be customized to display detailed Web HTML forms. For example, http://pick4.pick.uga.edu/mp/20m/act=report&Adelgis+tsugae is designed to collect specialized information on the invasive Hemlock Woolly Adelgid. Data from reports are stored in a simple text format using RFC822 headers and can be mapped and disseminated. The reporting system is integrated with the other tools. For example, it can use the identification guides, global mapper, and gazetteer to help users identify things and map where they found them.

Label, collection, and specimen level databases
Collection and specimen level information can be managed and retrieved using a browser. This information is integrated into the maps. It can be used to print labels with unique identifiers and machine readable symbols. See http://pick1.pick.uga.edu/mp/20l This information is password protected and can only be changed by authorized users who maintain the data. However, most records can be viewed without a password. This technology allows for specimen tracking and speeds determination and curation of uniquely labeled material.

Indexing & searching
A spider for searching and indexing biological information on the Web is being developed. The goal of this spider is to index 100,000 Websites and databases for information on 1,000,000 described species before 2005. The beginnings of this tool can be viewed at http://www.discoverlife.org/search_box.html. The source code of this page explains how to customize the search box for use on other sites so that it navigates users back to any Webpage.

Help
Discover Life's site map and more information on individual feature is at http://www.discoverlife.org/help.html. An extensive help for IDnature guides and the Global Mapper can be found at http://www.discoverlife.org/nh/id/20q/20q_help.html.

Data schema
The XML schema for the IDnature guides is at http://www.discoverlife.org/ed/tg/Building_Web_Pages/20q_xml_tags.html.

Reliability
Discover Life runs on Sun computers with OS8 at its primary sites at Missouri Botanical Garden and the University of Georgia, Athens. These machines have a total of 16 processors and over 600 MB of storage. By being at two physical sites, they are extremely reliable. In October, 2003, Discover Life served over 825,000 pages and images to appoximately 30,000 different IP addresses. Discover Life also uses a Linux server provided by the South African Agricultural Research Council in Pretoria.

Security
Considerable effort has been expended to make the site secure. 20q uses randomly generated pin numbers and email accounts, for example, to secure transactions in building pages and guides. File backup is done nightly.

Compatibility
Discover Life's Web pages and technology has been designed to be 100% functional with most browsers and does not require pluggins and applets, such as Flash and Shockwave. It works with Internet Explorer (version 4, 5 and 6), Netscape (version 4.76, and all recent versions except 4.79) and Mozilla. Discover Life uses server-side technology and interacts with clients via standard HTTP and HTML. For maximum compatibility on all machines, it requires no programs other than a generic browser on client machines. It does not use JavaScript. It does not require users to have Adobe reader, Word, Excel, or other programs to access its information.

Discover Life | All Living Things | IDnature guides | Global Mapper | Polistes Foundation | Technology