Eco-informatics, NSF SGER proposal, 2006

Eco-informatics:
scalable indexing and retrieval
of biodiversity information across
NSF sponsored studies

Small Grant for Exploratory Research
proposal to
The National Science Foundation
from
The Polistes Foundation

Dan Kjar
Georgetown University
Washington, D. C.
&
John Pickering
University of Georgia
Athens

February, 2006


Hibiscus Coccineus, Scarlet Rose Mallow
Hibiscus coccineus, Scarlet Rose Mallow
Illustration by Cheryl Reese, 2004

Updated: 11 July, 2006


Project Summary

Objectives
Discover Life and its scientific partners are building an interactive encyclopedia of life for web users to better study, monitor, manage, and enjoy biodiversity. The goal of this encyclopedia is to provide free, easy-to-use access to high-quality images, identification guides, real-time high-resolution maps, and other valuable information for a million species by 2012. Unfortunately, the integration of available biodiversity information is in its infancy. Incompatible data formats, changing taxonomies, and poor scalability hinder on-going efforts to consolidate information. If funded, this project will test a new data model and software that will potentially solve these problems and catalyze rapid advances in the emerging field of eco-informatics.

Methods
Discover Life's new technology serves dynamic information from contributing databases and websites regardless of their underlying software and data formats. To serve up-to-date composite webpages, it continuously updates a cascade of indexes reflecting changes in source information. If funded, this project will test whether the technology scales efficiently by attempting to rapidly incorporate a diversity of NSF databases containing millions of specimen records, images, and metadata into the larger framework of the encyclopedia of life.

The technology uses translators and integration routines to merge complex, non-standardized data formats for analysis and display. It integrates specimen level records and images into species pages, maps, and identification tools. Discover Life currently serves information on over 200,000 species from diverse sources, including databases at the American Museum of Natural History, California Academy of Sciences, Field Museum, Kansas Natural History Museum, Los Angeles County Museum, Missouri Botanical Garden, Museum of Comparative Zoology, and Smithsonian Institution. This proposal requests support for a post-doctoral fellow and student interns who will test the system at a much larger scale by attempting to integrate taxonomic and specimen data from all NSF projects that wish to participate.

Intellectual merit
Ant taxonomists can currently compare images of type specimens from a limited number of museum collections side-by-side within a single Discover Life webpage. Likewise, biogeographers can overlay insect distributions from the American Museum of Natural History's Planetary Biodiversity Inventory database on maps of potential host plants generated from participating herbaria databases without having to navigate each individually. If the data model and software prove scalable, scientists and the public alike will have a system to simultaneously retrieve and analyze biodiversity information from a potentially unlimited number of databases and webpages.

Broader impacts
Society benefits in many ways from the integration of high-quality, biodiversity information. This project will facilitate inquiry-based scientific learning at all levels. For example, the integration of information from the California Academy of Sciences, Museum of Comparative Zoology, and Smithsonian Institution has already enabled students at Cedar Shoals High School in Georgia to study the impact of invasive fire ants on local ant diversity. Citizen scientists will have better access to information needed to study and monitor nature. Land managers will gain a single source to query and analyze information across taxa. Policy makers will have a set of tools to retrieve data and compare trends from across NSF and other databases. Currently, Discover Life serves 2-3 million pages and images to over 60,000 IP addresses each month.


Discover Life | Search | The Polistes Foundation | Proposal