TDWG Discussion Notes
Open Discussion: Sharing information on species interactions, phenology, identification, and checklists

18-Oct-11
Notes courtesy of Annie Simpson

John Pickering, Gerry Cassis, & Malcolm Storey
http://www.discoverlife.org/demo/20111018tdwg.html

Just celebrated our first billion hits.

New emphasis/focus is on species interactions.

Funded by the Atlas of Living Australia, working with EOL, and other organization.

3 main technologies:

  1. Photo albums
  2. Global mapper
  3. Customized identification guides

Herbivorous caterpillars occasionally aren't. How to database this type of complicated species interaction?

Developing a schema and controlled vocabulary for species interactions, with ALA

Test cases will be on Australian data, but eventually will be useful to all.

Title Energy donor is Energy recipient is

What are the biological questions we should be asking of a species interactions database? (list of 10s of questions)

DiscoverLife maps disparate data occurrence sources.

Event Entity-1, -2, -n Evidence

Where when who how name information photo specimen etc

Metadata will be standards Darwin Core. What the interactions are will be a Darwin Core extension, in a "subject, verb, object, optional modifier" setup. Doesn't easily fit into a triplet.

GNEET are Global Numbers for Event and Entity Tracking. Are persistent, resolvable, and can include PINs.

Controlled vocabulary

Each photo can generate multiple subject-verb-object relationships.

Need to have this system be useful but stay as simple as possible, because many scientists don't go beyond using an excel spreadsheet for their data.

Mothing Project Results have 46K photos so far.

Data are analyzed each night, compiled by month, and made available for use by anyone for analysis.

How do we get the data in formats that are simple to use?

What are the critical questions participants have?

Stephen W.: Can you search on just one part of the 'sentence' like 'feeds on'?

Pick: Yes. We want to figure out what is limiting, so the details (modifiers) are optional.

Amanda (?): Have you had any interactions with the NSF Phenotype Project? Applies morphological questions to semantic issues. How do we make morphological or phonological models? It is being done in the NC research triangle.

Stinger: Is this the textbook case of ontology enabled symantic reasoning? Why not use semantics? What is in the black box.

Pick: A huge amount of purl code.

Stinger: So you are not developing an ontology, take it one more step to OWL, and there is a perfect system. Ontological relations are required for triplets.

Pick: Trying to keep the data structure as simple as possible so that dropdown boxes can be used.

Bob: You are not using symantic reasoning?

Pick: I don't think we will, because we have submitters uploading a large number of records we sort as key value pairs. You can build very sophisticated queries.

Bob: I have no problem with providing users with a simple interface. But some semantics are needed to make your data interoperable with others' work.

Stinger: This larvae feeding on this plant and there is a parasite on the larvae, you need semantic reasoning to enable this information.

Pick: There are ontologies in each community, and they are all different and quite overwhelming. We need controlled vocabularies and a simple logic, that can be blasted out for users. The structure must be such that it is easy to ask these questions.

Stinger: I want to know where to find a parasite that feeds on a larva that eats a plant. You can output in any form you want, but you are wasting an opportunity if you are not semantically enabling.

Pick: RDF isn't needed. I prefer a logical, coded solution.

Bob: You can recreate all the tools that have previously been created, there are even some in purl. There was nothing on your list of information that isn't SPARQL/OWL query.

Bob: There is a great book called Semantic Web tools for the working ontologist. I highly recommend it. Stinger's point is that it is not a common way to put reasoning on the data to avoid rdf. Do you think they are inextricably entwined?

Pick: I've used SPARQL and found it incredibly slow. Complex procedures are not usually needed. GBIF has gone from complex to simple.

Chuck: Isn't this a limited and short set of relationships, that may not need rdf? This problem has been around for more than 25 years. Maybe this situation doesn't need semantics and rdf.

Donald: We can decouple... Even the very simple relations can be reasoned out in a network of connections. These things don't have to be stored as rdf, but if we use rdf to describe the events it may be useful. Presenting a dataset in rdf form allows other systems to easily ingest the data. At this state we should think about the semantic implications. Relation exists, it is such and such, and is that term amenable to an rdf relationship?

Pick: I'm going to structure this in the most efficient way so that the relationships can be logically derived.

Donald: in everything that TDWG is doing, we need to decouple the relationships from what we want the data to look like. But we can take the extra step to prepare them for semantic access and data mining, and the benefits are potentially large.

Stephen Ginsburg: if you try to reduce everything to subject verb object, this will be powerful enough to answer any question you want.

Pick: you can search/filter on

Bob: Sentence one of every discussion of rdf is this situation. Why won't RDF work?

Pick: it will work, I am simply seeking a simple and fast solution. RDF often is terribly slow.

Donald: you've already taken a lot of the semantic complexity out by preparing your data to support a simpler solution.

Jim Croft was a listener.]

Several slides of a tomato leaf that might be eaten by different things.

I'm interested in all the relationships, and space/time associations can be involved. What do cows eat when larvae are present?

Bob: rdf is being used for complex cases of reason, and you saw it as slow. If your questions are simple, and you say they are, rdf will be blindingly fast.

Donald: it's not that you can't answer the questions with your solution, its that they will require additional coding.

Bob: if you solve your questions with code, the larger solutions will require additional coding. And may have issues in scale.

Pick: When I went from html to text, things scream.

I'm not saying rdf can't solve it, I'm saying my logical solution will be faster.

Jim Croft: it still looks like you are a long way away from a solution. When will it be ready?

Pick: It will have to be ready before next June.

Donald: the useful discussion centers around an agreement on the fields/format that will be used for the data manipulation. They have to be "inference ready". When we talk about vectoring, do we model pairs or...there are three pairs. Or more. Biting, infection, all kinds of modifiers are there and need to be documented. Pollination is a vectoring event, too. It happens to be beneficial. So is feeding.

Bob: If it is runtime instances of classes, you can declare that "feeds young" is an instance. He continues to deny this, but my objection is that broadly defined scale of all kinds, he will have to code.

Donald: We need to ID which bits of the data are structured enough to be reasoned and be useful. Fruits of different colors from different latitudes and the ability to chain the data is potentially very useful.

Bob: the next thing your are going to discover means you need a quad and reifications...

Jim: if someone maps several hundred K of interactions, we only have a tangle of lines. We are now trying to simplify and tease apart and visualize the mess. Jim.croft@gmail.com (Canberra Herbarium) I'm managing the taxonomy research network, linking APNI to the national species list. Have been talking to Jerry.

Pick: APNI is very well designed. Jerry can't even get his own information back out of ___ on Homoptera.

Jim: Fauna directory is based on expert opinion (unfortunately). I have not problem if you scrape content from APNI, just give us acknowledgement. If there are published taxonomic revisions, we normally get them updated in the system within a week. We have a complex agreement for images, but we do encourage deep linking. We would like the name to travel within the image, however, because when taxonomic updates are done, it won't get reflected within a copied image--but the deep link will take you back to the right name. We just don't want our images to be disassociated with their metadata.