Page tree
Skip to end of metadata
Go to start of metadata

out of date content - available for reference purposes only

How IGB loads data from a DAS server.

This is the current procedure for loading data from DAS servers. Currently,
this only works on two DAS servers (it no longer works for Ensembl):

UCSC = http://genome.cse.ucsc.edu/cgi-bin/das/dsn
Ensembl = http://www.ensembl.org/das/dsn

http://genome.cse.ucsc.edu/cgi-bin/das/dsn http://www.ensembl.org/das/dsn

  • initialize server:
    for each DSN element {{
    for each "SOURCE" element {{
    sourceid = get "id" attribute from SOURCE element
    }}
    master = first "MAPMASTER" element of DSN element
    masterURL = get text from master element
    version key = end (after last /) of masterURL
    one source object (=version) per version key
    add sourceid to source object.
    }}
    this creates a collection of source objects (versions)
  • select species/version
    issue a types query = server name -/dsn suffix + <versionid> + "/types" : http://genome.cse.ucsc.edu/cgi-bin/das/<id>/types
    for each TYPE element {{
    typeid = get "id" attribute from TYPE element
    featureURL = server name -/dsn suffix + <versionid> + "/features?type=" + <typeid>
    add features map entry, key = typeid, value = featureURL
    }}

Proposal for using the DAS registry to find data.

It is possible to use the DAS registry to find data on DAS servers,
the problem is that all DAS servers can be set up differently, they
all follow different rules or different DAS versions, some don't follow
all the protocol rules, and some are not maintained.
DAS is a protocol, and it has different versions. The spec for the
latest version (1.6) is at: http://www.biodas.org/documents/spec-1.6.html
the registry itself is at: http://www.dasregistry.org
So, for example, if we want to find servers for H_sapiens_Feb_2009,
the NCBI taxonomy id for Homo Sapiens is 9606 and the NCBI version is 37
we can use the query: http://www.dasregistry.org/das/sources?organism=9606&version=37&capability=features
and parse the result. The result is an xml document with SOURCES as root element,
and several SOURCE elements. Each SOURCE element has one VERSION element, which contains
some CAPABILITY elements. There must be at least one CAPABILITY element with type attribute
of "das1:features" (the restriction in the query). There may also be a "das1:entry_points"
CAPABILITY element (these will be the available chromosomes), a "das1:types" CAPABILITY
element (each type is a feature), a "das1:stylesheet" CAPABILITY element (this explains
how to display the element, colors, etc.), a "das1:sources" CAPABILITY element (usually
this has the same SOURCE elements as the das registry query). If there is no das1:types
capability, the SOURCE element can be handled as one single feature and the title attribute
of the SOURCE element can be used for the feature name.

Each of the query_uri attribute of the das1:feature CAPABILITY can be processed as one
feature (or several if there is a das1:types CAPABILITY) of a server, the server is the
uri before the first /.
(NOTE - at this point there must be some steps to restrict the features to ones that will
actually return data. Most of the features will fail for some reason, or return no data,
when queried. For example, M_musculus_Jul_2007 will give four servers and
if you select and load all of the features, most will return data. But for A_thaliana_Jun_2009 and V_vinifera_Mar_2010, five servers are found
and if you select and load the features, none of them will return data. And for G_gallus_May_2006 and D_rerio_Dec_2008 one of the five servers returns data.)

If the user selects a feature, a new query can be processed, using the query_uri
attribute of the das1:feature CAPABILITY, and adding a (optional) type param, and a
(required) segment param. The segment param must contain the chromosome, and can contain
the start and end. See the DAS spec. For example:

http://gbrowse.informatics.jax.org/cgi-bin/das/mouse_current%7CQTL/features?segment=1

(NOTE - if you don't know the chromosome names - IGB calls them BioSeq and DAS calls them segment -
this can cause a problem. Most of the DAS servers seem to use 1 as opposed to chr1, etc. If the
server supports entry_points, this can be used, also the IGB synonyms file.)

Further investigation may also result in improvements to this algorithm to give more features with data.
Since the DAS Registry servers are temporary, if they fail, or are disabled, they should be removed
from the Data Sources table in Preferences.

  • No labels