BRIF and digital identifiers
The following sections outlines several key issues to be discussed in this BRIF subgroup and provides some background information on digital identifiers.
Purpose of identification
To enable a given bio-resource to be reliably referred, whether in a formal scholarly citation context or under other citation-like circumstances (e.g. link from a web page) OR in 'discovery' applications which mine citation data to generate aggregate statistics (e.g. no. citation in a given year).
What to identify
Granularity and usefulness is a key concern here. For example, individual samples can be assigned DOIs, but is this useful given the stated purpose of identification and the privacy issues that would arise?
Some of the more obvious types of bio-resources include:
- Databases. Example: the Ehlers-Danlos Syndrome Variant Database.
- Biobanking projects. Example: the GAZEL project in France.
- Cohorts or sub-cohorts within biobanking projects. [need example]
- Datasets generated and published by biobanking projects. [need example]
Structure and syntax of bio-resource identifiers
The structure of of identifiers themselves tends to be hotly debated, for little reason. As a general rule of the thumb, information embedded in the identifier string should be kept to a minimum. This minimizes the risk of ending up in the future with a misleading/confusing identifier (which cannot be changed) because the name of a project, institution, country etc. was originally included in the string.
The identifier syntax will undoubtly also be very context-specific; syntax which works well for database resources may well be less suitable for identifiers for cohort resource.
Branding and affordance
Although persistent identifiers should ideally be completely opaque and non-semantic, it is acknowledged that this may be undesirable for many situations. On the other hand, a certain level of branding or "recognizability" of an identifier is clearly often useful for giving users some idea of where it came from and to guarantee uniqueness. Some examples help to illustrate this:
- The hypothetical DOI name 10.1935/variantdb.EDS indicates that this is a variant database kind of resource and that this is the EDS database referred to above. However, if the acrynom EDS is changed in the future, the information in the persistent DOI name is no longer correct.
- The DOI name 10.5061/dryad.292q34fp indicates that the Dryad data repository is the original issuer, but is otherwise totally semantic-free (a randomly-generated unique sequence of characters).
- The string EGAS00000000001 can be recognized as a study access from the European Genome-Phenome Archive (EGA).
["actionable" ("what can I do with this ID?").]
Identifiers, metadata and bio-resources - citation examples
Hypothetical examples of how various classes of bio-resources might be cited:
These examples show how the identifier will, in typical use, always be displayed in context - i.e. with presented with various bits of metadata describing the resource being identified and cited. Therefore, embedding descriptive information in the identifier itself adds very little.
Assigning and managing identifiers
Potential organizations or institutions:
- The BioDBCore initiative for databases?
- Clinical trial registries [need more info]
- National biobank registries?
- The P3G study catalogue?