ncRNA-DB
The non-coding RNA human interaction Data Base.

The ncRNA-DB project provides an integrative data base of human non-coding RNA interactions.

Interactions are periodically imported from several data sources, integrated and stored in an on-line data base server. Data can be accessed by directly connecting to the server instance (OrientDB) or through a command-line interface, a Java API package and a Cytoscape 3.x App.

You are free to link or use the ncRNA-DB data and interfaces. If do so, please reference (cite) ncRNA-DB and this website. See section References. We also appreciate bug fixes and would be happy to collaborate for improvements. See section Contacts.

See the Documentation section to understand the main architecture and concepts of the ncRNA-DB system. Download the CLI and API JAR package or the Cytoscape 3.x App, from the CLI and API and ncINetView sections.

Documentation

This is a documentation about the system architecture of ncRNA-DB.

System Architecture

we have imported and integrated associations among non-coding RNAs (miRNAs, circulating miRNAs, lnc-RNAs and other non coding), genes, RNAs and associated diseases from ten on-line databases. The database, named ncRNA-DB (non-coding RNA Human Interaction Data Base), is built on top of the NoSQL platform OrientDB. It is kept updated by common semi-automated procedures. The interaction data of ncRNA-DB can be simply searched and visualized by a web based or a command line interface. The database is accessible through a Cytoscape app, called ncINetView which allows to: (i) build a network annotated with all known ncRNAs and associated diseases by accessing to only one database, and (ii) use all visual and mining features available in Cytoscape app store to analyze it. At http://ncrnadb.scienze.univr.it/ncrnadb/, users can search in ncRNA-DB, export the results in text format, download the command line interface, Java API, the app ncINetView, and use ncRNA-DB as server for third party client applications.

ncRNA-DB is implemented in OrientDB which is both a graph model and an object-oriented model, on top of a document model. The use of OrientDB allows the public accesses to our server, the effective management of user privileges, the usage of graph traversal procedures and many choice of language bindings. It offers a SQL-like interface in addition to several language specific interfaces. It is developed in Java and provides native Java API (Application Programming Interface) for accessing the database, which is suitable for developing Cytoscape applications.

The below figure depicts the schema of ncRNA-DB. The abstract class BioEntity represents biological entities and it is specialized in the five sub-classes: ncRNA, RNA, Gene, Disease and Others. Aliases are represented by the abstract class Alias which is specialized in five different sub-classes related to the five entity types. DataSource is a class containing the external resource name and version from where the data is got or equivalently the official repository of the entity (e.g NONCODE v4). An instance of a class is a particular value (e.g., realization, element, data). In a graph model, instances of classes and sub-classes are nodes. Class inheritance happens when a class is a specialization of the other one (Figure 1 mark 1). The naming of a biological entity by an alias is represented by adding an edge between the corresponding graph nodes. Due to the ambiguity of nomenclatures, these edges are n : n cardinality (Figure 1 mark 2). This means that, for example, a ncRNA can have different aliases and the same alias can refer to different ncRNAs. Interactions a mong entities are modeled through a class called Relation associated to the class BioEntity. The cardinality of the association is n : 2, since an entity participates at more than one relation and a relation involves exactly two entities. Aliases act as access points to the data and they are indexed. The abstract class Alias is indexed by a single field not-unique map on the element nomenclature (the third field of the RID, Alias.name). This is used when the search is performed by giving only the nomenclature. The Alias.type subclasses are indexed by a composite key dictionary working on the second and third field of the RID, DataSource and Alias.name. This index works when both the EntityType and the nomenclature are specified.

Imported data

ncRNA-DB integrates data from several state of the art non-coding databases. We selected sources that cover the majority of non-coding RNAs information with high quality and updated data.

HGNC: we imported a list of non-coding RNAs and their approved aliases used by other datasources, protein-coding genes, pseudogenes and phenotypes (considered as diseases).
lncRNAdb: we imported a list of non-coding RNAs and their aliases.
circ2traits: we imported a set of interacting lncRNAs, circRNAs and messanger RNAs together with the associated diseases and the PubMed IDs of articles where the interactions are reported.
HMDD: we imported a list of diseases, the set of genes that interact with ncRNAs, PubMed IDs of articles together with the support sentences. Here, interactions are listed as ncRNA-disease or ncRNA-gene-disease. We split the multi-relation ncRNA-gene-disease into two distinct relations ncRNA-gene and ncRNA-disease.
lncRNAdisease: we imported a list of lncRNAs, their aliases, associated diseases, interaction levels, PubMed IDs of articles supporting the interactions and sentences describing details such as the type of dysfunction.
Mirandola: we imported a set of miRNAs, their aliases, PubMed IDs of articles together with the support sentences.
miRTarBase: we imported a set of miRNAs, their validated targets, and their aliases, PubMed IDs of articles together with the support sentences.
NONCODE: we imported a list of non-coding RNAs, their aliases and a mapping of NONCODE into external identifiers.
NPInter: we imported a set of ncRNAs, their interactions, interaction levels, PubMed IDs of referencing articles and supporting sentences.

RID - The Resource Identifier Format of ncRNA-DB

RID is the resource identifiers format used in ncRNA-DB. An RID is composed by three parts (or levels) EntityType:DataSource:AliasName. The EntityType indicates the biological classification of the element such as ncRNA, RNA (not including ncRNA), Gene, Disease and Others (for all other cases including entities with unspecified type in the imported data source). The DataSource reports the name of the external data source from where we got the data together with its version (e.g. HMDD 2). The AliasName represents the nomenclature used in the data source. Examples of complete RIDs are NCRNA:MIRTARBASE:MIRT038999, NCRNA:GENERIC:HSA_LET_7A, GENE:HGNC:1100, GENE:GENERIC:BRCA1 or DISEASE:GENERIC:BREAST_CANCER. In an RID the EntityType and the DataSource levels can be omitted. While an RID without the AliasName level is considered a not valid RID. A valid RID is in the form: EntityType:DataSource:AliasName, EntityType::AliasName, :DataSource:AliasName, DataSource:AliasName, ::AliasName or AliasName. The RID system has been introduced to allow semi-hierarchical searching over the alias entries of ncRNA-DB. For example, when searching for ::AliasName, the system returns all the biological entities having an alias with the specified name. Instead, a query using NCRNA::AliasName returns only the ncRNAs having such alias name, regardless of the DataSource. The value of AliasName depends on the external nomenclature, EntityType is one of those listed before, and the complete lit of DataSources is given in the next section.

The ncRNA-DB system applies a normalization procedure to the alias names and all the other imported data (i.e. sentences). The normalization is applied during the import phase, as well as on every input interface (CLI, API and Cytoscape App) when the user inserts a name. Every data, except sentences, is transformed in upper case and the following regex replacing rules are applied:

fromto
-_
tabspace
commaspace
a series of spaces_
This also means that if you want to specify a name like "breast cancer", you can do that by typing "breast_cancer".

List of DataSources

The table lists the currently supported DataSources.

ncRNAD-DB identifier Resource name Resource version Example
HGNCHUGO Gene Nomenclature Committee-HGNC:1100
ENZYMEEnzyme-ENZYME:6.3.2.17
ENSEMBLEnsemble-ENSEMBL:ENSG00000012048
ENTREZ_GENEEntrez Gene-ENTREZ_GENE:672
REFSEQRefSeq-REFSEQ:NM_007294
CCDSConsensus CDS-CCDS:CCDS11454.1
VEGAVertebrate and Genome Annotation Project-VEGA:OTTHUMG00000157426
OMIM Online Mendelian Inheritance in Man-OMIM:113705
UNIPROTUniProt-UNIPROT:P38398
UCSCHGNC-UCSC:UC002ICT.3
RGDRat Genome Database-RGD:2218
MGDMouse Genome Database-MGD:104537
MIRBASE_ACCESSION_NUMBERmiRBase accession number-MIRBASE_ACCESSION_NUMBER:MIMAT0015042
NONCODE_V4NONCODEv4NONCODE_V4:NONHSAT048279
NONCODE_V3NONCODEv3NONCODE_v3:N125530
MIRTARBASEmicroRNA-target interactions database-MIRTARBASE:MIRT027793
GENERICGeneric name, symbol or alias-GENERIC:BRCA1, GENERIC:HSA_LET_7A, GENERIC:BREAST_CANCER
GENERIC_ACCESSION_NUMBERGeneric accession number-GENERIC_ACCESSION_NUMBER:U14680

NOTE: the fact that a resource is included as a DataSource does not mean that the data fromt he external resource are imported in ncRNA-DB. It simply means that there exists some alias referring to such external nomenclature.

The class DataSources is also used to store the information about the origin of imported interations. Further data sources identifiers that can be found for interactions are listed in the following table.

ncRNAD-DB identifier Resource name Resource version
MIRBASE_NAMEmiRBase-
LNCRNADBLong Noncoding RNA Database-
CIRC2TRAITScirc2Traits-
HMDDHuman MicroRNA Disease Database -
HMDD_2Human MicroRNA Disease Database v2.0
LNCRNADISEASELncRNADisease-
MIRANDOLA_1.6miRandola1.6
NPINTERNPInter-
NPINTER_2.0NPInterv2.0
STARBASE_2.0starBasev2.0

References

Vincenzo Bonnici, Francesco Russo, Nicola Bombieri, Alfredo Pulvirenti and Rosalba Giugno*
Comprehensive reconstruction and visualization of non-coding regulatory networks in human.
Methods, Front. Bioeng. Biotechnol. - Bioinformatics and Computational Biology, 2014.

Contacts

For helps and bugs please send an email to Rosalba Giugno giugno@dmi.unict.it and Vincenzo Bonnici vincenzo.bonnici@univr.it.

ncRNA-DB - The non-coding RNA human interaction database.