COEUS is focused on integration and interoperability.
These are the key goals that defined COEUS' development and its internal
organization.
For a simpler interpretation of COEUS internal structure, we thought about
organizing it according to a gardening metaphor.
The first thing to consider are single COEUS instances. These standalone applications are built according to a configuration file, in Javascript, and a setup file, in RDF, adopting COEUS Ontology.
Each COEUS instance integrates data in a mini-warehouse (or a large warehouse,
depending on the number of resources you are collecting!) and is called
a Seed.
Building a new seed involves three key steps:
Since all seeds publish acquired data by default, we can deploy multiple seeds and connect them easily, creating a knowledge federation layer: the Garden.
A single COEUS instance is a seed. This represents a standalone application, built from integrated resources, with a public API.
And you get multiple seeds working together... they bloom, creating a garden: a truly federated semantic knowledge network.
The following code snippets highlight how to configure data loading from the various supported formats. The examples include one resource connector and one or more selectors.
# Resource configuration for BBC Sport News feed coeus:resource_BBC coeus:endpoint "http://feeds.bbci.co.uk/sport/0/rss.xml"^^<&xsd;string>; coeus:extends coeus:concept_BBC; coeus:hasKey coeus:xml_BBC_id; coeus:isResourceOf coeus:concept_BBC; coeus:loadsFrom coeus:xml_BBC_id, coeus:xml_BBC_title, coeus:xml_description, coeus:xml_link; coeus:method "cache"^^<&xsd;string>; coeus:order "1"^^<&xsd;integer>; coeus:query "//item"^^<&xsd;string>; dc:publisher "xml"^^<&xsd;string>; dc:title "BBC"^^<&xsd;string>; a coeus:Resource, owl:NamedIndividual; rdfs:comment "Resource loader for BBC XML feeds."^^<&xsd;string>; rdfs:label "resource_bbc"^^<&xsd;string>. # Loading news identifiers from XML XPath queries coeus:xml_BBC_id coeus:isKeyOf coeus:resource_BBC; coeus:loadsFor coeus:resource_BBC; coeus:property "dc:identifier"^^<&xsd;string>; coeus:query "guid"^^<&xsd;string>; coeus:regex "[0-9]{5,}"^^<&xsd;string>; dc:title "BBC identifier"^^<&xsd;string>; a coeus:XML, owl:NamedIndividual; rdfs:label "xml_bbc_id"^^<&xsd;string>. # Loading news titles from XML XPath queries coeus:xml_BBC_title coeus:loadsFor coeus:resource_BBC; coeus:property "dc:title"^^<&xsd;string>; coeus:query "title"^^<&xsd;string>; dc:title "BBC entry title"^^<&xsd;string>; a coeus:XML, owl:NamedIndividual; rdfs:label "xml_bbc_title"^^<&xsd;string>. # Loading news descriptions from XML XPath queries coeus:xml_description coeus:loadsFor coeus:resource_BBC; coeus:property "dc:description"^^<&xsd;string>; coeus:query "description"^^<&xsd;string>; dc:title "entry description"^^<&xsd;string>; a coeus:XML, owl:NamedIndividual; rdfs:label "xml_description"^^<&xsd;string>. # Loading news links from XML XPath queries coeus:xml_link coeus:loadsFor coeus:resource_BBC; coeus:property "dc:publisher"^^<&xsd;string>; coeus:query "link"^^<&xsd;string>; dc:title "entry link"^^<&xsd;string>; a coeus:XML, owl:NamedIndividual; rdfs:label "xml_link"^^<&xsd;string>.
# Resource configuration for generic SQL resource coeus:&coeus;resource_SomeSQL coeus:endpoint "jdbc:mysql://localhost:3306/\"db_name\"?user=\"user\"&password=\"pwd\""^^<&xsd;string>; coeus:hasKey coeus:&coeus;sql_id; coeus:isResourceOf coeus:&coeus;concept_SQL; coeus:loadsFrom coeus:&coeus;sql_id; coeus:method "cache"^^<&xsd;string>; coeus:order "20"^^<&xsd;integer>; coeus:query "SELECT b AS entry FROM hummer WHERE rel=37"; dc:description "Resource connecting SQL information."^^<&xsd;string>; dc:publisher "sql"^^<&xsd;string>; dc:title "SomeSQL"^^<&xsd;string>; a coeus:&coeus;Resource, owl:NamedIndividual; rdfs:comment "Resource connecting SQL information."^^<&xsd;string>; rdfs:label "resource_somesql"^^<&xsd;string>. # Loading data identifiers from SQL query "entry" variable coeus:&coeus;sql_id coeus:isKeyOf coeus:&coeus;resource_SomeSQL; coeus:loadsFor coeus:&coeus;resource_SomeSQL; coeus:property "dc:title|dc:identifier"^^<&xsd;string>; coeus:query "entry"^^<&xsd;string>; dc:description "SQL identifier for data."^^<&xsd;string>; a coeus:&coeus;SQL, owl:NamedIndividual; rdfs:comment "SQL identifier for data."^^<&xsd;string>; rdfs:label "sql_id"^^<&xsd;string>.
# UniProt Resource configuration coeus:resource_UniProt coeus:endpoint "http://www.uniprot.org/uniprot/?query=breast+cancer+AND+taxonomy%3a%22Homo+sapiens+%5b9606%5d%22&force=yes&format=tab&columns=id,entry%20name,reviewed,protein%20names,genes,organism,length"^^<&xsd;string>; coeus:extends coeus:concept_UniProt; coeus:hasKey coeus:csv_UniProt_entry; coeus:isResourceOf coeus:concept_UniProt; coeus:loadsFrom coeus:csv_UniProt_entry, coeus:csv_UniProt_entryname, coeus:csv_UniProt_proteinname; coeus:method "cache"^^<&xsd;string>; coeus:order "0"^^<&xsd;integer>; dc:publisher "csv"^^<&xsd;string>; dc:title "UniProt"^^<&xsd;string>; a coeus:Resource, owl:NamedIndividual; rdfs:comment "UniProt data loader."^^<&xsd;string>; rdfs:label "resource_uniprot". # Loading UniProt accession entries from CSV column 0 coeus:csv_UniProt_entry coeus:isKeyOf coeus:resource_UniProt; coeus:loadsFor coeus:resource_UniProt; coeus:property "dc:identifier"^^<&xsd;string>; coeus:query "0"^^<&xsd;string>; dc:title "UniProt entry"^^<&xsd;string>; a coeus:CSV, owl:NamedIndividual; rdfs:label "csv_uniprot_entry"^^<&xsd;string>. # Loading UniProt entry names from CSV column 1 coeus:csv_UniProt_entryname coeus:loadsFor coeus:resource_UniProt; coeus:property "dc:title"^^<&xsd;string>; coeus:query "1"^^<&xsd;string>; dc:title "UniProt Entry Name"^^<&xsd;string>; a coeus:CSV, owl:NamedIndividual; rdfs:label "csv_uniprot_entryname"^^<&xsd;string>. # Loading UniProt proteint names from CSV column 3 coeus:csv_UniProt_proteinname coeus:loadsFor coeus:resource_UniProt; coeus:property "dc:description"^^<&xsd;string>; coeus:query "3"^^<&xsd;string>; dc:title "UniProt Protein Name"^^<&xsd;string>; a coeus:CSV, owl:NamedIndividual; rdfs:label "csv_uniprot_proteinname"^^<&xsd;string>.
# Mesh Json Resource configuration coeus:resource_mesh_cache coeus:endpoint "http://bioinformatics.ua.pt/diseasecard/api/triple/diseasecard:uniprot_P51587/coeus:isAssociatedTo/obj"^^<&xsd;string>; coeus:extends coeus:concept_mesh; coeus:hasKey coeus:json_mesh_type; coeus:isResourceOf coeus:concept_mesh; coeus:loadsFrom coeus:json_mesh_type; coeus:method "cache"^^<&xsd;string>; coeus:order "5"^^<&xsd;string>; coeus:query "$.results.bindings[*]"^^<&xsd;string>; dc:publisher "json"^^<&xsd;string>; dc:title "Resource mesh cache"^^<&xsd;string>; a coeus:Resource, owl:NamedIndividual; rdfs:comment "resource for mesh terms cache"^^<&xsd;string>; rdfs:label "resource_mesh_cache"^^<&xsd;string>. # Loading mesh id from Json coeus:json_mesh_id coeus:isKeyOf coeus:resource_mesh_cache_ext; coeus:loadsFor coeus:resource_mesh_cache_ext; coeus:property "dc:identifier"^^<&xsd;string>; coeus:query "$.obj.value"^^<&xsd;string>; coeus:regex "D[0-9]{6}"^^<&xsd;string>; dc:title "json mesh identifier"^^<&xsd;string>; a coeus:class_json, owl:NamedIndividual; rdfs:label "json_mesh_id"^^<&xsd;string>. # Loading mesh type from Json coeus:json_mesh_type coeus:loadsFor coeus:resource_mesh_cache; coeus:property "rdfs:comment"^^<&xsd;string>; coeus:query "$.obj.type"^^<&xsd;string>; dc:title "json mesh type"^^<&xsd;string>; a coeus:class_json, owl:NamedIndividual; rdfs:label "json_mesh_type"^^<&xsd;string>. # Loading mesh uri from Json coeus:json_mesh_uri coeus:loadsFor coeus:resource_mesh_complete; coeus:property "dc:description"^^<&xsd;string>; coeus:query "$.obj.value"^^<&xsd;string>; dc:title "json mesh uri"^^<&xsd;string>; a coeus:class_json, owl:NamedIndividual; rdfs:label "json_mesh_uri"^^<&xsd;string>.
# UniProt RDF Resource configuration coeus:resource_uniprot_rdf_complete coeus:endpoint "http://www.uniprot.org/uniprot/#replace#.rdf"^^<&xsd;string>; coeus:extends coeus:concept_uniprot; coeus:extension "dc:identifier"^^<&xsd;string>; coeus:isResourceOf coeus:concept_uniprot; coeus:method "complete"^^<&xsd;string>; coeus:order "12"; coeus:query "http://purl.uniprot.org/uniprot/"^^<&xsd;string>; dc:publisher "rdf"^^<&xsd;string>; dc:title "Resource Uniprot RDF complete"^^<&xsd;string>; a coeus:Resource, owl:NamedIndividual; rdfs:comment "resource uniprot for rdf data"^^<&xsd;string>; rdfs:label "resource_uniprot_rdf_complete"^^<&xsd;string>.
# UniProt LinkedData Resource configuration coeus:resource_uniprot_ld_complete coeus:endpoint "http://purl.uniprot.org/uniprot/#replace#"^^<&xsd;string>; coeus:extends coeus:concept_uniprot; coeus:extension "dc:identifier"^^<&xsd;string>; coeus:hasKey coeus:ld_uniprot_complete; coeus:isResourceOf coeus:concept_uniprot; coeus:loadsFrom coeus:ld_uniprot_complete; coeus:method "complete"^^<&xsd;string>; coeus:order "14"; dc:publisher "ld"^^<&xsd;string>; dc:title "Resource Uniprot Linked Data"^^<&xsd;string>; a coeus:Resource, owl:NamedIndividual; rdfs:comment "resource uniprot for linked data "^^<&xsd;string>; rdfs:label "resource_uniprot_ld_complete"^^<&xsd;string>. # Make the association coeus:ld_uniprot_complete coeus:isKeyOf coeus:resource_uniprot_ld_complete; coeus:loadsFor coeus:resource_uniprot_ld_complete; coeus:property "rdfs:seeAlso"^^<&xsd;string>; coeus:query ""^^<&xsd;string>; dc:title "linkeddata uniprot complete"^^<&xsd;string>; a coeus:LD, owl:NamedIndividual; rdfs:label "ld_uniprot_complete"^^<&xsd;string>.
# UniProt Resource configuration coeus:resource_UniProt coeus:endpoint "http://www.uniprot.org/uniprot/?query=breast+cancer+AND+taxonomy%3a%22Homo+sapiens+%5b9606%5d%22&force=yes&format=tab&columns=id,entry%20name,reviewed,protein%20names,genes,organism,length"^^<&xsd;string>; coeus:extends coeus:concept_UniProt; coeus:hasKey coeus:csv_UniProt_entry; coeus:isResourceOf coeus:concept_UniProt; coeus:loadsFrom coeus:csv_UniProt_entry, coeus:csv_UniProt_entryname, coeus:csv_UniProt_proteinname; coeus:method "cache"^^<&xsd;string>; coeus:order "0"^^<&xsd;integer>; dc:publisher "csv"^^<&xsd;string>; dc:title "UniProt"^^<&xsd;string>; a coeus:Resource, owl:NamedIndividual; rdfs:comment "UniProt data loader."^^<&xsd;string>; rdfs:label "resource_uniprot". # Loading UniProt accession entries from CSV column 0 coeus:csv_UniProt_entry coeus:isKeyOf coeus:resource_UniProt; coeus:loadsFor coeus:resource_UniProt; coeus:property "dc:identifier"^^<&xsd;string>; coeus:query "0"^^<&xsd;string>; dc:title "UniProt entry"^^<&xsd;string>; a coeus:CSV, owl:NamedIndividual; rdfs:label "csv_uniprot_entry"^^<&xsd;string>. # Loading UniProt entry names from CSV column 1 coeus:csv_UniProt_entryname coeus:loadsFor coeus:resource_UniProt; coeus:property "dc:title"^^<&xsd;string>; coeus:query "1"^^<&xsd;string>; dc:title "UniProt Entry Name"^^<&xsd;string>; a coeus:CSV, owl:NamedIndividual; rdfs:label "csv_uniprot_entryname"^^<&xsd;string>. # Loading UniProt proteint names from CSV column 3 coeus:csv_UniProt_proteinname coeus:loadsFor coeus:resource_UniProt; coeus:property "dc:description"^^<&xsd;string>; coeus:query "3"^^<&xsd;string>; dc:title "UniProt Protein Name"^^<&xsd;string>; a coeus:CSV, owl:NamedIndividual; rdfs:label "csv_uniprot_proteinname"^^<&xsd;string>.
# PDB Resource configuration coeus:resource_PDB coeus:endpoint "http://uniprot.org/uniprot/#replace#.xml"^^<&xsd;string>; coeus:extends coeus:concept_UniProt; coeus:hasKey coeus:xml_PDB_id; coeus:isResourceOf coeus:concept_PDB; coeus:loadsFrom coeus:xml_PDB_id; coeus:method "cache"^^<&xsd;string>; coeus:order "11"^^<&xsd;integer>; coeus:query "//entry"^^<&xsd;string>; dc:publisher "xml"^^<&xsd;string>; dc:title "PDB"^^<&xsd;string>; a coeus:Resource, owl:NamedIndividual; rdfs:comment "Resource connecting PDB information."^^<&xsd;string>; rdfs:label "resource_pdb"^^<&xsd;string>. # Loading PDB identifiers from UniProt's XML (with XPath) coeus:xml_PDB_id coeus:isKeyOf coeus:resource_PDB; coeus:loadsFor coeus:resource_PDB; coeus:property "dc:title|dc:identifier"^^<&xsd;string>; coeus:query "//dbReference[@type='PDB']/@id"^^<&xsd;string>; dc:publisher "xml"^^<&xsd;string>; dc:title "PDB id"^^<&xsd;string>; a coeus:XML, owl:NamedIndividual; rdfs:label "xml_pdb_id"^^<&xsd;string>.
For organising all resources and setting up a new seed, COEUS ontology comes to play.
To manage as various data organisations as possible, COEUS data structure is organised in a tree: Entity > Concept > Item.
Entities are classes for the upper data types, Concepts for a middle division and Items for individual level.
For example, to create a seed with transportation information we might have the following structure:
The abstraction layer used the instance configuration to specify the precise bits of information that will be translated into the knowledge graph.
During the abstraction process, data are triplified. This means that new triples are generated in real time matching the configured properties. Developers can define the CSV columns, SQL query or XPath evaluation results that will be mapped to a configurable predicate from any ontology.
One of COEUS' key liberating features (and of the Semantic Web) is the ability to use any ontology internally. Hence, we extend the existing COEUS ontology with well-known properties from external ontologies.
Property | Description | Mandatory | Domain | Range | Sample |
---|---|---|---|---|---|
coeus:endpoint |
Defines a Resource endpoint.
|
YES | Resource | String | file:///coeus.xml |
coeus:extension |
Defines the source element for a Resource extension query (where the data comes from). | NO | Resource | String | rdfs:label |
coeus:line |
Starting line for CSV resource import. | NO | Resource | String | 2 |
coeus:loaded |
Defines if a Seed, Entity, Concept is or is not built. | NO | Seed, Entity, Concept | Boolean | false |
coeus:method |
Defines a Resource integration method.
|
YES | Resource | String | cache |
coeus:order |
Defines the Resource integration order (ASC). | NO | Resource | Integer | 0 (first) |
coeus:property |
Property selector defining the predicates to where integrated will be loaded to. Multiple predicates can be separated with the pipe (|) delimiter. | YES | CSV, XML, SQL, SPARQL | String | dc:title |
coeus:query |
Query selector defining the queries to apply to integrated resources for data translation.
|
YES | Resource, CSV, XML, SQL, SPARQL | String | //item |
coeus:regex |
Defines regular expression to get Item individual identification token from integrated resources. | NO | CSV, XML, SQL, SPARQL | String | [0-9]{5,} |
dc:publisher |
Defines the Resource connector type: CSV, XML, JSON, RDF, TTL, SQL or SPARQL. The selectors' configuration will be loaded according to this value. | YES | Resource | String | sql |
dc:title |
All individuals must have a valid title. | YES | Any | String | COEUS |
rdfs:comment |
All individuals must have a valid comment/description. | YES | Any | String | COEUS is a new semantic web framework. |
rdfs:label |
All individuals must have a valid label. | YES | Any | String | seed_COEUS |
Property | Description | Mandatory | Domain | Range | Sample |
---|---|---|---|---|---|
coeus:extends |
A Resource extends a Concept. This means that the subject resource will select data for processing from the concept Item individuals. | YES | Resource | Concept | coeus:resource_UniProt coeus:extends coeus:concept_UniProt |
coeus:hasConcept |
An Item individual will belong (has) a certain Concept. | AUTO | Item | Concept | coeus:uniprot_P51587 coeus:hasConcept coeus:concept_UniProt |
coeus:hasEntity |
A Concept belongs to (has) an Entity. | YES | Concept | Entity | coeus:concept_UniProt coeus:hasEntity coeus:entity_Protein |
coeus:hasKey |
A Resource has a given key selector. The key selector will be used to generate unique Item individual identifications (i.e. URIs) from the integrated resources. | YES | Resource | CSV, XML, SQL, SPARQL | coeus:resource_UniProt coeus:hasKey coeus:csv_UniProt_id |
coeus:hasResource |
A Concept is related to (has) a Resource. This defines the Resource individual connector that loads the data (generating items) for a given Concept. | YES | Concept | Resource | coeus:concept_UniProt coeus:hasResource coeus:resource_UniProt |
coeus:includes |
Defines which Entity individuals are included in Seed individuals. | YES | Seed | Entity | coeus:seed_COEUS coeus:includes coeus:entity_Protein |
coeus:isAssociatedTo |
Default association between two Item individuals loaded through Resource extensions (in the dependency graph). | AUTO | Item | Item | coeus:uniprot_P51587 coeus:isAssociatedTo coeus:hgnc_BRCA2 |
coeus:isConceptOf |
A concept individual is the Concept of multiple Item individuals. | AUTO | Concept | Item | coeus:concept_HGNC coeus:isConceptOf coeus:hgnc_BRCA2 |
coeus:isConnectedTo |
Default association between two Concept individuals. If two concepts are connected, then all their children items will be connected. | NO | Concept | Concept | coeus:concept_UniProt coeus:isConnectedTo coeus:concept_HGNC |
coeus:isEntityOf |
An individual entity is the Entity of one or more Concept individuals. | YES | Entity | Concept | coeus:entity_Protein coeus:isEntityOf coeus:concept_UniProt |
coeus:isExtendedBy |
An individual concept is extended by one or more Resource individuals (the connectors that load the data for the Concept). | YES | Concept | Resource | coeus:concept_UniProt coeus:isExtendedBy coeus:resource_HGNC |
coeus:isIncludedIn |
An entity individual is included in a seed. | YES | Entity | Seed | coeus:entity_Protein coeus:isIncludedIn coeus:seed_COEUS |
coeus:isKeyOf |
A selector individual act as the loading key for generating unique Item individuals identification (URIs) to a given resource. | YES | CSV, XML, SQL, SPARQL | Resource | coeus:csv_UniProt_id coeus:isKeyOf coeus:resource_UniProt |
coeus:isResourceOf |
A resource individual is the Resource connector for the given Concept individual. | YES | Resource | Concept | coeus:resource_UniProt coeus:isResourceOf coeus:concept_UniProt |
coeus:loadsFor |
A CSV, XML, SQL or SPARQL connector loads data for a Resource. | YES | CSV, XML, SQL, SPARQL | Resource | coeus:csv_UniProt_id coeus:loadsFor coeus:resource_UniProt |
coeus:loadsFrom |
A Resource loads data from a CSV, XML, SQL or SPARQL connector. | YES | Resource | CSV, XML, SQL, SPARQL | coeus:resource_UniProt coeus:loadsFrom coeus:csv_UniProt_id |
[1] If query is not provided it will be tested some popular delimiters (such '\t', ';', .. ) with the default values for the quotes delimiter ('"') and for the headers skip number (1).
How are your Javadoc reading skills?
The Java documentation is pretty self-explanatory. For interacting with the knowledge base, use the API class to add new triples and perform SPARQL queries directly.
/* Invoke getTriple(...); */ pt.ua.bioinformatics.api.API.getTriple(?coeus:uniprot_P51587?, ?p?, ?o?, ?xml?);
/* Invoke select(...); */ pt.ua.bioinformatics.api.API.select("SELECT ...", "js", false);
/* Invoke addStatement(...); */ pt.ua.bioinformatics.api.API.addStatement(Boot.getAPI().createResource(PrefixFactory.decode(sub)), Predicate.get(pred), Boot.getAPI().createResource(PrefixFactory.decode(obj)));
COEUS includes multiple factories to perform quick transformation between URIs, concepts, strings, prefixes, etc. We use these static methods throughout the entire framework, thus facilitating string-based data conversions.
The prefix factory is utility class for Prefix information and transformations. This class stores an internal prefix map, enabling quick composition or decomposition of full object URIs.
/* Invoke getURIForPrefix(...); */ pt.ua.bioinformatics.api.PrefixFactory.getURIForPrefix("rdfs");
/* Invoke encode(...); */ pt.ua.bioinformatics.api.PrefixFactory.encode("http://bioinformatics.ua.pt/coeus/resource/Item");
The item factory is a utility class for Item transformation tasks, such as getting an identifier from the full individual item label.
/* Invoke getTokenFromItem...); */ pt.ua.bioinformatics.api.ItemFactory.getTokenFromItem("http://bioinformatics.ua.pt/coeus/resource/omim_143100");
To access all triples in COEUS Semantic Storage, you can combine subjects, objects or predicates wildcards to iteratively get data. The wildcards' usage is highlighted in the following table.
Element | Description | Sample |
---|---|---|
subject | The existing or matching subject. | coeus:uniprot_P51582 |
predicate | The existing or matching predicate. | coeus:isAssociatedTo |
object | The existing or matching object. | obj |
format | The output format. | csv |
Some data output examples are:
To delete triples in COEUS Semantic Storage, you must combine the subject, predicate and object wildcard to iteratively remove the data. The delete API URL structure is:
Element | Description | Sample |
---|---|---|
API key | Value for the seed access API key (defined in config.js). | coeus |
subject | The existing subject. | coeus:uniprot_P51582 |
predicate | The existing predicate. | coeus:isAssociatedTo |
object | The existing object. | coeus:go_GO:0033593 |
Some examples are:
COEUS write API provides a simple URL to add new triples to a seed's knowledge base. This enables also writing sets of triples by being invoked recursively. The write API URL structure is:
Element | Description | Sample |
---|---|---|
API key | Value for the seed access API key (defined in config.js). | coeus |
subject | The subject to write. | coeus:uniprot_P51582 |
predicate | The predicate to write. | coeus:isAssociatedTo |
object | The object to write. | coeus:go_GO:0033593 |
Some examples are:
COEUS update API provides a straightforward URL to update exiting triples in the knowledge base. The main difference between the delete or write API calls is the adding of the new object separated by a comma:
Element | Description | Sample |
---|---|---|
API key | Value for the seed access API key (defined in config.js). | coeus |
subject | The existing subject. | coeus:uniprot_P51582 |
predicate | The existing predicate. | coeus:isAssociatedTo |
old_object,new_object | Combination of the existing object (old_object) and the new one (new_object). | coeus:go_GO:0033593, coeus:pdb_1N0W |
Some examples include:
For XSD types just append the datatype (xsd:int, xsd:string,..) at the start of the literal:
The write/delete/update REST API returns a JSON object with the server response. The status field of that object contains a numeric value with the write operation output.
All data collected in a COEUS instance can be accessed through a SPARQL endpoint and taking advantage of SPARQL's advanced querying features.
The endpoint default location is at /sparql.
PREFIX coeus: <http://bioinformatics.ua.pt/coeus/resource/> SELECT ?p ?o {coeus:uniprot_P51587 ?p ?o}
With the SPARQL endpoint online, querying multiple distributed COEUS instances is a straightforward process. Moreover, additional knowledge bases with public SPARQL endpoints can also be put into the mix, providing an holistic perspective over a distributed knowledge network.
PREFIX coeus: <http://bioinformatics.ua.pt/coeus/resource/> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX diseasome: <http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseasome/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX coeus: <http://bioinformatics.ua.pt/coeus/> SELECT ?pdb ?mesh WHERE{ { SERVICE <http://www4.wiwiss.fu-berlin.de/diseasome/sparql> { <http://www4.wiwiss.fu-berlin.de/diseasome/resource/genes/BRCA2> rdfs:label ?label } } { SERVICE <http://bioinformatics.ua.pt/coeus/sparql> { _:gene dc:title ?label. _:gene coeus:isAssociatedTo ?uniprot } } { SERVICE <http://bioinformatics.ua.pt/coeus/sparql> { ?uniprot coeus:isAssociatedTo ?pdb. ?pdb coeus:hasConcept coeus:concept_PDB } } } }
COEUS publishes all data through LinkedData patterns & guidelines by default. With pubby included in all COEUS seeds, data is easily available to external appications.
Using the ../resource/* pattern, you can access object data in the web or RDF browsers.
Easily perform SPARQL queries to your COEUS-generated endpoint with this new library.
Check the documentation and the library at ../assets/js/coeus.sparql.js.
COEUS Write/Update/Delete API can be easily accessed in Javascript.
Check the documentation and sample at ../assets/js/coeus.api.js.
COEUS has now the capability to share your data in the Nanopublication format. With this new plugin you can transform your integrated data in this prominent format by following the next steps:
Visit website
December 9-12, 2013 Edinburgh, United Kingdom
In this tutorial/hands-on session, we will guide you through the process of creating your own custom COEUS knowledge base. You will learn how to:
In this first tutorial we will build a semantic knowledge base aggregating data from multiple sources. The tutorial configuration and datasets are provided
on the default COEUS package. These include two examples: 1) COEUS News Aggregator (newsaggregator) and 2) COEUS Protein Integrator (proteinator).
In the first example COEUS aggregates data from multiple news sources, which are available in RSS/XML format, making
it fairly easy to load and process. Moreover, the majority of online news outlets provides access to all the news through a RSS feed.
For the second example, COEUS loads imports data from multiple protein-related resources, creating our targeted proteomics knowledge base. Protein entries are created for
resources from UniProt, PDB, Prosite and InterPro. UniProt entries are also associated with Gene Ontology terms and HGNC genes.
Since most of the tasks are very similar for both problems, this tutorial adequatly highlights the places where the options differ from the news aggregtor to the proteinator.
To create these applications, we will proceed as follows:
To launch our new seed we start by downloading the COEUS package from GitHub or checking out the source code into a local installation. Further
information regarding what's included in COEUS' package can be seen in the downloads section of this documentation.
For simplicity purposes, COEUS is provided as a NetBeans web project ready to be open. Hence, we just open the folder on our local NetBeans installation and setup the correct
library references and select the instance server. By default, the new COEUS seed will run on a /coeus/
application path, but we can easily change it in the project properties.
To finish our initial COEUS setup, we just need to create a new database and database user to be used as the triplestore backend by Jena. In this case, we will create a new database
called coeus
, and a new user also called demo
(with password demo
) with enough permissions to read and write in the database.
Customizing the seed configuration is the most cumbersome task for COEUS deployment. The seed configuration defines the internal knowledge base structure, the application model,
the external resources being loaded and how the heterogeneous data are integrated. To simplify this task, the use of Protege is advised.
A sample seed configuration file is provided for each scenario. These scenarios are further detailed next.
src/java/newsaggregator/setup.rdf
src/java/proteinator/setup.rdf
The first step is to define how our news integration model will map to COEUS' ontology. COEUS' ontology revolves a tree-based structure, Entity-Concept-Item, which will be used to organize our data collection in the knowledge base. For this scenario, we want to have a set of news organized according to their original source (an RSS/XML feed). We will use four sports journals: the international Reuters sports section, the british BBC sports section, the spanish Marca journal and the portuguese A Bola journal. For this matter, we will have the following structure:
<owl:NamedIndividual rdf:about="http://bioinformatics.ua.pt/coeus/resource/entity_News"> <rdf:type rdf:resource="http://bioinformatics.ua.pt/coeus/resource/Entity"/> <rdfs:label rdf:datatype="&xsd;string">entity_news</rdfs:label> <dc:title rdf:datatype="&xsd;string">News</dc:title> <rdfs:comment rdf:datatype="&xsd;string">News entity for COEUS News Aggregator</rdfs:comment> <isIncludedIn rdf:resource="http://bioinformatics.ua.pt/coeus/resource/seed_coeusna"/> </owl:NamedIndividual>
<owl:NamedIndividual rdf:about="http://bioinformatics.ua.pt/coeus/resource/concept_Reuters"> <rdf:type rdf:resource="http://bioinformatics.ua.pt/coeus/resource/Concept"/> <rdfs:label rdf:datatype="&xsd;string">concept_reuters</rdfs:label> <dc:title rdf:datatype="&xsd;string">Reuters</dc:title> <hasEntity rdf:resource="http://bioinformatics.ua.pt/coeus/resource/entity_News"/> <isExtendedBy rdf:resource="http://bioinformatics.ua.pt/coeus/resource/resource_Reuters"/> <hasResource rdf:resource="http://bioinformatics.ua.pt/coeus/resource/resource_Reuters"/> </owl:NamedIndividual> <owl:NamedIndividual rdf:about="http://bioinformatics.ua.pt/coeus/resource/concept_BBC"> <rdf:type rdf:resource="http://bioinformatics.ua.pt/coeus/resource/Concept"/> <rdfs:label rdf:datatype="&xsd;string">concept_bbc</rdfs:label> <dc:title rdf:datatype="&xsd;string">BBC</dc:title> <hasEntity rdf:resource="http://bioinformatics.ua.pt/coeus/resource/entity_News"/> <hasResource rdf:resource="http://bioinformatics.ua.pt/coeus/resource/resource_BBC"/> <isExtendedBy rdf:resource="http://bioinformatics.ua.pt/coeus/resource/resource_BBC"/> </owl:NamedIndividual>
resource_BBC
)
and the associated selectors (targeted xml_BBC_id
, xml_BBC_title
, and the generic xml_description
).
to integrate data for the BBC concept.
<owl:NamedIndividual rdf:about="http://bioinformatics.ua.pt/coeus/resource/resource_BBC"> <rdf:type rdf:resource="http://bioinformatics.ua.pt/coeus/resource/Resource"/> <rdfs:label rdf:datatype="&xsd;string">resource_bbc</rdfs:label> <query rdf:datatype="&xsd;string">//item</query> <order rdf:datatype="&xsd;integer">1</order> <dc:title rdf:datatype="&xsd;string">BBC</dc:title> <rdfs:comment rdf:datatype="&xsd;string">Resource loader for BBC XML feeds.</rdfs:comment> <method rdf:datatype="&xsd;string">cache</method> <endpoint rdf:datatype="&xsd;string">http://feeds.bbci.co.uk/sport/0/rss.xml</endpoint> <dc:publisher rdf:datatype="&xsd;string">xml</dc:publisher> <extends rdf:resource="http://bioinformatics.ua.pt/coeus/resource/concept_BBC"/> <isResourceOf rdf:resource="http://bioinformatics.ua.pt/coeus/resource/concept_BBC"/> <hasKey rdf:resource="http://bioinformatics.ua.pt/coeus/resource/xml_BBC_id"/> <loadsFrom rdf:resource="http://bioinformatics.ua.pt/coeus/resource/xml_BBC_id"/> <loadsFrom rdf:resource="http://bioinformatics.ua.pt/coeus/resource/xml_BBC_title"/> <loadsFrom rdf:resource="http://bioinformatics.ua.pt/coeus/resource/xml_description"/> <loadsFrom rdf:resource="http://bioinformatics.ua.pt/coeus/resource/xml_link"/> <loadsFrom rdf:resource="http://bioinformatics.ua.pt/coeus/resource/xml_date"/> </owl:NamedIndividual> <owl:NamedIndividual rdf:about="http://bioinformatics.ua.pt/coeus/resource/xml_BBC_id"> <rdf:type rdf:resource="http://bioinformatics.ua.pt/coeus/resource/XML"/> <rdfs:label rdf:datatype="&xsd;string">xml_bbc_id</rdfs:label> <dc:title rdf:datatype="&xsd;string">BBC identifier</dc:title> <regex rdf:datatype="&xsd;string">[0-9]{5,}</regex> <property rdf:datatype="&xsd;string">dc:identifier</property> <query rdf:datatype="&xsd;string">guid</query> <loadsFor rdf:resource="http://bioinformatics.ua.pt/coeus/resource/resource_BBC"/> <isKeyOf rdf:resource="http://bioinformatics.ua.pt/coeus/resource/resource_BBC"/> </owl:NamedIndividual> <owl:NamedIndividual rdf:about="http://bioinformatics.ua.pt/coeus/resource/xml_BBC_title"> <rdf:type rdf:resource="http://bioinformatics.ua.pt/coeus/resource/XML"/> <rdfs:label rdf:datatype="&xsd;string">xml_bbc_title</rdfs:label> <dc:title rdf:datatype="&xsd;string">BBC entry title</dc:title> <property rdf:datatype="&xsd;string">dc:title</property> <query rdf:datatype="&xsd;string">title</query> <loadsFor rdf:resource="http://bioinformatics.ua.pt/coeus/resource/resource_BBC"/> </owl:NamedIndividual> <owl:NamedIndividual rdf:about="http://bioinformatics.ua.pt/coeus/resource/xml_description"> <rdf:type rdf:resource="http://bioinformatics.ua.pt/coeus/resource/XML"/> <rdfs:label rdf:datatype="&xsd;string">xml_description</rdfs:label> <property rdf:datatype="&xsd;string">dc:description</property> <query rdf:datatype="&xsd;string">description</query> <dc:title rdf:datatype="&xsd;string">entry description</dc:title> <loadsFor rdf:resource="http://bioinformatics.ua.pt/coeus/resource/resource_ABola"/> <loadsFor rdf:resource="http://bioinformatics.ua.pt/coeus/resource/resource_BBC"/> <loadsFor rdf:resource="http://bioinformatics.ua.pt/coeus/resource/resource_Marca"/> <loadsFor rdf:resource="http://bioinformatics.ua.pt/coeus/resource/resource_Reuters"/> </owl:NamedIndividual>
With similar models defined for all the resources that will be integrated, we are now ready to configure our seed and get it ready for deployment.
With the project code and the database setup in place, we can customize the configuration for each of the components provided in COEUS' package. We need to provide custom configurations for:
In spite of Semantic Web's "reuse instead of rewrite" motto, in more complex scenarios we must create our own ontologies to deal with all the specificities of the new systems we are developing.
This is not needed for these tutorials, where we will reuse existing ontologies such as Dublin Core or the Resource Description Framework Schema. Furthermore, COEUS ontology includes a broad
number of object and data properties to further enhance our data integration efforts.
For now, we will stick to using COEUS' ontology, available at http://bioinformatics.ua.pt/coeus/ontology/
.
The seed integration setup file was configured previously in Protege, defining the internal seed structure, the external resources to load, and the set of connectors and selectors for each integrated concept.
As mentioned, the setup files for each example are as follows.
src/java/newsaggregator/setup.rdf
src/java/proteinator/setup.rdf
The seed configuration file, src/config.js
, stores the main application properties, setting these definitios for usage during the entire seed workflow.
{ "config": { "name": "coeus.NA", "description": "COEUS News Aggregator", "keyprefix":"coeus", "version": "1.0a", "ontology": "http://bioinformatics.ua.pt/coeus/ontology/", "setup": "newsaggregator/setup.rdf", "sdb":"newsaggregator/sdb.ttl", "predicates":"newsaggregator/predicates.csv", "built": false, "debug": true, "environment": "production" }, "prefixes" : { "coeus": "http://bioinformatics.ua.pt/coeus/resource/", "owl2xml":"http://www.w3.org/2006/12/owl2-xml#", "xsd": "http://www.w3.org/2001/XMLSchema#", "rdfs": "http://www.w3.org/2000/01/rdf-schema#", "owl": "http://www.w3.org/2002/07/owl#", "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#", "dc": "http://purl.org/dc/elements/1.1/", } }
The COEUS package includes two sample configuration files, each for its tutorial.
src/java/newsaggregator/config.js
src/java/proteinator/config.js
Property | Description | Sample |
---|---|---|
config.name |
The seed default name. | proteinator |
config.description |
A sample seed description. | COEUS Protein Data Aggregator (Proteinator) |
config.keyprefix |
The default prefix to be used throughout the seed knowledge base. Should be set to the seed ontology prefix. | coeus |
config.version |
The application version. | 1.0a |
config.ontology |
Valid URI for the base seed ontology location. | http://bioinformatics.ua.pt/coeus/ontology/ |
config.setup |
Seed setup file location (relative to project base). | proteinator/setup.rdf |
config.sdb |
Jena SDB configuration file base location (relative to project base). This filename will be prepended to the working environment. | proteinator/sdb.ttl |
config.predicates |
Predicates file location (relative to project base). The predicates file is a unique text-file including the list of all the predicates to use in the COEUS seed (one per line). | proteinator/predicates.csv |
config.apikey |
String set for defining the valid API keys for client applications. API keys are basic strings, delimited by |. API keys are used in services with write access to the knowledge base to prevent abuse. Using * in this property will validate all values. |
coeus|sdjkfhs8374 |
config.built |
Defines if the seed has been built or not (must be set to built once the knowledge base has been populated). | true |
config.debug |
Defines if the debugging mode is on. With debug true the application output is more verbose. | true |
config.environment |
Sets the environment variable. Appended to the SDB configuration file location (with _). This allows for multiple environment settings, for production, testing... | production |
prefixes |
Defines the list of ontology prefixes being used in the seed. | See above |
Web application settings are Tomcat-wide settings for our server. The complex definitions are already configured, to launch a new seed we just need to customize
the application description and the location of the Joseki and Pubby libraries.
The following table details the properties that can be configured.
Property | Description | Sample |
---|---|---|
<description> |
Application description for Tomcat server. | COEUS: Semantic Web Application Framework |
<display-name> |
Application name for Tomcat server. | COEUS |
<servlet> org.joseki.rdfserver.config |
Joseki configuration file location (relative to server base). | proteinator/joseki.ttl |
<context-param> config-file |
Pubby configuration file location (relative to source base). | classes/proteinator/pubby.ttl |
src/java/newsaggregator/sdb_production.ttl
src/java/proteinator/sdb_production.ttl
src/java/newsaggregator/joseki.ttl
src/java/proteinator/joseki.ttl
src/java/newsaggregator/pubby.ttl
src/java/proteinator/pubby.ttl
With all the files configuration set up in COEUS we are now ready to start importing data into our own knowledge base. This is an automated process and to do this we simply need to execute a single Java method.
The main process will boot the system and, if the application is not build, load the data into the knowledge base.
The application startup/loading process works as follows:
config.js
pt.ua.bioinformatics.coeus.common
package.
// Start build process Boot.start();
// Import single resource (threaded) example SingleImport single = new SingleImport("resource_go"); Thread t = new Thread(single); t.start();
Once data are completely loaded in the seed, we just need to update the application settings and deploy the seed in Tomcat server.
To set the application to server mode, the following config.js
configuration properties must be changed:
config.built
config.debug
With these changes, the application is ready for deployment.
With our COEUS seed online we can use any of the API methods to access integrated data.
Next, we have two quick SPARQL queries to obtain the data from the seeds configured in both tutorial scenarios.
PREFIX coeus: <http://bioinformatics.ua.pt/coeus/resource/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX diseasecard: <http://bioinformatics.ua.pt/diseasecard/resource/> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?item ?title ?description { ?item a coeus:Item . ?item dc:title ?title . ?item dc:description ?description . ?item dc:date ?date }
PREFIX coeus: <http://bioinformatics.ua.pt/coeus/resource/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX diseasecard: <http://bioinformatics.ua.pt/diseasecard/resource/> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?item ?title ?description { ?item a coeus:Item . ?item dc:title ?title . ?item dc:description ?description . ?item dc:date ?date }
Or you can fork the latest stable version from GitHub.
Despite COEUS' complexity, you can start your new seed with just a few external tools.
<user name="your_name" password="your_password" roles="manager-gui,manager-script" />
COEUS' code is organised in a traditional Maven Java Web application file structure.
pom.xml
: maven configuration file with the libraries dependenciessrc/main
: Java source code and configuration files
resources
: configuration files and examplesjava
: location for main Java source webapp
: web application source code
assets
: location for CSS (css), Javascript (js) and image (img) filesjavadoc
: COEUS and libraries generated Javadoc documentationMETA-INF
: context information for Tomcat deploymentontology
: COEUS' ontology documentationWEB-INF
: location for web application configuration file
templates
: location for pubby templatesCOEUS follows a "Semantic Web in a box" approach. This means that all the required components to kickstart a new application are available by default in the COEUS package.
Sample setup for integrating data from RSS/XML news feeds.
Configuration files available on src/main/resources/newsaggregator
.
Sample setup from multiple proteomics sources. Starting with a list of protein entries (in CSV), loads data for genes from HGNC and protein mappings to Gene Ontology, InterPro, PROSITE and PDB from UniProt's database.
This example highlights how we can combine data from multiple heterogeneous resource using a single COEUS seed configuration.
Configuration files available on src/main/resources/proteinator
.
Sample setup to test all different connection sources (CSV, XML, JSON, RDF, TTL, SQL and SPARQL).
Configuration files available on src/main/resources/tester
.
COEUS is an ongoing open-source project at the University of Aveiro's bioinformatics group.
If you are looking for support to launch your own system, please contact us.
Private/commercial collaborations are also possible through BMD Software. Support agreements are made on a per-project perspective. For a tentative budget, feel free to contact us.
COEUS: "Semantic Web in a box" for biomedical applications
Pedro Lopes & José Luís Oliveira
DOI: 10.1186/2041-1480-3-11
Notice: Please use this reference when citing COEUS in your work.
Visit website
December 9-12, 2013 Edinburgh, United Kingdom
Intro
Since 2008, the SWAT4LS Workshop (http://www.swat4ls.org) has provided a platform for the presentation and discussion of the benefits and limits of applying web-based information systems and semantic technologies in the domains of health care and life sciences. The next edition of SWAT4LS will be held in Edinburgh, UK, December 9-12, 2013, preceded by tutorials and followed by hackathon / model-a-thin. All information can be found at http://www.swat4ls.org/workshops/edinburgh2013/.
The COEUS Platform
On the tutorial day, the COEUS - Semantic Web Application Framework will be presented. This platform targets the quick creation of new biomedical applications. The framework combines the latest Semantic Web technologies with Rapid Application Development ideals to provide, in a single package, the required tools and algorithms to build a new semantic web information system from scratch.
In this tutorial/hands-on session, we will guide you through the process of creating your own custom COEUS knowledge base. You will learn how to:
Visit website
December 9th, 2011 London, United Kingdom
COEUS: A Semantic Web Application Framework
Pedro Lopes & José Luís Oliveira
DOI: 10.1145/2166896.2166915
Visit website
October 28th, 2011 Glasgow, Scotland
A semantic web application framework for health systems interoperability
Pedro Lopes & José Luís Oliveira
DOI: 10.1145/2064747.2064768
Visit website
September 7 - 9, 2011 Graz, Austria
Towards knowledge federation in biomedical applications
Pedro Lopes & José Luís Oliveira
DOI: 10.1145/2063518.2063530
COEUS by University of Aveiro is licensed under a Creative Commons Attribution 3.0 Unported License. Based on a work at http://bioinformatics.ua.pt/coeus/.
Attribution: You must attribute the work wether through an acknowledgement/disclaimer visible in your web page or through the provided Powered by COEUS code.
If you wish to spread the word about COEUS, feel free to add the following code snippet to your website. Place it at the end of your body tag, along with all the other scripts. It will create an almost invisible link to COEUS' web page.
<script src="http://bioinformatics.ua.pt/coeus/assets/js/coeus.powered.js" type="text/javascript"></script>