SPARQL example query
genetic_disease_related_proteins: List all UniProt proteins annotated to be related to a genetic disease.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX up: <http://purl.uniprot.org/core/> SELECT ?uniprot ?disease ?diseaseComment ?mim WHERE { GRAPH <http://sparql.uniprot.org/uniprot> { ?uniprot a up:Protein ; up:annotation ?diseaseAnnotation . ?diseaseAnnotation up:disease ?disease . } GRAPH <http://sparql.uniprot.org/diseases> { ?disease a up:Disease ; rdfs:comment ?diseaseComment . OPTIONAL { ?disease rdfs:seeAlso ?mim . ?mim up:database <http://purl.uniprot.org/database/MIM> . } } }Usemnemonic_also_known_as_id: List all UniProt protein ID (mnemonic) for current UniProt entries.
PREFIX up: <http://purl.uniprot.org/core/> SELECT ?uniprot ?mnemonic WHERE { GRAPH <http://sparql.uniprot.org/uniprot> { ?uniprot a up:Protein ; up:mnemonic ?mnemonic . } }Useobsolete_mnemonic_also_known_as_id: List all UniProt protein ID (mnemonic) that where used in the past for current UniProt entries.
PREFIX up: <http://purl.uniprot.org/core/> SELECT ?uniprot ?obsoleteMnemonic WHERE { GRAPH <http://sparql.uniprot.org/uniprot> { ?uniprot a up:Protein ; up:oldMnemonic ?obsoleteMnemonic . } }Userhea_reactions_annotated_as_experimental_and_only_small_molecules: Find all rhea (only small molecule) that are used in UniProt where the annotation has a paper and is tagged having experimental evidence.
PREFIX ECO: <http://purl.obolibrary.org/obo/ECO_> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX rh: <http://rdf.rhea-db.org/> PREFIX up: <http://purl.uniprot.org/core/> SELECT ?rhea ?catalyzedReaction ?source WHERE { { SELECT DISTINCT ?rhea WHERE { GRAPh<https://sparql.rhea-db.org/rhea> { ?rhea rdfs:subClassOf rh:Reaction . ?rhea rh:side/rh:contains/rh:compound ?compound2 . ?uc rdfs:subClassOf rh:Compound . } ?compound2 rdfs:subClassOf ?uc . BIND(IF(?uc = rh:SmallMolecule, 0, 1) AS ?c) } GROUP BY ?rhea HAVING (SUM(?c) = 0) } ?catalyzedReaction up:catalyzedReaction ?rhea . ?reif rdf:object ?catalyzedReaction ; up:attribution ?attr . ?attr up:evidence ECO:0000269 ; up:source ?source . ?source a up:Citation . }Userhea_reactions_associated_with_ec_in_uniprotkb: List Rhea reactions associated with an EC (enzyme classification).
PREFIX up: <http://purl.uniprot.org/core/> SELECT ?rhea ?EC WHERE { ?CatalyticActivity up:catalyzedReaction ?rhea ; up:enzymeClass ?EC . }Userhea_reactions_not_associated_with_ec_in_uniprotkb: List Rhea reactions thar are not associated with an EC (enzyme classification).
PREFIX up: <http://purl.uniprot.org/core/> SELECT ?rhea ?EC WHERE { ?CatalyticActivity up:catalyzedReaction ?rhea . MINUS { ?CatalyticActivity up:enzymeClass ?EC . } }Usetaxonomy_hierarchy: Find all taxonomic records that describe species of the genus Homo.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX taxon: <http://purl.uniprot.org/taxonomy/> PREFIX up: <http://purl.uniprot.org/core/> SELECT ?species ?genus WHERE { BIND(taxon:9605 AS ?genus) ?species a up:Taxon ; up:rank up:Species ; rdfs:subClassOf ?genus . ?genus a up:Taxon ; up:rank up:Genus . }Usetaxonomy_host: Find taxon records that are known to have part of their life cycle in other organisms (e.g. parasite, symbiont, infection)
PREFIX up: <http://purl.uniprot.org/core/> SELECT ?virus ?host WHERE { ?virus up:host ?host . }Usetaxonomy_rank_and_scientific_name: Retrieve the rank and the scientific name of an taxonomic record. Not all taxonomic records have a rank associated with them.
PREFIX up: <http://purl.uniprot.org/core/> SELECT ?taxon ?scientificName ?rank WHERE { ?taxon a up:Taxon ; up:scientificName ?scientificName . OPTIONAL { ?taxon up:rank ?rank } }Usetaxonomy_with_at_least_one_swissprot: Find taxon records for which one reviewed UniProtKB (Swiss-Prot) entry exists
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX taxon: <http://purl.uniprot.org/taxonomy/> PREFIX up: <http://purl.uniprot.org/core/> SELECT DISTINCT ?taxid ?scientificName ?domain ?domainName WHERE { ?uniprot a up:Protein . # reviewed entries ?uniprot up:reviewed true . ?uniprot up:organism ?taxid . ?taxid up:scientificName ?scientificName . VALUES ?domain { taxon:2 # bacteria taxon:2157 # archaea taxon:2759 # eukaryota taxon:10239 #viruses } . ?taxid rdfs:subClassOf ?domain . }Useuniparc_linked_to_active_uniprot: Show for a given UniParc accessions which active UniProt entries have the same amino acid sequence
PREFIX up: <http://purl.uniprot.org/core/> SELECT ?uniparc ?uniprot WHERE { GRAPH <http://sparql.uniprot.org/uniparc>{ BIND(<http://purl.uniprot.org/uniparc/UPI000002DB1C> AS ?uniparc) ?uniparc up:sequenceFor ?uniprot . } GRAPH <http://sparql.uniprot.org/uniprot> { ?uniprot a up:Protein . } }Useuniparc_triples_directly_associated: Predicates and objects, for a given UniParc accession as a subject
PREFIX up: <http://purl.uniprot.org/core/> SELECT ?predicate ?object WHERE { <http://purl.uniprot.org/uniparc/UPI000012A0AD> ?predicate ?object }Useuniprot_affected_by_metabolic_diseases_using_MeSH: Proteins annotated in UniProtKB to be affected by metabolic diseases. Using the MeSH concept as a root to find metabolic diseases in UniProt.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX up: <http://purl.uniprot.org/core/> SELECT ?disease ?protein WHERE { SERVICE<https://id.nlm.nih.gov/mesh/sparql> { GRAPH <http://id.nlm.nih.gov/mesh> { # Mesh M0013493 is a meaningless gathering about the concept 'Metabolic Diseases' ?mesh <http://id.nlm.nih.gov/mesh/vocab#broaderDescriptor>* ?broader . ?broader <http://id.nlm.nih.gov/mesh/vocab#preferredConcept> <http://id.nlm.nih.gov/mesh/M0013493> . } } GRAPH <http://sparql.uniprot.org/diseases>{ ?disease a up:Disease ; rdfs:seeAlso ?mesh . ?mesh up:database <http://purl.uniprot.org/database/MeSH> . } GRAPH <http://sparql.uniprot.org/uniprot> { ?protein up:annotation/up:disease ?disease . } }Useuniprot_alternative_protein_full_name: Alternative protein full names for UniProtKB entries
PREFIX up: <http://purl.uniprot.org/core/> SELECT ?protein ?fullName WHERE { ?protein a up:Protein ; up:alternativeName ?recommendedName . ?recommendedName up:fullName ?fullName . }Useuniprot_bioregistry_iri_translation: Translate the global unique identifier for a UniProt record into other options using the bioregistry translating endpoint.
PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/> PREFIX up: <http://purl.uniprot.org/core/> SELECT ?protein ?otherIdentifier WHERE { BIND(uniprotkb:P00750 AS ?protein) ?protein a up:Protein . SERVICE <https://bioregistry.io/sparql> { ?protein owl:sameAs ?otherIdentifier . } }Useuniprot_created_modified_updated: List the created, last modified, and last sequence update dates for UniProtKB proteins.
PREFIX up: <http://purl.uniprot.org/core/> SELECT ?protein ?created ?modified ?version WHERE { ?protein a up:Protein ; up:created ?created ; up:modified ?modified ; up:version ?version . }Useuniprot_encoding_gene: List UniProt proteins with their associated named gene
PREFIX up: <http://purl.uniprot.org/core/> SELECT ?protein ?gene WHERE { ?protein a up:Protein ; up:encodedBy ?gene . }Useuniprot_encoding_gene_name: List UniProt proteins with their associated gene and the gene's preffered name
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX up: <http://purl.uniprot.org/core/> SELECT ?protein ?gene WHERE { ?protein a up:Protein ; up:encodedBy ?gene . ?gene skos:prefLabel ?recommendedGeneName . }Useuniprot_encoding_gene_name_alternative_name: List UniProt proteins with their associated gene and the gene's names that are used in the field, but not recommeded for use by UniProt
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX up: <http://purl.uniprot.org/core/> SELECT ?protein ?gene ?altGeneName WHERE { ?protein a up:Protein ; up:encodedBy ?gene . ?gene skos:altLabel ?altGeneName . }Useuniprot_encoding_gene_org_name: List UniProt proteins with their associated gene and the gene's ORF label
PREFIX up: <http://purl.uniprot.org/core/> SELECT ?gene ?orfName WHERE { ?protein a up:Protein ; up:encodedBy ?gene . ?gene up:orfName ?orfName . }Useuniprot_entries_with_more_than_two_geneid_crossrefences: Find GeneID's crosslinked to more than one Human or Mouse UniProt entry
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX up: <http://purl.uniprot.org/core/> PREFIX taxon:<http://purl.uniprot.org/taxon/> SELECT ?geneid ?organism (GROUP_CONCAT(?protein; separator=', ') AS ?entries) WHERE { VALUES ?organism {taxon:9606 taxon:10090} ?geneid up:database <http://purl.uniprot.org/database/GeneID> . ?protein rdfs:seeAlso ?geneid ; up:organism ?organism } GROUP BY ?geneid ?organism HAVING (COUNT(?protein) > 1) ORDER BY ?organism ?geneidUseuniprot_identifiers_org_translation: Translate the global unique identifier for a UniProt record into other options using the identifiers.org translating endpoint.
PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/> PREFIX up: <http://purl.uniprot.org/core/> SELECT ?protein ?otherIdentifier WHERE { VALUES (?protein) {(uniprotkb:P00750) (uniprotkb:P05067)} ?protein a up:Protein . SERVICE <https://sparql.api.identifiers.org.sparql> { ?protein owl:sameAs ?otherIdentifier . } }Useuniprot_identifiers_org_translation: Translate the global unique identifier for a UniProt record into other options using the identifiers.org translating endpoint.
PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/> PREFIX up: <http://purl.uniprot.org/core/> SELECT ?protein ?otherIdentifier WHERE { VALUES (?protein) {(uniprotkb:P00750) (uniprotkb:P05067)} ?protein a up:Protein . SERVICE <https://sparql.api.identifiers.org/sparql> { ?protein owl:sameAs ?otherIdentifier . } }Useuniprot_organelles_or_plasmids: If a gene is located in an organelle other than the nucleus, or/and on a plasmid rather than a chromosome, the gene location is stored in encodedIn properties. Note that if a plasmid has several names, they are listed as multiple rdfs:label properties.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX up: <http://purl.uniprot.org/core/> SELECT ?protein ?plasmidOrOrganelle ?label WHERE { ?protein a up:Protein ; up:encodedIn ?plasmidOrOrganelle . OPTIONAL { ?plasmidOrOrganelle rdfs:label ?label . } }Useuniprot_potential_isoforms: List all Human UniProt entries and their computationaly potential isoforms..
PREFIX taxon: <http://purl.uniprot.org/taxonomy/> PREFIX up: <http://purl.uniprot.org/core/> SELECT ?entry ?sequence ?isCanonical WHERE { # We don't want to look into the UniParc graph which will # confuse matters GRAPH <http://sparql.uniprot.org/uniprot> { # we need the UniProt entries that are human ?entry a up:Protein ; up:organism taxon:9606 ; # and we select the computationally mapped sequences up:potentialSequence ?sequence . } }Useuniprot_primary_accession: Extracting an UniProtKB primary accession from our IRIs. Is done with a bit of string manipulation. While UniProt primary accession are unique within UniProtKB they may be reused by accident or itentionally by other data sources. If we provided them as strings (not IRI) and if you used them in a query that way, you might accidentaly retrieve completly wrong records.
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/> PREFIX up: <http://purl.uniprot.org/core/> SELECT ?primaryAccession ?protein WHERE { ?protein a up:Protein . BIND(substr(str(?protein), strlen(str(uniprotkb:))+1) AS ?primaryAccession) }Use