SPARQL example query
uniprot_primary_accession: Extracting an UniProtKB primary accession from our IRIs. Is done with a bit of string manipulation. While UniProt primary accession are unique within UniProtKB they may be reused by accident or itentionally by other data sources. If we provided them as strings (not IRI) and if you used them in a query that way, you might accidentaly retrieve completly wrong records.
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/> PREFIX up: <http://purl.uniprot.org/core/> SELECT ?primaryAccession ?protein WHERE { ?protein a up:Protein . BIND(substr(str(?protein), strlen(str(uniprotkb:))+1) AS ?primaryAccession) }Useuniprot_proteome_location_of_gene: List UniProt proteins with genetic replicon that they are encoded on using the Proteome data.
PREFIX taxon: <http://purl.uniprot.org/taxonomy/> PREFIX up: <http://purl.uniprot.org/core/> SELECT DISTINCT ?proteomeData ?replicon ?proteome WHERE { # reviewed entries (UniProtKB/Swiss-Prot) ?protein up:reviewed true . # restricted to Human taxid ?uniprot up:organism taxon:9606 . ?uniprot up:proteome ?proteomeData . BIND( strbefore( str(?proteomeData), "#" ) as ?proteome ) BIND( strafter( str(?proteomeData), "#" ) as ?replicon ) }Useuniprot_recomended_protein_full_name: The recommended protein full names for UniProtKB entries
PREFIX up: <http://purl.uniprot.org/core/> SELECT ?protein ?fullName WHERE { ?protein a up:Protein ; up:recommendedName ?recommendedName . ?recommendedName up:fullName ?fullName . }Useuniprot_recomended_protein_short_name: The recommended protein short names for UniProtKB entries
PREFIX up: <http://purl.uniprot.org/core/> SELECT ?protein ?fullName WHERE { ?protein a up:Protein ; up:recommendedName ?recommendedName . ?recommendedName up:shortName ?fullName . }Useuniprot_reviewed_or_not: List all UniProt protein and if they are reviewed (Swiss-Prot) or unreviewed (TrEMBL)
PREFIX up: <http://purl.uniprot.org/core/> SELECT ?protein ?reviewed WHERE { ?protein a up:Protein . ?protein up:reviewed ?reviewed . }Useuniprot_sequences_and_mark_which_is_cannonical_for_human: List all Human UniProt entries and their sequences, marking if the sequence listed is the cannonical sequence of the matching entry.
PREFIX taxon: <http://purl.uniprot.org/taxonomy/> PREFIX up: <http://purl.uniprot.org/core/> SELECT ?entry ?sequence ?isCanonical WHERE { # We don't want to look into the UniParc graph which will # confuse matters GRAPH <http://sparql.uniprot.org/uniprot> { # we need the UniProt entries that are human ?entry a up:Protein ; up:organism taxon:9606 ; up:sequence ?sequence . # If the sequence is a "Simple_Sequence" it is likely to be the # cannonical sequence OPTIONAL { ?sequence a up:Simple_Sequence . BIND(true AS ?likelyIsCanonical) } # unless we are dealing with an external isoform # see https://www.uniprot.org/help/canonical_and_isoforms OPTIONAL { FILTER(?likelyIsCanonical) ?sequence a up:External_Sequence . BIND(true AS ?isComplicated) } # If it is an external isoform it's id would not match the # entry primary accession BIND(IF(?isComplicated, STRENDS(STR(?entry), STRBEFORE(SUBSTR(STR(?sequence), 34),'-')),?likelyIsCanonical) AS ?isCanonical) } }Useuniprot_signature_match_start_end: List all InterPro member database signature match start and end for a specific UniProt protein.
PREFIX faldo: <http://biohackathon.org/resource/faldo#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX up: <http://purl.uniprot.org/core/> SELECT ?protein ?interproMemberDatabaseXref ?matchStart ?matchEnd WHERE{ GRAPH <http://sparql.uniprot.org/uniprot> { VALUES ?protein {<http://purl.uniprot.org/uniprot/P05067>} . ?protein rdfs:seeAlso ?sa . } GRAPH <http://sparql.uniprot.org/uniparc> { ?uniparc up:sequenceFor ?protein ; rdfs:seeAlso ?interproMemberDatabaseXref . ?interproDatabaseXref up:signatureSequenceMatch ?sam . ?sam faldo:begin ?sab ; faldo:end ?sae . ?sab faldo:position ?matchStart ; faldo:reference ?uniparc . ?sae faldo:position ?matchEnd ; faldo:reference ?uniparc . } }Useuniprot_transporter_in_liver: Find human transporter proteins in reviewed UniProtKB, that are expressed in the liver (Uses Bgee and UBERON).
PREFIX genex: <http://purl.org/genex#> PREFIX lscr: <http://purl.org/lscr#> PREFIX obo: <http://purl.obolibrary.org/obo/> PREFIX orth: <http://purl.org/net/orth#> PREFIX rh: <http://rdf.rhea-db.org/> PREFIX taxon: <http://purl.uniprot.org/taxonomy/> PREFIX uberon: <http://purl.obolibrary.org/obo/uo#> PREFIX up: <http://purl.uniprot.org/core/> SELECT ?rhea ?protein ?anat WHERE { GRAPH <https://sparql.rhea-db.org/rhea> { ?rhea rh:isTransport true . } ?protein up:annotation ?ann . ?protein up:organism taxon:9606 . ?ann up:catalyticActivity ?ca . ?ca up:catalyzedReaction ?rhea . BIND(uberon:0002107 AS ?anat) SERVICE <https://www.bgee.org/sparql> { ?seq genex:isExpressedIn ?anat . ?seq lscr:xrefUniprot ?protein . ?seq orth:organism ?organism . ?organism obo:RO_0002162 taxon:9606 . } }Useuniprot_unamed_plasmids: Sometimes it is known that a gene encoding a protein UniProtKB is located on a plasmid, but the name of the plasmid is unknown.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX up: <http://purl.uniprot.org/core/> SELECT ?protein ?plasmidOrOrganelle ?label WHERE { ?protein a up:Protein ; up:encodedIn ?plasmidOrOrganelle . OPTIONAL { ?plasmidOrOrganelle rdfs:label ?label . } }Use