my R journeys: January 2020

Pantherdb.org has what looks like a fairly up to date and comprehensive listing of genes and gene ontologies. I recently wanted to generate a comma separated list of gene ontology IDs and descriptions from a list of gene names using R, but the description of the API (here: http://www.pantherdb.org/help/PANTHERhelp.jsp#V.E.) uses javascript. Below is the strategy I used to get what I wanted in R.

BiocManager::install("httr")
library(stringr)
library(httr)
#parameters listed on the pantherdb web site...
# StringBody organism = new StringBody("Homo sapiens", ContentType.TEXT_PLAIN);
# FileBody fileData = new FileBody(new File("c:\\data_files\\gene_expression_files\\7_data\\humanEnsembl"), ContentType.TEXT_PLAIN);
# StringBody enrichmentType = new StringBody("process", ContentType.TEXT_PLAIN); //"function", "process", "cellular_location", "protein_class", "pathway", "fullGO_function", "fullGO_process", "fullGO_component", "reactome"
# StringBody testType = new StringBody("FISHER", ContentType.TEXT_PLAIN); // "FISHER", "BINOMIAL" . This parameter is optional; If not specified, the system will perform Fisher’s Exact test
# //StringBody cor = new StringBody("FDR", ContentType.TEXT_PLAIN);
# //StringBody cor = new StringBody("BONFERRONI", ContentType.TEXT_PLAIN);
# //StringBody cor = new StringBody("NONE", ContentType.TEXT_PLAIN);
# StringBody type = new StringBody("enrichment", ContentType.TEXT_PLAIN);

url<-"http://www.pantherdb.org/webservices/garuda/tools/enrichment/VER_2/enrichment.jsp?"
enrichmentType <- "fullGO_process"
organism <- "Homo sapiens"
cor <- "BONFERRONI"
type <- "enrichment"
genes<-"TSPAN6\nDPM1\nSCYL3\nC1orf112\nFGR\nCFH\nFUCA2\nGCLC\nNFYA\nSTPG1\nNIPAL3\nLAS1L\nENPP4\nSEMA3F\nCFTR\nANKIB1\nCYP51A1\nKRIT1\nRAD52\nBAD\nLAP3\nCD99\nHS3ST1\nAOC1\nWNT16\nHECW1\nMAD1L1\nLASP1\nSNX11\nTMEM176A\nM6PR"
response<-POST(url = url, encode = "multipart", body = list(organism = organism, enrichmentType = enrichmentType, geneList = genes, type = type, cor = cor))

content<-content(response, as = "parsed")
content<-unlist(str_split(content, "\n"))
content<-content[content != "Id\tName\tGeneId\traw P-value\tFDR"]
content<-content[content != "\r"]
content<-content[content != ""]
myColnames<-c("ID","Name","GeneId","raw_pValue","FDR")
myResult<-matrix("",nrow = length(content), ncol = 5)
for (i in seq_along(content)){
myResult[i,]<-unlist(str_split(content[i], "\t"))
}
colnames(myResult)<-myColnames
myResult<-as.data.frame(myResult, stringsAsFactors = FALSE)
myResult$raw_pValue<-as.numeric(myResult$raw_pValue)
myResult$FDR<-as.numeric(myResult$FDR)

myResult<-myResult[myResult$raw_pValue < .05,]
agg<-aggregate(myResult[,"ID"] ~ myResult[,"GeneId"], list(myResult[,"GeneId"]), toString)
agg2<-aggregate(myResult[,"Name"] ~ myResult[,"GeneId"], list(myResult[,"GeneId"]), toString)
finalResult<-cbind(agg,agg2[,2])
colnames(finalResult)<-c("geneID", "GOIDs", "descriptions")
View(finalResult)

here's what the output looks like:

my R journeys

Friday, January 17, 2020

Retrieve GO terms for a gene list from pantherdb.org

Wednesday, January 15, 2020

Custom color gradient legend for a ggplot graph

Retrieve GO terms for a gene list from pantherdb.org

Report Abuse