Identifying genes within pathways in unannotated genomes with PaGeSearch

Prof. Heebal Kim

In biological research, the identification and comparison of genes within specific pathways across the genomes of various species are invaluable. However, annotating the entire genome is resource intensive, and sequence similarity searches often yield results that are not actually genes. To address these limitations, we introduce Pathway Gene Search (PaGeSearch), a tool designed to identify genes from predefined lists, especially those in specific pathways, within genomes. The tool uses an initial sequence similarity search to identify relevant genomic regions, followed by targeted gene prediction and neural network–based result filtering. PaGeSearch suggests the regions that are most likely the orthologs of the genes in the query and is designed to be applicable for species within five classes: mammals, fish, birds, eudicotyledons, and Liliopsida. Compared with GeMoMa and miniprot, PaGeSearch generally outperforms in terms of sensitivity and positive predictive value, as well as negative predictive value. Also, the exon coverage of gene models from PaGeSearch is higher compared with those in GeMoMa and miniprot. Although its performance shows increased variability when applied to actual biological pathways, it nonetheless maintains an acceptable level of accuracy. Evaluating PaGeSearch across different assembly levels, chromosome, scaffold, and contig shows minimal variation in outcomes, indicating that PaGeSearch is resilient to variations in assembly quality.

more >> http://doi.org/10.1101/gr.278566.123

2024. 6. 18.

수정요청

현재 페이지에 대한 의견이나 수정요청을 관리자에게 보내실 수 있습니다.
아래의 빈 칸에 내용을 간단히 작성해주세요.

닫기