Introduction: A search box is shown at the top part of all pages. The search function is built upon the powerful search engine Sphinx, which has been programmatically configured to allow very fast index-organized table search (average response time: 300 millisecond) and highly efficient pagination. It implements a Google-like search supporting both exact and fuzzy query, and users can input a keyword to search 12 different data types. These data types can be largely classified into three groups:
(i) basic information about CAZymes: such as species name, CAZyme domain, protein ID, GCF ID, taxonomy ID;
(ii) CAZyme annotation data: such as PDB hits, Swiss-Prot hits, CAZy hits, CDD hits, E2P2 predicted enzyme reaction and EC number;
(iii) CAZyme genomic context.
For data types in (ii), it is to search for CAZymes sharing sequence similarity to Swiss-Prot, PDB, and CAZy proteins. For example, one can type in a PDB ID (e.g., 1LZL_A) and choose a sequence identity value (e.g, 50%). The search will return a list of CAZymes sharing similarity with the queried PDB protein with identity larger than the given value: similarity = 20%,1LZL_A. This is very useful to answer questions like, what proteins in dbCAN-seq have a high sequence similarity to some experimentally characterized CAZymes?
As for (iii), it is a very useful tool to search the gene neighborhood of a query CAZyme. For example, users can type in a CAZyme protein ID (e.g., WP_007212487.1), and select how many upstream and downstream genes of the query gene they want to explore (e.g., 5). The search will return a table with 11 genes with the query gene being the 6th in the table: WP_007212487.1 . If any of the 11 genes are CGC signature genes (i.e., CAZyme, TF, or TC), they will be highlighted with colors. This is a novel tool to answer questions like, is my CAZyme located close to any other CAZyme genes or TF or TC genes, or is my CAZyme potentially located in any CGCs?
Examples:
1. CAZyme ID (e.g., NP_212393.2 );
2. GCF_ID (e.g., GCF_000005825.2);
3. Tax_ID (e.g., 398511);
4. Species_Name (e.g.,Bacillus pseudofirmus OF4);
5. CAZyme_domain (e.g., CE4);
6. Pdb hit (e.g., similarity = 20%,1LZL_A);
7. Swiss-Prot hit (e.g.,similarity = 20%,ETHA_MYCTU or P9WNF9 or sp|P9WNF9|ETHA_MYCTU);
8. Cdd ID (e.g., similarity = 20%,COG2072);
9. CAZyme hit (e.g., similarity = 20%,AHF23796.1);
10. MetaCyc (e.g., 3.2.1.8-RXN);
11. Predited_EC (e.g., 3.1.1.23);
12. CAZyme ID (CGC) (e.g., WP_007212487.1);
From 1 to 5 and 10 to 12, one can type the keyword to search.
From 6 to 9, one has to specify a sequence identity value in addition to a keyword. For example, if choose to search pdb_hit,
on the left an identity value, e.g., 20%, has to be selected, followed by a pdb ID. The result will be a list of proteins in the database that share > 20% identity to the pdb protein.
Copyright 2022 © YIN LAB, UNL. All rights reserved. Designed by Jinfang Zheng and Boyang Hu. Maintained by Yanbin Yin.