You are at: HPRD  >  FAQ













Molecule Authority

 

FAQs
 

 

  • How do you maintain the currency of the information?
  • We have frequent reannotations of all the proteins along with their isoforms. In the future, we will depend on the biology community for such updates in addition to our own efforts.


  • Why do the PTM sites of a paper not match the ones reported in HPRD?
  • In many cases, the PTM study as carried out in the literature would be done only for a particular isoform of that protein. Since most isoforms do not change radically in their amino acid sequence, the sites and residues as mentioned in the paper are mapped to all the isoforms. If the isoforms are drastically different, the information for the PTMs is not provided.


  • Is all the data contained in HPRD strictly from humans? Should I presume that if a human protein has been shown to interact with a mouse protein, then it won't be reported in HPRD? How about a post-translational modification localized in a mouse protein?
  • Expression data are strictly taken only from normal human tissue. However, other annotations such as localization, post-translational modifications and protein-protein interaction information need not necessarily be human. We realize that sometimes human proteins may be tested for interactions against mouse proteins and if an interaction is detected, a similar experiment to detect interaction with the corresponding human proteins might never be carried out. Therefore, we have included data obtained from other mammals for annotating their human ortholog. Post-translational modification data from other species is first analyzed to see if it can be mapped to the protein sequence of the corresponding human ortholog, and if this is not possible, then the modification is not included in HPRD. In all cases, a reference to the corresponding literature is provided for additional experimental details.


  • Are the alternatively spliced forms of proteins included in HPRD?
  • We are now including all the isoforms curated by RefSeq. We are not including GenBank and other alternatively spliced isoforms as mentioned in literature.


  • Where did you get the domain and motif assignments from?
  • Domains and motifs are one of the very few fields in which the information may not be experimentally verified. Because domains predictions are not always reliable, we interpret the results of the best prediction programs algorithms (SMART and Pfam) and choose the most sensible prediction. In the case of an article that provide experimental data on domains, we might ignore the computer predictions and choose the literature-based assignment.


  • Why should I trust that your information is more accurate than other sources? What is the expertise of the people annotating the database?
  • We are mostly young researchers reading an average of 10-20 papers per person everyday. We have been trained by experienced biologists who have been monitoring the entire process. Every protein has been reviewed twice. Yet, we admit that there can be errors - mostly from an error in interpretation of data. We have made it easy for you to submit comments and would like to hear from you. If you would like to review a molecule or even an entire protein family, we will be happy to credit you as a 'Reviewer' for that molecule.


  • How come disulfide bond in the C-terminus of Protein X are not reported?
  • As explained above, we are not experts in everything. Please let us know any important annotations that are missing or erroneous and we will change it immediately. You can also be the official 'Reviewer' if you wish.


  • How do you generate the final statistics for the interaction count?
  • If we find 2 proteins are interacting, the interaction information will be available in both the protein annotations, but the interaction is counted as 1. In the same way, if we have an enzyme-substrate reaction where one protein acts on another affecting multiple sites, the annotation may be separate for each site, but the interaction count is only 1. We thus generate the statistic by taking all the available data from Interactions/PTMs/Substrates and then collapsing it to get a non-redundant count of the interactions.


 
Please send any questions or comments about the Human Protein Reference Database to help

This is a joint project between:
   and