The usage of Phyre2 was mainly due to a level of familiarity with its output, given that the server’s Poplar structure predictions we currently have are also generated with Phyre2. I mentioned that some of Phyre2’s predictions are rather low-quality, with low sequence coverage in the predicted structure — these shortcomings could possibly be overcome by leveraging other servers (like I-TASSER) for secondary analysis.
Hi Luke, nice job!
A few points:
– Your R-squared of 0.73 is quite amazing since you’re comparing RNA-Seq data against microarray (SNP-array) data. What was the overlap of the same individuals in your RNA-Seq dataset vs your microarray dataset? I think that’d be a relevant metric.
– When you truncate your protein sequences for Phyre2, does the PDB that is returned reference the position of the AAs to the untrucated version or does it normalize the positions against the submitted-truncated version? e.g. You submit the protein sequence of X from pos 10 to 100 because 9 had an X. Does the PDB that returned reference the first AA as ’10’ or ‘1’?
– The overlap of genotypes between the two datasets was honestly not something I’ve considered up to this point. I don’t have the analysis on hand but I agree it would be a good idea to do. I’ve already grabbed the accessions from both datasets and I’ll be ready to append the results to my report (or a later comment).
– Phyre2 does not know that the sequences it receives are truncated. It will treat the fragmented input as a full sequence and label the first residue index ‘1’. To follow your own example, the first AA of the input (i.e. right after the removed X) will be labelled index ‘1’.
Great presentation! I just wonder for the files that does not have 100% agreement with PDBs, are they been annotated? or the annotation will be done only for the files that have 100% agreement.
Indeed, even the PDBs that did not agree between annotation versions were tabulated. That means 2 index files, one listing all consistent PDBs, and one listing all inconsistent PDBs.
Hi Luke, Nice presentation. Just one question. Why use Phyre2 instead of other server such as I-TASSER?
Hi Wen Kai, I appreciate the comment.
The usage of Phyre2 was mainly due to a level of familiarity with its output, given that the server’s Poplar structure predictions we currently have are also generated with Phyre2. I mentioned that some of Phyre2’s predictions are rather low-quality, with low sequence coverage in the predicted structure — these shortcomings could possibly be overcome by leveraging other servers (like I-TASSER) for secondary analysis.
Hi Luke, nice job!
A few points:
– Your R-squared of 0.73 is quite amazing since you’re comparing RNA-Seq data against microarray (SNP-array) data. What was the overlap of the same individuals in your RNA-Seq dataset vs your microarray dataset? I think that’d be a relevant metric.
– When you truncate your protein sequences for Phyre2, does the PDB that is returned reference the position of the AAs to the untrucated version or does it normalize the positions against the submitted-truncated version? e.g. You submit the protein sequence of X from pos 10 to 100 because 9 had an X. Does the PDB that returned reference the first AA as ’10’ or ‘1’?
Thanks, Vincent.
– The overlap of genotypes between the two datasets was honestly not something I’ve considered up to this point. I don’t have the analysis on hand but I agree it would be a good idea to do. I’ve already grabbed the accessions from both datasets and I’ll be ready to append the results to my report (or a later comment).
– Phyre2 does not know that the sequences it receives are truncated. It will treat the fragmented input as a full sequence and label the first residue index ‘1’. To follow your own example, the first AA of the input (i.e. right after the removed X) will be labelled index ‘1’.
Great presentation! I just wonder for the files that does not have 100% agreement with PDBs, are they been annotated? or the annotation will be done only for the files that have 100% agreement.
Hi Juliana!
Indeed, even the PDBs that did not agree between annotation versions were tabulated. That means 2 index files, one listing all consistent PDBs, and one listing all inconsistent PDBs.