Brian the Roman
03-22-2003, 07:18 AM
Howard;
Do you currently use any homolgy modeling approaches to guide the sampling?
I was thinking a bit about the sampling side. There are only 20 AAs. When put together in various orders we get different proteins. However, any protein with over 20 residues must be reusing some of the AAs. So why not create a database of all proteins with know conformation (PDB?) and then find subsets of the sequence of interest in the database and use the conformations of those pieces to guide us. So we'd only need to randomy sample unknown sequences.
For example: say our protein contains a sequence ALA, ARG, ASP, CYS, ASN. We look in the db and find another protein whose native fold we know also contains this sequence. Then we could use the conformation of the middle three AAs as a starting point and only randomly sample for the rest, and thus effectively reduce the sampling space. Obviously, you could find multiple chains in the db each of which we could use to guide us. This info could be loocked up by the server before the protein is sent to the client so the client would only need the info about the known sequences, not the entire db.
This approach is similar to homology modeling is it not?
ms
Do you currently use any homolgy modeling approaches to guide the sampling?
I was thinking a bit about the sampling side. There are only 20 AAs. When put together in various orders we get different proteins. However, any protein with over 20 residues must be reusing some of the AAs. So why not create a database of all proteins with know conformation (PDB?) and then find subsets of the sequence of interest in the database and use the conformations of those pieces to guide us. So we'd only need to randomy sample unknown sequences.
For example: say our protein contains a sequence ALA, ARG, ASP, CYS, ASN. We look in the db and find another protein whose native fold we know also contains this sequence. Then we could use the conformation of the middle three AAs as a starting point and only randomly sample for the rest, and thus effectively reduce the sampling space. Obviously, you could find multiple chains in the db each of which we could use to guide us. This info could be loocked up by the server before the protein is sent to the client so the client would only need the info about the known sequences, not the entire db.
This approach is similar to homology modeling is it not?
ms