Bioinformatics
1. 1. Run a Delta-BLAST with the silkworm insulin protein (P26726). Limit to human proteins in the RefSeq_Protein database.
a. How many total sequences?
b. How many human homologs appear to have the insulin domain (irrespective of the e-value threshold)?
c. Now edit search remove the human from the organism selection and change DELTA-BLAST threshold to 0.005. Keep everything else unchanged. How many records?
d. Write the name of the organism that has the best e-value?
e. Run a second iteration. What is the name of the newly added organism that has the best e-value? What is that e-value?
2. 2. Run PSI-BLAST with the following sequence. Use the RefSeq database, limit to enterobacteria and change the PSI-BLAST threshold to 0.005.
>gi|337285369|ref|YP_004624843.1| proteasome subunit beta [Pyrococcus yayanosii CH1]
MEKKTGTTTVGIRVDEGVVLAADTQASLDHMVETLNIRKIIPITDRIAITTAGSVGDVQMLARILEVEARYYQFTWGRPMSTRAMANLLSNILNENKWFPYLVQIVIGGYVEEPVIANLDPLGGLIFDDYTATGSGSPFAIAVLEDGYRKDMGIEEAKELAVKAVKVAGKRDVYTGSRKVQVVTITKEGMREYWFEE
a. a. How many sequences? Write the name of the only organism that has an e-value of 1e-04.
b. Write the approximate number of new hits at each of the subsequent iterations.
c c. Are there still sequences being added on the fifth iteration?
d. Are most of those descriptions “proteases?
3. 3. Run hmmer with the proteasome from question 2. Search the RefSeq database.
What is the E value of the best matching proteasome?
Based on the colors and the distribution, from which of the three domains of life are the majority of hits?The proteasome sequence is from a Bioinformatics. Why do you suppose there may be a (relative) lack of total hits in archaea using HMMER?