Portrait of Dr Mark Wass

Dr Mark Wass

Senior Lecturer in Computational Biology


Mark obtained a BA in Natural Sciences from Cambridge University in 2000 followed by an MSc in Computing at Imperial College London (2001). After a few years working in Industry as an IT consultant Mark studied for a PhD with Prof Mike Sternberg at Imperial (2004-2008) and continued onto a post-doctoral position in the group until 2011. In 2011 Mark was awarded a FEBS Long Term Fellowship to work in the group of Alfonso Valencia at the CNIO (Spanish National Cancer Research Centre, Madrid, Spain). Mark joined the School of Biosciences at Kent in October 2012 as a lecturer in Computational Biology and now runs a joint wet/ dry laboratory research group together with Martin Michaelis


Google Scholar: http://tinyurl.com/lsesv4h

Research interests

Mark’s research focusses in two main areas. The first is the development of novel computational methods for the analysis of large scale biological data, particularly methods for the prediction of protein structure, function and interactions. The second area is the application of such methods to address important biological problems. These cover the association of genetic variation with human disease, investigating mechanisms and biomarkers of acquired resistance to anti-cancer drugs and also identifying determinants of pathogenicity in viruses.
In the area of acquired resistance in cancer, Mark’s research focusses on using the Resistant Cancer Cell Line Collection (RCCL), a unique collection of >1,300 cancer cell lines with acquired resistance to anti-cancer drugs, which provides a model to study how tumours become resistant to anti-cancer drugs during treatment. In the area of computational virology Mark’s research initially focussed on investigating determinants of Ebola virus pathogenicity, in 2016, Mark won the International Society of Computational Biology ‘Fight against Ebola award’. Mark’s continues research on Ebola virus and has expanded this area of research to other other viruses including Marburg virus and Zika virus.



  • BI638 – Bioinformatics and Genomics
  • BI639 – Frontiers in Oncology 
  • BI620 – Frontiers in Virology 
  • BI629 – Proteins


MSc-R projects available for 2019/20

Predicting protein function Advances in sequencing technologies have identified millions of protein sequences but the function of many of these proteins remains unknown. This project will focus on developing a computational method to predict protein function. Additional research costs: £1200
Predicting the effects of genetic variation Millions of genetic variants have been identified within human genomes but it is difficult to identify those that are associated with a phenotype (i.e. disease). This project will develop a computational machine learning approach to predict the effect of genetic variants. Additional research costs: £1200

Investigating determinants of virus pathogenicity (joint supervision with Prof Martin Michaelis)
Our research has recently compared different species of Ebolaviruses to identify parts of their proteins that determine if they are pathogenic. This project will apply these computational approaches to different types of viruses (e.g. Zika virus, west Nile, human papillomavirus) to identify determinants of virus pathogenicity and gain insight into what make some viruses highly virulent while others are harmless. Additional research costs: £1200

Investigation of drug-adapted cancer cell lines (joint supervision with Prof Martin Michaelis) We host the Resistant Cancer Cell Line (RCCL) collection, the worldwide largest collection of drug-adapted cancer cell lines and models of acquired drug resistance in cancer at Kent. In this project, drug-adapted cancer cell lines will be characterised and investigated to gain novel insights into the processes underlying resistance formation and to identify novel therapy candidates (including biomarkers) Additional research costs: £1200

Evolution of the muscle sarcomere. A bioinformatics approach to the interaction between myosin and myosin binding protein-C (joint supervision with Prof M Geeves)
Additional research costs: £1200  

Using cancer genomics to identify biomarkers of cancer resistance
(joint supervision with Prof Martin Michaelis)  
At Kent we host the Resistant Cancer Cell Line (RCCL) collection, the largest collection of cancer cell lines worldwide that have been adapted to anti-cancer drugs. These cells represent a model of drug resistance in tumours. This project will analyse exome sequencing data of a set of cell lines to identify mechanisms of resistance and biomarkers. 
Additional research costs: £1200

Design of cancer cell-specific drug carrier systems (joint supervision Dr Christopher Serpell, School of Physical Sciences)
The Serpell lab has produced perfectly sequence-defined polymers which self-assemble to give nanostructures with a remarkable variety of size and shape according to sequence and conditions (N. Appukutti, C. J. Serpell, Sequence Isomerism in Uniform Polyphosphoesters Programmes Self-Assembly and Folding, ChemRxiv, preprint posted 04.02.19, DOI: https://doi.org/10.26434/chemrxiv.7666316.v1. In this project, the effects of the polymer nanostructure on cell uptake and therapeutic efficacy will be studied in different cancer cells. This will provide pioneering insights into the prospects of sequence-defined polymers as carrier systems for anti-cancer drugs. Additional research costs: £1200



  • Martell, H. et al. (2019). Is the Bombali virus pathogenic in humans? Bioinformatics [Online]. Available at: http://dx.doi.org/10.1093/bioinformatics/btz267.
    Motivation: The potential of the Bombali virus, a novel Ebolavirus, to cause disease in humans
    remains unknown. We have previously identified potential determinants of Ebolavirus pathogenicity
    in humans by analysing the amino acid positions that are differentially conserved (specificity
    15 determining positions; SDPs) between human pathogenic Ebolaviruses and the non-pathogenic
    Reston virus. Here, we include the many Ebolavirus genome sequences that have since become
    available into our analysis and investigate the amino acid sequence of the Bombali virus proteins
    at the SDPs that discriminate between human pathogenic and non-human pathogenic
    20 Results: The use of 1408 Ebolavirus genomes (196 in the original analysis) resulted in a set of
    166 SDPs (reduced from 180), 146 (88%) of which were retained from the original analysis. This
    indicates the robustness of our approach and refines the set of SDPs that distinguish human pathogenic
    Ebolaviruses from Reston virus. At SDPs, Bombali virus shared the majority of amino acids
    with the human pathogenic Ebolaviruses (63.25%). However, for two SDPs in VP24 (M136L, R139S)
    25 that have been proposed to be critical for the lack of Reston virus human pathogenicity because
    they alter the VP24-karyopherin interaction, the Bombali virus amino acids match those of Reston
    virus. Thus, Bombali virus may not be pathogenic in humans. Supporting this, no Bombali virusassociated
    disease outbreaks have been reported, although Bombali virus was isolated from fruit
    bats cohabitating in close contact with humans, and anti-Ebolavirus antibodies that may indicate
    30 contact with Bombali virus have been detected in humans.
  • Wass, M., Ray, L. and Michaelis, M. (2019). Understanding of researcher behaviour is required to improve data reliability. GigaScience [Online]. Available at: https://doi.org/10.1093/gigascience/giz017.
    A lack of data reproducibility (“reproducibility crisis”) has been extensively debated across many academic disciplines.

    Main body
    Although a reproducibility crisis is widely perceived, conclusive data on the scale of the problem and the underlying reasons are largely lacking. The debate is primarily focused on methodological issues. However, examples such as the use of misidentified cell lines illustrate that the availability of reliable methods does not guarantee good practice. Moreover, research is often characterised by a lack of established methods. Despite the crucial importance of researcher conduct, research and conclusive data on the determinants of researcher behaviour are widely missing.

    Meta-research is urgently needed that establishes an understanding of the factors that determine researcher behaviour. This knowledge can then be used to implement and iteratively improve measures, which incentivise researchers to apply the highest standards resulting in high quality data.
  • Masterson, S. et al. (2018). Herd Immunity to Ebolaviruses Is Not a Realistic Target for Current Vaccination Strategies. Frontiers in Immunology [Online] 9. Available at: https://doi.org/10.3389/fimmu.2018.01025.
    The recent West African Ebola virus pandemic, which affected >28,000 individuals increased interest in anti-Ebolavirus vaccination programs. Here, we systematically analyzed the requirements for a prophylactic vaccination program based on the basic reproductive number (R0, i.e. the number of secondary cases that result from an individual infection). Published R0 values were determined by systematic literature research and ranged from 0.37 to 20. R0s ?4 realistically reflected the critical early outbreak phases and superspreading events. Based on the R0, the herd immunity threshold (Ic) was calculated using the equation Ic=1–(1/R0). The critical vaccination coverage (Vc) needed to provide herd immunity was determined by including the vaccine effectiveness (E) using the equation Vc=Ic/E. At an R0 of 4, the Ic is 75% and at an E of 90%, more than 80% of a population need to be vaccinated to establish herd immunity. Such vaccination rates are currently unrealistic because of resistance against vaccinations, financial/ logistical challenges, and a lack of vaccines that provide long-term protection against all human-pathogenic Ebolaviruses. Hence, outbreak management will for the foreseeable future depend on surveillance and case isolation. Clinical vaccine candidates are only available for Ebola viruses. Their use will need to be focused on health care workers, potentially in combination with ring vaccination approaches.
  • Fenton, T. et al. (2018). Meeting Abstracts of the BACR conference: response and resistance in cancer therapy. Cancer Drug Resistance [Online] 1:266-302. Available at: https://doi.org/10.20517/cdr.2018.18.
  • Fenton, T. et al. (2018). What really matters - response and resistance in cancer therapy. Cancer Drug Resistance [Online] 2018:200-203. Available at: https://doi.org/10.20517/cdr.2018.19.
  • Pappalardo, M. et al. (2017). Changes associated with Ebola virus adaptation to novel species. Bioinformatics [Online] 33:1911-1915. Available at: http://dx.doi.org/10.1093/bioinformatics/btx065.
    Motivation: Ebola viruses are not pathogenic but can be adapted to replicate and cause disease in rodents. Here, we used a structural bioinformatics approach to analyze the mutations associated with Ebola virus adaptation to rodents to elucidate the determinants of host-specific Ebola virus pathogenicity.
    Results: We identified 33 different mutations associated with Ebola virus adaptation to rodents in the proteins GP, NP, L, VP24, and VP35. Only VP24, GP and NP were consistently found mutated in rodent-adapted Ebola virus strains. Fewer than five mutations in these genes seem to be required for the adaptation of Ebola viruses to a new species. The role of mutations in GP and NP is not clear. However, three VP24 mutations located in the protein interface with karyopherin 5 may enable VP24 to inhibit karyopherins and subsequently the host interferon response. Three further VP24 mutations change hydrogen bonding or cause conformational changes. Hence, there is evidence that few mutations including crucial mutations in VP24 enable Ebola virus adaptation to new hosts. Since Reston virus, the only non-human pathogenic Ebolavirus species circulates in pigs in Asia, this raises concerns that few mutations may result in novel human pathogenic Ebolaviruses.
  • Popay, A. et al. (2017). Dexamethasone for the Prevention of Cisplatin-induced Ototoxicity. Clinical Cancer Drugs [Online] 4:59-64. Available at: https://doi.org/10.2174/2212697X04666170331171359.
    Background: Cisplatin is a commonly used anti-cancer drug. However, its use is associated with severe side effects including ototoxicity that affects a large fraction of cisplatin-treated patients. Approved therapies that reduce cisplatin-induced ototoxicity are lacking. Among the candidate therapeutics, dexamethasone stands out. There is extensive experience of its use in combination with cisplatin for the prevention of chemotherapy-induced nausea and vomiting indicating that dexamethasone does not affect the anti-cancer effects of cisplatin.

    Objective: The objective of this study is to assess the potential of dexamethasone for the prevention of cisplatin-induced ototoxicity by a systematic analysis of the available evidence.
    Method: The databases PubMed and Web of Science were used to identify relevant articles by using the search terms 'cisplatin', 'ototoxicity', and 'dexamethasone'.
    Results: We identified 16 relevant original research articles. The analyzed studies reported conflicting results on the effects of dexamethasone on cisplatin-induced ototoxicity. However, studies in which dexamethasone was used prior to cisplatin treatment and directly administered into the tympanic cavity of the middle ear consistently reported beneficial effects. The use of sustained release formulations that prolong the availability of dexamethasone within the ear further improved the efficacy of dexamethasone.
    Conclusion: Dexamethasone is a promising candidate drug for the prevention of cisplatin-induced ototoxicity when applied intratympanically. Optimized formulations and administration schedules with regard to dose and time of application need to be developed.
  • Pappalardo, M. et al. (2017). Investigating Ebola virus pathogenicity using Molecular Dynamics. BMC Genomics [Online] 18. Available at: https://dx.doi.org/10.1186/s12864-017-3912-2.
    Background: Ebolaviruses have been known to cause deadly disease in humans for 40 years and have recently been demonstrated in West Africa to be able to cause large outbreaks. Four Ebolavirus species cause severe disease associated with high mortality in humans. Reston viruses are the only Ebolaviruses that do not cause disease in humans. Conserved amino acid changes in the Reston virus protein VP24 compared to VP24 of other Ebolaviruses have been suggested to alter VP24 binding to host cell karyopherins resulting in impaired inhibition of interferon signalling, which may explain the difference in human pathogenicity. Here we used protein structural analysis and molecular dynamics to further elucidate the interaction between VP24 and KPNA5.

    Results: As a control experiment, we compared the interaction of wild-type and R137A-mutant (known to affect KPNA5 binding) Ebola virus VP24 with KPNA5. Results confirmed that the R137A mutation weakens direct VP24-KPNA5 binding and enables water molecules to penetrate at the interface. Similarly, Reston virus VP24 displayed a weaker interaction with KPNA5 than Ebola virus VP24, which is likely to reduce the ability of Reston virus VP24 to prevent host cell interferon signalling.

    Conclusion: Our results provide novel molecular detail on the interaction of Reston virus VP24 and Ebola virus VP24 with human KPNA5. The results indicate a weaker interaction of Reston virus VP24 with KPNA5 than Ebola virus VP24, which is probably associated with a decreased ability to interfere with the host cell interferon response. Hence, our study provides further evidence that VP24 is a key player in determining Ebolavirus pathogenicity.
  • Saintas, E. et al. (2017). Acquired resistance to oxaliplatin is not directly associated with increased resistance to DNA damage in SK-N-ASrOXALI4000, a newly established oxaliplatin-resistant sub-line of the neuroblastoma cell line SK-N-AS. PLoS ONE [Online] 12:e0172140. Available at: http://dx.doi.org/10.1371/journal.pone.0172140.
    The formation of acquired drug resistance is a major reason for the failure of anti-cancer therapies after initial response. Here, we introduce a novel model of acquired oxaliplatin resistance, a sub-line of the non-MYCN-amplified neuroblastoma cell line SK-N-AS that was adapted to growth in the presence of 4000 ng/mL oxaliplatin (SK-N-ASrOXALI4000). SK-N-ASrOXALI4000 cells displayed enhanced chromosomal aberrations compared to SK-N-AS, as indicated by 24-chromosome fluorescence in situ hybridisation. Moreover, SK-N-ASrOXALI4000 cells were resistant not only to oxaliplatin but also to the two other commonly used anti-cancer platinum agents cisplatin and carboplatin. SK-N-ASrOXALI4000 cells exhibited a stable resistance phenotype that was not affected by culturing the cells for 10 weeks in the absence of oxaliplatin. Interestingly, SK-N-ASrOXALI4000 cells showed no cross resistance to gemcitabine and increased sensitivity to doxorubicin and UVC radiation, alternative treatments that like platinum drugs target DNA integrity. Notably, UVC-induced DNA damage is thought to be predominantly repaired by nucleotide excision repair and nucleotide excision repair has been described as the main oxaliplatin-induced DNA damage repair system. SK-N-ASrOXALI4000 cells were also more sensitive to lysis by influenza A virus, a candidate for oncolytic therapy, than SK-N-AS cells. In conclusion, we introduce a novel oxaliplatin resistance model. The oxaliplatin resistance mechanisms in SK-N-ASrOXALI4000 cells appear to be complex and not to directly depend on enhanced DNA repair capacity. Models of oxaliplatin resistance are of particular relevance since research on platinum drugs has so far predominantly focused on cisplatin and carboplatin.
  • Martell, H. et al. (2017). Associating mutations causing cystinuria with disease severity with the aim of providing precision medicine. BMC Genomics [Online] 18:550-550. Available at: https://dx.doi.org/10.1186/s12864-017-3913-1.
    Cystinuria is an inherited disease that results in the formation of cystine stones in the kidney, which can have serious health complications. Two genes (SLC7A9 and SLC3A1) that form an amino acid transporter are known to be responsible for the disease. Variants that cause the disease disrupt amino acid transport across the cell membrane, leading to the build-up of relatively insoluble cystine, resulting in formation of stones. Assessing the effects of each mutation is critical in order to provide tailored treatment options for patients. We used various computational methods to assess the effects of cystinuria associated mutations, utilising information on protein function, evolutionary conservation and natural population variation of the two genes. We also analysed the ability of some methods to predict the phenotypes of individuals with cystinuria, based on their genotypes, and compared this to clinical data.
    Using a literature search, we collated a set of 94 SLC3A1 and 58 SLC7A9 point mutations known to be associated with cystinuria. There are differences in sequence location, evolutionary conservation, allele frequency, and predicted effect on protein function between these mutations and other genetic variants of the same genes that occur in a large population. Structural analysis considered how these mutations might lead to cystinuria. For SLC7A9, many mutations swap hydrophobic amino acids for charged amino acids or vice versa, while others affect known functional sites. For SLC3A1, functional information is currently insufficient to make confident predictions but mutations often result in the loss of hydrogen bonds and largely appear to affect protein stability. Finally, we showed that computational predictions of mutation severity were significantly correlated with the disease phenotypes of patients from a clinical study, despite different methods disagreeing for some of their predictions.
    The results of this study are promising and highlight the areas of research which must now be pursued to better understand how mutations in SLC3A1 and SLC7A9 cause cystinuria. The application of our approach to a larger data set is essential, but we have shown that computational methods could play an important role in designing more effective personalised treatment options for patients with cystinuria.
  • Cantoni, D. et al. (2016). Risks Posed by Reston, the Forgotten Ebolavirus. mSphere [Online] 1. Available at: http://dx.doi.org/10.1128/mSphere.00322-16.
  • Jiang, Y. et al. (2016). An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome biology [Online] 17:184. Available at: https://doi.org/10.1186/s13059-016-1037-6.

    A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging.


    We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2.


    The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.
  • Lobo, S. et al. (2016). Desulfovibrio vulgarisCbiKPcobaltochelatase: evolution of a haem binding protein orchestrated by the incorporation of two histidine residues. Environmental Microbiology [Online] 19:106-118. Available at: http://doi.org/10.1111/1462-2920.13479.
    The sulfate-reducing bacteria of the Desulfovibrio genus make three distinct modified tetrapyrroles, haem, sirohaem and adenosylcobamide, where sirohydrochlorin acts as the last common biosynthetic intermediate along the branched tetrapyrrole pathway. Intriguingly, D. vulgaris encodes two sirohydrochlorin chelatases, CbiKP and CbiKC, that insert cobalt/iron into the tetrapyrrole macrocycle but are thought to be distinctly located in the periplasm and cytoplasm respectively. Fusing GFP onto the C-terminus of CbiKP confirmed that the protein is transported to the periplasm. The structure-function relationship of CbiKP was studied by constructing eleven site-directed mutants and determining their chelatase activities, oligomeric status and haem binding abilities. Residues His154 and His216 were identified as essential for metal-chelation of sirohydrochlorin. The tetrameric form of the protein is stabilized by Arg54 and Glu76, which form hydrogen bonds between two subunits. His96 is responsible for the binding of two haem groups within the main central cavity of the tetramer. Unexpectedly, CbiKP is shown to bind two additional haem groups through interaction with His103. Thus, although still retaining cobaltochelatase activity, the presence of His96 and His103 in CbiKP, which are absent from all other known bacterial cobaltochelatases, has evolved CbiKP a new function as a haem binding protein permitting it to act as a potential haem chaperone or transporter.
  • Michaelis, M., Rossman, J. and Wass, M. (2016). Computational analysis of Ebolavirus data: prospects, promises and challenges. Biochemical Society transactions [Online] 44:973-8. Available at: http://dx.doi.org/10.1042/BST20160074.
    The ongoing Ebola virus (also known as Zaire ebolavirus, a member of the Ebolavirus family) outbreak in West Africa has so far resulted in >28000 confirmed cases compared with previous Ebolavirus outbreaks that affected a maximum of a few hundred individuals. Hence, Ebolaviruses impose a much greater threat than we may have expected (or hoped). An improved understanding of the virus biology is essential to develop therapeutic and preventive measures and to be better prepared for future outbreaks by members of the Ebolavirus family. Computational investigations can complement wet laboratory research for biosafety level 4 pathogens such as Ebolaviruses for which the wet experimental capacities are limited due to a small number of appropriate containment laboratories. During the current West Africa outbreak, sequence data from many Ebola virus genomes became available providing a rich resource for computational analysis. Here, we consider the studies that have already reported on the computational analysis of these data. A range of properties have been investigated including Ebolavirus evolution and pathogenicity, prediction of micro RNAs and identification of Ebolavirus specific signatures. However, the accuracy of the results remains to be confirmed by wet laboratory experiments. Therefore, communication and exchange between computational and wet laboratory researchers is necessary to make maximum use of computational analyses and to iteratively improve these approaches.
  • Wass, M., Rossman, J. and Michaelis, M. (2016). Ebola outbreak highlights the need for wet and dry laboratory collaboration. Journal of Emerging Diseases and Virology [Online] 2. Available at: http://dx.doi.org/10.16966/2473-1846.e102.
    The recent Ebola outbreak in Western Africa taught us that Ebolaviruses
    can cause much larger outbreaks and represent a much greater health
    threat than many of us believed (or wanted to believe). As of 30th March,
    the outbreak had resulted in 28,646 confirmed cases and 11,323 deaths.
    Although the WHO stated that the Ebola epidemic in West Africa no
    longer represents a Public Health Emergency of International Concern,
    since Guinea, Liberia, and Sierra are now capable of controlling and
    maintaining further small outbreaks, flare-ups still occur, most recently,
    on 4th April when two new cases were reported in Liberia (www.who.int).
  • Wong, K., Wass, M. and Thomas, K. (2016). The Role of Protein Modelling in Predicting the Disease Severity of Cystinuria. European Urology [Online] 69:543-544. Available at: http://dx.doi.org/10.1016/j.eururo.2015.10.039.
  • Shagari, H. et al. (2016). The 2014 Ebola Outbreak: Preparedness in West African Countries and its Impact on the Size of the Outbreak. Journal of Emerging Diseases and Virology [Online] 2. Available at: http://dx.doi.org/10.16966/2473-1846.123.
    The recent Ebola virus outbreak in West Africa was the first that reached epidemic size resulting in more than 28,000 suspected and confirmed cases and more than 11,000 deaths. Here, we performed a meta-analysis to determine the role of preparedness in the course of the epidemic. Relevant research articles were identified using the search terms "Ebola 2014 preparedness", "Ebola 2014 treatment and diagnosis", "Ebola 2014 isolation", "Ebola 2014 culture", and "Ebola 2014 Health Care Workers" in PubMed. 21 relevant original articles in English were identified and analysed. Results revealed that a lack of preparedness substantially contributed to the scale of the Ebola epidemic in West Africa. Studies consistently reported on shortcomings in the availability and use of personal protective equipment, transportation and communication systems, surveillance, patient isolation and treatment, training of healthcare workers, and public awareness and perception in the affected West African countries. Effective surveillance and patient isolation enabled outbreak control. In conclusion, effective health care systems and procedures for early detection and containment of outbreaks, in combination with education of the population will be needed to better control future Ebola outbreaks and outbreaks of other (novel) pathogens for which no effective treatment is available.
  • Pappalardo, M. et al. (2016). Conserved differences in protein sequence determine the human pathogenicity of Ebolaviruses. Scientific reports [Online] 6:23743. Available at: http://dx.doi.org/10.1038/srep23743.
    Reston viruses are the only Ebolaviruses that are not pathogenic in humans. We analyzed 196 Ebolavirus genomes and identified specificity determining positions (SDPs) in all nine Ebolavirus proteins that distinguish Reston viruses from the four human pathogenic Ebolaviruses. A subset of these SDPs will explain the differences in human pathogenicity between Reston and the other four ebolavirus species. Structural analysis was performed to identify those SDPs that are likely to have a functional effect. This analysis revealed novel functional insights in particular for Ebolavirus proteins VP40 and VP24. The VP40 SDP P85T interferes with VP40 function by altering octamer formation. The VP40 SDP Q245P affects the structure and hydrophobic core of the protein and consequently protein function. Three VP24 SDPs (T131S, M136L, Q139R) are likely to impair VP24 binding to human karyopherin alpha5 (KPNA5) and therefore inhibition of interferon signaling. Since VP24 is critical for Ebolavirus adaptation to novel hosts, and only a few SDPs distinguish Reston virus VP24 from VP24 of other Ebolaviruses, human pathogenic Reston viruses may emerge. This is of concern since Reston viruses circulate in domestic pigs and can infect humans, possibly via airborne transmission.
  • Michaelis, M. et al. (2016). Substrate-specific effects of pirinixic acid derivatives on ABCB1-mediated drug transport. Oncotarget [Online] 7:11664-76. Available at: http://dx.doi.org/10.18632/oncotarget.7345.
    Pirinixic acid derivatives, a new class of drug candidates for a range of diseases, interfere with targets including PPAR?, PPAR?, 5-lipoxygenase (5-LO), and microsomal prostaglandin and E2 synthase-1 (mPGES1). Since 5-LO, mPGES1, PPAR?, and PPAR? represent potential anti-cancer drug targets, we here investigated the effects of 39 pirinixic acid derivatives on prostate cancer (PC-3) and neuroblastoma (UKF-NB-3) cell viability and, subsequently, the effects of selected compounds on drug-resistant neuroblastoma cells. Few compounds affected cancer cell viability in low micromolar concentrations but there was no correlation between the anti-cancer effects and the effects on 5-LO, mPGES1, PPAR?, or PPAR?. Most strikingly, pirinixic acid derivatives interfered with drug transport by the ATP-binding cassette (ABC) transporter ABCB1 in a drug-specific fashion. LP117, the compound that exerted the strongest effect on ABCB1, interfered in the investigated concentrations of up to 2?M with the ABCB1-mediated transport of vincristine, vinorelbine, actinomycin D, paclitaxel, and calcein-AM but not of doxorubicin, rhodamine 123, or JC-1. In silico docking studies identified differences in the interaction profiles of the investigated ABCB1 substrates with the known ABCB1 binding sites that may explain the substrate-specific effects of LP117. Thus, pirinixic acid derivatives may offer potential as drug-specific modulators of ABCB1-mediated drug transport.
  • Wass, M., Rossman, J. and Michaelis, M. (2016). Ebola outbreak highlights the need for wet and dry laboratory collaboration. Journal of Virology and Emerging Diseases [Online] 2. Available at: http://dx.doi.org/10.16966/2473-1846.e102.
  • Voges, Y. et al. (2016). Effects of YM155 on survivin levels and viability in neuroblastoma cells with acquired drug resistance. Cell death & disease [Online] 7:e2410. Available at: http://dx.doi.org/10.1038/cddis.2016.257.
    Resistance formation after initial therapy response (acquired resistance) is common in high-risk neuroblastoma patients. YM155 is a drug candidate that was introduced as a survivin suppressant. This mechanism was later challenged, and DNA damage induction and Mcl-1 depletion were suggested instead. Here we investigated the efficacy and mechanism of action of YM155 in neuroblastoma cells with acquired drug resistance. The efficacy of YM155 was determined in neuroblastoma cell lines and their sublines with acquired resistance to clinically relevant drugs. Survivin levels, Mcl-1 levels, and DNA damage formation were determined in response to YM155. RNAi-mediated depletion of survivin, Mcl-1, and p53 was performed to investigate their roles during YM155 treatment. Clinical YM155 concentrations affected the viability of drug-resistant neuroblastoma cells through survivin depletion and p53 activation. MDM2 inhibitor-induced p53 activation further enhanced YM155 activity. Loss of p53 function generally affected anti-neuroblastoma approaches targeting survivin. Upregulation of ABCB1 (causes YM155 efflux) and downregulation of SLC35F2 (causes YM155 uptake) mediated YM155-specific resistance. YM155-adapted cells displayed increased ABCB1 levels, decreased SLC35F2 levels, and a p53 mutation. YM155-adapted neuroblastoma cells were also characterized by decreased sensitivity to RNAi-mediated survivin depletion, further confirming survivin as a critical YM155 target in neuroblastoma. In conclusion, YM155 targets survivin in neuroblastoma. Furthermore, survivin is a promising therapeutic target for p53 wild-type neuroblastomas after resistance acquisition (neuroblastomas are rarely p53-mutated), potentially in combination with p53 activators. In addition, we show that the adaptation of cancer cells to molecular-targeted anticancer drugs is an effective strategy to elucidate a drug's mechanism of action.
  • Michaelis, M. et al. (2015). Identification of flubendazole as potential anti-neuroblastoma compound in a large cell line screen. Scientific reports [Online] 5:8202. Available at: http://www.nature.com/srep/2015/150203/srep08202/full/srep08202.html.
    Flubendazole was shown to exert anti-leukaemia and anti-myeloma activity through inhibition of microtubule function. Here, flubendazole was tested for its effects on the viability of in total 461 cancer cell lines. Neuroblastoma was identified as highly flubendazole-sensitive cancer entity in a screen of 321 cell lines from 26 cancer entities. Flubendazole also reduced the viability of five primary neuroblastoma samples in nanomolar concentrations thought to be achievable in humans and inhibited vessel formation and neuroblastoma tumour growth in the chick chorioallantoic membrane assay. Resistance acquisition is a major problem in high-risk neuroblastoma. 119 cell lines from a panel of 140 neuroblastoma cell lines with acquired resistance to various anti-cancer drugs were sensitive to flubendazole in nanomolar concentrations. Tubulin-binding agent-resistant cell lines displayed the highest flubendazole IC50 and IC90 values but differences between drug classes did not reach statistical significance. Flubendazole induced p53-mediated apoptosis. The siRNA-mediated depletion of the p53 targets p21, BAX, or PUMA reduced the neuroblastoma cell sensitivity to flubendazole with PUMA depletion resulting in the most pronounced effects. The MDM2 inhibitor and p53 activator nutlin-3 increased flubendazole efficacy while RNAi-mediated p53-depletion reduced its activity. In conclusion, flubendazole represents a potential treatment option for neuroblastoma including therapy-refractory cells.
  • Kelley, L. et al. (2015). The Phyre2 web portal for protein modeling, prediction and analysis. Nature protocols [Online] 10:845-58. Available at: http://dx.doi.org/10.1038/nprot.2015.053.
    Phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre, the original version of the server for which we previously published a paper in Nature Protocols. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous SNPs (nsSNPs)) for a user's protein sequence. Users are guided through results by a simple interface at a level of detail they determine. This protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model. The server is available at http://www.sbg.bio.ic.ac.uk/phyre2. A typical structure prediction will be returned between 30 min and 2 h after submission.
  • Friedberg, I. et al. (2015). Ten simple rules for a community computational challenge. PLoS computational biology [Online] 11. Available at: https://doi.org/10.1371/journal.pcbi.1004150.
  • Talman, A. et al. (2014). Proteomic analysis of the Plasmodium male gamete reveals the key role for glycolysis in flagellar motility. Malaria Journal [Online] 13:315. Available at: http://dx.doi.org/10.1186/1475-2875-13-315.
  • Wong, K. et al. (2014). The Genetic Diversity of Cystinuria in a UK Population of Patients. BJU International [Online] 116:109-116. Available at: http://dx.doi.org/10.1111/bju.12894.

    To examine the genetic mutations in the first UK cohort of patients with cystinuria with preliminary genotype/phenotype correlation.

    Patients and Methods

    DNA sequencing and multiplex ligation-dependent probe amplification (MLPA) were used to identify the mutations in 74 patients in a specialist cystinuria clinic in the UK. Patients with type A cystinuria were classified into two groups: Group M patients had at least one missense mutation and Group N patients had two alleles of all other types of mutations including frameshift, splice site, nonsense, deletions and duplications. The levels of urinary dibasic amino acids, age at presentation of disease, number of stone episodes and interventions were compared between patients in the two groups using the Mann–Whitney U-test.


    In all, 41 patients had type A cystinuria, including one patient with a variant of unknown significance and 23 patients had type B cystinuria, including six patients with variants of unknown significance. One patient had three sequence variants in SLC7A9; however, two are of unknown significance. Three patients had type AB cystinuria. Three had a single mutation in SLC7A9. No identified mutations were found in three patients in either gene. There were a total of 88 mutations in SLC3A1 and 55 mutations in SLC7A9. There were 23 pathogenic mutations identified in our UK cohort of patients not previously published. In patients with type A cystinuria, the presence of a missense mutation correlated to lower levels of urinary lysine (mean [se] 611.9 [22.65] vs 752.3 [46.39] millimoles per mole of creatinine [mm/MC]; P=0.02), arginine (194.8 [24.83] vs 397.7 [15.32] mm/MC; P<0.001) and ornithine (109.2 [7.40] vs 146.6 [12.7] mm/MC; P=0.02). There was no difference in the levels of urinary cystine (182.1 [8.89] vs 207.2 [19.23] mm/MC; P=0.23).


    We have characterised the genetic diversity of cystinuria in a UK population including 23 pathogenic mutations not previously published. Patients with at least one missense mutation in SLC3A1 had significantly lower levels of lysine, arginine, and ornithine but not cystine than patients with all other combinations of mutations.
  • Wass, M. et al. (2014). The automated function prediction SIG looks back at 2013 and prepares for 2014. Bioinformatics [Online] 30:2091-2092. Available at: http://dx.doi.org/10.1093/bioinformatics/btu117.
    The mission of the Automated Function Prediction Special Interest Group (AFP-SIG) is to coalesce the community of computational biologists, experimental biologists and biocurators who are addressing the challenge of protein function prediction, thereby sharing ideas and creating collaborations. The AFP-SIG holds annual meetings alongside the Intelligent Systems for Molecular Biology, the leading conference of the International Society for Computational Biology. The AFP–SIG also runs the ongoing Critical Assessment of Functional Annotation (CAFA) challenge (Radivojac et al., 2013.
  • Pappalardo, M. and Wass, M. (2014). VarMod: modelling the functional effects of non-synonymous variants. Nucleic Acids Research [Online] 42:W331-W336. Available at: http://dx.doi.org/10.1093/nar/gku483.
    Unravelling the genotype–phenotype relationship in humans remains a challenging task in genomics studies. Recent advances in sequencing technologies mean there are now thousands of sequenced human genomes, revealing millions of single nucleotide variants (SNVs). For non-synonymous SNVs present in proteins the difficulties of the problem lie in first identifying those nsSNVs that result in a functional change in the protein among the many non-functional variants and in turn linking this functional change to phenotype. Here we present VarMod (Variant Modeller) a method that utilises both protein sequence and structural features to predict nsSNVs that alter protein function. VarMod develops recent observations that functional nsSNVs are enriched at protein–protein interfaces and protein–ligand binding sites and uses these characteristics to make predictions. In benchmarking on a set of nearly 3000 nsSNVs VarMod performance is comparable to an existing state of the art method. The VarMod web server provides extensive resources to investigate the sequence and structural features associated with the predictions including visualisation of protein models and complexes via an interactive JSmol molecular viewer.

    VarMod is available for use at http://www.wasslab.org/varmod.
  • Zhi, D. et al. (2014). The South Asian Genome. PLoS ONE [Online] 9:e102645. Available at: http://dx.doi.org/10.1371/journal.pone.0102645.
  • Radivojac, P. et al. (2013). A large-scale evaluation of computational protein function prediction. Nature Methods [Online] 10:221-227. Available at: http://dx.doi.org/10.1038/nmeth.2340.
    Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.
  • Wass, M. et al. (2012). Proteomic analysis of Plasmodium in the mosquito: progress and pitfalls. Parasitology [Online] 139:1131-1145. Available at: http://dx.doi.org/10.1017/S0031182012000133.
    Here we discuss proteomic analyses of whole cell preparations of the mosquito stages of malaria parasite development (i.e. gametocytes, microgamete, ookinete, oocyst and sporozoite) of Plasmodium berghei. We also include critiques of the proteomes of two cell fractions from the purified ookinete, namely the micronemes and cell surface. Whereas we summarise key biological interpretations of the data, we also try to identify key methodological constraints we have met, only some of which we were able to resolve. Recognising the need to translate the potential of current genome sequencing into functional understanding, we report our efforts to develop more powerful combinations of methods for the in silico prediction of protein function and location. We have applied this analysis to the proteome of the male gamete, a cell whose very simple structural organisation facilitated interpretation of data. Some of the in silico predictions made have now been supported by ongoing protein tagging and genetic knockout studies. We hope this discussion may assist future studies.
  • Wass, M., Barton, G. and Sternberg, M. (2012). CombFunc: predicting protein function using heterogeneous data sources. Nucleic Acids Research [Online] 40:W466-W470. Available at: http://dx.doi.org/10.1093/nar/gks489.
    Only a small fraction of known proteins have been functionally characterized, making protein function prediction essential to propose annotations for uncharacterized proteins. In recent years many function prediction methods have been developed using various sources of biological data from protein sequence and structure to gene expression data. Here we present the CombFunc web server, which makes Gene Ontology (GO)-based protein function predictions. CombFunc incorporates ConFunc, our existing function prediction method, with other approaches for function prediction that use protein sequence, gene expression and protein–protein interaction data. In benchmarking on a set of 1686 proteins CombFunc obtains precision and recall of 0.71 and 0.64 respectively for gene ontology molecular function terms. For biological process GO terms precision of 0.74 and recall of 0.41 is obtained.
  • David, A. et al. (2012). Protein-protein interaction sites are hot spots for disease-associated nonsynonymous SNPs. Human Mutation [Online] 33:359-363. Available at: http://dx.doi.org/10.1002/humu.21656.
    Many nonsynonymous single nucleotide polymorphisms (nsSNPs) are disease causing due to effects at protein-protein interfaces. We have integrated a database of the three-dimensional (3D) structures of human protein/protein complexes and the humsavar database of nsSNPs. We analyzed the location of nsSNPS in terms of their location in the protein core, at protein-protein interfaces, and on the surface when not at an interface. Disease-causing nsSNPs that do not occur in the protein core are preferentially located at protein-protein interfaces rather than surface noninterface regions when compared to random segregation. The disruption of the protein-protein interaction can be explained by a range of structural effects including the loss of an electrostatic salt bridge, the destabilization due to reduction of the hydrophobic effect, the formation of a steric clash, and the introduction of a proline altering the main-chain conformation.
  • Wass, M. et al. (2011). Towards the prediction of protein interaction partners using physical docking. Molecular Systems Biology [Online] 7. Available at: http://dx.doi.org/10.1038/msb.2011.3.
  • Chambers, J. et al. (2011). Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma. Nature Genetics [Online] 43:1131-1138. Available at: http://dx.doi.org/10.1038/ng.970.
    Concentrations of liver enzymes in plasma are widely used as indicators of liver disease. We carried out a genome-wide association study in 61,089 individuals, identifying 42 loci associated with concentrations of liver enzymes in plasma, of which 32 are new associations (P = 10(-8) to P = 10(-190)). We used functional genomic approaches including metabonomic profiling and gene expression analyses to identify probable candidate genes at these regions. We identified 69 candidate genes, including genes involved in biliary transport (ATP8B1 and ABCB11), glucose, carbohydrate and lipid metabolism (FADS1, FADS2, GCKR, JMJD1C, HNF1A, MLXIPL, PNPLA3, PPP1R3B, SLC2A2 and TRIB1), glycoprotein biosynthesis and cell surface glycobiology (ABO, ASGR1, FUT2, GPLD1 and ST3GAL4), inflammation and immunity (CD276, CDH6, GCKR, HNF1A, HPR, ITGA1, RORA and STAT4) and glutathione metabolism (GSTT1, GSTT2 and GGT), as well as several genes of uncertain or unknown function (including ABHD12, EFHD1, EFNA1, EPHA2, MICAL3 and ZNF827). Our results provide new insight into genetic mechanisms and pathways influencing markers of liver function.
  • Wass, M., David, A. and Sternberg, M. (2011). Challenges for the prediction of macromolecular interactions. Current Opinion in Structural Biology [Online] 21:382-90. Available at: http://dx.doi.org/10.1016/j.sbi.2011.03.013.
    Macromolecular interactions are central to most cellular processes. Experimental methods generate diverse data on these interactions ranging from high throughput protein-protein interactions (PPIs) to the crystallised structures of complexes. Despite this, only a fraction of interactions have been identified and therefore predictive methods are essential to fill in the numerous gaps. Many predictive methods use information from related proteins. Accordingly, we review the conservation of interface and ligand binding sites within protein families and their association with conserved residues and Specificity Determining Positions. We then review recent developments in predictive methods for the identification of PPIs, protein interface sites and small molecule ligand binding sites. The challenges that are still faced by the community in these areas are discussed.
  • Chambers, J. et al. (2010). Genetic variation in SCN10A influences cardiac conduction. Nature Genetics [Online] 42:149-152. Available at: http://dx.doi.org/10.1038/ng.516.
    To identify genetic factors influencing cardiac conduction, we carried out a genome-wide association study of electrocardiographic time intervals in 6,543 Indian Asians. We identified association of a nonsynonymous SNP, rs6795970, in SCN10A (P = 2.8 x 10(-15)) with PR interval, a marker of cardiac atrioventricular conduction. Replication testing among 6,243 Indian Asians and 5,370 Europeans confirmed that rs6795970 (G>A) is associated with prolonged cardiac conduction (longer P-wave duration, PR interval and QRS duration, P = 10(-5) to 10(-20)). SCN10A encodes Na(V)1.8, a sodium channel. We show that SCN10A is expressed in mouse and human heart tissue and that PR interval is shorter in Scn10a(-/-) mice than in wild-type mice. We also find that rs6795970 is associated with a higher risk of heart block (P < 0.05) and a lower risk of ventricular fibrillation (P = 0.01). Our findings provide new insight into the pathogenesis of cardiac conduction, heart block and ventricular fibrillation.
  • Sinden, R. et al. (2010). The flagellum in malarial parasites. Current Opinion in Microbiology [Online] 13:491-500. Available at: http://dx.doi.org/10.1016/j.mib.2010.05.016.
    The malarial parasites assemble flagella exclusively during the formation of the male gamete in the midgut of the female mosquito vector. The observation of gamete formation ex vivo reported by Laveran (Laveran MA: De la nature parasitaire des accidents de l'impaludisme. Comptes Rendues De La Societe de Biologie. Paris 1881, 93:627-630) was seminal to the discovery of the parasite itself. Following ingestion of malaria-infected blood by the mosquito, microgamete formation from the terminally arrested gametocytes is exceptionally rapid, completing three mitotic divisions in just a few minutes, and is precisely regulated. This review attempts to draw together the diverse original observations with subsequent electron microscopic studies, and recent work on the signalling pathways regulating sexual development, together with transcriptomic and proteomic studies that are paving the way to new understandings of the molecular mechanisms involved and the potential they offer for effective interventions to block the transmission of the parasites in natural communities.
  • Chambers, J. et al. (2010). Genetic loci influencing kidney function and chronic kidney disease. Nature Genetics [Online] 42:373-5. Available at: http://dx.doi.org/doi:10.1038/ng.566.
    Using genome-wide association, we identify common variants at 2p12-p13, 6q26, 17q23 and 19q13 associated with serum creatinine, a marker of kidney function (P = 10(-10) to 10(-15)). Of these, rs10206899 (near NAT8, 2p12-p13) and rs4805834 (near SLC7A9, 19q13) were also associated with chronic kidney disease (P = 5.0 x 10(-5) and P = 3.6 x 10(-4), respectively). Our findings provide insight into metabolic, solute and drug-transport pathways underlying susceptibility to chronic kidney disease.
  • Wass, M., Kelley, L. and Sternberg, M. (2010). 3DLigandSite: predicting ligand-binding sites using similar structures. Nucleic Acids Research [Online] 38:W469-W473. Available at: http://dx.doi.org/10.1093/nar/gkq406.
    3DLigandSite is a web server for the prediction of ligand-binding sites. It is based upon successful manual methods used in the eighth round of the Critical Assessment of techniques for protein Structure Prediction (CASP8). 3DLigandSite utilizes protein-structure prediction to provide structural models for proteins that have not been solved. Ligands bound to structures similar to the query are superimposed onto the model and used to predict the binding site. In benchmarking against the CASP8 targets 3DLigandSite obtains a Matthew’s correlation co-efficient (MCC) of 0.64, and coverage and accuracy of 71 and 60%, respectively, similar results to our manual performance in CASP8. In further benchmarking using a large set of protein structures, 3DLigandSite obtains an MCC of 0.68. The web server enables users to submit either a query sequence or structure.
  • Chambers, J. et al. (2009). Genome-wide association study identifies variants in TMPRSS6 associated with hemoglobin levels. Nature Genetics [Online] 41:1170-1172. Available at: http://dx.doi.org/10.1038/ng.462.
    We carried out a genome-wide association study of hemoglobin levels in 16,001 individuals of European and Indian Asian ancestry. The most closely associated SNP (rs855791) results in nonsynonymous (V736A) change in the serine protease domain of TMPRSS6 and a blood hemoglobin concentration 0.13 (95% CI 0.09-0.17) g/dl lower per copy of allele A (P = 1.6 x 10(-13)). Our findings suggest that TMPRSS6, a regulator of hepcidin synthesis and iron handling, is crucial in hemoglobin level maintenance.
  • Wass, M. and Sternberg, M. (2009). Prediction of ligand binding sites using homologous structures and conservation at CASP8. Proteins:Structure, Function, and Genetics [Online] 77 Sup:147-151. Available at: http://dx.doi.org/10.1002/prot.22513.
    The critical assessment of protein structure prediction experiment is a blind assessment of the prediction of protein structure and related topics including function prediction. We present our results in the function/binding site prediction category. Our approach to identify binding sites combined the use of the predicted structure of the targets with both residue conservation and the location of ligands bound to homologous structures. We obtained an average coverage of 83% and 56% accuracy. Analysis of our predictions suggests that over-prediction reduces the accuracy obtained due to large areas of conservation around the binding site that do not bind the ligand. In some proteins such conserved residues may have a functional role. A server version of our method will soon be available.
  • Wass, M. and Sternberg, M. (2008). ConFunc--functional annotation in the twilight zone. Bioinformatics [Online] 24:798-806. Available at: http://dx.doi.org/10.1093/bioinformatics/btn037.
    Motivation: The success of genome sequencing has resulted in many protein sequences without functional annotation. We present ConFunc, an automated Gene Ontology (GO)-based protein function prediction approach, which uses conserved residues to generate sequence profiles to infer function. ConFunc split sets of sequences identified by PSI-BLAST into sub-alignments according to their GO annotations. Conserved residues are identified for each GO term sub-alignment for which a position specific scoring matrix is generated. This combination of steps produces a set of feature (GO annotation) derived profiles from which protein function is predicted.

    Results: We assess the ability of ConFunc, BLAST and PSI-BLAST to predict protein function in the twilight zone of sequence similarity. ConFunc significantly outperforms BLAST & PSI-BLAST obtaining levels of recall and precision that are not obtained by either method and maximum precision 24% greater than BLAST. Further for a large test set of sequences with homologues of low sequence identity, at high levels of presicision, ConFunc obtains recall six times greater than BLAST. These results demonstrate the potential for ConFunc to form part of an automated genomics annotation pipeline.
  • Gherardini, P. et al. (2007). Convergent evolution of enzyme active sites is not a rare phenomenon. Journal of Molecular Biology [Online] 372:817-845. Available at: http://dx.doi.org/10.1016/j.jmb.2007.06.017.
    Since convergent evolution of enzyme active sites was first identified in serine proteases, other individual instances of this phenomenon have been documented. However, a systematic analysis assessing the frequency of this phenomenon across enzyme space is still lacking. This work uses the Query3d structural comparison algorithm to integrate for the first time detailed knowledge about catalytic residues, available through the Catalytic Site Atlas (CSA), with the evolutionary information provided by the Structural Classification of Proteins (SCOP) database. This study considers two modes of convergent evolution: (i) mechanistic analogues which are enzymes that use the same mechanism to perform related, but possibly different, reactions (considered here as sharing the first three digits of the EC number); and (ii) transformational analogues which catalyse exactly the same reaction (identical EC numbers), but may use different mechanisms. Mechanistic analogues were identified in 15% (26 out of 169) of the three-digit EC groups considered, showing that this phenomenon is not rare. Furthermore 11 of these groups also contain transformational analogues. The catalytic triad is the most widespread active site; the results of the structural comparison show that this mechanism, or variations thereof, is present in 23 superfamilies. Transformational analogues were identified for 45 of the 951 four-digit EC numbers present within the CSA and about half of these were also mechanistic analogues exhibiting convergence of their active sites. This analysis has also been extended to the whole Protein Data Bank to provide a complete and manually curated list of the all the transformational analogues whose structure is classified in SCOP. The results of this work show that the phenomenon of convergent evolution is not rare, especially when considering large enzymatic families.
Last updated