School of Biosciences

Join us in our journey of discovery


 

profile image for Dr Mark Wass

Dr Mark Wass

Lecturer in Computational Biology

School of Biosciences

 

Mark joined the School of Biosciences in October 2012. He obtained his first degree in Natural Sciences at Cambridge University in 2000 followed by a Masters in Computing at Imperial College London. After a few years working in Industry as an IT consultant Mark studied for a PhD with Prof Mike Sternberg at Imperial (2004-2008) and continued onto a post-doctoral position in the group until 2011. In 2011 Mark was awarded a FEBS Long Term Fellowship to work in the group of Alfonso Valencia at the CNIO (Spanish National Cancer Research Centre, Madrid, Spain).

Mark's research interests are in Structural Bioinformatics particularly the analysis and prediction of protein function, structure and interactions. He is also interested in using such approaches to analyse genetic variation and identify the functional effects that are associated with disease.

Mark is a member of the Cytogenomics and Bioinformatics Group.

back to top

 

Also view these in the Kent Academic Repository
Articles

    Radivojac, Predrag and Clark, Wyatt T. and Oron, Tal Ronnen et al. (2013) A large-scale evaluation of computational protein function prediction. Nature Methods, 10 (3). pp. 221-227. ISSN 1548-7091.

    Abstract

    Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.

    David, Alessia and Razali, Rozami and Wass, Mark N. et al. (2012) Protein-protein interaction sites are hot spots for disease-associated nonsynonymous SNPs. Human Mutation, 33 (2). pp. 359-363. ISSN 1059-7794.

    Abstract

    Many nonsynonymous single nucleotide polymorphisms (nsSNPs) are disease causing due to effects at protein-protein interfaces. We have integrated a database of the three-dimensional (3D) structures of human protein/protein complexes and the humsavar database of nsSNPs. We analyzed the location of nsSNPS in terms of their location in the protein core, at protein-protein interfaces, and on the surface when not at an interface. Disease-causing nsSNPs that do not occur in the protein core are preferentially located at protein-protein interfaces rather than surface noninterface regions when compared to random segregation. The disruption of the protein-protein interaction can be explained by a range of structural effects including the loss of an electrostatic salt bridge, the destabilization due to reduction of the hydrophobic effect, the formation of a steric clash, and the introduction of a proline altering the main-chain conformation.

    Wass, Mark N. and Barton, Geraint and Sternberg, Michael J.E. (2012) CombFunc: predicting protein function using heterogeneous data sources. Nucleic Acids Research, 40 (W1). pp. W466-W470. ISSN 0305-1048.

    Abstract

    Only a small fraction of known proteins have been functionally characterized, making protein function prediction essential to propose annotations for uncharacterized proteins. In recent years many function prediction methods have been developed using various sources of biological data from protein sequence and structure to gene expression data. Here we present the CombFunc web server, which makes Gene Ontology (GO)-based protein function predictions. CombFunc incorporates ConFunc, our existing function prediction method, with other approaches for function prediction that use protein sequence, gene expression and protein–protein interaction data. In benchmarking on a set of 1686 proteins CombFunc obtains precision and recall of 0.71 and 0.64 respectively for gene ontology molecular function terms. For biological process GO terms precision of 0.74 and recall of 0.41 is obtained.

    Wass, Mark N. and Stanway, R. and Blagborough, A. M. et al. (2012) Proteomic analysis of Plasmodium in the mosquito: progress and pitfalls. Parasitology, 139 (9). pp. 1131-1145. ISSN 1469-8161.

    Abstract

    Here we discuss proteomic analyses of whole cell preparations of the mosquito stages of malaria parasite development (i.e. gametocytes, microgamete, ookinete, oocyst and sporozoite) of Plasmodium berghei. We also include critiques of the proteomes of two cell fractions from the purified ookinete, namely the micronemes and cell surface. Whereas we summarise key biological interpretations of the data, we also try to identify key methodological constraints we have met, only some of which we were able to resolve. Recognising the need to translate the potential of current genome sequencing into functional understanding, we report our efforts to develop more powerful combinations of methods for the in silico prediction of protein function and location. We have applied this analysis to the proteome of the male gamete, a cell whose very simple structural organisation facilitated interpretation of data. Some of the in silico predictions made have now been supported by ongoing protein tagging and genetic knockout studies. We hope this discussion may assist future studies.

    Wass, Mark N. and Fuentes, Gloria and Pons, Carles et al. (2011) Towards the prediction of protein interaction partners using physical docking. Molecular Systems Biology, 7. ISSN 1744-4292.

    Chambers, John C and Zhang, Weihua and Sehmi, Joban S. et al. (2011) Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma. Nature Genetics, 43 (11). pp. 1131-1138. ISSN 1061-4036.

    Abstract

    Concentrations of liver enzymes in plasma are widely used as indicators of liver disease. We carried out a genome-wide association study in 61,089 individuals, identifying 42 loci associated with concentrations of liver enzymes in plasma, of which 32 are new associations (P = 10(-8) to P = 10(-190)). We used functional genomic approaches including metabonomic profiling and gene expression analyses to identify probable candidate genes at these regions. We identified 69 candidate genes, including genes involved in biliary transport (ATP8B1 and ABCB11), glucose, carbohydrate and lipid metabolism (FADS1, FADS2, GCKR, JMJD1C, HNF1A, MLXIPL, PNPLA3, PPP1R3B, SLC2A2 and TRIB1), glycoprotein biosynthesis and cell surface glycobiology (ABO, ASGR1, FUT2, GPLD1 and ST3GAL4), inflammation and immunity (CD276, CDH6, GCKR, HNF1A, HPR, ITGA1, RORA and STAT4) and glutathione metabolism (GSTT1, GSTT2 and GGT), as well as several genes of uncertain or unknown function (including ABHD12, EFHD1, EFNA1, EPHA2, MICAL3 and ZNF827). Our results provide new insight into genetic mechanisms and pathways influencing markers of liver function.

    Wass, Mark N. and David, Alessia and Sternberg, Michael J.E. (2011) Challenges for the prediction of macromolecular interactions. Current Opinion in Structural Biology, 21 (3). pp. 382-90. ISSN 0959-440X.

    Abstract

    Macromolecular interactions are central to most cellular processes. Experimental methods generate diverse data on these interactions ranging from high throughput protein-protein interactions (PPIs) to the crystallised structures of complexes. Despite this, only a fraction of interactions have been identified and therefore predictive methods are essential to fill in the numerous gaps. Many predictive methods use information from related proteins. Accordingly, we review the conservation of interface and ligand binding sites within protein families and their association with conserved residues and Specificity Determining Positions. We then review recent developments in predictive methods for the identification of PPIs, protein interface sites and small molecule ligand binding sites. The challenges that are still faced by the community in these areas are discussed.

    Wass, Mark N. and Kelley, L. A. and Sternberg, Michael J.E. (2010) 3DLigandSite: predicting ligand-binding sites using similar structures. Nucleic Acids Research, 38. pp. W469-W473. ISSN 0305-1048.

    Abstract

    3DLigandSite is a web server for the prediction of ligand-binding sites. It is based upon successful manual methods used in the eighth round of the Critical Assessment of techniques for protein Structure Prediction (CASP8). 3DLigandSite utilizes protein-structure prediction to provide structural models for proteins that have not been solved. Ligands bound to structures similar to the query are superimposed onto the model and used to predict the binding site. In benchmarking against the CASP8 targets 3DLigandSite obtains a Matthew’s correlation co-efficient (MCC) of 0.64, and coverage and accuracy of 71 and 60%, respectively, similar results to our manual performance in CASP8. In further benchmarking using a large set of protein structures, 3DLigandSite obtains an MCC of 0.68. The web server enables users to submit either a query sequence or structure.

    Chambers, John C and Zhao, Jing and Terracciano, Cesare M.N. et al. (2010) Genetic variation in SCN10A influences cardiac conduction. Nature Genetics, 42 (2). pp. 149-152. ISSN 1546-1718.

    Abstract

    To identify genetic factors influencing cardiac conduction, we carried out a genome-wide association study of electrocardiographic time intervals in 6,543 Indian Asians. We identified association of a nonsynonymous SNP, rs6795970, in SCN10A (P = 2.8 x 10(-15)) with PR interval, a marker of cardiac atrioventricular conduction. Replication testing among 6,243 Indian Asians and 5,370 Europeans confirmed that rs6795970 (G>A) is associated with prolonged cardiac conduction (longer P-wave duration, PR interval and QRS duration, P = 10(-5) to 10(-20)). SCN10A encodes Na(V)1.8, a sodium channel. We show that SCN10A is expressed in mouse and human heart tissue and that PR interval is shorter in Scn10a(-/-) mice than in wild-type mice. We also find that rs6795970 is associated with a higher risk of heart block (P < 0.05) and a lower risk of ventricular fibrillation (P = 0.01). Our findings provide new insight into the pathogenesis of cardiac conduction, heart block and ventricular fibrillation.

    Chambers, John C and Zhang, Weihua and Lord, Graham M et al. (2010) Genetic loci influencing kidney function and chronic kidney disease. Nature Genetics, 42 (5). pp. 373-5. ISSN 1061-4036.

    Abstract

    Using genome-wide association, we identify common variants at 2p12-p13, 6q26, 17q23 and 19q13 associated with serum creatinine, a marker of kidney function (P = 10(-10) to 10(-15)). Of these, rs10206899 (near NAT8, 2p12-p13) and rs4805834 (near SLC7A9, 19q13) were also associated with chronic kidney disease (P = 5.0 x 10(-5) and P = 3.6 x 10(-4), respectively). Our findings provide insight into metabolic, solute and drug-transport pathways underlying susceptibility to chronic kidney disease.

    Sinden, R. E. and Talman, A. M. and Marques, S R et al. (2010) The flagellum in malarial parasites. Current Opinion in Microbiology, 13 (4). pp. 491-500. ISSN 13695274.

    Abstract

    The malarial parasites assemble flagella exclusively during the formation of the male gamete in the midgut of the female mosquito vector. The observation of gamete formation ex vivo reported by Laveran (Laveran MA: De la nature parasitaire des accidents de l'impaludisme. Comptes Rendues De La Societe de Biologie. Paris 1881, 93:627-630) was seminal to the discovery of the parasite itself. Following ingestion of malaria-infected blood by the mosquito, microgamete formation from the terminally arrested gametocytes is exceptionally rapid, completing three mitotic divisions in just a few minutes, and is precisely regulated. This review attempts to draw together the diverse original observations with subsequent electron microscopic studies, and recent work on the signalling pathways regulating sexual development, together with transcriptomic and proteomic studies that are paving the way to new understandings of the molecular mechanisms involved and the potential they offer for effective interventions to block the transmission of the parasites in natural communities.

    Wass, Mark N. and Sternberg, Michael J.E. (2009) Prediction of ligand binding sites using homologous structures and conservation at CASP8. Proteins:Structure, Function, and Genetics, 77 Sup (S9). pp. 147-151. ISSN 0887-3585.

    Abstract

    The critical assessment of protein structure prediction experiment is a blind assessment of the prediction of protein structure and related topics including function prediction. We present our results in the function/binding site prediction category. Our approach to identify binding sites combined the use of the predicted structure of the targets with both residue conservation and the location of ligands bound to homologous structures. We obtained an average coverage of 83% and 56% accuracy. Analysis of our predictions suggests that over-prediction reduces the accuracy obtained due to large areas of conservation around the binding site that do not bind the ligand. In some proteins such conserved residues may have a functional role. A server version of our method will soon be available.

    Chambers, John C and Zhang, Weihua and Li, Yun et al. (2009) Genome-wide association study identifies variants in TMPRSS6 associated with hemoglobin levels. Nature Genetics, 41 (11). pp. 1170-1172. ISSN 1061-4036.

    Abstract

    We carried out a genome-wide association study of hemoglobin levels in 16,001 individuals of European and Indian Asian ancestry. The most closely associated SNP (rs855791) results in nonsynonymous (V736A) change in the serine protease domain of TMPRSS6 and a blood hemoglobin concentration 0.13 (95% CI 0.09-0.17) g/dl lower per copy of allele A (P = 1.6 x 10(-13)). Our findings suggest that TMPRSS6, a regulator of hepcidin synthesis and iron handling, is crucial in hemoglobin level maintenance.

    Wass, Mark N. and Sternberg, Michael J.E. (2008) ConFunc--functional annotation in the twilight zone. Bioinformatics, 24 (6). pp. 798-806. ISSN 1367-4803.

    Abstract

    Motivation: The success of genome sequencing has resulted in many protein sequences without functional annotation. We present ConFunc, an automated Gene Ontology (GO)-based protein function prediction approach, which uses conserved residues to generate sequence profiles to infer function. ConFunc split sets of sequences identified by PSI-BLAST into sub-alignments according to their GO annotations. Conserved residues are identified for each GO term sub-alignment for which a position specific scoring matrix is generated. This combination of steps produces a set of feature (GO annotation) derived profiles from which protein function is predicted. Results: We assess the ability of ConFunc, BLAST and PSI-BLAST to predict protein function in the twilight zone of sequence similarity. ConFunc significantly outperforms BLAST & PSI-BLAST obtaining levels of recall and precision that are not obtained by either method and maximum precision 24% greater than BLAST. Further for a large test set of sequences with homologues of low sequence identity, at high levels of presicision, ConFunc obtains recall six times greater than BLAST. These results demonstrate the potential for ConFunc to form part of an automated genomics annotation pipeline.

    Gherardini, Pier Federico and Wass, Mark N. and Helmer-Citterich, Manuela et al. (2007) Convergent evolution of enzyme active sites is not a rare phenomenon. Journal of Molecular Biology, 372 (3). pp. 817-845. ISSN 0022-2836.

    Abstract

    Since convergent evolution of enzyme active sites was first identified in serine proteases, other individual instances of this phenomenon have been documented. However, a systematic analysis assessing the frequency of this phenomenon across enzyme space is still lacking. This work uses the Query3d structural comparison algorithm to integrate for the first time detailed knowledge about catalytic residues, available through the Catalytic Site Atlas (CSA), with the evolutionary information provided by the Structural Classification of Proteins (SCOP) database. This study considers two modes of convergent evolution: (i) mechanistic analogues which are enzymes that use the same mechanism to perform related, but possibly different, reactions (considered here as sharing the first three digits of the EC number); and (ii) transformational analogues which catalyse exactly the same reaction (identical EC numbers), but may use different mechanisms. Mechanistic analogues were identified in 15% (26 out of 169) of the three-digit EC groups considered, showing that this phenomenon is not rare. Furthermore 11 of these groups also contain transformational analogues. The catalytic triad is the most widespread active site; the results of the structural comparison show that this mechanism, or variations thereof, is present in 23 superfamilies. Transformational analogues were identified for 45 of the 951 four-digit EC numbers present within the CSA and about half of these were also mechanistic analogues exhibiting convergence of their active sites. This analysis has also been extended to the whole Protein Data Bank to provide a complete and manually curated list of the all the transformational analogues whose structure is classified in SCOP. The results of this work show that the phenomenon of convergent evolution is not rare, especially when considering large enzymatic families.

Total publications in KAR: 15 [See all in KAR]

 

back to top

My research is primarily based in protein bioinformatics. I have developed methods for the prediction of protein function (ConFunc and CombFunc – webserver at http://www.sbg.bio.ic.ac.uk/combfunc ) and for the prediction of small molecule binding sites in proteins (3DLigandSite - http://www.sbg.bio.ic.ac.uk/3dligandsite). Recent research has demonstrated the ability to use protein-protein docking tools to predict interactions between proteins. I am increasingly interested in using structural bioinformatics tools to analyse genetic variation and the functional effects that they may have in disease. To pursue this I have collaborated with a number of genome wide association and sequencing studies.

back to top

 

Enquiries: Phone: +44 (0)1227 823743

School of Biosciences, University of Kent, Canterbury, Kent, CT2 7NJ

Last Updated: 28/08/2013