Can exhaustive algorithms produce MSAs that are suitable for 3D structure modelling?

Can exhaustive algorithms produce MSAs that are suitable for 3D structure modelling?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

While predicting 3D structure of a protein through homology modelling, the most important step is Multiple Sequence Alignment of template sequences with the target protein whose 3D model is to be predicted. At this step according to my book for Multiple Sequence Alignment, the either T-coffee or Praline software is used; these programs are heuristic in nature and based on a progressive alignment method. There are always concerns regarding the sensitivity and specificity of heuristic algorithms.

To avoid that problem I imagine one option is to use exhaustive algorithms. During this alignment step, which is very critical in 3D modelling, can we use exhaustive algorithms to yield good models?


Methods for multiple sequence alignment (MSA) like T-coffee, which you mention, and Clustal (, which is also widely used, employ heuristics for the alignment for the very good reason that exhaustive algorithms to do this are NP-complete.

You mention the dynamic programing algorithm as exhaustive in your comment. Alignment of two sequences by dynamic programming has an order of time complexity (big O) of $n^2$ and may take a few minutes depending on sequence length etc. Alignment of three has a big O of $n^3$ and can be done if you have lots of time on your hands, but it should be obvious that that's the limit to the number of sequences that can be aligned using dynamic programming.

There's no shame in heuristics. Perhaps the most widely used bioinformatics program in the world, BLAST, employs a heuristic. (And your brain makes decisions that your life depends on every day using heuristics.)

The methods that use heuristics for MSA have been successively modified to deal with known factors that can cause problems, so often they perform very well. How can you tell? What you need to do in using MSA programs is to perform a "sanity check" on any alignment that the programs produce - does it look reasonable, has it matched up similar regions well or are there vast gaps introduced or obvious regions of homology missed. And if you are doing 3D modeling and you don't get an alignment that looks good, then your modelling isn't likely to work.

Finally you always have to consider the possibility that there aren't any proteins of known structure that are similar to the one you want to model so you'll never get a good MSA.

Applied Mycology and Biotechnology

Melissa R. Pitman , R. Ian Menz , in Applied Mycology and Biotechnology , 2006

Homology modelling has become a useful tool for the prediction of protein structure when only sequence data are available. Structural information is often more valuable than sequence alone for determining protein function. Homology modelling is potentially a very useful tool for the mycologist, as the number of fungal gene sequences available has exploded in recent years, whilst the number of experimentally determined fungal protein structures remains low. Programs available for homology modelling utilise different approaches and methods to produce the final model. Within each step of the homology modelling process, many factors affect the quality of the model produced, and appropriate selection of the program can significantly improve the quality of the model. This review discusses the advantages and limitations of the currently available methods and programs and provides a starting point for novices wishing to create a structural model. We have taken a practical approach as we hope to enable any scientist to utilise homology modelling as a tool for the analysis of their protein, or genome, of interest.

Author summary

Identifying the systematic interactions of multiple components within a complex biological system can be challenging due to the number of potential processes and the concomitant lack of information about the essential dynamics. Selection algorithms that allow an automated evaluation of a large number of different models provide a useful tool in identifying the systematic relationships between experimental data. However, many of the existing model selection algorithms are not able to address complex model structures, such as systems of differential equations, and partly rely on local or exhaustive search methods which are inappropriate for the analysis of various biological systems. Therefore, we developed a flexible model selection algorithm that performs a robust and dynamical search of large model spaces to identify complex systems dynamics and applied it to the analysis of T cell proliferation dynamics within different culture conditions. The algorithm, which is available as an R -package, provides an advanced tool for the analysis of complex systems behaviour and, due to its flexible structure, can be applied to a large variety of biological problems.

Citation: Gabel M, Hohl T, Imle A, Fackler OT, Graw F (2019) FAMoS: A Flexible and dynamic Algorithm for Model Selection to analyse complex systems dynamics. PLoS Comput Biol 15(8): e1007230.

Editor: Becca Asquith, Imperial College London, UNITED KINGDOM

Received: January 27, 2019 Accepted: June 30, 2019 Published: August 16, 2019

Copyright: © 2019 Gabel et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript and its Supporting Information files.

Funding: This work was funded by the Center for Modeling and Simulation in the Biosciences (BIOMS) to FG, and by the Deutsche Forschungsgemeinschaft (German research foundation, DFG) - Project number 240245660 - SFB1129 (project 8) to OTF. OTF is member of the cluster of excellence Cellnetworks. FG is member of the IWR and additionally supported by a Fellowship from the Chica and Heinz Schaller Foundation. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

2. Principles of Structure-Based Drug Design

2.1. Rationale and Target Selection

SBDD is a powerful tool that uses knowledge of the three-dimensional (3D) structure of a biological target to efficiently search chemical space for ligands with high binding affinity. Use of SBDD began in the mid-1980s, and publications describing its success in developing new therapeutics for HIV/AIDS began to emerge in the early 1990s [25,26]. Structural knowledge of the HIV protease allowed for the design of new protease inhibitors, five of which became clinical and commercial successes [26,27]. Many other widely-used drugs, including zanamivir (Relenza GlaxoSmithKline) [28] for influenza, the cyclooxygenase inhibitor celecoxib (Celebrex Pfizer) [29], and the antileukemic Bcr-Abl tyrosine kinase inhibitor imatinib (Gleevec Novartis) [30], owe their origins to SBDD.

SBDD begins with the choice of a suitable target. An antivirulence target should be present only in the pathogen, and be essential for bacterial pathogenesis but not directly involved in bacterial survival. The vast majority of antivirulence targets are proteins, although RNA molecules may also be targeted, as is the case for the well-known antibiotics tetracycline and streptomycin [31]. The target should ideally be validated in vivo by testing a knockout strain of the bacteria for virulence in an animal model of infection. Accurate structural information about the target must then be obtained. Three main sources of structural information have been used for SBDD: X-ray crystallography, nuclear magnetic resonance (NMR), and homology modelling. X-ray crystal structures are the most common source of drug design data, owing to the typically high resolution available and the ability to work with proteins that range in size from small peptides to 998 kDa [32]. Indeed, 83.8% of the total protein structures in the Protein Data Bank (PDB) as of December 2018 were determined via X-ray crystallography. Ordered water molecules are also visible in crystal structures, the organization of which can often provide a starting point for drug lead design. NMR structures are another valuable source for drug design, provided that the target is smaller than 35 kDa [33]. The low resolution of structures gained by cryo-electron microscopy (EM) has precluded their use in SBDD in the past, however, best recent cryo-EM structures have surpassed the

2.5 Å atomic resolution threshold and may thus be useful for drug design in the future [34]. Experimentally determined structures are curated in the PDB, which currently contains over 83,000 bacterial protein entries out of more than 145,000 total entries. For cases in which no experimentally determined structure is available, a homology model can be used for drug design provided there is substantial sequence similarity between the proteins [35,36]. Advanced homology modeling software uses experimentally determined structures as a template to predict the 3D-folds of another protein that has similar amino acid sequence [37]. Several free programs, including SWISS-MODEL [38] and Phyre2 [39] provide fully automated homology modeling. Commercial programs such as Modeller [40] or the modelling tools built into software suites such as Discovery Studio (BIOVIA) [41] offer additional control over the modelling process. Although homology models are routinely used in the absence of experimentally determined structures, it is noted that they are not ideal for SBDD since the accuracy of the binding pocket can be less reliable, particularly when sequence identity is below 40% [37].

The next step in SBDD involves using the available structure or model to select a specific ligand binding site. The target usually has a well-defined binding pocket, such as a receptor ligand binding site or enzyme active site. Algorithms are now available to predict the suitability of a binding pocket based on criteria such as its rigidity or hydrophobic character as calculated from high-resolution protein structures [42,43]. Although much less common, drugs targeting protein-protein interactions (PPIs) are increasingly being pursued by drug discovery groups [44]. An important, but often overlooked, aspect of target choice is the conformational flexibility of a protein during ligand binding. While the structure of the protein in the crystal represents a snapshot 𠆏rozen’ in one position, the biologically active form of the protein may undergo dramatic conformational changes upon ligand binding. This highlights the potential importance of being able to model protein and ligand flexibility in SBDD. Several programs such as GOLD [45], SLIDE [46] and FlexE [47] can be used to this end, however, the increased computing time required can be prohibitive [48]. Additionally, these programs only account for side chain flexibility, which can be insufficient when modelling more complex protein backbone motions as in the case of receptors [49].

2.2. Methods of SBDD

Once a target has been selected, there are three main methods that are used to identify or design new ligands based on its structural information: Inspection of substrate and known inhibitors, virtual screening, and de novo design. In the first method, inspection of substrates, cofactors, or known inhibitors of the protein is used to inform the modification of these compounds to become inhibitors [25,50,51]. In virtual screening, libraries of available small molecules are docked into the region of interest in silico and scored based on their predicted interaction with the site. The third approach involves de novo design of small molecule fragments that are positioned in the target site, scored, and then linked in silico to give one complete molecule ( Figure 2 ). The final linked compounds are then chemically synthesized and tested for biological activity. Recent examples of these three approaches in the context of antivirulence drug discovery are discussed below.

A typical workflow for virtual high-throughput screening and de novo drug design. (A) The virtual screening procedure used to select active compounds from a large library of existing molecules. SAR, structure-activity relationship. (B) The procedure for de novo design of ligands via the fragment-based approach using programs such as LUDI or SPROUT.

Special Issue Editors

We invite you to submit your latest research in the field of granular computing to this Special Issue called &ldquoGranular Computing: From Foundations to Applications&rdquo. Granular computing is a rapidly changing multidisciplinary information processing paradigm suitable for modeling complex systems and for extracting knowledge from data by means of suitable entities known as information granules. According to this paradigm, a given system can be observed at different levels of granularity, showing or hiding details and peculiarities of the system as a whole. Given a specific data-driven modeling problem, automatically finding a suitable resolution (semantic) level in order to gather the maximum amount of knowledge from the data at hand is a challenging task. With this Special Issue, we would like to embrace both fundamental/methodological aspects and applications related to granular computing.

We welcome high-quality research papers addressing and reviewing theoretical and practical issues of granular computing, focusing on complex systems modeling, parallel and distributed big data analysis, data granulation and its impact on knowledge discovery, problemsolving and decision-making systems, and advanced pattern recognition systems.

Similarly, we welcome research papers on cutting-edge applications, including (but not limited to) bioinformatics and computational biology, image analysis, natural language processing, sentiment and behavior analysis, time-series forecasting, and cybersecurity.

Prof. Dr. Antonello Rizzi
Dr. Alessio Martino
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.


The aim of this workshop is to provide a broad look at the state of the art in the probabilistic modeling and machine learning methods involving biological structures and systems, and to bring together method developers and experimentalists working with the problems.

We encourage submissions bringing forward methods for discovering complex structures (e.g. interaction networks, molecule/cellular structures) and methods supporting genome-wide data analysis.

A non-exhaustive list of topics suitable of this workshop:


  • Algorithms
  • Bayesian Methods
  • Data integration/fusion
  • Feature/subspace selection
  • High-throughput methods
  • Kernel Methods
  • Machine Learning
  • Probabilistic Inference
  • Structured output prediction


  • Sequence Annotation
  • Gene Expression
  • Gene Networks
  • Gene Prediction
  • Metabolic Profiling
  • Metabolic Reconstruction
  • Protein Structure Prediction
  • Protein Function Prediction
  • Protein-protein interaction networks

Organization and related workshops

The workshop is organized by the European Network of Excellence PASCAL (Pattern Analysis, Statistical Modelling and Computational Learning) and belongs to the thematic programme on 'Learning with Complex and Structured Outputs'.

The workshop is preceded by the Fourth International Workshop on Computational Systems Biology(WCSB 2006, June 12-13, Tampere, Finland).

The workshop is immediately followed by the International Specialised Symposium on Yeasts (ISSY25, June 18-21, Espoo, Finland) that has the theme 'Systems biology of Yeast - From Models to Applications'.

Conclusions and Vision

The growing availability of large amounts of data (i.e., big data) will allow models to be tested very finely. Spatial data could be collected in three dimensions (thanks, perhaps, to microscope imaging advances), capturing the formation of patterns, niches, molecular associations, and multiscale features. The time dimension could range from molecular events (for example, DNA mutations or epigenetic changes) to organism development, circadian, species evolution, and other meaningful periodicities.

The development of new, efficient tools will motivate others to generate new computational models or to improve the existing ones. This will increase the community of scientists sharing their knowledge through standardized computational models reproducing numerically the behavior of the biological process under investigation. With computational modeling acquiring better capacity to describe biological systems and processes at a level useful for prediction and to suggest experiments, it will trigger a useful feed-forward process with experimental biologists.

The tools described in this paper can already accommodate different complexly structured properties of biological processes and could be used separately or in different combinations and architectures. This will enable biologists to answer complex questions. For example, temporal logics, in particular, will have a profound impact in systems biology by helping to transform cause—effect relationships into objects that can be manipulated both mathematically and computationally. In epistatic control, temporal logics can be used to model two or more causal factors as interacting mechanistically with respect to the observed phenomenon. Doing so will establish powerful connections, with reasoning based on logic and statistics and the mechanisms and processes that underlie the observed behavior.

One future interesting research direction that we envision is the extension of the current formal analysis techniques and temporal logics to the spatial domain. For example, understanding how a spatial pattern emerges from the biochemical level acting at the cellular level (i.e., morphogenesis in developmental biology) is currently very challenging because of both the high computational complexity required by the spatiotemporal modeling and the lack of a suitable specification language to specify the spatiotemporal patterns of interest [86,87,153].

Furthermore, the rapid progress of modern technologies for healthcare has led to a new generation of devices called medical cyber-physical systems [154], in which smart and collaborative computational elements control the biological systems. Examples include pacemakers, biocompatible and implantable devices, insulin pumps, electro-anatomical mapping and intervention, robotic prosthetics, and neurostimulators. Here, the computational modeling of the biological part is indispensable to the development of efficient and safe controlling devices. Furthermore, the successful application of formal analysis techniques and tools to verify the correct and safe behavior of these systems will have an economic impact on our society by reducing warranty, liability, and certification costs. We believe that the concepts and the computational tools described here represent core elements of computational description, particularly in the framework of systems biology, and will have some relevance to both newcomers and experts.


A comparative evaluation of PPIS predictors has been performed in previous reviews [109 , 214] as such, we focus on information regarding only the most recent predictors, shown in Table 4.

Objective evaluation of PPIS predictor performance is made difficult by the varying definitions of interaction sites and accessible surface residues in the literature, the lack of available servers for all predictors, the adoption of varying training and testing datasets, and the different metrics used for evaluation [4 , 47]. We partially circumvent these problems by considering the performance of each predictor across a variety of test sets based on literature values.

Assessment measures

PPIS predictors are generally judged with a number of standard performance metrics, including sensitivity (recall, true positive rate, or coverage), precision, and specificity:

where T P, T N, F P, and F N denote true and false positives and negatives, respectively.

Measures designed to balance between false negative and positive rates include F 1 and MCC [219]:

Similarly, the receiver operator curve (ROC), which is a plot of sensitivity versus 1−s p e c i f i c i t y derived by varying the classifier prediction threshold, can be used to compute the area under the ROC curve (AUROC/AUC) [251], which is especially useful for identifying artificially “inflated” performance (e.g. higher sensitivity at the expense of specificity) and for being decision threshold independent [252].

In general, similar to the lack of consensus in interface definitions and datasets, there is no standard criteria for performance assessment [47]. Given that some false positive predictions may be correct (due to the paucity of crystallized complexes), patch-specific performance metrics (i.e. assessing the correct answer in a local patch around an interface in question, such as by the Sørensen-Dice index [253 , 254]) may be used, though this poorly accounts for false positives. While other evaluation methods have been devised [16 , 35], computing the statistics above per residue and averaging across the dataset appears to be the most objective and easily comparable method.

The authors note that even the more balanced measures should not be solely relied on (e.g. MCC may favour overprediction in PPIS prediction [214] and underprediction elsewhere [255]) and that predictor performance should be viewed holistically across as many metrics as possible, as balancing performance metrics is domain-dependent [47 , 255]. When considering PPIS prediction for mimetic drug design, slight underprediction may be desirable, as it will likely find the better discriminated core residues [7 , 33], from which the remaining PPIS can be inferred (rather than “guessing” which of many allegedly “active” residues is even interacting).

Comparative evaluation

While it is difficult to draw conclusions from the differing performance of the predictors, we can nevertheless observe some trends that may be explained by the biological theory discussed previously. For example, while transient datasets (such as TransComp_1) generally garner lower scores than permanent ones (such as PlaneDimers), this is not perfectly followed (Table 4), possibly due to the difficulty in defining a threshold on the transient-permanent continuum. Some sets (e.g. S149) may be intrinsically more predictable, as evidenced by higher scores across all predictors others achieve better results only on certain types of predictors (e.g. DB3-188 on structural homology-based predictors). To achieve high scores on specialized testing datasets, predictors often require either specializations of their own, or inherent characteristics that permit accurate classification (e.g. ANCHOR’s [10] specialization for disordered proteins and HomPPI’s [20] lack of requirement for structural information allow them both to successfully predict on the S1/2 disordered sets). Theoretically, unbound structures are more difficult to predict on than bound monomers (due to the conformational disparity between the two sets) this is largely confirmed by differing results on the DS56B/U sets, as well as generally lower scores on unbound sets (Table 4). Overall, we find that there has been significant progress in the predictive abilities of the predictors over the last decade across diverse interaction types and datasets.


In this review, we have given an overview of the principles of 3D pharmacophores and their role in drug discovery. The fact that 3D pharmacophore models are universal, editable, and comprehensive allows them to be applied in different scenarios.

A major application field is the identification of novel ligands through virtual screening. For this purpose, 3D pharmacophore models are the sole technique that can be applied in either a ligand-based or a structure-based manner. In both ways, 3D pharmacophore models are computationally very efficient, enabling the virtual screening of very large databases. The basic concept of abstracting chemical functionality allows for scaffold hopping and enriches the chemical diversity of hit lists. Altogether, this grants researchers more flexibility regarding available data, computational resources, and testing capabilities. The case studies that we selected highlight the power of pharmacophore-based virtual screening for drug discovery and show their applicability to challenging targets. Also, increasingly popular fragment-based drug discovery campaigns can benefit from pharmacophore screening by a dramatic reduction of fragments tested in vitro and by rationalizing fragment growing with constant fragment core interactions. 54, 55

Besides virtual screening, 3D pharmacophores are well suited to study and visualize binding modes of drug-like molecules. Their composition of a limited number of chemically defined interaction features make them understandable and intuitive. This represents a major advantage in interdisciplinary projects, since 3D pharmacophore models are able to rationalize various pharmacological effects. For this objective, 3D pharmacophores are typically combined with other methods such as docking, MD simulations, or machine learning. The selected case studies for this field underline the power of 3D pharmacophores to mechanistically explain and understand protein functionality. Additionally, 3D pharmacophores are an excellent tool for communication between researchers, a factor that is often underestimated.

However, besides the aforementioned advantages and possibilities, classic 3D pharmacophore models also have certain drawbacks. They represent static models for highly dynamic systems and their interaction features are restricted to simple geometries (e.g., spherical features). Moreover, they share a shortcoming with other modeling techniques, which all are focused on estimating the enthalpy of molecular interactions but are suboptimal for the description of entropic effects. However, enthalpy and entropy both contribute to the change in free energy of ligand binding to a macromolecule. Although the basic concept of 3D pharmacophore generation and its application to virtual screening has not changed in the last 30 years, there are various developments in the field that aim at addressing these shortcomings.

The combination of 3D pharmacophore models with MDs is therefore a consequent evolution with great potential. Different approaches to integrating MDs into 3D pharmacophore modeling have been reported and described in this review. 71-75, 77-79, 81, 103 However, only the dynophore method represents a fully automated approach, which tackles two drawbacks of classical 3D pharmacophores at a time. 74 The dynophore application reveals a new perspective on ligand binding by providing visualization of pharmacophoric features that escape from the traditional spherical geometry and by delivering statistics that report feature occurrence frequencies and different binding modes over the course of a trajectory. The direct usage of these property-density functions for virtual screening would represent a true paradigm shift in 3D pharmacophore modeling.

Several advanced approaches also consider entropic effects of ligand binding for 3D pharmacophore modeling. 71-73, 78, 79, 81, 103 PyRod, for instance, analyzes the protein environment of water molecules in MD simulations, which allows for placement of pharmacophore features at hydration sites with certain thermodynamic characteristics. 79 Such hydration sites may harbor water molecules in a highly hydrophobic protein environment or heavily restrain water molecules via hydrogen bonds and the shape of the binding pocket. The restriction of 3D pharmacophores to entropically and enthalpically important sites render such approaches valuable tools for virtual screening campaigns, especially for those generating 3D pharmacophores from an apo structure. Importantly, PyRod is a free and open-access tool making such strategies accessible to a broader user base.

The combination of 3D pharmacophore concept and machine learning/artificial intelligence is only in its beginning stages. Although some approaches already exist, 30, 82-86 we predict an increasing number of studies and methods that aim to use pharmacophore features as descriptors or try to generate 3D pharmacophores from big data. Another trend that we observe is the availability of freely available web services for pharmacophore-based virtual screening. 87-91

The recent developments in the field of 3D pharmacophores are promising and afford the opportunity to employ 3D pharmacophores in ever-increasing ways and more challenging situations, such as multitarget prediction, modeling binding kinetics, or pathway-specific receptor activation. Overall, 3D pharmacophores represent an essential part of the toolbox for computer-aided drug design and are perfectly apt to identify novel ligands and understand their interaction with the macromolecular target.

Computational models of cellular reprogramming

In 2006, Takahashi and Yamanaka evaluated a number of candidate genes with respect to their potential to induce pluripotency in somatic cells (Takahashi and Yamanaka, 2006). The discovery that the overexpression of four factors (Oct4, Sox2, Klf4 and c-Myc) can direct the reprogramming of somatic cells into induced pluripotent stem cells (iPSCs) constituted a paradigm shift in developmental biology. However, details and mechanisms of the reprogramming process during iPSC generation still need to be fully elucidated (Takahashi and Yamanaka, 2013).

A current limitation of the cellular reprogramming process is its low efficiency, for which there are two conceptual explanatory models: the deterministic model, in which only some cells have the potential to generate iPSCs within a fixed and uniform time period (latency), and the stochastic model, in which most or even all cells are competent for reprogramming, but the latency differs (Hanna et al., 2009 Yamanaka, 2009). Hanna et al. combined time-series experiments with computational modelling to gain insights into the nature of reprogramming (Hanna et al., 2009). In their experiments, they monitored the proportions of iPSCs generated and found that reprogrammed cells appear in most clonal populations, if cultured for a sufficiently long time. However, the time to conversion apparently differs between clones, supporting the view that reprogramming is a continuous, stochastic process. Accordingly, the authors developed a mathematical model that considers reprogramming as a one-step stochastic process with a constant cell-intrinsic rate. Fitting the model to their experimental data, this intrinsic rate was estimated for different experimental settings. The authors argued that Nanog overexpression accelerates the reprogramming kinetics through a mechanism most likely independent of the cell proliferation rate. However, in that study, the molecular changes during reprogramming were analysed in heterogeneous cell populations, making sequential events occurring in single cells inaccessible. Combining single-cell expression analysis with a Bayesian network model (see Glossary, Box 1), Buganim et al. demonstrated that the reprogramming process can best be described by two phases: an early stochastic phase with high variation in gene expression and a subsequent, more hierarchical phase of gene activation (Buganim et al., 2012). The network model predicted that the activation of Sox2 initiates a number of consecutive steps that finally lead to fully reprogrammed iPSCs. The authors applied the hierarchical model to predict sets of TFs capable to induce pluripotency. Subsequently, these sets were tested experimentally, demonstrating that all of them facilitate iPSC generation but with different efficiencies, ranging from 0.2% (a combination without Oct4, Klf4, c-Myc) to 22.2% (using Oct4, Nanog, Esrrb, Klf4 and c-Myc). Interestingly, there is limited correlation between those genes that can facilitate efficient reprogramming and those of which the expression is predictive of future iPSC generation. For example, the endogenous expression of Oct4 does not necessarily predict iPSC generation, whereas the expression of Utf1 and Esrrb does.

Given the importance of both the genetic and epigenetic control of pluripotency, Artyomov et al. developed a computational model of pluripotency induction that couples both layers of regulation (Artyomov et al., 2010). In this study, all genes responsible for a particular cellular identity (e.g. Oct4, Sox2 and Nanog for pluripotency) were described as a single ensemble module. Moreover, cellular states, arranged in a hierarchical tree-like structure, were described not only by the expression levels of master genes, but also by the state of their epigenome. Artyomov et al. defined a set of rules on how protein expression can modify the epigenetic state and vice versa during cell cycle progression. Simulating thousands of independent reprogramming experiments, the authors found rare pathways leading to successful reprogramming.

As briefly illustrated by the studies discussed above, a combination of dynamic single-cell expression data and computational models provides insights into the different phases of the reprogramming process. Although an optimisation of the process seems hardly possible within the stochastic phase, targeted activation of pathways or genes in the hierarchical phase can enhance the generation of fully reprogrammed iPSCs. Selecting cells or cell colonies that express predictive markers might help to further increase the proportion of iPSCs, especially because environmental cues and cell-cell communication, which can be manipulated experimentally, play pivotal roles (compare with section on spatio-temporal dynamics).

Whereas stimulating particular components of the pluripotency GRN is crucial for the re-acquisition of a pluripotent cell state, limiting its activity is essential for its maintenance, as shown by studies on mESC differentiation, which we will cover in the following section.


[ PubMed ] [ DOI ] The MBP1 family proteins are the DNA binding subunits of MBF cell-cycle transcription factor complexes and contain an N terminal winged helix-turn-helix (wHTH) DNA binding domain (DBD). Although the DNA binding mechanism of MBP1 from Saccharomyces cerevisiae has been extensively studied, the structural framework and the DNA binding mode of other MBP1 family proteins remains to be disclosed. Here, we determined the crystal structure of the DBD of PCG2, the Magnaporthe oryzae orthologue of MBP1, bound to MCB-DNA. The structure revealed that the wing, the 20-loop, helix A and helix B in PCG2-DBD are important elements for DNA binding. Unlike previously characterized wHTH proteins, PCG2-DBD utilizes the wing and helix-B to bind the minor groove and the major groove of the MCB-DNA whilst the 20-loop and helix A interact non-specifically with DNA. Notably, two glutamines Q89 and Q82 within the wing were found to recognize the MCB core CGCG sequence through making hydrogen bond interactions. Further in vitro assays confirmed essential roles of Q89 and Q82 in the DNA binding. These data together indicate that the MBP1 homologue PCG2 employs an unusual mode of binding to target DNA and demonstrate the versatility of wHTH domains.

Watch the video: CFD The k - epsilon Turbulence Model (December 2022).