Information

How are split inteins joined?

How are split inteins joined?


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Do they form a peptide bond? Or is it just some affinity/van der Waals interaction?

Is the mechanism enzyme catalized?

Are the residues/moieties at the join site known? Could they be introduced to synthesized peptide fragments to achieve something like native chemical ligation?


Decorating proteins with chemical modifications using split inteins

Back in Biology101, we learnt that the structure and function of each protein is primarily determined by its amino acid sequence, which in turn is encoded by the universal genetic code within DNA. Manipulate the DNA sequence, and we can get the cell to generate any protein that we want. Well, not quite, since we would be limited to the 20 amino acid building blocks afforded by the genetic code. Chemists, on the other hand, prefer to synthesize peptides and proteins from scratch, which allows them to introduce non-canonical amino acids (ncAAs) with desired properties and functionalities. However, the size and type of protein we can synthesize this way is still limited.

In recent years, scientists have successfully attempted to reprogram the genetic code, enabling the site-specific introduction of more exotic ncAAs into proteins 1 . Yet the number and type of residues that can be efficiently introduced is still not limitless and several times, during other projects in the lab, we ended up facing some of these issues 2,3 . When looking into semi-synthetic avenues to manipulate proteins, we came across split inteins, which offer seemingly endless potential for protein engineering 4 .

Split inteins are naturally occurring protein domains that essentially behave as connectors to join specific protein segments together in a near traceless manner (Fig 1) 4 .

Figure 1. Ligation of two proteins using split inteins.

We thought it should be possible to reconstitute a full-length membrane protein from three distinct fragments using two orthogonal split inteins (Int-A and Int-B) — an approach dubbed tandem protein trans splicing (tPTS). This would allow us to effectively replace a whole segment of the protein with a synthetic peptide, by covalently joining it with the rest of the membrane protein fragments expressed in the cell using split inteins (Fig 2). In theory, this means anything that can be chemically synthesized in the context of the peptide can be incorporated into the protein.

Figure 2. Insertion of a synthetic peptide (Peptide X) into an intracellular linker of a membrane protein.

To showcase some of the possible applications, we used our approach to a) introduce modifications into GFP to manipulate its chromophore properties, b) insert multiple posttranslational modifications (PTMs) into intracellular linkers of the cardiac sodium channel Nav1.5 and c) introduce non-canonical lysine derivatives into the extracellular ATP binding site of the P2X2 receptor (Fig 3) 5 .

Our work is the first example of reconstitution of full-length proteins by tPTS in eukaryotic cells. The approach has the potential to introduce virtually any ncAAs, PTMs, chemical handles, fluorophores and combination thereof at a chosen site in a protein, thus representing a new tool in the field of protein engineering. Yet it is important to keep in mind that the tPTS yields we experienced in our experiments were typically less than 5%, a value that is likely too low for some assays. However, it is sufficient for investigating proteins that can be studied with highly sensitive methods, such as electrophysiology or imaging.

Figure 3. Some of the PTM mimics and ncAAs successfully introduced using tPTS .

Although a powerful tool, there is clearly room for improvement of the technique in the future then. Using more efficient and promiscuous inteins would significantly increase the number of proteins and peptide insertion sites amenable to this approach.

Indeed, with our knowledge of biology and chemistry combined, and a bit of help from split inteins, it seems possible to build and engineer some truly novel proteins in the future.

1 Chin, J. W. Expanding and reprogramming the genetic code. Nature 550, 53-60, doi:10.1038/nature24031 (2017).

2 Lynagh, T., Mikhaleva, Y., Colding, J. M., Glover, J. C. & Pless, S. A. Acid-sensing ion channels emerged over 600 Mya and are conserved throughout the deuterostomes. Proc. Natl. Acad. Sci. U. S. A. 115, 8430-8435, doi:10.1073/pnas.1806614115 (2018).

3 Borg, C. B. et al. Mechanism and site of action of big dynorphin on ASIC1a. Proc. Natl. Acad. Sci. U. S. A. 117, 7447-7454, doi:10.1073/pnas.1919323117 (2020).

4 Thompson, R. E. & Muir, T. W. Chemoenzymatic Semisynthesis of Proteins. Chem. Rev., doi:10.1021/acs.chemrev.9b00450 (2019).

5 Khoo, K. K. et al. Chemical modification of proteins by insertion of synthetic peptides using tandem protein trans-splicing. Nat. Commun. 11, 2284, doi:10.1038/s41467-020-16208-6 (2020).


Abstract

Chemically modified proteins are invaluable tools for studying the molecular details of biological processes, and they also hold great potential as new therapeutic agents. Several methods have been developed for the site-specific modification of proteins, one of the most widely used being expressed protein ligation (EPL) in which a recombinant α-thioester is ligated to an N-terminal Cys-containing peptide. Despite the widespread use of EPL, the generation and isolation of the required recombinant protein α-thioesters remain challenging. We describe here a new method for the preparation and purification of recombinant protein α-thioesters using engineered versions of naturally split DnaE inteins. This family of autoprocessing enzymes is closely related to the inteins currently used for protein α-thioester generation, but they feature faster kinetics and are split into two inactive polypeptides that need to associate to become active. Taking advantage of the strong affinity between the two split intein fragments, we devised a streamlined procedure for the purification and generation of protein α-thioesters from cell lysates and applied this strategy for the semisynthesis of a variety of proteins including an acetylated histone and a site-specifically modified monoclonal antibody.


RESULTS

Design of Intein-mediated split SpCas9

Inteins can be easily explained as protein introns: They excise themselves out of a sequence and join the remaining flanking regions (exteins) with a peptide bond without leaving a scar ( 24). The coding regions of the catalytic subunit of DNA polymerase III DnaE from the cyanobacteria Nostoc punctiforme (Npu) are located in two genes, dnaE-n and dnaE-c. The dnaE-n encoded protein consists of an N-terminal DnaE fragment plus the N-intein, while dnaE-c encodes a protein that consists of the C-terminal DnaE fragment preceded by a C-Intein entity. N-intein and C-Intein recognize each other, splice themselves out and simultaneously ligate the flanking N- and C-terminal exteins resulting in the recovery of the full-length DnaE ( 24).

We use this naturally occurring phenomenon by exchanging the extein regions with the respective halves of SpCas9 (Figure 1a). The split-sites of SpCas9 were carefully chosen between Glu573 and Cys574 for the first version (v1) or between Lys637 and Thr638 for the second version (v2), since the N-terminal amino acid of the C-Cas9 in the C-Intein_C-Cas9 fusion should be Cys, Ser or Thr to ensure high splicing efficiency ( 7, 25) (Figure 1a). Moreover, particular attention was given such that the split-sites were surface-exposed due to the sterical need for protein splicing ( 26).

Using split-inteins for split–Cas9 reconstitution. (A) Upon split-intein reconstitution, the split-intein moieties splice themselves out and ligate the flanking N- and C-terminal SpCas9 halves (exteins) resulting in the recovery of active full-length SpCas9. (B) Split-SpCas9 version 1: DnaE N-intein (orange) is fused to the C-terminus of SpCas9 1- 573 (N-Cas9, red), whereas DnaE C-intein (green) is fused to N-terminus of SpCas9 574- 1368 (C-Cas9, blue). Split-SpCas9 version 2: DnaE N-intein is fused to the C-terminus of SpCas9 1- 637 , whereas DnaE C-intein is fused to N-terminus of SpCas9 638- 1368 . (C) To measure nuclease activity, and to distinguish between HDR and NHEJ events, a traffic light reporter system was used. NHEJ events introduced by SpCas9, will often create indels that lead to frameshift mutations, resulting in the expression of TagRFP. HDR events will result in the repair of the truncated Venus by the repair template given (donor DNA left homology arm: 0.4 kb, right homology arm: 1.4 kb) and the removal of the stop codon in the center of Venus leading to the expression of full-length Venus. Translated regions are denoted as lines under the schematic DNA with translation start site and stop codon. The box inset shows a detail of the target sequence recognized by the gRNAs crTLR#1 and crTLR#3. CAG: synthetic mammalian promoter CAG bGH: bovine growth hormone polyadenylation site.

Using split-inteins for split–Cas9 reconstitution. (A) Upon split-intein reconstitution, the split-intein moieties splice themselves out and ligate the flanking N- and C-terminal SpCas9 halves (exteins) resulting in the recovery of active full-length SpCas9. (B) Split-SpCas9 version 1: DnaE N-intein (orange) is fused to the C-terminus of SpCas9 1- 573 (N-Cas9, red), whereas DnaE C-intein (green) is fused to N-terminus of SpCas9 574- 1368 (C-Cas9, blue). Split-SpCas9 version 2: DnaE N-intein is fused to the C-terminus of SpCas9 1- 637 , whereas DnaE C-intein is fused to N-terminus of SpCas9 638- 1368 . (C) To measure nuclease activity, and to distinguish between HDR and NHEJ events, a traffic light reporter system was used. NHEJ events introduced by SpCas9, will often create indels that lead to frameshift mutations, resulting in the expression of TagRFP. HDR events will result in the repair of the truncated Venus by the repair template given (donor DNA left homology arm: 0.4 kb, right homology arm: 1.4 kb) and the removal of the stop codon in the center of Venus leading to the expression of full-length Venus. Translated regions are denoted as lines under the schematic DNA with translation start site and stop codon. The box inset shows a detail of the target sequence recognized by the gRNAs crTLR#1 and crTLR#3. CAG: synthetic mammalian promoter CAG bGH: bovine growth hormone polyadenylation site.

The split–intein–Cas9 systems were created by fusing the N- or C-terminal halves of SpCas9 to the corresponding intein halves: v1 uses SpCas9 1- 573 and SpCas9 574- 1368 as split–Cas9 moieties, resulting in two fusion constructs called N-Cas9_N-Intein_v1 and C-Intein_C-Cas9_v1 v2 uses SpCas9 1- 637 and SpCas9 638- 1368 as split–Cas9 moieties, resulting in two fusion constructs called N-Cas9_N-Intein_v2 and C-Intein_C-Cas9_v2 (Figure 1b).

Intein-mediated SpCas9 is as active as wild-type SpCas9 in surrogate reporter systems

To test if the intein-reconstituted split Cas9 is active, we used a Neuro-2a cell line carrying a single copy integrated surrogate traffic light reporter (TLR). This reporter enables the distinction between the two major repair pathways after a double-strand break, mediated by the repair machinery of the cell. This is (i) the error-prone non-homologous end-joining (NHEJ) and (ii) the high-fidelity homology-directed repair (HDR) (Figure 1c). NHEJ causes indels in the reporter region that will put TagRFP in frame in one third of the total events, allowing its expression and detection (red fluorescence) whereas HDR results in the expression of the repaired mVenus (green fluorescence) (Figure 1c). After 48 h of transient transfection with both variants of the split–intein system (N-Cas9_N-Intein and C-Intein_C-Cas9 for v1 and v2) and an U6 expressed gRNA (Figure 2a), the resulting number of fluorescent cells, corresponding to the sum of events from non-homologous end joining (NHEJ) and homology directed repair (HDR), matches the SpCas9 wild-type values, using a full-length SpCas9 and gRNA (Figure 2a and b). In contrast to that, no fluorescence signal was detected in the negative controls in which SpCas9 was expressed without gRNA (Figure 2b) or when single intein-halves of SpCas9 alone were co-expressed with the gRNA (Figure 2b).

Testing split–Cas9 efficiency in Neuro-2a TLR cell lines. (A) Overview of the WT and split–Cas9 expression plasmids used (Cas9, N-Cas9_N-Intein_v1, C-Intein_C-Cas9_v1, N-Cas9_N-Intein_v2, C-Intein_C-Cas9_v2, gRNA crTLR#1/#2). To ensure a high expression a strong synthetic mammalian promoter (CAG, green) and a bovine growth hormone (bGH, red) polyadenylation site was used. Cas9 cDNA is shown in orange, N-intein in dark brown and C-Intein in light brown. NLS (dark red): Nuclear localization signal. FLAG and HA tag are shown in light and dark grey respectively. For gRNA expression (turquoise), U6 promoter was chosen (dark green). (B) Results after FACS: only transfection with both N- and C-terminal parts of the split–intein–Cas9 system for version 1 (v1) and version 2 (v2) resulted in nuclease activity similar to wild-type SpCas9, represented in HDR or NHEJ events transfection with only one moiety did not show any observable HDR or NHEJ events. Shown are means ± SD of three independent experiments. (C) The split-intein-Cas9 system v1 was used to target the fused in sarcoma (Fus) gene's second last exon. The respective segment was PCR amplified with the annotated primers for further analysis. T7 endonuclease I assay was performed after PCR on the samples to investigate the occurrence of NHEJ events. After the assay, the samples were analyzed with a Bioanalyzer. The appearance of a second band indicates the presence of indels resulting from NHEJ events. (D) Targeting of Rosa26 locus with the gRNA Rosa26#1. Only indels were detected when SpCas9 wild-type or both SpCas9 moieties were transfected. (E) Targeting of Rosa26 locus with the gRNA Rosa26#3. Indels were detected by RFLP analysis. The XbaI resistant product could be only observed when SpCas9 wild-type or both SpCas9 moieties were transfected. (F) Targeting of Rab38 locus with the gRNA Rab38#2. Indels were detected by RFLP analysis. The XcmI resistant product could be only observed when SpCas9 wild-type or both SpCas9 moieties were transfected. (G) Quantification of the nuclease activity in each target sequence. FU: fluorescence units, s: seconds.

Testing split–Cas9 efficiency in Neuro-2a TLR cell lines. (A) Overview of the WT and split–Cas9 expression plasmids used (Cas9, N-Cas9_N-Intein_v1, C-Intein_C-Cas9_v1, N-Cas9_N-Intein_v2, C-Intein_C-Cas9_v2, gRNA crTLR#1/#2). To ensure a high expression a strong synthetic mammalian promoter (CAG, green) and a bovine growth hormone (bGH, red) polyadenylation site was used. Cas9 cDNA is shown in orange, N-intein in dark brown and C-Intein in light brown. NLS (dark red): Nuclear localization signal. FLAG and HA tag are shown in light and dark grey respectively. For gRNA expression (turquoise), U6 promoter was chosen (dark green). (B) Results after FACS: only transfection with both N- and C-terminal parts of the split–intein–Cas9 system for version 1 (v1) and version 2 (v2) resulted in nuclease activity similar to wild-type SpCas9, represented in HDR or NHEJ events transfection with only one moiety did not show any observable HDR or NHEJ events. Shown are means ± SD of three independent experiments. (C) The split-intein-Cas9 system v1 was used to target the fused in sarcoma (Fus) gene's second last exon. The respective segment was PCR amplified with the annotated primers for further analysis. T7 endonuclease I assay was performed after PCR on the samples to investigate the occurrence of NHEJ events. After the assay, the samples were analyzed with a Bioanalyzer. The appearance of a second band indicates the presence of indels resulting from NHEJ events. (D) Targeting of Rosa26 locus with the gRNA Rosa26#1. Only indels were detected when SpCas9 wild-type or both SpCas9 moieties were transfected. (E) Targeting of Rosa26 locus with the gRNA Rosa26#3. Indels were detected by RFLP analysis. The XbaI resistant product could be only observed when SpCas9 wild-type or both SpCas9 moieties were transfected. (F) Targeting of Rab38 locus with the gRNA Rab38#2. Indels were detected by RFLP analysis. The XcmI resistant product could be only observed when SpCas9 wild-type or both SpCas9 moieties were transfected. (G) Quantification of the nuclease activity in each target sequence. FU: fluorescence units, s: seconds.

Targeting efficiency of endogenous genes with intein-mediated split–Cas9 system is comparable to wild-type SpCas9

In subsequent experiments, we validated our Intein-Cas9 system in several endogenous loci (Fus, Rosa26 and Rab38) in Neuro-2a cells. The first gene chosen was Fus (fused in sarcoma) because of its importance as a model for amyotrophic lateral sclerosis ( 27). The DNA of the entire cell population was isolated 48 h after the transient transfection. The region containing the CRISPR/Cas9 target site was PCR-amplified and the T7 endonuclease I assay was performed. Indels will form mismatched duplexes after the denaturing and reannealing step. T7 endonuclease I cleaves these mismatched regions and the existence of indels was revealed by the presence of digested PCR products (Figure 2c). As expected, only samples from cells exposed to the SpCas9, or to both moieties, presented the predicted digestion products, showing both a similar percentage of indels (23.1 and 22.7%, respectively), while in the negative controls none were detected (Figure 2c and g).

To validate the results obtained targeting the Fus gene, two other gene loci were targeted. We targeted Rosa26 as a second genomic locus with two different gRNAs (Figure 2d and e). Rosa26 is widely used as a safe locus in mouse to knock in foreign DNA ( 28). As in the previous experiment, the region containing the predicted target sequence was amplified by PCR. The presence of indels when gRNA-Rosa26#1 was used was revealed by the T7 endonuclease I assay, and the existence of an XbaI site on the binding site of gRNA-Rosa26#3 permitted us to analyze the existence of indels by RFLP. When gRNA-Rosa26#1 was used, the expected digested products were only detected when SpCas9 or the two SpCas9 moieties were added (64.0 and 48.1%, respectively) (Figure 2d, g). These data were confirmed using a second gRNA, gRNA-Rosa26#3. In this case, the indels will disrupt the XbaI site, resulting in XbaI resistant mutant DNA fragments (Figure 2e). Again we observed that the nuclease activity observed with SpCas9 intein-split version (42.9%) was comparable to wild-type SpCas9 (50.2%) (Figure 2e and g).

As a third locus, we targeted the Rab38 locus (Figure 2f). This locus was used by us before as a proof of principle genomic region to test nuclease activity of ZFN, TALEN and CRISPR/Cas9 ( 21, 29). The region containing the target sequence of the gRNA-Rab38#2 was PCR-amplified and analyzed by RFLP. The presence of indels was revealed by loss of XcmI site, whereas the wild-type DNA was digested (Figure 2f). The undigested product was only detectable when both split SpCas9 moieties (24.9%) or SpCas9 (22.7%) were used (Figure 2f and g).

The intein-mediated split–Cas9 system can be efficiently delivered via rAAV

The goal of developing an intein-mediated split–Cas9 system was to permit the efficient delivery via rAAVs. To demonstrate this, two recombinant AAV were produced, each carrying a split-half of the system (pAAV_crTLR#1_Nv1 and pAAV_crTLR#1_Cv1) (Figure 3a). A human cell line, AAVS1 TLR/+ ( 18), and a mouse cell line, Neuro-2a TLR, were used to validate our system. When the AAVS1 TLR/+ cell line was transduced with both rAAVs, the relative nuclease activity was at two orders of magnitude higher than the negative control or when only one rAAV was used (Figure 3b). In the case of the Neuro-2a TLR, a similar result was observed. The relative nuclease activity was one order of magnitude higher than the negative control and the single rAAV experiments (Figure 3c). In both cases, the actual activity observed was even higher because, with this reporter system, only one-third of the theoretical total nuclease activity is detected. These results demonstrate that the delivery of a complete full CRISPR/Cas9 system is possible using two rAAV without being restrained by truncated elements.

Demonstration that the split-intein split-SpCas9 system can be delivered by rAAV. (A) Overview of the split–Cas9 rAAV constructs (pAAV_crTLR#1_Nv1, pAAV_crTLR#1_Cv1). To ensure a high expression a strong synthetic mammalian promoter (CBh, green) and a bovine growth hormone (bGH, red) polyadenylation site was used. Split Cas9 cDNA is shown in orange, N-intein in dark brown and C-Intein in light brown. NLS (dark red): Nuclear localization signal. FLAG and HA tag are shown in light and dark grey, respectively. For gRNA expression (turquoise), U6 promoter was chosen (dark green). The inverted terminal repeats (ITR) are shown in light blue and light green. (B and C) Only nuclease activity was detectable when the two rAAV carrying the two moieties were added to the AAVS1 TLR/+ (b) or to the Neuro-2a TLR cells (c). In the experiments with only one of the moieties, the nuclease activity was indistinguishable from the negative control. Shown are means ± SD of three independent experiments.

Demonstration that the split-intein split-SpCas9 system can be delivered by rAAV. (A) Overview of the split–Cas9 rAAV constructs (pAAV_crTLR#1_Nv1, pAAV_crTLR#1_Cv1). To ensure a high expression a strong synthetic mammalian promoter (CBh, green) and a bovine growth hormone (bGH, red) polyadenylation site was used. Split Cas9 cDNA is shown in orange, N-intein in dark brown and C-Intein in light brown. NLS (dark red): Nuclear localization signal. FLAG and HA tag are shown in light and dark grey, respectively. For gRNA expression (turquoise), U6 promoter was chosen (dark green). The inverted terminal repeats (ITR) are shown in light blue and light green. (B and C) Only nuclease activity was detectable when the two rAAV carrying the two moieties were added to the AAVS1 TLR/+ (b) or to the Neuro-2a TLR cells (c). In the experiments with only one of the moieties, the nuclease activity was indistinguishable from the negative control. Shown are means ± SD of three independent experiments.

An intein-mediated Cas9 D10A nickase is functional and comparable to SpCas9 D10A

SpCas9 D10A nickases have been reported to significantly reduce genomic off-targets in vitro, which is of great importance for the usage of CRISPR-Cas9 in gene therapy ( 30). Therefore, we designed an intein-mediated split–Cas9 system (Figure 4a) to compare its efficiency in HDR-NHEJ-ratio to the unmodified SpCas9D10A nickase using the traffic light reporter cell line and two gRNAs (crTLR#1 and crTLR#3) (Figure 1c) and to the SpCas9 wild-type. Interestingly, the nickases showed higher preference for HDR than the wild-type SpCas9 (Figure 4c) with reduced NHEJ values. The split-nickase also shares the same ratio HDR versus NHEJ than SpCas9D10A nickase, but with a slightly overall decreased activity compared to wild-type. Its HDR efficiency was nevertheless as efficient as wild-type SpCas9 but with greatly reduced NHEJ (Figure 4c).

Plasmid transfection experiment for comparison of SpCas9 wild-type and nickase with its respective split versions, and of donor DNA as separate plasmid or accommodated directly on the AAV plasmid. (A) Overview of the split–Cas9 plasmids used to express SpCas9, SpCas9D10A and its split version (Cas9, Cas9D10A, N-Cas9_N-Intein_v1, C-Intein_C-Cas9_v1, gRNA crTLR#1/#2. (B) rAAV plasmids used: without donor sequence (pAAV_crTLR#1_Nv1), carrying the donor DNA flanked by CRISPR sites (pAAV_crTLR#1_CRISPR-Donor_Nv1) or not flanked (pAAV_crTLR#1_Donor_Nv1), C-Cas9 expression plasmid (pAAV_crTLR#1_Cv1). CBh promoter is shown in green, bovine growth hormone (bGH, red). Split Cas9 cDNA is shown in orange, N-intein in dark brown and C-Intein in light brown. NLS (dark red): Nuclear localization signal. FLAG and HA tag are shown in light and dark grey, respectively. For gRNA expression (turquoise), U6 promoter was chosen (dark green). The inverted terminal repeats (ITR) are shown in light blue and light green. Donor DNA sequence (Donor, dark green) (C) The double nicking strategy, SpCas9D10A in combination with two gRNAs, showed a preference for HDR compared to wild-type. This effect was also observed with split-SpCas9D10A but with decreased activity. With the different DNA donor strategies no differences were observed, but with donor DNA flanked by CRISPR sites reduced HDR was observed. Shown are means ± SD of three independent experiments.

Plasmid transfection experiment for comparison of SpCas9 wild-type and nickase with its respective split versions, and of donor DNA as separate plasmid or accommodated directly on the AAV plasmid. (A) Overview of the split–Cas9 plasmids used to express SpCas9, SpCas9D10A and its split version (Cas9, Cas9D10A, N-Cas9_N-Intein_v1, C-Intein_C-Cas9_v1, gRNA crTLR#1/#2. (B) rAAV plasmids used: without donor sequence (pAAV_crTLR#1_Nv1), carrying the donor DNA flanked by CRISPR sites (pAAV_crTLR#1_CRISPR-Donor_Nv1) or not flanked (pAAV_crTLR#1_Donor_Nv1), C-Cas9 expression plasmid (pAAV_crTLR#1_Cv1). CBh promoter is shown in green, bovine growth hormone (bGH, red). Split Cas9 cDNA is shown in orange, N-intein in dark brown and C-Intein in light brown. NLS (dark red): Nuclear localization signal. FLAG and HA tag are shown in light and dark grey, respectively. For gRNA expression (turquoise), U6 promoter was chosen (dark green). The inverted terminal repeats (ITR) are shown in light blue and light green. Donor DNA sequence (Donor, dark green) (C) The double nicking strategy, SpCas9D10A in combination with two gRNAs, showed a preference for HDR compared to wild-type. This effect was also observed with split-SpCas9D10A but with decreased activity. With the different DNA donor strategies no differences were observed, but with donor DNA flanked by CRISPR sites reduced HDR was observed. Shown are means ± SD of three independent experiments.

The donor DNA can be accommodated on the same plasmid encoding the split–CRISPR/Cas9 system

Many inherited diseases are monogenetic, originating from a point mutation in the corresponding gene, which therefore can be, in theory, corrected if a template region is provided after a DSB event. There is about 0.9 kb left on the dual-vector system before reaching the 5 kb, to integrate additional sequences. Thus, we included a donor sequence into one of the split–Cas9 plasmids (Figure 4b), and tested their efficiency using our TLR reporter system, in comparison to adding the donor as an independent plasmid. In addition, two versions of the rAAV-vector-encoded repair template were created: the first version contained the donor region, while, in the second version, the donor region was additionally flanked by CRISPR recognition sites (pAAV_crTLR#1_CRISPR-Donor_Nv1, pAAV_crTLR#1_Donor_Nv1, Figure 4b). When this system was tested by plasmid transfection, the HDR values observed with the non-flanked donor were comparable to the wild-type and split SpCas9 variants in the presence of an extra donor plasmid (Figure 4c). In the case of the flanked donor, the nuclease activity was lower. This could be probably caused by DNA instability after being cut by SpCas9 (Figure 4c). These results can be useful in designing a vector system for genome engineering with a lower number of plasmids, making the transfection of all of them easier.


Inteins

In most cases, each gene encodes a single protein, but cells have found ways around this limitation. Viruses, with their tiny genomes, often contain genes that encode long polyproteins, which are then chopped into a bunch of functional pieces by enzymes. Inteins are another way that cells make several proteins from one gene. The first example of an intein was discovered in the yeast vacuolar ATPase (shown here from PDB entry 1jva ). The gene encodes the ATPase protein along with additional protein embedded in the middle. This embedded protein is termed an intein (shown here in green), and the two halves of the ATPase are termed exteins (shown here in red and blue--note that this structure only includes a small portion of the exteins). When the protein is made, the intein splices itself out of the chain and connects together the two exteins, creating the functional ATPase.

Homing Endonucleases

Many inteins include two parts: a portion that splices the intein out of the overall protein chain, and a portion that acts as a DNA-cutting enzyme. This enzyme is often termed a homing endonuclease because it homes in on DNA that doesn't encode an intein. Homing endonucleases act as selfish genetic elements, with an insidious method for propagating themselves between different copies of genes. They don't attack genes that already encode inteins, but they cleave genes without them. Then, the cell's normal repair mechanism will fix the break, but using the intein-including gene as the template. So, when the damage is corrected, the gene is left with an intein. This isn't a problem, though, since the intein will splice itself out of the protein when it's made.

Inteins in the Lab

Inteins are modular protein splicing machines, and thus have been useful for biotechnology. Working inteins have been isolated from cells, and then engineered into new proteins to create self-splicing proteins for specific functions. For instance, engineered inteins have been used to connect together peptides with different types of labels to assist with NMR experiments, or to add non-natural amino acids to a protein, or even to connect proteins to quantum dots. Inteins have also been used to connect the two ends of a chain, creating a cyclic protein.

Large and Small

Not all inteins include a homing nuclease. Some forms, termed mini-inteins, include only the splicing portion. Examples of each are shown here. On the left is a typical large intein that includes a homing endonuclease, shown here spliced out of its protein and bound to DNA (PDB entry 1lws ). On the right is a mini-intein that is found in the gene for a mycobacterial gyrase (PDB entry 1am2 ). It includes just enough protein to perform the splicing reaction.

Exploring the Structure

Two structures of the vacuolar ATPase intein show the protein before and after its splicing reaction. In both cases, a few of the catalytic amino acids were mutated to slow the reaction, so that the structure could solved. PDB entry 1jva (left) shows the intein (green) before the reaction, with small segments of the exteins (red and blue) attached. PDB entry 1um2 (right) is after the reaction, and the two exteins have been connected together. In the actual protein, the exteins are much larger, and when joined they form part of a large proton pump. To compare these two structures in more detail, click on the image for an interactive Jmol.

Topics for Further Discussion

  1. Structures are available for many different inteins. Can you find them in the PDB? Do the structures include the homing endonuclease, or are they mini-inteins?
  2. Do the structures include portions of the exteins? Why is it difficult to determine structures that include exteins?

Related PDB-101 Resources

References

  1. S. Elleuche and S. Poggeler (2010) Inteins, valuable genetic elements in molecular biology and biotechnology. Applied Microbiology and Biotechnology 87, 479-489.
  2. Y. Anraku and Y. Satow (2009) Reflections on protein splicing: structures, functions and mechanisms. Proceedings of the Japan Academy, Series B, 85, 409-421.
  3. Y. Anraku, R. Mizutani and Y. Satow (2005) Protein splicing: its discovery and structural insight into novel chemical mechanisms. IUBMB Life 57, 563-574.

November 2010, David Goodsell

About PDB-101

PDB-101 helps teachers, students, and the general public explore the 3D world of proteins and nucleic acids. Learning about their diverse shapes and functions helps to understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease to biological energy.

Why PDB-101? Researchers around the globe make these 3D structures freely available at the Protein Data Bank (PDB) archive. PDB-101 builds introductory materials to help beginners get started in the subject ("101", as in an entry level course) as well as resources for extended learning.


BRIEF SUMMARY OF THE INVENTION

The present invention provides robust split inteins and methods of using the same. The split inteins are active over a large temperature range, over a wide pH range, and in the presence of chaotropic salts. They also show high tolerance to sequence variability in fused heterologous polypeptides. These features make the split inteins especially useful in protein purification and engineering techniques.

In particular, fusion proteins comprising (i) an intein domain at least 75% identical to a sequence selected from the group consisting of SEQ ID NOs: 7, 16, 24, 38 and 65 and (ii) a heterologous polypeptide, wherein the heterologous polypeptide is C-terminal to the intein domain are provided. In some embodiments, the last amino acid of the intein domain is asparagine or glutamine. In some embodiments, the last amino acid of the intein domain is an amino acid other than asparagine or glutamine, e.g., an alanine In some embodiments, the penultimate amino acid of the intein domain is an amino acid other than histidine. In some embodiments, the heterologous polypeptide is directly linked to the intein domain via a peptide bond. In some embodiments, the first amino acid of the heterologous polypeptide is serine, cysteine, or threonine. In some embodiments, the last amino acid of the intein domain is an amino acid other than asparagine or glutamine, e.g., an alanine and the first amino acid of the heterologous polypeptide is other than serine, threonine or cysteine, e.g. alanine In some embodiments, the fusion protein further comprises a linker between the heterologous polypeptide and the intein domain. In some embodiments, the first amino acid of the linker is serine, cysteine, or threonine. In some embodiments, the first amino acid of the linker is an amino acid other than serine, cysteine, or threonine, i.e an alanine In some embodiments, the last amino acid of the intein domain is an amino acid other than asparagine or glutamine, e.g., an alanine and the first amino acid of the linker is an amino acid other than serine, threonine or cysteine. e.g an alanine In some embodiments, the linker comprises 1-5 amino acids of a native extein sequence. Fusion proteins comprising an intein domain having a sequence selected from the group consisting of SEQ ID NOs: 7, 16, 24, 38 and 65 and (ii) a heterologous polypeptide, wherein the heterologous polypeptide is C-terminal to the intein domain are also provided.

In addition, fusion proteins comprising (i) an intein domain at least 75% identical to a sequence selected from the group consisting of SEQ ID NOs: 3, 12, 20, 34 and 64 and (ii) a heterologous polypeptide, wherein the heterologous polypeptide is N-terminal to the intein domain are provided. In some embodiments, the first amino acid of the intein domain is a cysteine. In some embodiments, the first amino acid of the intein domain is an amino acid other than serine or cysteine, e.g., an alanine In some embodiments, the heterologous polypeptide is directly linked to the intein domain via a peptide bond. In some embodiments, the fusion protein further comprises a linker between the heterologous polypeptide and the intein domain. In some embodiments, the linker comprises 1-5 amino acids of a native extein sequence. Fusion proteins comprising an intein domain having a sequence selected from the group consisting of SEQ ID NOs: 3, 12, 20, 34 and 64 and a heterologous polypeptide, wherein the heterologous polypeptide is N-terminal to the intein domain are also provided.

Furthermore, fusion proteins comprising a first intein domain, a second intein domain, and a heterologous polypeptide are provided. Furthermore, fusion proteins comprising a first intein domain, a second intein domain, and a heterologous polypeptide are provided wherein the heterologous polypeptide is N-terminal to the first intein domain, and the heterologous polypeptide is

C-terminal to the second intein domain. Furthermore, fusion proteins comprising a first intein domain, a second intein domain, and a heterologous polypeptide are provided wherein the heterologous polypeptide is N-terminal to the first intein domain (N-terminal splicing domain), and the heterologous polypeptide is C-terminal to the second intein domain (C-terminal splicing domain). In some embodiments, (a) the first intein domain is at least 75% identical to SEQ ID NO:3 and the second intein domain is at least 75% identical to SEQ ID NO:7 (b) the first intein domain is at least 75% identical to SEQ ID NO:12 and the second intein domain is at least 75% identical to SEQ ID NO:16 (c) the first intein domain is at least 75% identical to SEQ ID

NO:20 and the second intein domain is at least 75% identical to SEQ ID NO:24 (d) the first intein domain is at least 75% identical to SEQ ID NO:34 and the second intein domain is at least 75% identical to SEQ ID NO:38 or (d) the first intein domain is at least 75% identical to SEQ ID NO:64 and the second intein domain is at least 75% identical to SEQ ID NO:65. In some embodiments, the first amino acid of the heterologous polypeptide is serine, cysteine, or threonine. In some embodiments, the fusion protein further comprises a linker between the heterologous polypeptide and the second intein domain, wherein the first amino acid of the linker is serine, cysteine, or threonine. In some embodiments, the first amino acid of the linker is serine.

Polynucleotides encoding the fusion proteins according to the invention are also provided herein.

Compositions comprising fusion proteins are also provided. Such compositions are useful, for example, for C-terminal cleavage reactions, N-terminal cleavage reactions, trans-splicing reactions, and protein-cyclization methods.

Host cells comprising the proteins, fusion proteins, polynucleotides, or compositions are also provided.

Methods of using polypeptides and fusion proteins provided herein in, for example, C-terminal cleavage reactions, N-terminal cleavage reactions, trans-splicing reactions, and protein-cyclization are provided. Such methods can occur at temperatures of about 0° C. to about 60° C. at a pH of about 6 to about 10 , and/or in the presence of about 0.5 M to about 6 M urea.

In some embodiments, the reaction rate constant of the reactions provided herein is at least about 1×10 −1 s− 1 , or at least about 2×10 −1 s− 1 . In some embodiments, the reaction rate half-life is less than about 100 seconds, less than about 50 seconds, or less than about 25 seconds or less than about 15 seconds.

The reactions can be initiated, for example, by a shift in temperature or pH or mixing proteins.

The invention also provides a vector which comprises a polynucleotide encoding an intein domain at least 75% identical to a sequence selected from the group consisting of SEQ ID NOs: 7, 16, 24, 38 and 65 and at least a cloning site downstream of said polynucleotide which allows the cloning of a polynucleotide of interest such that a polynucleotide is formed which encodes a fusion protein comprising the intein domain and the polypeptide encoded by the polynucleotide of interest.

The invention also provides a vector which comprises a polynucleotide encoding an intein domain at least 75% identical to a sequence selected from the group consisting of SEQ ID NOs: 3, 12, 20, 34 and 64 and at least a cloning site upstream of said polynucleotide which allows the cloning of a polynucleotide of interest such that a polynucleotide is formed which encodes a fusion protein comprising the polypeptide encoded by the polynucleotide of interest and the intein domain.

    • a. if the first intein domain is at least 75% identical to SEQ ID NO:7, then the second intein domain is at least 75% identical to SEQ ID NO:3
    • b. if the first intein domain is at least 75% identical to SEQ ID NO:16 then the second intein domain is at least 75% identical to SEQ ID NO:12
    • c. if the first intein domain is at least 75% identical to SEQ ID NO:24, then the second intein domain is at least 75% identical to SEQ ID NO:20
    • d. if the first intein domain is at least 75% identical to SEQ ID NO:38, then the second intein domain is at least 75% identical to SEQ ID NO:34.
      • a. if the first intein domain is at least 75% identical to SEQ ID NO:7, then the second intein domain is at least 75% identical to SEQ ID NO:3
      • b. if the first intein domain is at least 75% identical to SEQ ID NO:16 then the second intein domain is at least 75% identical to SEQ ID NO:12
      • c. if the first intein domain is at least 75% identical to SEQ ID NO:24, then the second intein domain is at least 75% identical to SEQ ID NO:20
      • d. if the first intein domain is at least 75% identical to SEQ ID NO:38, then the second intein domain is at least 75% identical to SEQ ID NO:34 or
      • e. if the first intein domain is at least 75% identical to SEQ ID NO:65, then the second intein domain is at least 75% identical to SEQ ID NO:64.

      From the Back Cover

      This volume focuses on applications of split inteins, and the progress that has been made in the past 5 years on discovery and engineering of fast and more efficient split inteins. The first few chapters in Split Inteins: Methods and Protocols explore new techniques on how to use split inteins for affinity purification of overproduced proteins, and split-intein based technologies to prepare cyclic peptides and proteins. The next few chapters discuss semisynthetic protein trans-splicing using one synthetic intein piece, synthetic intein-extein pieces used to deliver other cargos for chemical modification both of purified proteins and of proteins in living cells, as well as isotopic labeling of proteins for NMR studies, and a discussion on how protein block copolymers can be generated by protein trans-splicing to form protein hydrogels. The last few chapters deal with intein applications in transgenic plants and conditional inteins that can be regulated in artificial ways by small molecules or light, a cassette-based approach to quickly test many intein insertion positions, and a computational approach to predict new intein split sites (the approach also works for other proteins). Written in the highly successful Methods in Molecular Biology series format, chapters include introduction to their respective topics, lists of the necessary materials and reagents, step-by-step, readily reproducible laboratory protocols, and tips on troubleshooting and avoiding known pitfalls.

      Cutting-edge and thorough, Split Inteins: Methods and Protocols is a valuable resource that will provide guidance toward possibilities of split intein applications, explore proven and detailed protocols adaptable to various research projects, and inspire new method developments.


      Group I Introns and Inteins: Disparate Origins but Convergent Parasitic Strategies

      FIG. 1 . (A) Predicted secondary structure of a group I intron (Cbu.L1951) (93). Paired, conserved helices common to group I introns are designated P1 to P10. The 5′- and 3′-terminal intron bases are encircled. The intron sequence is in uppercase 5′ and 3′ exons are in lowercase and colored red and blue, respectively. P1 and P10 together form the IGS. The site of HE insertion in P8 is indicated in green. (B) Mechanism of group I intron splicing (110). 5′ and 3′ exons are in red and blue, respectively. ΩG, terminal intron guanine. G*, exogenous guanosine. (Step 1) Nucleophilic attack on the 5′ splice site by the 3′-OH of G* in GBS. (Step 2) Nucleophilic attack on the 3′ splice site by the free 3′-OH of the 5′ exon. (Step 3) Free intron and spliced exons. FIG. 2 . (A) Intein modular structure (87, 90). An intein with flanking exteins is shown. The N- and C-terminal splicing domains and the optional HE domain with their conserved motifs are shown. Conserved amino acids Cys or Ser at the 5′ end of the intein, Asn at the 3′ end of the intein, and Cys, Ser, or Thr at the first position on the 3′ extein are also indicated. (B) Intein splicing chemistry (39, 87). N and C exteins are in red and blue, respectively. FIG. 3 . (A) Intein or group I intron homing through HE-mediated DNA double-strand-break repair and recombination. (B) Cycle model of HE gain, degeneration, and loss within host populations (37, 38, 39). FIG. 4 . Convergence of evolutionary paths of group I introns, inteins, and HEs.

      Supporting information

      S1 Fig

      (A) A phylogenetic tree of Prp8 inteins was reconstructed based on an amino acid multiple sequence alignment of the splicing blocks (A, B, F, G) using the NJ algorithm and an interior-branch test with 1,000 replicates. Fifty representatives covering Prp8 intein diversity were selected, and the full name of each intein-containing organism is listed. Colored symbols represent the insertion site and correspond to colors in Fig 1A . Letters (a1, a2, b, c, d, e, f, g) represent each of the 7 unique insertion sites. (B) A phylogenetic tree of Prp8 inteins was reconstructed based on an amino acid multiple sequence alignment of the splicing blocks (A, B, F, G) using the ML method and evaluated with SH-aLRT. The substitution model, WAG+G+I, was selected using ProtTest 3 (https://github.com/ddarriba/prottest3). ML tree follows the same formatting as in panel A and shows similar architecture as NJ tree. Amoebo, Amoebozoa Asco, Ascomycota Basidio, Basidiomycota Blasto, Blastocladiomycota Choano, Choanoflagellida Chloro Viridipl, Chlorophyta Viridiplantae Chytridio, Chytridiomycota ML, maximum likelihood Mucoro, Mucoromycota NJ, neighbor-joining Opistho, Opisthokonta Prp8, pre-mRNA processing factor 8 SH-aLRT, Shimodaira–Hasegawa nonparametric approximate likelihood-ratio test

      S2 Fig

      Comparative analysis of amino acid residues found in Blocks A, B, F, and G from the selected 50 representative Prp8 inteins, shown with abbreviated species names (full names in S1 Fig). Letters (a1, a2, b, c, d, e, f, g) represent each of the 7 unique insertion sites. Shading is as follows: black, identical amino acid dark gray, conserved amino acid light gray, similar amino acid substitution. Prp8, pre-mRNA processing factor 8

      S3 Fig

      In the amoeba Asu, an intein was identified at a new site in Prp8, here termed g. This is the seventh site in which a Prp8 intein has been found. The full site g intein sequence is shown, plus 10 flanking N-extein (blue) and C-extein (green) amino acids. The Asu C1 (yellow) and terminal asparagine (red) are highlighted. Residue numbering corresponds to the Asu Prp8 exteins. Accession number: XP_0127532. Asu, Acytostelium subglobosum Prp8, pre-mRNA processing factor 8

      S4 Fig

      (A) A phylogenetic tree of Prp8 exteins corresponding to inteins (see S1 Fig) was reconstructed based on an amino acid multiple sequence alignment using the NJ algorithm and an interior-branch test with 1,000 replicates. Extreme conservation among Prp8 exteins is observed along with grouping by host organism phylogeny. Colored symbols represent the intein insertion site of the exteins and correspond to colors in Fig 1A . Letters (a1, a2, b, c, d, e, f, and g) represent each of the 7 unique insertion sites. Phylum abbreviations are listed in the S1 Fig legend. (B) A phylogenetic tree of Prp8 exteins was reconstructed based on an amino acid multiple sequence alignment of the splicing blocks (A, B, F, G) using the ML method and evaluated with SH-aLRT. The substitution model, LG+G, was selected using ProtTest 3 (https://github.com/ddarriba/prottest3). Tree follows the same formatting as in panel A. ML, maximum likelihood NJ, neighbor-joining Prp8, pre-mRNA processing factor 8 SH-aLRT, Shimodaira–Hasegawa nonparametric approximate likelihood-ratio test.

      S5 Fig

      (A) Overlay of the Sce VMA1 intein and Cne Prp8 intein active sites. The Sce VMA1 intein (cyan, PDB 1GPP) was overlaid with the Cne Prp8 intein (red). The active site residues, crucial to protein splicing, are shown as sticks and labeled. A majority of these conserved residues overlap exactly, such as the catalytic C1, and the Block B TxxH motif. The Sce VMA1 intein uses an asparagine (N76) rather than threonine in the TxxH motif, but the positioning is similar to the threonine (T62) of the Cne Prp8 intein. The penultimate histidines (H170 and H453) are in comparable positions except for the side chains, whose chi angles are different by 45°. The Sce VMA1 intein was not solved with the terminal asparagine. (B) Structural comparison of bacterial Mtu RecA intein and fungal Cne Prp8 intein. Overlay of the Mtu RecA intein (brown, PDB 2IMZ), and the Cne Prp8 intein (red) reveals structural similarities in major intein features, such as the anti-parallel β-sheet folding, that contribute to the horseshoe shape. The Hint domain, comprised of splicing Blocks A, B, F, and G, are generally aligned between the 2 inteins. The structures deviate at sequences between Blocks B and F, where the Cne Prp8 intein encoded a linker or endonuclease domain. The 2 structures have an RMSD value of 2.22 Å. Cne, C. neoformans Mtu, Mycobacterium tuberculosis PDB, Protein Data Bank Prp8, pre-mRNA processing factor 8 RMSD, Root-mean-square deviation Sce, Saccharomyces cerevisiae

      S6 Fig

      (A) Diverse Prp8 intein splicing patterns. Several Prp8 inteins from other fungal pathogens Afu, Bde, and Hca were cloned into MIG. Splicing was observed over time by the loss of precursor (P) and increase in LE, or simply by the presence of ligated exteins (for Afu). The gel shows that not all Prp8 inteins splice similarly, despite being placed in an identical extein context. (B) Precursor amounts vary greatly. A quantitation of precursor (P) at each time point shows that these Prp8 inteins are active but splice at variable rates. The Afu Prp8 intein is almost entirely spliced at the start of the assay (0 h), whereas Bde has 31% precursor at 0 h and Hca has 14% precursor at 0 h. Initial splicing rates were determined by calculating the loss of precursor over time (Pt0−Pt1/60 min) with standard error for MIG Bde Prp8 and MIG Hca Prp8, and are (5.9 ± 0.4) × 10 𢄢 % per min and (2.7 ± 0.9) × 10 𢄢 % per min, respectively. This suggests intein-mediated control of protein splicing. Data are representative of 3 biological replicates and mean standard deviations are shown. Trend lines are fit to show the decay curve. Data available in S1 Data. Afu, Aspergillus fumigatus Bde, Batrachochytrium dendrobatidis Hca, Histoplasma capsulatum LE, ligated exteins MIG, MBP-Intein-GFP Prp8, pre-mRNA processing factor 8

      S7 Fig

      (A) Copper treatment causes inhibition. Induced MIG Prp8 A-1V cells were lysed and treated with 0 or 1 mM CuSO4. The lysates were incubated for the indicated time at 30ଌ and then frozen. Samples were separated on SDS-PAGE and scanned for GFP fluorescence. In the absence of copper, MIG Prp8 A-1V spliced well over 30 h, converting P into LE. There was little to no conversion of P to LE over time with copper addition. Quantitation is shown below in a stacked plot. Data are representative of 3 biological replicates and mean standard deviations are shown. Data available in S1 Data. (B) Relative position of 2 cysteines. There are only 2 cysteines present in the Cne Prp8 intein. Using the solved structure, a measurement of the distance between C1 and C61 (shown as sticks) was calculated to be 8.9 Å. (C) Valine is the preferred residue at position 61. A sequence logo was constructed of Block B from the 50 representative Prp8 inteins (S1 Fig). This shows absolute conservation of the histidine (position 10) and a strong preference for threonine (position 7) in the TxxH motif. However, the Block B cysteine (position 6, red box) is not highly conserved across Prp8 inteins, and most encode valine at this site. Cne, C. neoformans GFP, green fluorescent protein LE, ligated exteins MIG, MBP-Intein-GFP P, precursor Prp8, pre-mRNA processing factor 8

      S8 Fig

      (A) Mutations to C61 in MIG Prp8 A-1V slow down splicing. The B block C61 was mutated to valine (C61V), alanine (C61A), and serine (C61S), and splicing was observed over time in MIG. Initial splicing rates were determined by calculating the loss of precursor over time (Pt0−Pt1/60 min) with standard error and are as follows: WT, (1.01 ± 0.07) × 10 𢄡 % per min C61V, (1.07 ± 0.08) × 10 𢄡 % per min C61A, (6.22 ± 0.50) × 10 𢄢 % per min, and C61S, (2.92 ± 1.04) × 10 𢄢 % per min. The C61V mutant splices similarly to WT, whereas C61A and C61S are slower. A quantitation is shown to the right with the amount of precursor (P) at each time point. Data are representative of 3 biological replicates and mean standard deviations are shown. Trend lines are fit to show the decay curve. Data available in S1 Data. (B) MIG Prp8 A-1V B block cysteine mutants are inhibited by copper. To test whether copper inhibition was caused by C1 oxidation, C61 mutants were treated with CuSO4. After induction of MIG, the cells were lysed, and 1 mM CuSO4 was added. The lysates were incubated at 30ଌ, and aliquots were collected at the indicated time. Samples were run on SDS-PAGE and scanned for GFP fluorescence. None of the C61 mutants show an increase in LE over time, with little loss of precursor (P). This indicates that at least C1 oxidation by copper is sufficient to cause the observed splicing inhibition and that disulfide bonds are not involved. Quantitation is shown below in a stacked plot. Data are representative of 3 biological replicates, and mean standard deviations are shown. Data available in S1 Data. GFP, green fluorescent protein LE, ligated exteins MIG, MBP-Intein-GFP Prp8, pre-mRNA processing factor 8 WT, wild type.

      S9 Fig

      (A) Intact Cne Prp8 intein shows small mass shift. Purified Cne Prp8 intein was untreated or treated with 10× excess copper and separated and analyzed using LC-MS. The peaks were deconvoluted, and the expected mass of the Prp8 intein, 19,588 Da, is seen as the largest peak. A small, 32 Da shift (19,620 Da) was visible with both no treatment and copper treatment only (arrow). This suggests that highly reactive cysteines are modified by atmospheric oxygen alone. (B) C1 and C61 are oxidized with copper treatment. Trypsin-digested fragments of copper-treated Cne Prp8 intein were separated and sprayed using LC-MS/MS (insets). Peptides (red peaks) containing C1 or C61 were detected and further analyzed using multiple reaction MIDAS to confirm the identity and location of oxidation. The chromatogram shows elution time for both cysteines consistent with a single additional oxygen or a sulfenic acid modification. Cne, C. neoformans LC-MS, liquid chromatography-mass spectrometry LC-MS/MS, liquid chromatography-mass spectrometry/mass spectrometry MIDAS, monitoring-initiated detection and sequencing Prp8, pre-mRNA processing factor 8

      S10 Fig

      The 7 unique insertion sites (a–g) were mapped to a solved structure of Prp8 from a S. cerevisiae C complex spliceosome (PDB 5GMK, chain A from Wan and colleagues, 2016) by locating the +1 residue. This Prp8 structure was used because the insertion sites are all resolved. The +1 residues are shown as red spheres and labeled a through g. Most Prp8 inteins localize close to the active center of Prp8. Some insertions are in the N-terminal domain, which provides structural integrity to the spliceosome. A corresponding line diagram of Prp8 exteins shows the domains of the host protein from amino acid residues 127 to 2084 with arrows indicating the site of intein insertion with the residue number and insertion site letter. The domains are as follows: N-terminal domain, gray RT Palm/Finger, dark blue Thumb/X, light blue linker, green endonuclease, yellow and RNase H-like, orange. PDB, Protein Data Bank Prp8, pre-mRNA processing factor 8

      S11 Fig

      The Prp8 intein-containing Prp8 precursor model was docked into a cryo-EM tri-snRNP structure from Sce (PDB 5GAN) to look for intein-spliceosome disruptions. Prp8 is shown as lavender, and the Prp8 intein is shown as red, and the rest of the tri-snRP components are colored by chain. This reveals that the Prp8 intein would occupy a relatively crowded, centralized location of the tri-snRNP (circled). The intein clashes are shown here (with labels) and noted in Fig 7B . Cne, C. neoformans cryo-EM, cryogenic electron microscopy PDB, Protein Data Bank Prp8, pre-mRNA processing factor 8 Sce, S. cerevisiae tri-snRNP, triple small nuclear ribonucleoprotein.

      S1 Table

      A list of bacterial strains used for various cloning, overexpression, and purification studies is provided. Strains of fungi and yeast used for in vivo studies are also listed.

      S2 Table

      A list of MIG constructs and purification vectors with corresponding backbones are provided. MIG, MBP-Intein-GFP.

      S3 Table

      Primers used for the construction of various vectors or for the mutation of plasmids are provided.

      S4 Table

      Data collection, refinement statistics, and model details for (A) the unbound and (B) the Zn 2+ -bound Cne Prp8 intein crystal structures. Cne, C. neoformans Prp8, pre-mRNA processing factor 8

      S1 Data

      Individual numerical values that underlie any graphs (Figs ​ (Figs4B, 4B , ​ ,4C, 4C , ​ ,5A, 5A , ​ ,5B, 5B , and S6B, S7A, S8A and S8B Figs) are provided in separate sheets. Values were calculated from biological triplicate gel images using ImageJ software. Levels of precursor (P), LE, and OPC products are given out of a total of 100. Some graphs use percent precursor as a proxy for splicing. Time points are indicated. LE, ligated exteins MIG, MBP-Intein-GFP OPC, off-pathway cleavage Prp8, pre-mRNA processing factor 8


      Suffix: -dactyl

      Adactyly (a - dactyl - y) - a condition characterized by the absence of fingers or toes at birth.

      Anisodactyly (aniso - dactyl - y) - describes a condition in which corresponding fingers or toes are unequal in length.

      Artiodactyl (artio - dactyl) - even-toed hoofed mammals which include animals such as sheep, giraffes, and pigs.

      Brachydactyly (brachy - dactyl - y) - a condition in which fingers or toes are unusually short.

      Camptodactyly (campto - dactyl - y) - describes the abnormal bending of one or more fingers or toes. Camptodactyly is usually congenital and most often occurs in the little finger.

      Clinodactyly (clino - dactyl - y) - of or relating to the curvature of a digit, whether a finger or a toe. In humans, the most common form is the smallest finger curving toward the adjacent finger.

      Didactyl (di - dactyl) - an organism that only has two fingers per hand or two toes per foot.

      Ectrodactyly (ectro - dactyl - y) - a congenital condition in which all or part of a finger (fingers) or toe (toes) is missing. Ectrodactyly is also known as a split hand or split foot deformity.

      Hexadactylism (hexa - dactyl - ism) - an organism that has six toes per foot or six fingers per hand.

      Macrodactyly (macro - dactyly) - possessing overlay large fingers or toes. It is typically due to an overgroth of bone tissue.

      Monodactyl (mono - dactyl) - an organism with only one digit per foot. A horse is an example of a monodactyl.

      Oligodactyly (oligo - dactyl - y) - having fewer than five fingers on the hand or five toes on the foot.

      Pentadactyl (penta - dactyl) - an organism with five fingers per hand and five toes per foot.

      Perissodactyl (perisso - dactyl) - odd-toed hoofed mammals such as horses, zebras, and rhinoceroses.

      Polydactyly (poly - dactyl - y) - the development of extra fingers or toes.

      Pterodactyl (ptero - dactyl) - an extinct flying reptile that had wings covering an elongated digit.

      Syndactyly (syn - dactyl - y) - a condition in which some or all of the fingers or toes are fused together at the skin and not bone. It is commonly referred to as webbing.

      Zygodactyly (zygo - dactyl - y) - a type of syndactyly in which all the fingers or toes are fused together.


      Watch the video: Intein Annimation (September 2022).


Comments:

  1. Jamel

    In my opinion you are not right. I can defend my position. Write to me in PM, we will discuss.

  2. Zoolal

    I think, that you commit an error.

  3. Collin

    I confirm. I join told all above. We can communicate on this theme. Here or in PM.

  4. Faerrleah

    Unequivocally, a prompt reply :)



Write a message