Articles Service
Research article
Novel High-Throughput DNA Part Characterization Technique for Synthetic Biology
1Synthetic Biology Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 34141, Republic of Korea
2Biosystems and Bioengineering Program, University of Science and Technology, Daejeon 34141, Republic of Korea
J. Microbiol. Biotechnol. 2022; 32(8): 1026-1033
Published August 28, 2022 https://doi.org/10.4014/jmb.2207.07013
Copyright © The Korean Society for Microbiology and Biotechnology.
Abstract
Keywords
Graphical Abstract
Introduction
In synthetic biology, the term DNA part refers to a DNA fragment with a specific function. Each fragment is composed of a promoter, ribosome-binding site (RBS), terminator, and insulator that together regulate the expression of the coding DNA sequence. Such characterized DNA parts and the standardized assembly methods provide the basis of optimized genetic circuits or metabolic pathways. DNA part characterization is commonly defined as quantification of expression rate of the part. Determining the quantitative characteristics of DNA from a given strain is important for designing and constructing a predictable genetic circuit. To this end, synthetic biologists have characterized the biologically standardized DNA parts and stored the information in databases such as the iGEM Parts Registry [1-4]. Currently, the Registry holds data on more than 20,000 identified DNA parts, but many are yet to be characterized, and those that have been characterized are mostly limited to
In general, it is difficult to measure the attributes of hundreds of thousands of DNA parts as well as determine optimal culture conditions for the parts. The most well-known DNA part characterization method is the measurement of the relative strength of the DNA part through the expression of fluorescent proteins. This method involves manipulating the genotype (including the given DNA part) for fluorescent protein expression and subsequent cloning and characterization of the part’s strength based on its corresponding phenotype (measured via fluorescence intensity). However, this method requires expensive devices such as flow cytometers to measure single-cell level signals. In addition, the time taken for cloning and single-cell fluorescence measurements increases exponentially as the number of DNA parts increases. Moreover, if the respective DNA parts are used in a different host strain, the strength of the parts may change from what was determined originally, demonstrating the need to develop a technique to rapidly characterize multiple DNA parts in many different strains [8, 9]. In this study, we developed an imaging-based, high-throughput DNA part characterization technique that uses solid plate-based phenotyping and in-house, long-read sequencing techniques [10-12].
Materials and Methods
Strain and DNA Preparation
Golden Gate Assembly for Combinatorial Library Assembly
For Golden Gate assembly, the selected DNA parts were designed with the addition of a BsaI-specific site (GGTCTC) on either end of the sequence as well as a 1 bp spacer sequence (A) for enzyme function and a 4 bp designed overhang sequence for each part type (Table S2). Golden Gate assembly was performed using a modified pACBB vector that had an inserted BsaI site [13, 14]. Five stretches of 4 bp sticky end overhangs were selected considering the DNA ligation efficiency to allow up to four different DNA parts to assemble in the destination vector (pACBB) with high accuracy [15]. For the Golden Gate assembly reaction, 56 fmol of DNA part solution was added to 112 fmol of destination vector solution, and the volume of the DNA solution amounted to 16 μl using DW. Next, 1 μl of BsaI_HFv2 (NEB #R3733), 1 μl of T4 DNA ligase (HC), and 2 μl of T4 DNA ligase (HC) buffer (Promega, USA) were added to the DNA to create a 20 μl reaction solution. The DNA concentrations were measured using a Qubit Fluorometer 4 (Invitrogen, USA) and a 1× dsDNA HS Assay Kit (Invitrogen).
High-Throughput Phenotyping with Fluorescent Colony Image
The combinatorial part characterization circuit library assembled via Golden Gate cloning was transformed into each
Assembled Plasmid Library Sequencing
The circular DNA library prepared through Golden Gate assembly was immediately treated using the Rapid Barcoding Sequencing protocol from Oxford Nanopore Technologies (ONT, UK). The DNA concentrations were determined using a Qubit fluorometer and then increased to suitable levels using an AMpure XP Bead (Beckman Coulter, USA). The complete sequencing library was loaded onto an R9.4.1 Flongle Flow Cell (ONT) for sequencing in a MiniON MK1C device. The raw sequencing data were converted and pre-processed into fastq files with NanoFilt, the guppy basecaller (v5.0.1) from ONT [17, 18]. To verify the DNA parts of each selected read, minimap2 was used for multi-reference mapping of the reference sequence, which included all DNA part sequences available for combination [19]. For the final result, primary mapping data were selected using SAMtools [20]. The “pysam” package of Python software (v3.7.0) was used to determine the reads at ≥ 95% query coverage. With upper processed reads, the count number of reads by each genotype and profile for the whole sequencing library was based on the ratio for each genotype.
Pooled Colony Genotyping with Tagged Sequencing
The GFP module of the characterization circuit was amplified for sequencing through colony PCR using AccuPower PCR Premix (Bioneer, Korea). The colony PCR was performed using the forward and reverse tag primers that bound the tagging primer-binding site at either end of the GFP module. The tag primer consists of a 7 bp sequence as a tagged barcode and a 17–20 bp binding sequence. To produce the tag sequence, the “create.dnabarcodes” of the “DNABarcodes” package in R Bioconductor was used [21]. Eight forward and twelve reverse primers were designed to confer different tag sequence combinations to as many as 96 colonies. The tag primer sequences are presented in Table S3. The tag-attached amplified DNA generated via colony PCR was collected in a single tube for sequencing. The pooled amplified DNA samples were prepared using the Ligation Sequencing Kit (ONT). Running sequencing and pre-processing methods were the same as described above. Selected reads were demultiplexed by their own forward and reverse tag using R software (v4.0.5) with “VmatchPattern” of the R Bioconductor “Biostrings” package [22].
Calculating the Relative DNA Part Unit
The quantified values of the DNA parts were expressed as relative promoter unit (RPU) and relative RBS unit (RRU), respectively, in reference to [5]. For the calculation, the promoter (J23119) and RBS (B0030) were used as the standard promoter and RBS, respectively. These standard parts were cloned and the standard characterization circuit (J23119 – B0030 – sfGFP – L3S2P56) was prepared. The colony fluorescence intensity of the standard circuit was used as the baseline. The promoter or RBS of the standard circuit was replaced with the part to be measured, and the colony fluorescence intensity was measured to calculate the relative promoter unit or relative RBS unit (RPU and RRU, respectively) as follows:
Results
Building a DNA Part Characterization Library
A genetic circuit for part characterization was constructed to quantify part strength by fluorescent signal measurements. The part characterization circuit consisted of GFP and RFP modules for fluorescence-based phenotyping and a tag primer-binding site for genotyping (Fig. 1A). The GFP module responsible for the quantifiable fluorescence signals was generated by a combinatorial part library. The RFP module was expressed as a single fixed combination to account for the effects of microbial growth rate [5]. The two modules were designed in opposite directions to minimize the transcriptional read-through effect. Two fluorescent proteins, sfGFP and tdTomato, were selected as they showed minimum interference and similar ranges of fluorescence intensity [23, 24]. The tag primer-binding site was used to attach a tag sequence which allows pooling of differently tagged colonies into one sample for high-throughput sequencing. The combinatorial part library was constructed using Golden Gate assembly and inserted into the GFP module while the RFP module contained a single fixed combination of the part (Fig. 1B). To investigate the part library, two types of part libraries were prepared with four promoters, five RBSs, and one terminator. The names of the parts are presented as numbers in Table S1. One library used the same molar ratio for all DNA parts of each type. The other used random molar ratios for all parts of each type. We used a nanopore long-read sequencing technique and profiled the distribution of the assembled DNA parts [25, 26]. The results showed that the frequency of DNA part combination was strongly positively correlated with the molar ratio of each part introduced in the assembly (Figs. 1C and 1D). Also shown is that evenly distributed combinatorial part libraries can be built when the parts were combined at an identical ratio.
-
Fig. 1. Combinatorial part library profiles using long-read sequencing.
(A) Simple diagram of combinatorial DNA library for part characterization. The part details used for each standard circuit are described below the circuit diagram. (B) Fluorescent image of cells containing standard characterization circuit which expresses GFP and RFP. (C) Heatmap of the input DNA part molar ratio in the two combinatorial libraries and measured DNA part profiles from long-read sequencing. The promoters in combination are presented horizontally, and RBS is presented vertically. (D) Details of the DNA part profiles by analyzing long-read sequencing. The barplots show the part combination ratio of the two combinatorial libraries, with the yaxis representing its ratio in the library and the x-axis representing the promoter-RBS combinations. The pie charts at the bottom show the ratio of the part (promoter, RBS) in the libraries.
High-Throughput Phenotyping on a Solid Plate
For rapid phenotyping of the combinatorial part library, we used an imaging-based computerized method that can obtain the fluorescence value of each colony on a solid plate. The combinatorial library was transformed into
-
Fig. 2. Image-based high-throughput phenotyping.
(A) Indexed colony image (left) and converted fluorescent data by computational method (right). The shade green (RFU) of the dot shows colony intensity, and the size of the dot represents colony size. (B) Comparison of fluorescent values using single cell-based FACS and the microscopy-based method developed in this study.
x - andy -axes represent the values from FACS and microscopy, respectively. The red line is the regression-fitted line with a 95% confidence interval (gray). (C) Examination of colony size-driven bias. Blue, green, and red represent high-, middle-, and low-intensity colonies in the same plate, respectively. Thex -axis represents colony size (number of pixels in image), and they -axis represents colony intensity calculated with Eq. (1). (D) Biological replicates with low-, middle-, and high-intensity parts performed on different plates on different days. Each boxplot shows the mean and deviation calculated with colonies on different plates.
To evaluate the proposed method, twenty colonies harboring the part characterization circuit were analyzed using a fluorescent microscope as well as a conventional flow cytometry single cell analytic device. Fig. 2B shows that the two groups of measured values were strongly positively correlated. To examine cell variation and reproducibility of the image analysis, three characterized circuits with varying GFP expression were transformed into
Multiplexed Genotyping with Nanopore Long-Read Sequencing
Solid plate-based phenotyping can be conducted in a high-throughput manner, but subsequent genotyping of the colonies is a time-consuming and labor-intensive process, which inevitably slows down the workflow of part characterization. To increase the throughput of genotyping of multiple colonies, designed barcode tags were attached to each combinatorial DNA sequence in the GFP modules by using a colony PCR technique. The barcode tags enable sequencing of all the DNA sequences at once by pooling the colony PCR products. After phenotyping through imaging, the GFP module sequence was amplified through tagged colony PCR with 8 forward and 12 reverse primers to generate up to 96 barcode sequence combinations. This multiplexing number can increase up to 2304 (ONT barcodes 횞 forward tags 횞 reverse tags), as the additional 24 barcodes provided by ONT could be used. In this study, we picked around 200 colonies manually and their amplified DNAs were pooled in a single tube after tag attachment with 5 forward and 8 reverse tags. The complete sequencing library was loaded to an R9.4.1 Flongle Flow Cell for sequencing in a MiniON MK1C device. Fig. 3A shows the numbers of reads for the barcode-tagged samples after demultiplexing the pooled sequencing results. Despite some bias, each sample had more than 200 reads, which is enough to further analysis. To evaluate the demultiplexing of the reads, the quality score1 and score2 were calculated using the “qcat” program, an official demultiplexing program of ONT (Eq. 2). Figs. 3B and 3C show the tagged data satisfying the empirical QC criteria (Total read count > 15, score1 > 0.4, score2 > 0.65).
-
Fig. 3. Evaluation of multiplexed genotyping.
(A) Read count distribution for each barcoding sample. Each number on each top of the graph and alphabetical value represents a reverse and forward tag, respectively. The
y -axis represents the number of reads demultiplexed with the specific tag combination. (B) The histogram of tag score1 values. They -axis represents the number of tags. (C) The histogram of tag score2 values. It shows that most of the tagged data satisfy the empirical QC criteria.
Promoter and RBS Characterization in Three E. coli Strains
In this study, we characterized a total of 44
-
Fig. 4. Part strengths of the promoter and RBS libraries.
(A) RFU of the promoter library on three different
E. coli strains. (B) Comparisons between the biological replicates of the promoter library. The biological replicates of 2 promoters could not be obtained, so they were removed. (C) RFU of the RBS library on three differentE. coli strains. All results were obtained from 240 colonies according to strain and part type (promoter or RBS). Colonies were obtained from three different 9 cm plates generated at the same time.
Discussion
This study introduces a high-throughput DNA part characterization method based on imaging and long-read sequencing techniques. Our method is advantageous as it allows the characterization of multiple DNA parts rapidly with commonly used imaging tools such as a low-cost fluorescent microscope instead of high-cost devices, such as a microplate reader or flow cytometer used in conventional methods. ONT sequencing devices are also increasingly available in small-scale laboratories at much lower cost than conventional NGS devices. We were able to characterize 21 promoters and 23 RBSs in approximately 72 h in three
It is necessary to compute the absolute unit and the relative unit of a DNA part in the context of a large set of possible DNA part combinations since the strength of a DNA part can change significantly as its combinatorial DNA partners are changed. The combination of such parts in set conditions may increase exponentially as the parts increase in number, while the use of a few well-defined parts could increase the functional instability due to the increase in the proportion of repeated sequences within the system [27]. Our approach using combinatorial library assembly, high-throughput phenotyping, and pooled genotyping is able to speed up part characterization and promote efficiency in synthetic biology research, especially in the field of genetic circuit design and prediction [28-31].
However, a more accurate strategy is required for measuring fluorescence in different strains due to their differences in colony formation, fluorescent protein expression, maturation times, and growth rate (Fig. 5). In addition, pixel saturation (where the measurement capabilities of the imaging device were exceeded due to too strong or too weak fluorescence values) was an issue [32]. This issue occurs when various colonies with large deviations in fluorescence coexist on a single plate, which is common for libraries with varying genotypes. The light intensity that allows photographing in imaging analysis is influenced by the exposure time, which also influences the range of measurable fluorescence. If the exposure time is too short, the fluorescence of weak intensity cannot be measured, and if the exposure time is too long, the actual fluorescence exceeds the level of measurable fluorescence, causing pixel saturation and resulting distortion. In this study, an exposure time of 0.3 s was selected as our empirical criterion. The novel techniques developed in this study are anticipated to contribute to expanding the capabilities of DNA part characterization beyond
-
Fig. 5. Maturation time of fluorescent proteins.
Time-course fluorescent values of GFP and RFP used in this study were measured on plates. The
y -axis andx -axis represent the fluorescent value of the colonies and time (h), respectively; extra black dots represent outliers.
Supplemental Materials
Acknowledgments
This research was supported by a grant from the Next-Generation BioGreen 21 Program (Grant No. PJ015808022022), Rural Development Administration, Republic of Korea, the Bio & Medical Technology Development Program (Grant No. 2021M3A9I4022731) of the National Research Foundation funded by the Ministry of Science and ICT of the Republic of Korea and the Korea Research Institute of Bioscience and Biotechnology (KRIBB) Research Initiative Program (KGM5402221).
Conflict of Interest
The authors have no financial conflicts of interest to declare.
References
- Voigt CA. 2006. Genetic parts to program bacteria.
Curr. Opin. Biotechnol. 17 : 548-557. - Endy D. 2005. Foundations for engineering biology.
Nature 438 : 449-453. - Canton B, Labno A, Endy D. 2008. Refinement and standardization of synthetic biological parts and devices.
Nat. Biotechnol. 26 : 787-793. - Chen YJ, Liu P, Nielsen AAK, Brophy JAN, Clancy K, Peterson T,
et al . 2013. Characterization of 582 natural and synthetic terminators and quantification of their design constraints.Nat. Methods 10 : 659-664. - Kelly JR, Rubin AJ, Davis JH, Ajo-Franklin CM, Cumbers J, Czar MJ,
et al . 2009. Measuring the activity of BioBrick promoters using an in vivo reference standard.J. Biol. Eng. 3 : 4. - Yeom SJ, Kim M, Kwon KK, Fu Y, Rha E, Park SH,
et al . 2018. A synthetic microbial biosensor for high-throughput screening of lactam biocatalysts.Nat. Commun. 9 : 5053. - Shao B, Rammohan J, Anderson DA, Alperovich N, Ross D, Voigt CA. 2021. Single-cell measurement of plasmid copy number and promoter activity.
Nat. Commun. 12 : 1475. - Huang HH, Camsund D, Lindblad P, Heidorn T. 2010. Design and characterization of molecular tools for a synthetic biology approach towards developing cyanobacterial biotechnology.
Nucleic Acids Res. 38 : 2577-2593. - Schaumberg KA, Antunes MS, Kassaw TK, Xu W, Zalewski CS, Medford JI,
et al . 2015. Quantitative characterization of genetic parts and circuits for plant synthetic biology.Nat. Methods 13 : 94-100. - Nuñez I, Matute T, Herrera R, Keymer J, Marzullo T, Rudge T,
et al . 2017. Low cost and open source multi-fluorescence imaging system for teaching and research in biology and bioengineering.PLoS One 12 : e0187163. - French S, Coutts BE, Brown ED. 2018. Open-source high-throughput phenomics of bacterial promoter-reporter strains.
Cell Syst. 7 : 339-346.e3. - Esling P, Lejzerowicz F, Pawlowski J. 2015. Accurate multiplexing and filtering for high-throughput amplicon-sequencing.
Nucleic Acids Res. 43 : 2513-2524. - Engler C, Kandzia R, Marillonnet S. 2008. A One pot, one step, precision cloning method with high throughput capability.
PLoS One 3 : e3647. - Vick JE, Johnson ET, Choudhary S, Bloch SE, Lopez-Gallego F, Srivastava P,
et al . 2011. Optimized compatible set of BioBrick™ vectors for metabolic pathway engineering.Appl. Microbiol. Biotechnol. 92 : 1275-1286. - Potapov V, Ong JL, Kucera RB, Langhorst BW, Bilotti K, Pryor JM,
et al . 2018. Comprehensive profiling of four base overhang ligation fidelity by T4 DNA ligase and application to DNA assembly.ACS Synth. Biol. 7 : 2665-2674. - Geissmann Q. 2013. OpenCFU, a new free and open-source software to count cell colonies and other circular objects.
PLoS One 8 : e54072. - De Coster W, D'Hert S, Schultz DT, Cruts M, Van Broeckhoven C. 2018. NanoPack: visualizing and processing long-read sequencing data.
Bioinformatics 34 : 2666-2669. - Wick RR, Judd LM, Holt KE. 2019. Performance of neural network basecalling tools for Oxford Nanopore sequencing.
Genome Biol. 20 : 129. - Li H. 2018. Minimap2: pairwise alignment for nucleotide sequences.
Bioinformatics 34 : 3094-3100. - Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N,
et al . 2009. The Sequence alignment/Map format and SAMtools.Bioinformatics 25 : 2078-2079. - Buschmann T, Bystrykh L V. 2013. Levenshtein error-correcting barcodes for multiplexed DNA sequencing.
BMC Bioinformatics 14 : 272. - DebRoy HP and PA and RG and S. 2020. Efficient manipulation of biological strings.
Biostrings . doi. 10.18129/B9.bioc.Biostrings. - Day RN, Davidson MW. 2009. The fluorescent protein palette: tools for cellular imaging.
Chem. Soc. Rev. 38 : 2887-2921. - Henkin TM. 2000. Transcription termination control in bacteria.
Curr. Opin. Microbiol. 3 : 149-153. - Feng Y, Zhang Y, Ying C, Wang D, Du C. 2015. Nanopore-based fourth-generation DNA sequencing technology.
Genomics Proteomics Bioinformatics 13 : 4-16. - Fuller CW, Kumar S, Porel M, Chien M, Bibillo A, Stranges PB,
et al . 2016. Real-time single-molecule electronic DNA sequencing by synthesis using polymer-tagged nucleotides on a nanopore array.Proc. Natl. Acad. Sci. USA 113 : 5233-5238. - Hossain A, Lopez E, Halper SM, Cetnar DP, Reis AC, Strickland D,
et al . 2020. Automated design of thousands of nonrepetitive parts for engineering stable genetic systems.Nat. Biotechnol. 38 : 1466-1475. - Huynh L, Tagkopoulos I. 2014. Optimal part and module selection for synthetic gene circuit design automation.
ACS Synth. Biol. 3 : 556-564. - Nielsen AAK, Segall-Shapiro TH, Voigt CA. 2013. Advances in genetic circuit design: novel biochemistries, deep part mining, and precision gene expression.
Curr. Opin. Chem. Biol. 17 : 878-892. - Nielsen AAK, Der BS, Shin J, Vaidyanathan P, Paralanov V, Strychalski EA,
et al . 2016. Genetic circuit design automation.Science 352 : aac7341. - Appleton E, Madsen C, Roehner N, Densmore D. 2017. Design automation in synthetic biology.
Cold Spring Harbor Perspect. Biol. 9 : a023978. - Zhang X, Brainard DH. 2004. Estimation of saturated pixel values in digital color imaging.
J. Opt. Soc. Am. A Opt. Image Sci. Vis. 21 : 2301-2310.
Related articles in JMB
Article
Research article
J. Microbiol. Biotechnol. 2022; 32(8): 1026-1033
Published online August 28, 2022 https://doi.org/10.4014/jmb.2207.07013
Copyright © The Korean Society for Microbiology and Biotechnology.
Novel High-Throughput DNA Part Characterization Technique for Synthetic Biology
Seong-Kun Bak1,2, Wonjae Seong1, Eugene Rha1, Hyewon Lee1, Seong Keun Kim1, Kil Koang Kwon1, Haseong Kim1,2*, and Seung-Goo Lee1,2*
1Synthetic Biology Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 34141, Republic of Korea
2Biosystems and Bioengineering Program, University of Science and Technology, Daejeon 34141, Republic of Korea
Correspondence to:Haseong Kim, haseong@kribb.re.kr
Seung-Goo Lee, sglee@kribb.re.kr
Abstract
This study presents a novel DNA part characterization technique that increases throughput by combinatorial DNA part assembly, solid plate-based quantitative fluorescence assay for phenotyping, and barcode tagging-based long-read sequencing for genotyping. We confirmed that the fluorescence intensities of colonies on plates were comparable to fluorescence at the single-cell level from a high-end, flow-cytometry device and developed a high-throughput image analysis pipeline. The barcode tagging-based long-read sequencing technique enabled rapid identification of all DNA parts and their combinations with a single sequencing experiment. Using our techniques, forty-four DNA parts (21 promoters and 23 RBSs) were successfully characterized in 72 h without any automated equipment. We anticipate that this high-throughput and easy-to-use part characterization technique will contribute to increasing part diversity and be useful for building genetic circuits and metabolic pathways in synthetic biology.
Keywords: Synthetic biology, DNA parts, circuit design, long-read sequencing, image analysis
Introduction
In synthetic biology, the term DNA part refers to a DNA fragment with a specific function. Each fragment is composed of a promoter, ribosome-binding site (RBS), terminator, and insulator that together regulate the expression of the coding DNA sequence. Such characterized DNA parts and the standardized assembly methods provide the basis of optimized genetic circuits or metabolic pathways. DNA part characterization is commonly defined as quantification of expression rate of the part. Determining the quantitative characteristics of DNA from a given strain is important for designing and constructing a predictable genetic circuit. To this end, synthetic biologists have characterized the biologically standardized DNA parts and stored the information in databases such as the iGEM Parts Registry [1-4]. Currently, the Registry holds data on more than 20,000 identified DNA parts, but many are yet to be characterized, and those that have been characterized are mostly limited to
In general, it is difficult to measure the attributes of hundreds of thousands of DNA parts as well as determine optimal culture conditions for the parts. The most well-known DNA part characterization method is the measurement of the relative strength of the DNA part through the expression of fluorescent proteins. This method involves manipulating the genotype (including the given DNA part) for fluorescent protein expression and subsequent cloning and characterization of the part’s strength based on its corresponding phenotype (measured via fluorescence intensity). However, this method requires expensive devices such as flow cytometers to measure single-cell level signals. In addition, the time taken for cloning and single-cell fluorescence measurements increases exponentially as the number of DNA parts increases. Moreover, if the respective DNA parts are used in a different host strain, the strength of the parts may change from what was determined originally, demonstrating the need to develop a technique to rapidly characterize multiple DNA parts in many different strains [8, 9]. In this study, we developed an imaging-based, high-throughput DNA part characterization technique that uses solid plate-based phenotyping and in-house, long-read sequencing techniques [10-12].
Materials and Methods
Strain and DNA Preparation
Golden Gate Assembly for Combinatorial Library Assembly
For Golden Gate assembly, the selected DNA parts were designed with the addition of a BsaI-specific site (GGTCTC) on either end of the sequence as well as a 1 bp spacer sequence (A) for enzyme function and a 4 bp designed overhang sequence for each part type (Table S2). Golden Gate assembly was performed using a modified pACBB vector that had an inserted BsaI site [13, 14]. Five stretches of 4 bp sticky end overhangs were selected considering the DNA ligation efficiency to allow up to four different DNA parts to assemble in the destination vector (pACBB) with high accuracy [15]. For the Golden Gate assembly reaction, 56 fmol of DNA part solution was added to 112 fmol of destination vector solution, and the volume of the DNA solution amounted to 16 μl using DW. Next, 1 μl of BsaI_HFv2 (NEB #R3733), 1 μl of T4 DNA ligase (HC), and 2 μl of T4 DNA ligase (HC) buffer (Promega, USA) were added to the DNA to create a 20 μl reaction solution. The DNA concentrations were measured using a Qubit Fluorometer 4 (Invitrogen, USA) and a 1× dsDNA HS Assay Kit (Invitrogen).
High-Throughput Phenotyping with Fluorescent Colony Image
The combinatorial part characterization circuit library assembled via Golden Gate cloning was transformed into each
Assembled Plasmid Library Sequencing
The circular DNA library prepared through Golden Gate assembly was immediately treated using the Rapid Barcoding Sequencing protocol from Oxford Nanopore Technologies (ONT, UK). The DNA concentrations were determined using a Qubit fluorometer and then increased to suitable levels using an AMpure XP Bead (Beckman Coulter, USA). The complete sequencing library was loaded onto an R9.4.1 Flongle Flow Cell (ONT) for sequencing in a MiniON MK1C device. The raw sequencing data were converted and pre-processed into fastq files with NanoFilt, the guppy basecaller (v5.0.1) from ONT [17, 18]. To verify the DNA parts of each selected read, minimap2 was used for multi-reference mapping of the reference sequence, which included all DNA part sequences available for combination [19]. For the final result, primary mapping data were selected using SAMtools [20]. The “pysam” package of Python software (v3.7.0) was used to determine the reads at ≥ 95% query coverage. With upper processed reads, the count number of reads by each genotype and profile for the whole sequencing library was based on the ratio for each genotype.
Pooled Colony Genotyping with Tagged Sequencing
The GFP module of the characterization circuit was amplified for sequencing through colony PCR using AccuPower PCR Premix (Bioneer, Korea). The colony PCR was performed using the forward and reverse tag primers that bound the tagging primer-binding site at either end of the GFP module. The tag primer consists of a 7 bp sequence as a tagged barcode and a 17–20 bp binding sequence. To produce the tag sequence, the “create.dnabarcodes” of the “DNABarcodes” package in R Bioconductor was used [21]. Eight forward and twelve reverse primers were designed to confer different tag sequence combinations to as many as 96 colonies. The tag primer sequences are presented in Table S3. The tag-attached amplified DNA generated via colony PCR was collected in a single tube for sequencing. The pooled amplified DNA samples were prepared using the Ligation Sequencing Kit (ONT). Running sequencing and pre-processing methods were the same as described above. Selected reads were demultiplexed by their own forward and reverse tag using R software (v4.0.5) with “VmatchPattern” of the R Bioconductor “Biostrings” package [22].
Calculating the Relative DNA Part Unit
The quantified values of the DNA parts were expressed as relative promoter unit (RPU) and relative RBS unit (RRU), respectively, in reference to [5]. For the calculation, the promoter (J23119) and RBS (B0030) were used as the standard promoter and RBS, respectively. These standard parts were cloned and the standard characterization circuit (J23119 – B0030 – sfGFP – L3S2P56) was prepared. The colony fluorescence intensity of the standard circuit was used as the baseline. The promoter or RBS of the standard circuit was replaced with the part to be measured, and the colony fluorescence intensity was measured to calculate the relative promoter unit or relative RBS unit (RPU and RRU, respectively) as follows:
Results
Building a DNA Part Characterization Library
A genetic circuit for part characterization was constructed to quantify part strength by fluorescent signal measurements. The part characterization circuit consisted of GFP and RFP modules for fluorescence-based phenotyping and a tag primer-binding site for genotyping (Fig. 1A). The GFP module responsible for the quantifiable fluorescence signals was generated by a combinatorial part library. The RFP module was expressed as a single fixed combination to account for the effects of microbial growth rate [5]. The two modules were designed in opposite directions to minimize the transcriptional read-through effect. Two fluorescent proteins, sfGFP and tdTomato, were selected as they showed minimum interference and similar ranges of fluorescence intensity [23, 24]. The tag primer-binding site was used to attach a tag sequence which allows pooling of differently tagged colonies into one sample for high-throughput sequencing. The combinatorial part library was constructed using Golden Gate assembly and inserted into the GFP module while the RFP module contained a single fixed combination of the part (Fig. 1B). To investigate the part library, two types of part libraries were prepared with four promoters, five RBSs, and one terminator. The names of the parts are presented as numbers in Table S1. One library used the same molar ratio for all DNA parts of each type. The other used random molar ratios for all parts of each type. We used a nanopore long-read sequencing technique and profiled the distribution of the assembled DNA parts [25, 26]. The results showed that the frequency of DNA part combination was strongly positively correlated with the molar ratio of each part introduced in the assembly (Figs. 1C and 1D). Also shown is that evenly distributed combinatorial part libraries can be built when the parts were combined at an identical ratio.
-
Figure 1. Combinatorial part library profiles using long-read sequencing.
(A) Simple diagram of combinatorial DNA library for part characterization. The part details used for each standard circuit are described below the circuit diagram. (B) Fluorescent image of cells containing standard characterization circuit which expresses GFP and RFP. (C) Heatmap of the input DNA part molar ratio in the two combinatorial libraries and measured DNA part profiles from long-read sequencing. The promoters in combination are presented horizontally, and RBS is presented vertically. (D) Details of the DNA part profiles by analyzing long-read sequencing. The barplots show the part combination ratio of the two combinatorial libraries, with the yaxis representing its ratio in the library and the x-axis representing the promoter-RBS combinations. The pie charts at the bottom show the ratio of the part (promoter, RBS) in the libraries.
High-Throughput Phenotyping on a Solid Plate
For rapid phenotyping of the combinatorial part library, we used an imaging-based computerized method that can obtain the fluorescence value of each colony on a solid plate. The combinatorial library was transformed into
-
Figure 2. Image-based high-throughput phenotyping.
(A) Indexed colony image (left) and converted fluorescent data by computational method (right). The shade green (RFU) of the dot shows colony intensity, and the size of the dot represents colony size. (B) Comparison of fluorescent values using single cell-based FACS and the microscopy-based method developed in this study.
x - andy -axes represent the values from FACS and microscopy, respectively. The red line is the regression-fitted line with a 95% confidence interval (gray). (C) Examination of colony size-driven bias. Blue, green, and red represent high-, middle-, and low-intensity colonies in the same plate, respectively. Thex -axis represents colony size (number of pixels in image), and they -axis represents colony intensity calculated with Eq. (1). (D) Biological replicates with low-, middle-, and high-intensity parts performed on different plates on different days. Each boxplot shows the mean and deviation calculated with colonies on different plates.
To evaluate the proposed method, twenty colonies harboring the part characterization circuit were analyzed using a fluorescent microscope as well as a conventional flow cytometry single cell analytic device. Fig. 2B shows that the two groups of measured values were strongly positively correlated. To examine cell variation and reproducibility of the image analysis, three characterized circuits with varying GFP expression were transformed into
Multiplexed Genotyping with Nanopore Long-Read Sequencing
Solid plate-based phenotyping can be conducted in a high-throughput manner, but subsequent genotyping of the colonies is a time-consuming and labor-intensive process, which inevitably slows down the workflow of part characterization. To increase the throughput of genotyping of multiple colonies, designed barcode tags were attached to each combinatorial DNA sequence in the GFP modules by using a colony PCR technique. The barcode tags enable sequencing of all the DNA sequences at once by pooling the colony PCR products. After phenotyping through imaging, the GFP module sequence was amplified through tagged colony PCR with 8 forward and 12 reverse primers to generate up to 96 barcode sequence combinations. This multiplexing number can increase up to 2304 (ONT barcodes 횞 forward tags 횞 reverse tags), as the additional 24 barcodes provided by ONT could be used. In this study, we picked around 200 colonies manually and their amplified DNAs were pooled in a single tube after tag attachment with 5 forward and 8 reverse tags. The complete sequencing library was loaded to an R9.4.1 Flongle Flow Cell for sequencing in a MiniON MK1C device. Fig. 3A shows the numbers of reads for the barcode-tagged samples after demultiplexing the pooled sequencing results. Despite some bias, each sample had more than 200 reads, which is enough to further analysis. To evaluate the demultiplexing of the reads, the quality score1 and score2 were calculated using the “qcat” program, an official demultiplexing program of ONT (Eq. 2). Figs. 3B and 3C show the tagged data satisfying the empirical QC criteria (Total read count > 15, score1 > 0.4, score2 > 0.65).
-
Figure 3. Evaluation of multiplexed genotyping.
(A) Read count distribution for each barcoding sample. Each number on each top of the graph and alphabetical value represents a reverse and forward tag, respectively. The
y -axis represents the number of reads demultiplexed with the specific tag combination. (B) The histogram of tag score1 values. They -axis represents the number of tags. (C) The histogram of tag score2 values. It shows that most of the tagged data satisfy the empirical QC criteria.
Promoter and RBS Characterization in Three E. coli Strains
In this study, we characterized a total of 44
-
Figure 4. Part strengths of the promoter and RBS libraries.
(A) RFU of the promoter library on three different
E. coli strains. (B) Comparisons between the biological replicates of the promoter library. The biological replicates of 2 promoters could not be obtained, so they were removed. (C) RFU of the RBS library on three differentE. coli strains. All results were obtained from 240 colonies according to strain and part type (promoter or RBS). Colonies were obtained from three different 9 cm plates generated at the same time.
Discussion
This study introduces a high-throughput DNA part characterization method based on imaging and long-read sequencing techniques. Our method is advantageous as it allows the characterization of multiple DNA parts rapidly with commonly used imaging tools such as a low-cost fluorescent microscope instead of high-cost devices, such as a microplate reader or flow cytometer used in conventional methods. ONT sequencing devices are also increasingly available in small-scale laboratories at much lower cost than conventional NGS devices. We were able to characterize 21 promoters and 23 RBSs in approximately 72 h in three
It is necessary to compute the absolute unit and the relative unit of a DNA part in the context of a large set of possible DNA part combinations since the strength of a DNA part can change significantly as its combinatorial DNA partners are changed. The combination of such parts in set conditions may increase exponentially as the parts increase in number, while the use of a few well-defined parts could increase the functional instability due to the increase in the proportion of repeated sequences within the system [27]. Our approach using combinatorial library assembly, high-throughput phenotyping, and pooled genotyping is able to speed up part characterization and promote efficiency in synthetic biology research, especially in the field of genetic circuit design and prediction [28-31].
However, a more accurate strategy is required for measuring fluorescence in different strains due to their differences in colony formation, fluorescent protein expression, maturation times, and growth rate (Fig. 5). In addition, pixel saturation (where the measurement capabilities of the imaging device were exceeded due to too strong or too weak fluorescence values) was an issue [32]. This issue occurs when various colonies with large deviations in fluorescence coexist on a single plate, which is common for libraries with varying genotypes. The light intensity that allows photographing in imaging analysis is influenced by the exposure time, which also influences the range of measurable fluorescence. If the exposure time is too short, the fluorescence of weak intensity cannot be measured, and if the exposure time is too long, the actual fluorescence exceeds the level of measurable fluorescence, causing pixel saturation and resulting distortion. In this study, an exposure time of 0.3 s was selected as our empirical criterion. The novel techniques developed in this study are anticipated to contribute to expanding the capabilities of DNA part characterization beyond
-
Figure 5. Maturation time of fluorescent proteins.
Time-course fluorescent values of GFP and RFP used in this study were measured on plates. The
y -axis andx -axis represent the fluorescent value of the colonies and time (h), respectively; extra black dots represent outliers.
Supplemental Materials
Acknowledgments
This research was supported by a grant from the Next-Generation BioGreen 21 Program (Grant No. PJ015808022022), Rural Development Administration, Republic of Korea, the Bio & Medical Technology Development Program (Grant No. 2021M3A9I4022731) of the National Research Foundation funded by the Ministry of Science and ICT of the Republic of Korea and the Korea Research Institute of Bioscience and Biotechnology (KRIBB) Research Initiative Program (KGM5402221).
Conflict of Interest
The authors have no financial conflicts of interest to declare.
Fig 1.
Fig 2.
Fig 3.
Fig 4.
Fig 5.
References
- Voigt CA. 2006. Genetic parts to program bacteria.
Curr. Opin. Biotechnol. 17 : 548-557. - Endy D. 2005. Foundations for engineering biology.
Nature 438 : 449-453. - Canton B, Labno A, Endy D. 2008. Refinement and standardization of synthetic biological parts and devices.
Nat. Biotechnol. 26 : 787-793. - Chen YJ, Liu P, Nielsen AAK, Brophy JAN, Clancy K, Peterson T,
et al . 2013. Characterization of 582 natural and synthetic terminators and quantification of their design constraints.Nat. Methods 10 : 659-664. - Kelly JR, Rubin AJ, Davis JH, Ajo-Franklin CM, Cumbers J, Czar MJ,
et al . 2009. Measuring the activity of BioBrick promoters using an in vivo reference standard.J. Biol. Eng. 3 : 4. - Yeom SJ, Kim M, Kwon KK, Fu Y, Rha E, Park SH,
et al . 2018. A synthetic microbial biosensor for high-throughput screening of lactam biocatalysts.Nat. Commun. 9 : 5053. - Shao B, Rammohan J, Anderson DA, Alperovich N, Ross D, Voigt CA. 2021. Single-cell measurement of plasmid copy number and promoter activity.
Nat. Commun. 12 : 1475. - Huang HH, Camsund D, Lindblad P, Heidorn T. 2010. Design and characterization of molecular tools for a synthetic biology approach towards developing cyanobacterial biotechnology.
Nucleic Acids Res. 38 : 2577-2593. - Schaumberg KA, Antunes MS, Kassaw TK, Xu W, Zalewski CS, Medford JI,
et al . 2015. Quantitative characterization of genetic parts and circuits for plant synthetic biology.Nat. Methods 13 : 94-100. - Nuñez I, Matute T, Herrera R, Keymer J, Marzullo T, Rudge T,
et al . 2017. Low cost and open source multi-fluorescence imaging system for teaching and research in biology and bioengineering.PLoS One 12 : e0187163. - French S, Coutts BE, Brown ED. 2018. Open-source high-throughput phenomics of bacterial promoter-reporter strains.
Cell Syst. 7 : 339-346.e3. - Esling P, Lejzerowicz F, Pawlowski J. 2015. Accurate multiplexing and filtering for high-throughput amplicon-sequencing.
Nucleic Acids Res. 43 : 2513-2524. - Engler C, Kandzia R, Marillonnet S. 2008. A One pot, one step, precision cloning method with high throughput capability.
PLoS One 3 : e3647. - Vick JE, Johnson ET, Choudhary S, Bloch SE, Lopez-Gallego F, Srivastava P,
et al . 2011. Optimized compatible set of BioBrick™ vectors for metabolic pathway engineering.Appl. Microbiol. Biotechnol. 92 : 1275-1286. - Potapov V, Ong JL, Kucera RB, Langhorst BW, Bilotti K, Pryor JM,
et al . 2018. Comprehensive profiling of four base overhang ligation fidelity by T4 DNA ligase and application to DNA assembly.ACS Synth. Biol. 7 : 2665-2674. - Geissmann Q. 2013. OpenCFU, a new free and open-source software to count cell colonies and other circular objects.
PLoS One 8 : e54072. - De Coster W, D'Hert S, Schultz DT, Cruts M, Van Broeckhoven C. 2018. NanoPack: visualizing and processing long-read sequencing data.
Bioinformatics 34 : 2666-2669. - Wick RR, Judd LM, Holt KE. 2019. Performance of neural network basecalling tools for Oxford Nanopore sequencing.
Genome Biol. 20 : 129. - Li H. 2018. Minimap2: pairwise alignment for nucleotide sequences.
Bioinformatics 34 : 3094-3100. - Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N,
et al . 2009. The Sequence alignment/Map format and SAMtools.Bioinformatics 25 : 2078-2079. - Buschmann T, Bystrykh L V. 2013. Levenshtein error-correcting barcodes for multiplexed DNA sequencing.
BMC Bioinformatics 14 : 272. - DebRoy HP and PA and RG and S. 2020. Efficient manipulation of biological strings.
Biostrings . doi. 10.18129/B9.bioc.Biostrings. - Day RN, Davidson MW. 2009. The fluorescent protein palette: tools for cellular imaging.
Chem. Soc. Rev. 38 : 2887-2921. - Henkin TM. 2000. Transcription termination control in bacteria.
Curr. Opin. Microbiol. 3 : 149-153. - Feng Y, Zhang Y, Ying C, Wang D, Du C. 2015. Nanopore-based fourth-generation DNA sequencing technology.
Genomics Proteomics Bioinformatics 13 : 4-16. - Fuller CW, Kumar S, Porel M, Chien M, Bibillo A, Stranges PB,
et al . 2016. Real-time single-molecule electronic DNA sequencing by synthesis using polymer-tagged nucleotides on a nanopore array.Proc. Natl. Acad. Sci. USA 113 : 5233-5238. - Hossain A, Lopez E, Halper SM, Cetnar DP, Reis AC, Strickland D,
et al . 2020. Automated design of thousands of nonrepetitive parts for engineering stable genetic systems.Nat. Biotechnol. 38 : 1466-1475. - Huynh L, Tagkopoulos I. 2014. Optimal part and module selection for synthetic gene circuit design automation.
ACS Synth. Biol. 3 : 556-564. - Nielsen AAK, Segall-Shapiro TH, Voigt CA. 2013. Advances in genetic circuit design: novel biochemistries, deep part mining, and precision gene expression.
Curr. Opin. Chem. Biol. 17 : 878-892. - Nielsen AAK, Der BS, Shin J, Vaidyanathan P, Paralanov V, Strychalski EA,
et al . 2016. Genetic circuit design automation.Science 352 : aac7341. - Appleton E, Madsen C, Roehner N, Densmore D. 2017. Design automation in synthetic biology.
Cold Spring Harbor Perspect. Biol. 9 : a023978. - Zhang X, Brainard DH. 2004. Estimation of saturated pixel values in digital color imaging.
J. Opt. Soc. Am. A Opt. Image Sci. Vis. 21 : 2301-2310.