Genome-Wide Transcriptomic Analysis of n-Caproic Acid Production in Ruminococcaceae Bacterium CPB6 with Lactate Supplementation

n-Caproic acid (CA) is gaining increased attention due to its high value as a chemical feedstock. Ruminococcaceae bacterium strain CPB6 is an anaerobic mesophilic bacterium that is highly prolific in its ability to perform chain elongation of lactate to CA. However, little is known about the genome-wide transcriptional analysis of strain CPB6 for CA production triggered by the supplementation of exogenous lactate. In this study, cultivation of strain CPB6 was carried out in the absence and presence of lactate. Transcriptional profiles were analyzed using RNA-seq, and differentially expressed genes (DEGs) between the lactate-supplemented cells and control cells without lactate were analyzed. The results showed that lactate supplementation led to earlier CA p,roduction, and higher final CA titer and productivity. 295 genes were substrate and/or growth dependent, and these genes cover crucial functional categories. Specifically, 5 genes responsible for the reverse β-oxidation pathway, 11 genes encoding ATP-binding cassette (ABC) transporters, 6 genes encoding substrate-binding protein (SBP), and 4 genes encoding phosphotransferase system (PTS) transporters were strikingly upregulated in response to the addition of lactate. These genes would be candidates for future studies aiming at understanding the regulatory mechanism of lactate conversion into CA, as well as for the improvement of CA production in strain CPB6. The findings presented herein reveal unique insights into the biomolecular effect of lactate on CA production at the transcriptional level.


RNA Isolation, Library Construction, and Sequencing
In preparation for RNA isolation, 10 ml cell culture was harvested at each time point, and centrifuged at 8,000 ×g for 10 min at 4°C. Cells were then frozen in liquid nitrogen prior to storage at -80°C. The RNA was extracted and purified using a RNA extraction kit (DP430, Tiangen Biotech, China) following the manufacturer's protocol. RNA quality and quantity were characterized using a NanoDrop2000 (NanoDrop Technologies, USA), agarose gel electrophoresis (RNA integrity detection) and Agilent 2100 (RIN value measurement). Only the RNA samples with high-quality (≥ 5 μg; ≥ 200 ng/μl; OD260/280=1.8~2.2; RIN > 6.0) were used for the cDNA library construction and sequencing. Before library construction, rRNAs were removed with the Ribo-Zero rRNA Removal Kit (Epicentre, USA) following the manufacturer's protocol. The enriched mRNA was randomly fragmented into 200 bp fragments by fragmentation buffer. The mRNA was then the first strand cDNA was synthesized using the random hexamer-primer with the mRNA fragment as the template. After synthesizing the second strand cDNA using DNA polymerase I and RNase H, double-stranded cDNA was further end repaired, Atailed, and indexed adapters ligated. The final cDNA library was constructed using TruSeq RNA sample preparation Kit (Illumina Inc., USA), and then sequenced by Illumin Hiseq 4000 (Illumina Inc.) with 2 × 150 bp.
Sequencing trimming and quality control methods are as follows: (1) Remove the Adapter sequence in reads; (2) The bases containing non-A, G, C and T at the 5 'end were removed by shear; (3) Trim the ends of reads with low sequencing quality (< Q20); (4) Reads containing 10 % N were removed; (5) Remove Adapter and small segments with length less than 25 bp after quality pruning.

RNA-Seq Data Analysis
Raw data were processed, and reads containing adapter and poly-N sequences and low-quality reads were removed using Sickle and SeqPrep to obtain clean data [19,20]. The trimmed reads in each sample were aligned to strain CPB6 genome (CP020705.1) using Bowtie2, and those that did not align uniquely to the genome were discarded using the default quality parameters [21]. Each base was assigned a value based on the number of mapped sequence coverage. Gene expression levels were defined using the number of transcripts per million (TPM), which is proportional to the quantity of cDNA fragments derived from the gene transcripts. The quantitative gene expression values between samples were identified by calculating the number of unambiguous tags for each gene and then normalizing this to TPM, which was calculated following the method reported by Parto et al. [22]. The gene expression results were visualized as a heat-map plot using ggplot2 package. The general changes in gene expression among different treatments were evaluated by permutational multivariate analysis of variance using the function Adonis in the R vegan package. Gene annotation was performed based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (http://www.genome.jp/kegg/).

Reverse Transcription-Quantitative PCR (RT-qPCR)
In order to validate the results of the RNA-Seq analysis, 5 candidate reference genes were selected for RT-qPCR confirmation. Primers used are listed in Supporting Information Table S3. Total RNA was extracted from three sets of independent cultures grown on cultures with or without lactate supplementation, and then converted to cDNA by random priming, using the Maxima Reverse Transcriptase (Thermo Scientific). PCR reactions were run in triplicate using procedure as follows: initial denaturation (3 min at 95 o C), followed by 45 cycles of denaturation (5 s at 95 o C), annealing and elongation (30 s at 60 o C). The transcription level of genes was determined according to the 2 -(ΔΔCt) method, using 16S rRNA as a reference gene for the normalization of gene expression levels [23].

Statistical Analysis
Significant differences of the gene expression between the culture with lactate supplementation and the control were determined using ANOVA in R software (version 3.5.2). TPM values were first transformed to log10-scale. The log10-transformed TPM values were then properly centered for better representation of the data using the heatmap plots. Fold changes (FCs) as the ratio of the TPM values were calculated following the method reported by Love et al. [24], and were used to compare the differentially expressed genes (DEGs) between the culture from fermentation with lactate supplementation and the control.

RNA-Seq Data
The RNA-Seq sequencing data have been deposited in the NCBI Sequence Read Archive (SRA) under the accession number PRJNA564589

Cell Growth and the Production of CA
As shown in Fig. 1A, cells took approximately 18 h to grow to the stationary phase. Although the maximum OD 600 of the lactate-supplemented cultures was slightly higher than that of control cultures without lactate at the stationary phase (1.25 vs 1.16), both cultures showed similar growth kinetics. The lactate was completely consumed in the lactate supplemented culture after 21 h of cultivation. Moreover, no lactate was detected throughout the control group. CA production was started to be observed in the lactate-supplemented cultures after 6 h of cultivation, and the CA titer continued to increase and reached 1,717.2 mg/l at 21 h (Fig. 1B), while CA was not detected in control cells until 15 h of cultivation, and the CA titer of which only reached 618 mg/l at 21 h (Fig. 1C). These results suggest that lactate supplementation had little effect on the cell growth, but led to earlier initiation for CA production (6 vs 15 h), higher final CA titer (1,717 vs 618 mg/l), and higher CA productivity (81.8 vs 29.4 mg/l h).

RNA-Seq Statistics
Samples were taken for RNA-Seq analysis from both growth (12 h) and stationary (18 h) phases for the lactate- supplemented cultures and control cultures. For each culture, independent biological triplicates (a, b, and c) were included (Table 1). Therefore, a total of twelve samples were taken for cDNA libraries construction and sequencing on the Illumina HiSeq 4000 (Illumina). The number of raw reads generated from the sequencing for each library was from 15.7 to 23.5 million (Table S1). A total of 224 Mb sequence reads from 12 cDNA libraries were mapped to strain CPB6 genome. Only those reads that mapped unambiguously to strain CPB6 genome were used for further analysis.
Overall, out of the reads derived from all samples, 15.1 to 22.7 million reads were unambiguously mapped to strain CPB6 genome, and over 98% reads were mapped (Table 1). A total of 1968/1969 out of 2045 protein-coding genes had detectable expression in all cells, covering 96% of strain CPB6 genome. This result indicated that the RNA-Seq analysis achieved comprehensive coverage of strain CPB6 transcriptome. The transcription levels (the number of transcripts per million, TPM) of most active protein-coding genes were in the range of 3.2 × 10 4 -7.3 × 10 4 .
As illustrated in Fig. 2, the gene expression could be classified into four levels: low (TPM < 30), moderate (TPM: 30-150), high (TPM: 150-1000), and very high (TPM > 1000). The number of genes at some specific expression levels was significantly different for the two cultures. For the growth phase, there were slightly more genes in the moderate, high and very high expression level in the lactate-treated cells than in control cells, but lowly expressed genes were significantly decreased. While for the stationary phase, the lactate-treated cells had more genes in the moderate expression level, but fewer genes in the high and very high expression level.

Functional Annotation and Classification
In the transcriptome of strain CPB6, a total of 1122 expressed genes were allocated into three primary Gene Ontology (GO) categories (Fig. 3), including the category of biological process (601 genes), cellular component (524 genes), and molecular function (916 genes). In each category, the genes were further assigned into 28 functional groups, such as metabolic process (478 genes), cellular process (440 genes), cell part (307 genes),  The percentage value above each bar indicates the genes at the specific expression level accounting for the proportion of the total number of genes. The '*' mark indicates that significantly different frequencies (i.e., numbers of genes) were observed between the two RNA-Seq data sets from the lactate-supplemented fermentation (L) vs. the control (C), respectively. membrane part (297 genes), catalytic activity (654 genes), binding (561genes), and etc. The analysis of the genes based on the KEGG annotation identified a total of 1046 unigenes allocated into six primary KEGG categories including 35 subcategories (Fig. S1). The analysis based on the Clusters of Orthologous Groups (COGs) showed that 1785 unigenes were allocated to four primary COG categories containing 20 COG functional clusters (Fig. S2).

DEGs Affected by Lactate Supplementation
The correct identification of DEGs between specific conditions is a key in understanding phenotypic variation of organisms under environmental stress. As shown in Table 2, only 34 DEGs (FC ≥2 or ≤ 0.5 with p-value < 0.05) were found in the lactate-supplemented cells compared to control cells at the growth phase, of which 15 genes were upregulated, and 19 genes were downregulated. At the stationary phase, a total of 245 DEGs were identified in both cultures, of which 123 genes were significantly upregulated and 122 genes were downregulated (Table S2). These results demonstrated that the addition of lactate led to differences in gene expression during different growth phases.
The COG distribution of DEGs is illustrated in Fig. S3. It revealed potential genes, processes and pathways that  may participate in the utilization of lactate and CA production. Cluster analysis of the DEGs between the lactatesupplemented cells and control cells is showed in Fig. S4. Obviously, more DEGs was observed in the stationary phase (L2 vs C2) than in the growth phase (L1 vs C1). As shown in Fig. 4, a total of 295 DEGs in expression pattern were substrate and/or growth dependent, of which 31 genes were substrate (lactate) dependent, 228 genes were growth dependent, and 36 genes were substrategrowth dependent (Fig. 4A). Specifically, 11 and 20 lactate-dependent genes were significantly upregulated and downregulated, as well as 98 and 130 growth-dependent genes were significantly upregulated and downregulated, respectively (Fig. 4B). It was suggested that the differences in gene expression are stronger for stationary phase vs growth phase than for plus/minus lactate. Similar results was observed for C. thermocellum, in which growth rate had stronger effects on gene expression than substrate type [25].

Expression of Glycolysis Genes
An overview of the metabolic pathway in strain CPB6, and the expression levels of genes involved in key metabolic processes with their fold change (FC) were shown in Fig. 5 and Table 3. Most glycolytic genes were expressed at a relatively high level (TPM>150) between the lactate-supplemented cells and control cells, but there was no significant difference (p > 0.05) between them at the growth phase. Three glycolytic genes exhibited different expression patterns at the stationary phase. Gene encoding phosphofructokinase (Pfk, B6259_RS06095) was significantly downregulated (p < 0.05), while genes encoding glucose-1-phosphate adenylyltransferase (GlgC, B6259_RS09035) and 1, 4-alpha-glucan branching enzyme (GlgB, B6259_RS09040) were upregulated by 4.58 and 3.42-fold (p < 0.05) in the lactate-supplemented cells compared with control cells, respectively. GlgB and GlgC are typically associated with glycogen synthesis, why expression of these genes be affected by lactate Two ldh genes (B6259_RS09845 and RS06770), encoding L-lactate dehydrogenase and D-lactate dehydrogenase, were detected in strain CPB6 transcriptome, respectively. The two ldh genes were expressed at low levels in the lactate-supplemented cells and control cells (Table 3), and there was no significant difference in the expression level between the two groups. The gene encoding pyruvate: ferredoxin (flavodoxin) oxidoreductase (Pfor, B6259_RS09135) was upregulated by 1.83-and 3.26-fold (p < 0.05) in the lactate supplemented cells than in control cells during the growth and stationary phases, respectively.

Expression of Butyrate-and CA-Producing Genes
The enzymes involved in the butyrate formation include acetyl-CoA acetyltransferase (AtoB), 3-hydroxybutyryl-CoA dehydrogenase (Hbd), enoyl-CoA hydratase (Crt), NAD-dependent butyryl-CoA dehydrogenase/Electron   [7,16]. In the present study, genes encoding AtoB (B6259_RS06365), Crt (B6259_RS06360) and Hbd (B6259_RS06355) maintained at very high expression levels (TPM>3000) in the lactate-supplemented cells, and were upregulated by 3.5-8.6 folds (p < 0.05) compared with control cells without lactate. Bcd (B6259_RS01790) was expressed at a very high level in the lactate-supplemented cells and control cells throughout the growth and stationary phases, but there was no difference between two groups. EtfAB (alpha unit, B6259_RS01785 and beta unit, B6259_RS01780) showed the Bcd-like expression profile. Another Bcd (B6259_RS02600) was expressed at relatively low level at the growth phase, but its expression was upregulated 4.5-fold (p < 0.05) at the stationary phase. One CoAT gene (B6259_RS06345) showed high expression level in the two cultures (TPM>150), and it was markedly upregulated by 4-fold (p < 0.05) in the lactate-supplemented cells than in the control at the stationary phase. Additionally, the gene encoding phosphate acetyltransferase (Pta, B6259_RS07830) was remarkably upregulated (p < 0.05) at the two phases with the addition of lactate (Table 3). The expression of acetate kinase (Ack, B6259_RS03430) showed no change (p < 0.05) in response to the addition of lactate. The two genes can produce acetate from acetyl-CoA (sourced from glycolysis or lactate oxidation), contributing to a dynamic equilibrium of acetate in cultures to some extent. By including the production of H 2 and CO 2 into the loop, it could provide a whole picture for carbon balance for the substrate utilization and cell biomass production. Unfortunately, the production of H 2 and CO 2 was not monitored in this study. In the future studies, this should be taken into consideration for improvement.

Expression of Putative ABC Transporter and Sporulation Genes
As shown in Table 3 and Fig. 5, sporulation genes showed similar expression patterns in the lactatesupplemented cells and control cells, e.g., spo0, spoIIID, spoVAE, and spoYtfJ, were induced to high expression at the growth and stationary phases, while spoIID, spoIIIAD, spoIVA, spoVAC, and spoVAD were expressed at low or moderate levels.
Notably, most genes for ABC transporter and substrate-binding protein (SBP) were no significant changes (p > 0.05) in the two groups at growth phase, except two ABC transporter genes (B6259_RS00445, B6259_RS00450), and one SBP gene (B6259_RS00440) which were upregulated by more than 2-fold (p < 0.05, Table 3) with the addition of lactate. However, many of these genes were markedly upregulated at the stationary phase (p <0.05). Specially, B6259_RS07905, _RS07910, _RS00320, _RS00325 and B6259_RS07915 were increased over 10-fold (p < 0.05) in the lactate-supplemented cells compared with control cells.
In addition, four phosphotransferase system (PTS) transporter genes, including PTS fructose transporter subunit IIC (B6259_RS00095), PTS glucose transporter subunit IIA (B6259_RS09280), PTS β-glucoside transporter subunit IIABC (B6259_RS01415), and PTS mannitol transporter subunit IICBA (B6259_RS00370), were detected in the transcriptome of strain CPB6. Genes encoding PTS fructose and glucose transporters were expressed at high levels under both groups, but the two genes were significantly downregulated (p < 0.05) in the lactate-supplemented cells than in control cells at the growth phase, indicating that the two PTS transporters are sensitive to lactate supplementation. Moreover, the three PTS transporter genes (B6259_RS0095, RS01415 and RS00370) and one ferrous iron transporter gene (B6259_RS03880) were upregulated by 2-to 4-fold at stationary phase (p <0.05, Table 3).

RT-qPCR Verification
The fold-changes in expression of 5 genes (Pfor, AtoB, Hbd, Crt, and CoAT) were measured by RT-qPCR with 16S rRNA as reference gene. The five genes were significantly upregulated in the lactate-supplemented cells compared with control cells (Fig. S5). The RT-qPCR data mainly matched the RNA-Seq of 5 selected genes based FC values, which indicated that our RNA-Seq result is accurate and the conclusion from RNA-Seq should be reliable.

Discussion
Lactate is a major end-product of glycolysis or energy substrate for many anaerobic bacteria such as Acetobacterium woodii, C. botulinum and Desulfotomaculum reducens [26,27]. The recent studies show that lactate as electron donor can be transformed into CA in either mixed anaerobes [3,12,13], or in the pure anaerobic bacterium [14], but the biochemistry of lactate oxidation to CA and underlying regulatory mechanisms are still obscure. Lactate dehydrogenase (LDH) is the key enzyme in lactate production from pyruvate. LDH catalyzes the reaction converts pyruvate to lactate or the reverse reaction that converts lactate to pyruvate coupled to NADH/ NAD + redox [28]. Generally, bacteria that grow on lactate as sole energy and carbon source have a serious energetic problem because of the high redox potential of the pyruvate/lactate pair. Recently, a novel mode of lactate metabolism is proposed for strictly anaerobic bacteria [27], in which the LDH/ Etf complex uses flavinbased electron confurcation to drive endergonic lactate oxidation with NAD + as oxidant at the expense of simultaneous exergonic electron flow from reduced ferredoxin. And that, the lactate metabolism in these strictly anaerobic bacteria is negatively regulated by the transcriptional regulator [29]. In this study, the upregulation of LDHs was not observed with the addition of lactate, indicating that lactate supplementation does not trigger increased expression of LDH. Moreover, the L-ldh (B6259_RS09845) heterologously expressed in Escherichia coli BL21 (DE3) exhibits high LDH activity of driving endergonic lactate oxidation in the absence of Fd 2− , and LDH oxidative activity predominates over reductive activity [30]. These results indicate that the lactate metabolism in strain CPB6 is different from other strict anaerobes. It warrants further investigation concerning the detailed regulatory mechanism of lactate oxidation in strain CPB6.
The bioproduction of CA is a well-known chain elongation process from acetate (C2) to butyrate (C4), and then to caproate (C6) via the reverse β-oxidation pathway [6]. The conversion of C2 to C4 is well understood, but little is known about the key enzymes responsible for caproyl-CoA or CA synthesis. Enzymes (e.g., AtoB, Crt, Hbd, Bcd/EtfAB complex and CoAT) responsible for butyrate synthesis via the reverse β-oxidation are assumed to have the function in the formation of caproyl-CoA and CA [7]. However, C. tyrobutyricum, which contains these genes, only produce butyric acid instead of CA [31], while C. kluyveri and strain CPB6, which contain these genes, can further elongate C4 to C6 [7,15]. It indicates that there are differences in structure and function between these genes from different organisms. In this study, the three genes (AtoB, Crt, Hbd) responsible for the conversion of acetyl-CoA to crotonyl-CoA were markedly upregulated (p < 0.05) throughout the exponential and stationary phases with the addition of lactate. However, the Bcd (B6259_RS02600) and CoAT (B6259_RS06345) genes were only significantly upregulated at the stationary phase. Provided that the rate of CA accumulation was significantly higher during the stationary phase than the growth phase, the two genes are likely involved in the formation of caproyl-CoA and CA, The CoAT is the key enzyme responsible for the last step of the butyrate formation [31]. Theoretically, high-level expression of the CoAT gene should result in the accumulation of butyric acid, but significant accumulation of CA instead of butyric acid was observed in the lactate-supplemented cultures, suggesting that the CoAT prefers to convert caproyl-CoA to caproate than butyryl-CoA to butyrate. This speculate was verified by expression of the CoAT (B6259_RS06345) in E. coli BL21 (DE3). This CoAT protein could catalyze the conversion of both butyryl-CoA to butyrate and caproyl-CoA to caproate, but its catalytic efficiency with caproyl-CoA as the substrate was 3.8 times higher than that with butyryl-CoA [32]. Thus, the CoAT is a key gene that determines whether the final product is butyric acid or caproic acid.
Some bacteria develop into highly resistant spores to protect their genome and cell from certain doom when living conditions become intolerable [33]. It ensures bacterial survival under adverse environmental conditions. Sporulation in Clostridium spp. is ordinarily not triggered by starvation but by cessation of growth in the presence of excess carbon source or exposure to oxygen [33]. The two most critical factors involved in the shift to solventogenesis, a decrease in external pH and accumulation of acidic fermentation products, are generally assumed to be associated with the initiation of sporulation in Clostridium spp., to some extent [34]. Recent studies show that the sporulation events are uncoupled from the induction of solventogenesis in C. beijerinckii [35]. In this study, the sporulation genes showed no significant difference between the lactate supplemented cells and control cells, indicating that the sporulation events are not associated with the production of CA in strain CPB6 until the stationary phase. This may be because low concentrations of CA (1,717 mg/l) are not sufficient to initiate sporulation for strain CPB6.
ABC transporters are ubiquitous membrane proteins that couple the transport of diverse substrates across cellular membranes to the hydrolysis of ATP [36]. ABC transporters are generally divided into importers and exporters on the basis of the polarity of solute movement. ABC importers are found mostly in bacteria and are crucial in mediating the uptake of solutes including sugar, metal ions, and vitamins [37]. ABC transporters play important roles in response to lactate stress. High expression of ABC transporter genes may be of benefit to organisms to maintain intercellular homeostasis under lactate stress [38] or increase intracellular ATP concentrations to protect cells against acidic damage in the initial stage of acid stress [39]. Here, nine ABC transporter genes and six SBP genes were markedly upregulated at the stationary phase. Specially, B6259_RS07905, _RS07910, _RS00320, _RS00325, and B6259_RS07915 were increased over 10-fold in the lactate supplemented cells compared to control cells, demonstrating that these genes are associated with the extrusion of CA from the cell, and the maintenance of osmotic homeostasis in cytoplasm [40].
PTS is a multiple-component carbohydrate uptake system that drives specific saccharides across the bacterial inner membrane while simultaneously catalyzing sugar phosphorylation [41]. Five distinct subfamilies of proteins related to PTS have been identified within the glucose superfamily: the lactose family, the glucose family, the β-glucoside family, the mannitol family, and the fructose family [42]. In this study, genes encoding PTS fructose, β-glucoside, and mannitol transporters were all strikingly upregulated in the lactate-supplemented cells than in control cells at the stationary phase, suggesting that these transporters may be involved in the extrusion of intracellular CA in strain CPB6, similar to the role of ABC transporters [43].
In sum, this study showed that lactate supplementation induced earlier CA production, higher CA titer, and productivity. The gene transcriptional profiles based on RNA-Seq demonstrated that supplemented lactate promoted CA production by altering the expression patterns of genes responsible for crucial metabolic pathways. Specifically, 5 genes (AtoB, Hbd, Crt, Bcd/EtfAB, and CoAT) involved in the reverse β-oxidation pathway, 11 genes encoding ABC transporter, 6 SBP genes, and 4 PTS transporter genes showed high correlation with utilization of lactate and CA production. The findings presented herein provide unique insights into the metabolic effects of lactate on CA production at the gene regulation level.