Deciphering the Core Metabolites of Fanconi Anemia by Using a Multi-Omics Composite Network

Deciphering the metabolites of human diseases is an important objective of biomedical research. Here, we aimed to capture the core metabolites of Fanconi anemia (FA) using the bioinformatics method of a multi-omics composite network. Based on the assumption that metabolite levels can directly mirror the physiological state of the human body, we used a multi-omics composite network that integrates six types of interactions in humans (gene-gene, disease phenotype-phenotype, disease-related metabolite-metabolite, gene-phenotype, gene-metabolite, and metabolite-phenotype) to procure the core metabolites of FA. This method is applicable in predicting and prioritizing disease candidate metabolites and is effective in a network without known disease metabolites. In this report, we first singled out the differentially expressed genes upon different groups that were related with FA and then constructed the multi-omics composite network of FA by integrating the aforementioned six networks. Ultimately, we utilized random walk with restart (RWR) to screen the prioritized candidate metabolites of FA, and meanwhile the co-expression gene network of FA was also obtained. As a result, the top 5 metabolites of FA were tenormin (TN), guanosine 5'-triphosphate, guanosine 5'-diphosphate, triphosadenine (DCF) and adenosine 5'-diphosphate, all of which were reported to have a direct or indirect relationship with FA. Furthermore, the top 5 co-expressed genes were CASP3, BCL2, HSPD1, RAF1 and MMP9. By prioritizing the metabolites, the multi-omics composite network may provide us with additional indicators closely linked to FA.


Construction of the Multi-Omics Composite Network
In order to establish a multi-omics composite network, the above-mentioned six types of network data were integrated into composite network A.
W represents the transition matrix of composite network A, which can be deduced from the adjacency with matrix A. W ij signifies the probability of transition from node i to node j. x, y and z are the probability of transition between the gene network and the phenotype network, the gene network and the metabolic network, as well as the phenotypic network and the metabolic network, respectively. The default value is 1/3. The probability of gene i (g i ) to gene j (g j ) in the genetic network can be defined as follows: . Similarly, the transition probability from gene i (g i ) to phenotype j (p j ) is defined as follows: . The transition probability from gene i (g i ) to metabolite j (m j ) is defined as follows: .
In the phenotypic network, the probability from the phenotype i (p i ) to the phenotype j (p j ) is defined as follows: . The probability from phenotype i (pi) to gene j (gj) is defined as follows: . The probability from phenotype i (pi) to metabolite j (mj) is defined as follows: .
In the metabolite network, the probability from metabolite i (mi) to metabolite j (m) is defined as follows: .
The probability from metabolite i (mi) to gene j (gj) is defined as follows: .
The probability from metabolite i (mi) to phenotype j (pj) is defined as follows: .

Random Walk with Restart (RWR)
To obtain the preferred candidate metabolites in the complex network, we utilized the RWR method to extend the screening to a multithreaded composite network. The method selected the preferred candidate metabolite as per the proximity of every candidate to the seed candidate metabolite (i.e., the known metabolite) and simulated a random walk from the seed node. Every step of the walk moved from the current node to its immediate neighbor at probability 1-α, or returned to the seed node at probabilityα. The formula for the calculation is as follows: .
Here, P 0 represents the beginning probability vector. P k manifests the probability vector at which the i-th element is held at node I and the alpha default value is 0.7. W signifies the transition matrix of the composite network A.
(1) initial probability vector P 0 u 0 , v 0 and w 0 were supposed to be the beginning probabilities of the genetic network, the phenotype network and the metabolic network, respectively. For a phenotype (i.e. disease), the seed nodes comprised phenotypes, corresponding known metabolites and known genes. The phenotypes that were associated with FA included FA of complementation group V (FANCV, caused by mutation in the REV7 gene on chromosome 1p36), FA of complementation group T (FANCT, caused by compound heterozygous mutation in the UBE2T gene on chromosome 1q32), FA of complementation group L (FANCL, caused by homozygous or compound heterozygous mutation in the PHF9 gene on chromosome 2p16), FA of complementation group D2 (FANCD2, caused by compound heterozygous or homozygous mutation in the FANCD2 on chromosome 3p25), etc. The initial probability of the genetic network u 0 was computed by giving an equal probability to the gene nodes in the gene network. The sum was equal to 1, meaning that the random walk began at the same probability from each seed node. Similarly, the initial probabilities v 0 and w 0 were calculated, where a = 1/3, and b = 1/3.
(2) k-step probability vector P k After multiple iterations, the change between P k + 1 and P k was less than 10 -10 , where the probability reached a steady state and the iteration stopped.

Identifying Preferred Metabolites and Their Co-Expressed Genes
After the random walk reached the steady state, each metabolite in the composite network had a corresponding probability. After removal of the seed metabolite nodes, the metabolites were sorted according to the probability. The top 50 metabolites were screened as the preferred metabolites. The disease-preferred metabolite information was derived from the multi-group composite network, and the disease-preferred metabolite network was screened out. We identified the genes that interacted with the preferred metabolites, and then analyzed and selected the genes having a score greater than the mean and ranking in the top 100 as co-expressed genes.

Identifying the FA-Related Metabolite Prioritization
The multi-omics information of FA was obtained from enriched differential genes to multi-omics composite network of human, and the result was described in Table 3. The prioritization of the FA-related metabolites can be identified by evaluating the relation score of each metabolite in the multi-omics composite network. By combining the original weight score that was calculated with the RWR method, the new relation score of each metabolite was computed, and the metabolite was further ranked by the corresponding score. In the present study,  we selected 50 metabolites as prioritized candidate metabolites which ranked in the top 50 in terms of relation scores. The top 10 metabolites were displayed in Table 4. The disease-preferred metabolite information was collected from the multi-group composite network, and the disease-preferred metabolite network was screened out (Fig. 2), where the top 5 metabolites with a higher score were marked in red, including tenormin (TN) (score = 0.000677293), guanosine 5'-triphosphate (score = 0.002984), guanosine 5'-diphosphate (score = 0.002867), and triphosadenine (DCF) (score = 0.002808), adenosine 5'-diphosphate (score = 0.002593). Seed nodes were marked in yellow and other metabolites were in pink.

Identification of Co-Expressed Genes in the Composite Network
Clustergrammer was applied to visualize the co-expressed gene expression data. According to the 50 prioritized candidate metabolites, genes interacting with metabolites in the composite network were chosen. Moreover, we identified genes that interacted with 50 prioritized candidate metabolites and   analyzed their score values to single out the genes having a score value greater than the mean and ranking in top 100 as co-expressed genes. Further, two groups of samples with 100 preferred metabolite co-expressed gene expressions (rows) were showed in the a heatmap of FA (Fig. 3). Of those, 22 genes were upregulated, and 77 others were downregulated in patients group. The SLX4 gene was not found in the dataset, which was not present in Fig. 3.   Fig. 3. Heatmap of AA-preferred metabolite co-expressed gene. A heatmap of the gene expression was generated using the R-package pheatmap. Gene expression profiles were displayed with 100 co-expressed genes in rows and samples in columns. Green to red represented the spectrum from low to high. Blue represented normal group. Red represented patients group. The degree is the number of edges connected to each node; closeness represents the closeness between a node and other nodes in the network; betweenness is a centrality measure of a node within a network; transitivity is a notion measuring the probability that the adjacent nodes of a node are connected among them.
As a result, a co-expressed network was obtained (Fig. 4). In this composite network, yellow nodes represented the seed nodes, blue nodes manifested the co-expressed genes, pink nodes signified the prioritized candidate metabolites and red nodes indicated the top 4 metabolites. The parameter information on the 10 co-expressed genes with a degree greater than 30 was listed in Table 5, including CASP3, BCL2, TNF, HSPD1, RAF1, MMP9,  IFNG, HPRT1, LDHA, and DUT.

Discussion
With the assistance of a combined multiomics analysis, we screened out the preferred metabolites of FA, and analyzed the relationship between preferred metabolites and co-expressed genes of FA. The top 5 metabolites of FA were TN, guanosine 5'-triphosphate, guanosine 5'-diphosphate, DCF, and adenosine 5'-diphosphate. The top 5 co-expressed genes were CASP3, BCL2, HSPD1, RAF1 and MMP9. The successful sorting of preferred metabolites can be attributed to a model of a multi-omics composite network, which depends on several aspects. Initially, we utilized a composite network comprising six networks, that is, genome, phenome, metabolome, genemetabolite action network, phenotype-gene action network, and phenotype-metabolite interaction network. Secondly, this multi-omics composite network exploited the advantage of RWR method to capture the global multi-omics information. It maintained that the candidate metabolites were ordered according to the interaction information in the whole composite network, but not only the local environment.
The biological system can be reflected in some aspects, such as genome, metabolome, phenome, and interactome information which integrate into a composite network. Therefore, constructing a network based on multi-omics information might be useful in finding disease-related risk metabolites. A multi-omics composite network, named MetPriCNet, was first used for predicting and prioritizing candidate metabolites by Yao et al. [11]. The novel integrated network was established on the basis of six data sources. From a perspective of integrating multi-omics information, MetPriCNet has an advantage over RWR only in the metabolite network (PROFANCY) [12]. MetPriCNet could achieve a higher AUC value than PROFANCY, which indicated that MetPriCNet upgraded the performance by integrating the multi-omics information. Furthermore, MetPriCNet could prioritize the candidate metabolites, even in the absence of disease metabolites, using other known information such as related disease genes and phenotypes [13]. The robustness of MetPriCNet was assessed by introducing noise into the network weight score. As a result, MetPriCNet achieved an AUC value above 0.79, although the relation score was disturbed by up to 30% noise, which implied that this multi-omics composite network was more robust to noise than a single PPI network [9]. Thus, in this report, we utilized the MetPriCNet Fig. 4. Disease-preferred metabolite co-expressed gene distribution. Co-expressed network related to the top 50 metabolites, seed nodes and co-expressed genes. The blue node represented the co-expression gene, the pink node represented the prioritized candidate metabolite, the yellow represented the seed node and red nodes were the top 4 metabolites. method to select preferred metabolites and their co-expressed genes.
TN is a beta blocker that is used to cure angina (chest pain) and hypertension (high blood pressure), and is also prescribed to lower the risk of death after a heart attack [14]. This result reminds us that patients with angina or hypertension have a higher possibility of suffering from FA. Guanosine could reverse anemia condition induced by mycophenolic acid, which hints that guanosine may be an inhibitor of AA [9]. Recently, the altered morphology of the mitochondrion, the principal site of aerobic ATP (adenosine triphosphate) production, and its deficiency of energetic activity in FA cells was reported [9]. DCF and adenosine 5'-diphosphate are different forms of ATP. Accordingly, increased energy supply from red blood cells in the form of ATP could improve anemiainduced hemolysis [9]. In this report, although the selected top 5 metabolites have been rarely reported on in relation to FA, their relationships with anemia have been mostly investigated. Some of the top 5 co-expressed genes were also directly or indirectly related with AA. Dioscorea nipponica Makino could alleviate AA by suppressing the expression of intracellular apoptosis protein, caspase-3 [15]. Upregulation of Bcl-2 protein denotes chemoresistance in acute myeloid leukemia. HSPD1 and RAF1 have not been reported to be related with AA or relevant diseases. Salidroside could facilitate the hematopoietic function recovery of bone marrow depressed anemic mice by promoting the expression and activity of MMPs [13]. MMP inhibitors were used in the management of FA immortalized fibroblasts [16]. However, despite these contributions, the present study has limitation. As Fanconi anemia is a congenital form of anemia, it would be significantly associated with genetic variants, but an ideal data set was lacking in this study. Despite this lack of data, this study has important implications.
The multi-omics composite network in screening the metabolites was verified to be able to provide us with additional indicators closely linked to AA. These indicators of FA were not usually to be found. However, in this report, by introducing multi-omics composite network methods, and utilizing the perfect combination of the computing power of computer and statistics, we could comprehensively find AA-related metabolites, and further screen out the most closely linked top 5 metabolites. This provides a more reliable basis for future diagnosis of FA and the possibility of noninvasive diagnosis of FA. However, biological experiments are urgently needed to facilitate the conversion of our results into clinical application.