MANCIE can be applied using two column-matched data matrices and adjusts one main matrix using the other associated matrix by identifying and reinforce the concordant information in the two matrices and reducing the discordant information between them. For example, one can measure the same omic profile using different experimental platforms or one can consider different omic types.
The resulting adjusted matrix was used in our algorithm in the case of two omics analysis. Therefore, we associated to each gene the list of KEGG pathway in which it is annotated and the number of publications that relates it to breast cancer. We represents our results in terms of a network. In this representation, each node in the network represents a gene and an edge between two nodes means that the corresponding genes belongs to the same KEGG pathway.
Triangular-shaped nodes correspond to the genes that have already been identified in literature as breast-cancer associated genes. The latter step has been done using the database available in Cotterill The number of papers that associates such genes to breast cancer is also reported in the triangular nodes. The statistical approach presented in Figure 1 and described in Algorithm 1 has been implemented as a comprehensive R script that allows to execute all methods under the same R environment.
The Illumina probes were annotated with the mappings from the Bioconductor package illuminaHumanv4.
For the BMD-screening, we select a subset of genes that are involved in breast cancer by using a functional map that summarize the most relevant interactions in the cancer area of interest Huttenhower et al. This map is used to build the network-matrix and to identify the weight of the edges among genes. It is implemented in Coxnet package version 0. Then we take the mean of this estimate as the optimal tuning parameter values see Algorithm 1.
Then, Survival package in the R software is used to compare the Kaplan-Meier survival curves and to derive the significance p -value indicating the difference between two survival curves. RCytoscape www. Note that all the scripts are available upon request from the first two authors.
For such purpose, we divided the dataset in two parts, training set T and testing set D as described in section 3. The screened genes and the potential biomarkers were evaluated on the training set, the latter resulting in a gene signature able to subdivide patients in high and low risk groups. The prediction capabilities were evaluated by using Kaplan-Maier curves and log-rank tests on the testing set.
After that the list of potential biomarkers underwent to a pathway analysis in order to provide a biological interpretation of the results and illustrate the relationship with already available biological information. Moreover, we also demonstrate that integrating two omic data types improves the predictions.
Both the matrices are normalized as discussed in Curtis et al. As a first step, we divided the patients in two subsets: a training set T samples and testing set D samples. When performing the analysis using only the mRNA expression data a total of 19, genes was retrieved from 48, Illumina expression probes by using a bioconductor annotation data package Dunning et al. Table 1. First, we describes the results obtained using the BMD-screening.
Such subsets of genes reflect the bio-medical knowledge about breast cancer markers available from previous studies. HEFalMp was also used to build the gene network to be used in the network-penalized Cox regression method.
Recommended for you
Then, the network-based Cox regression methods applied on the training dataset, T , allowed us to select high-risk genes or potential biomarkers i. BMD-genes were used to compute the prognostic index of each patient and to classify them in low and high risk groups. An optimal cut-off for the prognostic index was estimated for such purpose.
- Johisus, Mari, maridaitmi - Score.
- Genetic pathway could enhance survival of coral.
- Apoptosis and survival - Apoptotic TNF-family pathways;
- CHERG | Failures in the “pathways to survival”.
- Crap Cars (Top Gear)?
The significance of the BMD-gene lists was evaluated on the testing dataset, D , in terms of p -values of the log-rank test were novel patients were divided in low and high risk groups according to their prognostic index. Table 2 shows additional results of our procedure in terms of identified markers in the training set T and log-rank test p -value obtained from the testing set D.
Overall such results confirm those obtained in Iuliano et al. Table 2. Figure 2. The results refer to the testing set D. We use the color blue to indicate the high-risk group and the color red to show the low-risk group. The p -value is also calculated applying the log-rank test on testing set. The high-risk group is better separated from the low-risk group by using the integration of mRNA expression data and CNAs profiles right , compared with using the single omics data left.
The X-axis represents time and the Y-axis represents survival rate. In particular, we ordered the patients with respect to the prognostic index PI and divide them in two risk classes i. By inspecting the heatmaps in Figure S1, we identified two groups of genes e. The first group contains genes such that the lower is their expression the worse is the patient prognosis, the other group contains genes such that the higher is their expression the worse is the patient prognosis.
Figure S2 shows similar behavior and group of genes, reducing the noise in the heatmaps. In this case we identified the same group of genes and few others of interest. Second, we show the results obtained using the DAD-screening. We called DAD-genes the high-risk gene signature i. As before, the significance of the DAD-gene lists were assessed on the testing dataset, D. From our analysis we observed that the log-rank test p -values were able to separate the high and low risk group of patients with a significance lower than 0. As expected, log-rank p -value associated to the DAD-genes are not as strong as the corresponding p -values associated to the BMD-genes, suggesting that DAD-screening is not competitive in terms of prediction power with respect to the BMD-screening.
Therefore, the information available from the literature should not be neglected and DAD-screening should be used to find potential candidate biomarkers and predict survival only when no other or very limited information is available.
Researchers identify genetic pathway that could enhance survival of coral
Such subsets of genes reflect the bio-medical knowledge available from previous studies BMD part and also incorporate additional information contained in the data under analysis DAD part. However, our analysis reinforce the evidence that they could be related to breast cancer. By contrast, genes identified in group c might be important for the process of novel biomarker discovery since they represent potential biomarkers not previously identified as associated to breast cancer.
Tables S5, S6 also show the number of times each gene in the signature was selected when changing the threshold and the network methods. For these genes the frequency of the occurrence is equal to 20 corresponding to the number of threshold used in our analysis. Finally, to further evaluate the robustness of gene signatures we used Venn diagrams see Figure 3.
From this figure we observed that the overlaps between screening and network methods is quite good, although there are specificities that explain the better performance of one combination with respect to another. Figure 3. A more comprehensive analysis of these candidate genes is described out in the following section.
In order to better understand and interpret the inferred gene signatures, in this section we report the results of the KEGG pathways analysis performed on the not-isolated genes in the signature as described in section 2. We used such networks to easily visualize the gene-gene interactions and the KEGG pathways involved in such interactions. Each node corresponds to a gene and the edges represent the KEGG pathways shared by the linked genes.
Note that some of the genes colored in orange might be also be retrieved from the data under analysis as DAD-genes , however in this context we want to underline and make sense of the novel information not yet considered. Figure 4. Non isolated genes are represented as nodes in the network, then a link a drawn between two adjacent genes when the two genes belong to the same KEGG pathway.
Triangular-shaped nodes indicate the genes identified in literature as breast-cancer associated genes. The number of papers is also reported in the triangular nodes. Figure 5. From the color of the nodes, we can infer that most of but not all the genes come from the BMD contribution i. Moreover, our analysis allows us to further investigate the KEGG pathways the involved genes belong to.
Danish Cancer Patient Pathways: three-legged strategy for faster referral and diagnosis of cancer
In particular, a gene shown in both networks is BCL2 , which accordingly to Cotterill has already been mentioned in publications showing its importance in breast cancer. BCL2 functions to prevent apoptosis and it is a tumor-related gene that has the potential to further improve individualization of patient management, by predicting response to chemotherapy, hormonal therapy and radiotherapy Joensuu et al.
Extensive studies relate the KEGG focal adhesion pathway to breast cancer since it plays critical roles in integrin-mediated signal transduction and also participates in signaling by other cell surface receptors.