1. The Cancer Genome Atlas (TCGA) and the Genotype Tissue Expression (GTEx) project have generated a wealth of RNA-seq data, but integrating these data across studies is challenging due to differences in sample handling and processing.
2. A pipeline has been developed to process and unify RNA-seq data from different studies, removing batch effects by uniformly reprocessing raw sequencing reads.
3. The data generated using this pipeline is available on figshare, with gene expression levels calculated from the FPKM in RSEM’s output being quantile normalized and corrected for batch effects.
The article “Unifying cancer and normal RNA sequencing data from different sources” provides an overview of a pipeline developed to process and unify RNA-seq data from different studies, removing batch effects by uniformly reprocessing raw sequencing reads. The article is well written and provides a detailed description of the methods used in the pipeline as well as technical validation results that demonstrate its effectiveness in correcting for study-specific batch effects.
However, there are some potential biases that should be noted when evaluating the trustworthiness and reliability of this article. First, the authors do not provide any evidence or discussion regarding possible risks associated with their approach, such as potential errors introduced by realignment or quantification tools used in their pipeline. Second, while they discuss how their approach can be used to compare expression levels between TCGA tumors and GTEx normal samples, they do not explore any counterarguments or alternative approaches that could be used for this purpose. Third, while they provide technical validation results demonstrating the effectiveness of their approach in correcting for study-specific batch effects, they do not provide any evidence or discussion regarding how well it performs on other tissues or datasets beyond those discussed in the article. Finally, while they provide access to the data generated using their pipeline on figshare, they do not provide any information about how this data can be accessed or used by other researchers.
In conclusion, while this article provides a detailed description of a pipeline developed to process and unify RNA-seq data from different studies, there are some potential biases that should be noted when evaluating its trustworthiness and reliability.