SYNTHETIC Norwegian Colorectal Cancer genomic dataset generated in EOSC4Cancer

SYNTHETIC This dataset contains 10 tumor and normal pairs synthetic WGS data of colorectal cancer that were simulated in a standard format of Illumina paired-end reads. NEAT read simulator (version 3.0, https://github.com/zstephens/neat-genreads) is utilized to synthetize these 10 pairs tumor and normal WGS data. In the procedure of data generation, simulated parameters (i.e., sequencing error statistics, read fragment length distribution and GC% coverage bias) are learned from data models provided by NEAT. The average sequencing depth for tumor and normal samples aims to reach around 110X and 60X, respectively.

For generation of synthetic normal WGS data per each sample, a germline variant profile from a real patient is down-sampled randomly, which includes 50% germline variants of such a patient. It is then mixed together with an in silico germline variant profile that is modelled randomly using an average mutation rate (0.001), finally constituting a full germline profile for normal synthetic WGS data.

For generation of synthetic tumor WGS data per each sample, a pre-defined somatic short variant profile (SNVs+Indels) learn from a real CRC patient is added to the germline variant profile used for creating normal synthetic WGS data of the same patient, which is utilized to produce simulated sequences. Neither copy number profile nor structural variation profile is introduced into the tumor synthetic WGS data. Tumor content and ploidy are assumed to be 100% and 2.

Data en bronnen

Deze gegevensset heeft geen inhoud

Extra Informatie

Veld	Waarde
Titel	SYNTHETIC Norwegian Colorectal Cancer genomic dataset generated in EOSC4Cancer
Omschrijving	SYNTHETIC This dataset contains 10 tumor and normal pairs synthetic WGS data of colorectal cancer that were simulated in a standard format of Illumina paired-end reads. NEAT read simulator (version 3.0, https://github.com/zstephens/neat-genreads) is utilized to synthetize these 10 pairs tumor and normal WGS data. In the procedure of data generation, simulated parameters (i.e., sequencing error statistics, read fragment length distribution and GC% coverage bias) are learned from data models provided by NEAT. The average sequencing depth for tumor and normal samples aims to reach around 110X and 60X, respectively. For generation of synthetic normal WGS data per each sample, a germline variant profile from a real patient is down-sampled randomly, which includes 50% germline variants of such a patient. It is then mixed together with an in silico germline variant profile that is modelled randomly using an average mutation rate (0.001), finally constituting a full germline profile for normal synthetic WGS data. For generation of synthetic tumor WGS data per each sample, a pre-defined somatic short variant profile (SNVs+Indels) learn from a real CRC patient is added to the germline variant profile used for creating normal synthetic WGS data of the same patient, which is utilized to produce simulated sequences. Neither copy number profile nor structural variation profile is introduced into the tumor synthetic WGS data. Tumor content and ploidy are assumed to be 100% and 2.
Keywords	colorectal_tumor ngs normal_tumor_pair synthetic wgs
Contact points	Contact point 1 URI Naam Sen Zhao Name (translations) Email senz@ifi.uio.no Identifier https://www.researchgate.net/profile/Sen-Zhao-7 URL
Publisher	Publisher 1 URI Naam University of Oslo Name (translations) Email URL Type Publisher note Publisher type Identifier https://www.uio.no/english/index.html
Creator	Creator 1 URI Naam Name (translations) Email URL Type Publisher note Publisher type Identifier
Landing page
Release date	2024-04-30T00:00:00+00:00
Modification date
In Series
Versie
Version notes
Identifier	GDI-NO-D-T0001
Frequency
Provenance
Type
Temporal coverage
Temporal resolution
Spatial coverage
Spatial resolution in meters
Access rights
Other identifier
Theme	https://en.wikipedia.org/wiki/Colorectal_cancer
Taal
Documentation
Conforms to
Is referenced by
Distribution
Sample
Analytics
Applicable legislation
Has version
Code values
Coding system
Purpose
Health category
Health theme
Legal basis
Minimum typical age
Maximum typical age
Number of records
Number of records for unique individuals.
Personal data
Publisher note
Publisher type
Trusted Data Holder
Population coverage
Retention period
Health data access body
Qualified relation
Provenance activity
Qualified attribution
Quality annotations
URI	http://gdi-norway.onemilliongenomes.eu/dataset/GDI-NO-D-T0001