SYNTHETIC Norwegian Colorectal Cancer genomic dataset generated in EOSC4Cancer
Data en bronnen
Deze gegevensset heeft geen inhoud
Extra Informatie
Veld | Waarde |
---|---|
Titel | SYNTHETIC Norwegian Colorectal Cancer genomic dataset generated in EOSC4Cancer |
Omschrijving | SYNTHETIC This dataset contains 10 tumor and normal pairs synthetic WGS data of colorectal cancer that were simulated in a standard format of Illumina paired-end reads. NEAT read simulator (version 3.0, https://github.com/zstephens/neat-genreads) is utilized to synthetize these 10 pairs tumor and normal WGS data. In the procedure of data generation, simulated parameters (i.e., sequencing error statistics, read fragment length distribution and GC% coverage bias) are learned from data models provided by NEAT. The average sequencing depth for tumor and normal samples aims to reach around 110X and 60X, respectively. For generation of synthetic normal WGS data per each sample, a germline variant profile from a real patient is down-sampled randomly, which includes 50% germline variants of such a patient. It is then mixed together with an in silico germline variant profile that is modelled randomly using an average mutation rate (0.001), finally constituting a full germline profile for normal synthetic WGS data. For generation of synthetic tumor WGS data per each sample, a pre-defined somatic short variant profile (SNVs+Indels) learn from a real CRC patient is added to the germline variant profile used for creating normal synthetic WGS data of the same patient, which is utilized to produce simulated sequences. Neither copy number profile nor structural variation profile is introduced into the tumor synthetic WGS data. Tumor content and ploidy are assumed to be 100% and 2. |
Keywords | |
Contact points |
|
Publisher | |
Creator | |
Landing page | |
Release date | 2024-04-30T00:00:00+00:00 |
Modification date | |
Temporal start date | |
Temporal end date | |
In Series | |
Versie | |
Version notes | |
Identifier | GDI-NO-D-T0001 |
Frequency | |
Provenance | |
Type | |
Temporal coverage | |
Temporal resolution | |
Spatial coverage | |
Spatial resolution in meters | |
Access rights | |
Other identifier | |
Theme |
|
Taal | |
Documentation | |
Conforms to | |
Is referenced by | |
Analytics | |
Applicable legislation | |
Has version | |
Code values | |
Coding system | |
Purpose | |
Health category | |
Health theme | |
Legal basis | |
Minimum typical age | |
Maximum typical age | |
Number of records | |
Number of records for unique individuals. | |
Personal data | |
Publisher note | |
Publisher type | |
Trusted Data Holder | |
Population coverage | |
Retention period | |
Health data access body | |
Qualified relation | |
Provenance activity | |
Qualified attribution | |
Quality annotations | |
URI | http://gdi-norway.onemilliongenomes.eu/dataset/GDI-NO-D-T0001 |