SYNTHETIC Norwegian Colorectal Cancer genomic dataset generated in EOSC4Cancer

SYNTHETIC This dataset contains 10 tumor and normal pairs synthetic WGS data of colorectal cancer that were simulated in a standard format of Illumina paired-end reads. NEAT read simulator (version 3.0, https://github.com/zstephens/neat-genreads) is utilized to synthetize these 10 pairs tumor and normal WGS data. In the procedure of data generation, simulated parameters (i.e., sequencing error statistics, read fragment length distribution and GC% coverage bias) are learned from data models provided by NEAT. The average sequencing depth for tumor and normal samples aims to reach around 110X and 60X, respectively.

For generation of synthetic normal WGS data per each sample, a germline variant profile from a real patient is down-sampled randomly, which includes 50% germline variants of such a patient. It is then mixed together with an in silico germline variant profile that is modelled randomly using an average mutation rate (0.001), finally constituting a full germline profile for normal synthetic WGS data.

For generation of synthetic tumor WGS data per each sample, a pre-defined somatic short variant profile (SNVs+Indels) learn from a real CRC patient is added to the germline variant profile used for creating normal synthetic WGS data of the same patient, which is utilized to produce simulated sequences. Neither copy number profile nor structural variation profile is introduced into the tumor synthetic WGS data. Tumor content and ploidy are assumed to be 100% and 2.

Datos e recursos

Este conxunto de datos non ten datos

Información adicional

Campo Valor
Título SYNTHETIC Norwegian Colorectal Cancer genomic dataset generated in EOSC4Cancer
Descrición

SYNTHETIC This dataset contains 10 tumor and normal pairs synthetic WGS data of colorectal cancer that were simulated in a standard format of Illumina paired-end reads. NEAT read simulator (version 3.0, https://github.com/zstephens/neat-genreads) is utilized to synthetize these 10 pairs tumor and normal WGS data. In the procedure of data generation, simulated parameters (i.e., sequencing error statistics, read fragment length distribution and GC% coverage bias) are learned from data models provided by NEAT. The average sequencing depth for tumor and normal samples aims to reach around 110X and 60X, respectively.

For generation of synthetic normal WGS data per each sample, a germline variant profile from a real patient is down-sampled randomly, which includes 50% germline variants of such a patient. It is then mixed together with an in silico germline variant profile that is modelled randomly using an average mutation rate (0.001), finally constituting a full germline profile for normal synthetic WGS data.

For generation of synthetic tumor WGS data per each sample, a pre-defined somatic short variant profile (SNVs+Indels) learn from a real CRC patient is added to the germline variant profile used for creating normal synthetic WGS data of the same patient, which is utilized to produce simulated sequences. Neither copy number profile nor structural variation profile is introduced into the tumor synthetic WGS data. Tumor content and ploidy are assumed to be 100% and 2.

Keywords
Contact points
Contact point 1
URI
Nome
Name (translations)
Correo-e
senz@ifi.uio.no
Identifier
https://www.researchgate.net/profile/Sen-Zhao-7
Publisher
Publisher 1
URI
Nome
Name (translations)
Correo-e
URL
Tipo
Identifier
https://www.uio.no/english/index.html
Creator
Creator 1
URI
Nome
Name (translations)
Correo-e
URL
Tipo
Identifier
Landing page
Release date 2024-04-30T00:00:00+00:00
Modification date
Temporal start date
Temporal end date
In Series
    Versión
    Version notes
    Identifier GDI-NO-D-T0001
    Frequency
    Provenance
    Tipo
    Temporal coverage
    Temporal resolution
    Spatial coverage
    Spatial resolution in meters
    Access rights
    Other identifier
    Theme
    1. https://en.wikipedia.org/wiki/Colorectal_cancer
    Idioma
    Documentation
    Conforms to
    Is referenced by
    Analytics
    Applicable legislation
    Has version
    Code values
    Coding system
    Purpose
    Health category
    Health theme
    Legal basis
    Minimum typical age
    Maximum typical age
    Number of records
    Number of records for unique individuals.
    Personal data
    Publisher note
    Publisher type
    Trusted Data Holder
    Population coverage
    Retention period
    Health data access body
    Qualified relation
    Provenance activity
    Qualified attribution
    Quality annotations
    URI http://gdi-norway.onemilliongenomes.eu/dataset/GDI-NO-D-T0001