SYNTHETIC Norwegian Colorectal Cancer genomic dataset generated in EOSC4Cancer

SYNTHETIC This dataset contains 10 tumor and normal pairs synthetic WGS data of colorectal cancer that were simulated in a standard format of Illumina paired-end reads. NEAT read simulator (version 3.0, https://github.com/zstephens/neat-genreads) is utilized to synthetize these 10 pairs tumor and normal WGS data. In the procedure of data generation, simulated parameters (i.e., sequencing error statistics, read fragment length distribution and GC% coverage bias) are learned from data models provided by NEAT. The average sequencing depth for tumor and normal samples aims to reach around 110X and 60X, respectively.

For generation of synthetic normal WGS data per each sample, a germline variant profile from a real patient is down-sampled randomly, which includes 50% germline variants of such a patient. It is then mixed together with an in silico germline variant profile that is modelled randomly using an average mutation rate (0.001), finally constituting a full germline profile for normal synthetic WGS data.

For generation of synthetic tumor WGS data per each sample, a pre-defined somatic short variant profile (SNVs+Indels) learn from a real CRC patient is added to the germline variant profile used for creating normal synthetic WGS data of the same patient, which is utilized to produce simulated sequences. Neither copy number profile nor structural variation profile is introduced into the tumor synthetic WGS data. Tumor content and ploidy are assumed to be 100% and 2.

Data en bronnen

Deze gegevensset heeft geen inhoud

Extra Informatie

Veld Waarde
Titel SYNTHETIC Norwegian Colorectal Cancer genomic dataset generated in EOSC4Cancer
Omschrijving

SYNTHETIC This dataset contains 10 tumor and normal pairs synthetic WGS data of colorectal cancer that were simulated in a standard format of Illumina paired-end reads. NEAT read simulator (version 3.0, https://github.com/zstephens/neat-genreads) is utilized to synthetize these 10 pairs tumor and normal WGS data. In the procedure of data generation, simulated parameters (i.e., sequencing error statistics, read fragment length distribution and GC% coverage bias) are learned from data models provided by NEAT. The average sequencing depth for tumor and normal samples aims to reach around 110X and 60X, respectively.

For generation of synthetic normal WGS data per each sample, a germline variant profile from a real patient is down-sampled randomly, which includes 50% germline variants of such a patient. It is then mixed together with an in silico germline variant profile that is modelled randomly using an average mutation rate (0.001), finally constituting a full germline profile for normal synthetic WGS data.

For generation of synthetic tumor WGS data per each sample, a pre-defined somatic short variant profile (SNVs+Indels) learn from a real CRC patient is added to the germline variant profile used for creating normal synthetic WGS data of the same patient, which is utilized to produce simulated sequences. Neither copy number profile nor structural variation profile is introduced into the tumor synthetic WGS data. Tumor content and ploidy are assumed to be 100% and 2.

Keywords
Contact points
Contact point 1
URI
Naam
Sen Zhao
Name (translations)
Email
senz@ifi.uio.no
Identifier
https://www.researchgate.net/profile/Sen-Zhao-7
URL
    Publisher
    Publisher 1
    URI
    Naam
    University of Oslo
    Name (translations)
    Email
    URL
    Type
    Publisher note
    Publisher type
    Identifier
    https://www.uio.no/english/index.html
    Creator
    Creator 1
    URI
    Naam
    Name (translations)
    Email
    URL
    Type
    Publisher note
    Publisher type
    Identifier
    Landing page
    Release date 2024-04-30T00:00:00+00:00
    Modification date
    In Series
      Versie
      Version notes
      Identifier GDI-NO-D-T0001
      Frequency
      Provenance
      Type
      Temporal coverage
      Temporal resolution
      Spatial coverage
      Spatial resolution in meters
      Access rights
      Other identifier
      Theme
      1. https://en.wikipedia.org/wiki/Colorectal_cancer
      Taal
      Documentation
      Conforms to
      Is referenced by
      Distribution
      Sample
      Analytics
      Applicable legislation
      Has version
      Code values
      Coding system
      Purpose
      Health category
      Health theme
      Legal basis
      Minimum typical age
      Maximum typical age
      Number of records
      Number of records for unique individuals.
      Personal data
      Publisher note
      Publisher type
      Trusted Data Holder
      Population coverage
      Retention period
      Health data access body
      Qualified relation
      Provenance activity
      Qualified attribution
      Quality annotations
      URI http://gdi-norway.onemilliongenomes.eu/dataset/GDI-NO-D-T0001