Skip to content

hCNV Parameters and Mappings to Output format

Collected Parameters

Variants data

  • CNV type (DUP, DEL...)
    • translated to EFO terms using CN count values
  • referenceName
    • chromosome translated to refseq id w/ prefix
  • start
    • left-shifted since 0-based coordinates instead of VCFs 1-based
  • end
    • from INFO
  • CN count
    • from call field
  • assemblyId
    • from header (GRCh38)
  • ...

Metadata

  • sample id
  • donor id? different?
  • sequencing platform
  • sequencing library / model (?)
  • sex
  • ethnicity ...
  • geographic provenance
  • external references, e.g. biosamples collection ID (as CURIE) and associated publication(s)
  • ...

Parameter Output Mappings

Output Model

The information below just gives some indications about the way these are handled in the Beacon default model and its Progenetix variant. However, the current idea is to go directly for a representation through Phenopackets (which has many similarities to Beacon v2 but a different unified wrapper model).

For comparison please see the Phenopacket example from Progenetix. The same can be accessed through Progenetix using progenetix.org/beacon/phenopackets/onekgind-HG00320. Please be aware that this example doesn't contain some of the "interesting" parameters like technical provenance or population background.

Beacon v2 Default Model for genomicVariation

Beacon v2 provides a default model with its main data entities individual, biosample, analysis, run and genomicVariation. The parameters needed for an hCNV reference resource potentially map to all of those entities; e.g.

  • donor sex => individual.sex

Progenetix bycon parameter mappings

The Progenetix implementation of the Beacon API - through the bycon stack - closely adheres internally to the Beacon v2 default model. Specifically, records are stored in document formats described in JSON Schema with overall correspondance to the standard Beacon models, and stored in a MongoDB database with per schema collections (individuals, biosamples, callsets and variants as well as helper collections for e.g. ontologies or genome lookups).

For data I/O the bycon package contains a mapping file which allows to map data from columnar (i.e. tab delimited) input files to the corresponding attributes in the document schemas.

Example bycon parameter mappings

  genomicVariant:
    type: object
    parameters:
      variant_id:
        db_key: id
        indexed: True
        compact: True
        computed: True
      variant_internal_id:
        type: string
        db_key: variant_internal_id
        indexed: True
        compact: True
        computed: True
      callset_id:
        description: |
            The bycon model uses `callset` to store
            information corresponding to Beacon's `analysis`
            and `run` entities.
        type: string
        db_key: callset_id
        indexed: True
        compact: True
      biosample_id:
        type: string
        db_key: biosample_id
        indexed: True
        compact: True
      individual_id:
        type: string
        db_key: individual_id
        indexed: True
        compact: True
      sequence_id:
        type: string
        db_key: location.sequence_id
        indexed: True
        compact: True
      reference_name:
        type: string
        db_key: location.chromosome
        indexed: True
        compact: True
      start:
        type: integer
        db_key: location.start
        indexed: True
        compact: True
      end:
        type: integer
        db_key: location.end
        indexed: True
        compact: True
      variant_state_id:
        type: string
        db_key: variant_state.id
        indexed: True
        compact: True
      variant_state_label:
        type: string
        db_key: variant_state.label
        compact: True
      reference_bases:
        type: string
        db_key: reference_sequence
        indexed: True
        compact: True
      alternate_bases:
        type: string
        db_key: sequence
        indexed: True
        compact: True
      annotation_derived:
        type: boolean
        db_key: info.annotation_derived
        default: False
        indexed: True
      aminoacid_changes:
        type: array
        items: string
        db_key: molecular_attributes.aminoacid_changes
        indexed: True
      genomic_hgvs_id:
        type: string
        db_key: identifiers.genomicHGVS_id
        indexed: True

      # special pgxseg columns

      log2:
        db_key: info.cnv_value
        type: number
      variant_type:
        type: string
        db_key: variant_type