Grande et al. MAF. — grande

A MAF (data frame) drawn from the Grande et al. dataset.

grande_maf

Format

`grande_maf`

A MAF in data frame format. 12251 rows and 125 columns.

Hugo_Symbol: HUGO symbol for the gene (HUGO symbols are always in all caps). "Unknown" is used for regions that do not correspond to a gene
Entrez_Gene_ID: Entrez gene ID (an integer). "0" is used for regions that do not correspond to a gene region or Ensembl ID
Centre: One or more genome sequencing center reporting the variant
NCBI_Build: The reference genome used for the alignment
Chromosome: The affected chromosome
Start_Position: Lowest numeric position of the reported variant on the genomic reference sequence. Mutation start coordinate
End_Position: Highest numeric genomic position of the reported variant on the genomic reference sequence. Mutation end coordinate
Strand: Genomic strand of the reported allele. Currently, all variants will report the positive strand: '+'
Variant_Classification: Translational effect of variant allele
Variant_Type: Type of mutation. TNP (tri-nucleotide polymorphism) is analogous to DNP (di-nucleotide polymorphism) but for three consecutive nucleotides. ONP (oligo-nucleotide polymorphism) is analogous to TNP but for consecutive runs of four or more (SNP, DNP, TNP, ONP, INS, DEL, or Consolidated)
Reference_Allele: The plus strand reference allele at this position. Includes the deleted sequence for a deletion or "-" for an insertion
Tumor_Seq_Allele1: Primary data genotype for tumor sequencing (discovery) allele 1. A "-" symbol for a deletion represents a variant. A "-" symbol for an insertion represents wild-type allele. Novel inserted sequence for insertion does not include flanking reference bases
Tumor_Seq_Allele2: Tumor sequencing (discovery) allele 2
dbSNP_RS: The rs-IDs from the dbSNP database, "novel" if not found in any database used, or null if there is no dbSNP record, but it is found in other databases
dbSNP_Val_Status: The dbSNP validation status is reported as a semicolon-separated list of statuses. The union of all rs-IDs is taken when there are multiple
Tumor_Sample_Barcode: Aliquot barcode for the tumor sample
Matched_Norm_Sample_Barcode: Aliquot barcode for the matched normal sample
Match_Norm_Seq_Allele1: Primary data genotype. Matched normal sequencing allele 1. A "-" symbol for a deletion represents a variant. A "-" symbol for an insertion represents wild-type allele. Novel inserted sequence for insertion does not include flanking reference bases (cleared in somatic MAF)
Match_Norm_Seq_Allele2: Matched normal sequencing allele 2
Tumor_Validation_Allele1: Secondary data from orthogonal technology. Tumor genotyping (validation) for allele 1. A "-" symbol for a deletion represents a variant. A "-" symbol for an insertion represents wild-type allele. Novel inserted sequence for insertion does not include flanking reference bases
Tumor_Validation_Allele2: Secondary data from orthogonal technology. Tumor genotyping (validation) for allele 2
Verification_Status: Second pass results from independent attempt using same methods as primary data source. Generally reserved for 3730 Sanger Sequencing
Validation_Status: Second pass results from orthogonal technology
Mutation_Status: An assessment of the mutation as somatic, germline, LOH, post transcriptional modification, unknown, or none. The values allowed in this field are constrained by the value in the Validation_Status field
Sequencing_Phase: TCGA sequencing phase (if applicable). Phase should change under any circumstance that the targets under consideration change
Sequence_Source: Molecular assay type used to produce the analytes used for sequencing. Allowed values are a subset of the SRA 1.5 library_strategy field values. This subset matches those used at CGHub
Validation_Method: The assay platforms used for the validation call
Score: Boolean variable
BAM_File: Boolean column stating if BAM file exists or not
Sequencer: Instrument used to produce primary sequence data
Tumor_Sample_UUID: GDC aliquot UUID for tumor sample
Matched_Norm_Sample_UUID: GDC aliquot UUID for matched normal sample
HGVSc: The coding sequence of the variant in HGVS recommended format
HGVSp: The protein sequence of the variant in HGVS recommended format. "p.=" signifies no change in the protein
HGVS_Short: Same as the HGVSp column, but using 1-letter amino-acid codes
Transcript_ID: Ensembl ID of the transcript affected by the varian
Exon_Number: The exon number (out of total number)
t_depth: Read depth across this locus in tumor BAM
t_ref_count: Read depth supporting the reference allele in tumor BAM
t_alt_count: Read depth supporting the variant allele in tumor BAM
n_depth: Read depth across this locus in normal BAM
n_ref_count: Read depth supporting the reference allele in normal BAM (cleared in somatic MAF)
n_alt_count: Read depth supporting the variant allele in normal BAM (cleared in somatic MAF)
all_effects: A semicolon delimited list of all possible variant effects, sorted by priority
Allele: The variant allele used to calculate the consequence
Gene: Stable Ensembl ID of affected gene
Feature: Stable Ensembl ID of feature (transcript, regulatory, motif)
Feature_type: Type of feature. Currently one of Transcript, RegulatoryFeature, MotifFeature (or blank)
Consequence: Consequence type of this variant; sequence ontology terms
cDNA_position: Relative position of base pair in the cDNA sequence as a fraction. A "-" symbol is displayed as the numerator if the variant does not appear in cDNA
CDS_position: Relative position of base pair in coding sequence. A "-" symbol is displayed as the numerator if the variant does not appear in coding sequence
Protein_position: Relative position of affected amino acid in protein. A "-" symbol is displayed as the numerator if the variant does not appear in coding sequence
Amino_acids: Only given if the variation affects the protein-coding sequence
Codons: The alternative codons with the variant base in upper case
Existing_variation: Known identifier of existing variation
ALLELE_NUM: Allele number from input; 0 is reference, 1 is first alternate etc.
DISTANCE: Shortest distance from the variant to transcript
STRAND_VEP
SYMBOL: The gene symbol
SYMBOL_SOURCE: The source of the gene symbol
HGNC_ID: Gene identifier from the HUGO Gene Nomenclature Committee if applicable
BIOTYPE: Biotype of transcript
CANONICAL: A flag (YES) indicating that the VEP-based canonical transcript, the longest translation, was used for this gene. If not, the value is null
CCDS: The CCDS identifier for this transcript, where applicable
ENSP: The Ensembl protein identifier of the affected transcript
SWISSPROT: UniProtKB/Swiss-Prot accession
TREMBL: UniProtKB/TrEMBL identifier of protein product
UNIPARC: UniParc identifier of protein product
RefSeq: RefSeq identifier for this transcript
SIFT: The SIFT prediction and/or score, with both given as prediction (score)
PolyPhen: The PolyPhen prediction and/or score
EXON: The exon number (out of total number)
INTRON: The intron number (out of total number)
DOMAINS: The source and identifier of any overlapping protein domains
GMAF: Non-reference allele and frequency of existing variant in 1000 Genomes
GMAF_Allele: Non-reference allele and frequency of existing variant in 1000 Genomes
GMAF_AF: Non-reference allele and frequency of existing variant in 1000 Genomes
AFR_MAF: Non-reference allele and frequency of existing variant in 1000 Genomes combined African population
AMR_MAF: Non-reference allele and frequency of existing variant in 1000 Genomes combined American population
ASN_MAF: Non-reference allele and frequency of existing variant in 1000 Genomes combined Asian population
EAS_MAF: Non-reference allele and frequency of existing variant in 1000 Genomes combined East Asian population
EUR_MAF: Non-reference allele and frequency of existing variant in 1000 Genomes combined European population
SAS_MAF: Non-reference allele and frequency of existing variant in 1000 Genomes combined South Asian population
AA_MAF: Non-reference allele and frequency of existing variant in NHLBI-ESP African American population
EA_MAF: Non-reference allele and frequency of existing variant in NHLBI-ESP European American population
CLIN_SIG: Clinical significance of variant from dbSNP as annotated in ClinVar
SOMATIC: Somatic status of each ID reported under Existing_variation (0, 1, or null)
PUBMED: Pubmed ID(s) of publications that cite existing variant
MOTIF_NAME: The source and identifier of a transcription factor binding profile aligned at this position
MOTIF_POS: The relative position of the variation in the aligned TFBP
HIGH_INF_POS: A flag indicating if the variant falls in a high information position of a transcription factor binding profile (TFBP) (Y, N, or null)
MOTIF_SCORE_CHANGE: The difference in motif score of the reference and variant sequences for the TFBP
IMAPCT: The impact modifier for the consequence type
PICK: Indicates if this block of consequence data was picked by VEP's pick feature (1 or null)
VARIANT_CLASS: Sequence Ontology variant class
TSL: Transcript support level, which is based on independent RNA analyses
HGVS_OFFSET: Indicates by how many bases the HGVS notations for this variant have been shifted
PHENO: Indicates if existing variant is associated with a phenotype, disease or trait (0, 1, or null)
MINIMISED: Alleles in this variant have been converted to minimal representation before consequence calculation (1 or null)
ExAC_AF: Global Allele Frequency from ExAC
ExAC_AF_AFR: African/African American Allele Frequency from ExAC
ExAC_AF_AMR: American Allele Frequency from ExAC
ExAC_AF_EAS: East Asian Allele Frequency from ExAC
ExAC_AF_FIN: Finnish Allele Frequency from ExAC
ExAC_AF_NFE: Non-Finnish European Allele Frequency from ExAC
ExAC_AF_OTH: Other Allele Frequency from ExAC
ExAC_AF_SAS: South Asian Allele Frequency from ExAC
GENE_PHENO: Indicates if gene that the variant maps to is associated with a phenotype, disease or trait (0, 1, or null)
FILTER: Copied from input VCF. This includes filters implemented directly by the variant caller and other external software used in the DNA-Seq pipeline. See below for additional details.
flanking_bps: The flanking basepairs
variant_id: Variant ID
variant_qual: Variant quality
ExAC_AF_Adj: Adjusted Global Allele Frequency from ExAC
ExAC_AC_AN_Adj: Adjusted Global Allele Frequency from ExAC
ExAC_AC_AN: Global Allele Frequency from ExAC
ExAC_AC_AN_AFR: African/African American Allele Frequency from ExAC
ExAC_AC_AN_AMR: American Allele Frequency from ExAC
ExAC_AC_AN_EAS: East Asian Allele Frequency from ExAC
ExAC_AC_AN_FIN: Finnish Allele Frequency from ExAC
ExAC_AC_AN_NFE: Non-Finnish European Allele Frequency from ExAC
ExAC_AC_AN_OTH: Other Allele Frequency from ExAC
ExAC_AC_AN_SAS: South Asian Allele Frequency from ExAC
ExAC_FILTER: Filter