A MAF (data frame) drawn from the Grande et al. dataset.

grande_maf

Format

grande_maf

A MAF in data frame format. 12251 rows and 125 columns.

Hugo_Symbol

HUGO symbol for the gene (HUGO symbols are always in all caps). "Unknown" is used for regions that do not correspond to a gene

Entrez_Gene_ID

Entrez gene ID (an integer). "0" is used for regions that do not correspond to a gene region or Ensembl ID

Centre

One or more genome sequencing center reporting the variant

NCBI_Build

The reference genome used for the alignment

Chromosome

The affected chromosome

Start_Position

Lowest numeric position of the reported variant on the genomic reference sequence. Mutation start coordinate

End_Position

Highest numeric genomic position of the reported variant on the genomic reference sequence. Mutation end coordinate

Strand

Genomic strand of the reported allele. Currently, all variants will report the positive strand: '+'

Variant_Classification

Translational effect of variant allele

Variant_Type

Type of mutation. TNP (tri-nucleotide polymorphism) is analogous to DNP (di-nucleotide polymorphism) but for three consecutive nucleotides. ONP (oligo-nucleotide polymorphism) is analogous to TNP but for consecutive runs of four or more (SNP, DNP, TNP, ONP, INS, DEL, or Consolidated)

Reference_Allele

The plus strand reference allele at this position. Includes the deleted sequence for a deletion or "-" for an insertion

Tumor_Seq_Allele1

Primary data genotype for tumor sequencing (discovery) allele 1. A "-" symbol for a deletion represents a variant. A "-" symbol for an insertion represents wild-type allele. Novel inserted sequence for insertion does not include flanking reference bases

Tumor_Seq_Allele2

Tumor sequencing (discovery) allele 2

dbSNP_RS

The rs-IDs from the dbSNP database, "novel" if not found in any database used, or null if there is no dbSNP record, but it is found in other databases

dbSNP_Val_Status

The dbSNP validation status is reported as a semicolon-separated list of statuses. The union of all rs-IDs is taken when there are multiple

Tumor_Sample_Barcode

Aliquot barcode for the tumor sample

Matched_Norm_Sample_Barcode

Aliquot barcode for the matched normal sample

Match_Norm_Seq_Allele1

Primary data genotype. Matched normal sequencing allele 1. A "-" symbol for a deletion represents a variant. A "-" symbol for an insertion represents wild-type allele. Novel inserted sequence for insertion does not include flanking reference bases (cleared in somatic MAF)

Match_Norm_Seq_Allele2

Matched normal sequencing allele 2

Tumor_Validation_Allele1

Secondary data from orthogonal technology. Tumor genotyping (validation) for allele 1. A "-" symbol for a deletion represents a variant. A "-" symbol for an insertion represents wild-type allele. Novel inserted sequence for insertion does not include flanking reference bases

Tumor_Validation_Allele2

Secondary data from orthogonal technology. Tumor genotyping (validation) for allele 2

Verification_Status

Second pass results from independent attempt using same methods as primary data source. Generally reserved for 3730 Sanger Sequencing

Validation_Status

Second pass results from orthogonal technology

Mutation_Status

An assessment of the mutation as somatic, germline, LOH, post transcriptional modification, unknown, or none. The values allowed in this field are constrained by the value in the Validation_Status field

Sequencing_Phase

TCGA sequencing phase (if applicable). Phase should change under any circumstance that the targets under consideration change

Sequence_Source

Molecular assay type used to produce the analytes used for sequencing. Allowed values are a subset of the SRA 1.5 library_strategy field values. This subset matches those used at CGHub

Validation_Method

The assay platforms used for the validation call

Score

Boolean variable

BAM_File

Boolean column stating if BAM file exists or not

Sequencer

Instrument used to produce primary sequence data

Tumor_Sample_UUID

GDC aliquot UUID for tumor sample

Matched_Norm_Sample_UUID

GDC aliquot UUID for matched normal sample

HGVSc

The coding sequence of the variant in HGVS recommended format

HGVSp

The protein sequence of the variant in HGVS recommended format. "p.=" signifies no change in the protein

HGVS_Short

Same as the HGVSp column, but using 1-letter amino-acid codes

Transcript_ID

Ensembl ID of the transcript affected by the varian

Exon_Number

The exon number (out of total number)

t_depth

Read depth across this locus in tumor BAM

t_ref_count

Read depth supporting the reference allele in tumor BAM

t_alt_count

Read depth supporting the variant allele in tumor BAM

n_depth

Read depth across this locus in normal BAM

n_ref_count

Read depth supporting the reference allele in normal BAM (cleared in somatic MAF)

n_alt_count

Read depth supporting the variant allele in normal BAM (cleared in somatic MAF)

all_effects

A semicolon delimited list of all possible variant effects, sorted by priority

Allele

The variant allele used to calculate the consequence

Gene

Stable Ensembl ID of affected gene

Feature

Stable Ensembl ID of feature (transcript, regulatory, motif)

Feature_type

Type of feature. Currently one of Transcript, RegulatoryFeature, MotifFeature (or blank)

Consequence

Consequence type of this variant; sequence ontology terms

cDNA_position

Relative position of base pair in the cDNA sequence as a fraction. A "-" symbol is displayed as the numerator if the variant does not appear in cDNA

CDS_position

Relative position of base pair in coding sequence. A "-" symbol is displayed as the numerator if the variant does not appear in coding sequence

Protein_position

Relative position of affected amino acid in protein. A "-" symbol is displayed as the numerator if the variant does not appear in coding sequence

Amino_acids

Only given if the variation affects the protein-coding sequence

Codons

The alternative codons with the variant base in upper case

Existing_variation

Known identifier of existing variation

ALLELE_NUM

Allele number from input; 0 is reference, 1 is first alternate etc.

DISTANCE

Shortest distance from the variant to transcript

STRAND_VEP
SYMBOL

The gene symbol

SYMBOL_SOURCE

The source of the gene symbol

HGNC_ID

Gene identifier from the HUGO Gene Nomenclature Committee if applicable

BIOTYPE

Biotype of transcript

CANONICAL

A flag (YES) indicating that the VEP-based canonical transcript, the longest translation, was used for this gene. If not, the value is null

CCDS

The CCDS identifier for this transcript, where applicable

ENSP

The Ensembl protein identifier of the affected transcript

SWISSPROT

UniProtKB/Swiss-Prot accession

TREMBL

UniProtKB/TrEMBL identifier of protein product

UNIPARC

UniParc identifier of protein product

RefSeq

RefSeq identifier for this transcript

SIFT

The SIFT prediction and/or score, with both given as prediction (score)

PolyPhen

The PolyPhen prediction and/or score

EXON

The exon number (out of total number)

INTRON

The intron number (out of total number)

DOMAINS

The source and identifier of any overlapping protein domains

GMAF

Non-reference allele and frequency of existing variant in 1000 Genomes

GMAF_Allele

Non-reference allele and frequency of existing variant in 1000 Genomes

GMAF_AF

Non-reference allele and frequency of existing variant in 1000 Genomes

AFR_MAF

Non-reference allele and frequency of existing variant in 1000 Genomes combined African population

AMR_MAF

Non-reference allele and frequency of existing variant in 1000 Genomes combined American population

ASN_MAF

Non-reference allele and frequency of existing variant in 1000 Genomes combined Asian population

EAS_MAF

Non-reference allele and frequency of existing variant in 1000 Genomes combined East Asian population

EUR_MAF

Non-reference allele and frequency of existing variant in 1000 Genomes combined European population

SAS_MAF

Non-reference allele and frequency of existing variant in 1000 Genomes combined South Asian population

AA_MAF

Non-reference allele and frequency of existing variant in NHLBI-ESP African American population

EA_MAF

Non-reference allele and frequency of existing variant in NHLBI-ESP European American population

CLIN_SIG

Clinical significance of variant from dbSNP as annotated in ClinVar

SOMATIC

Somatic status of each ID reported under Existing_variation (0, 1, or null)

PUBMED

Pubmed ID(s) of publications that cite existing variant

MOTIF_NAME

The source and identifier of a transcription factor binding profile aligned at this position

MOTIF_POS

The relative position of the variation in the aligned TFBP

HIGH_INF_POS

A flag indicating if the variant falls in a high information position of a transcription factor binding profile (TFBP) (Y, N, or null)

MOTIF_SCORE_CHANGE

The difference in motif score of the reference and variant sequences for the TFBP

IMAPCT

The impact modifier for the consequence type

PICK

Indicates if this block of consequence data was picked by VEP's pick feature (1 or null)

VARIANT_CLASS

Sequence Ontology variant class

TSL

Transcript support level, which is based on independent RNA analyses

HGVS_OFFSET

Indicates by how many bases the HGVS notations for this variant have been shifted

PHENO

Indicates if existing variant is associated with a phenotype, disease or trait (0, 1, or null)

MINIMISED

Alleles in this variant have been converted to minimal representation before consequence calculation (1 or null)

ExAC_AF

Global Allele Frequency from ExAC

ExAC_AF_AFR

African/African American Allele Frequency from ExAC

ExAC_AF_AMR

American Allele Frequency from ExAC

ExAC_AF_EAS

East Asian Allele Frequency from ExAC

ExAC_AF_FIN

Finnish Allele Frequency from ExAC

ExAC_AF_NFE

Non-Finnish European Allele Frequency from ExAC

ExAC_AF_OTH

Other Allele Frequency from ExAC

ExAC_AF_SAS

South Asian Allele Frequency from ExAC

GENE_PHENO

Indicates if gene that the variant maps to is associated with a phenotype, disease or trait (0, 1, or null)

FILTER

Copied from input VCF. This includes filters implemented directly by the variant caller and other external software used in the DNA-Seq pipeline. See below for additional details.

flanking_bps

The flanking basepairs

variant_id

Variant ID

variant_qual

Variant quality

ExAC_AF_Adj

Adjusted Global Allele Frequency from ExAC

ExAC_AC_AN_Adj

Adjusted Global Allele Frequency from ExAC

ExAC_AC_AN

Global Allele Frequency from ExAC

ExAC_AC_AN_AFR

African/African American Allele Frequency from ExAC

ExAC_AC_AN_AMR

American Allele Frequency from ExAC

ExAC_AC_AN_EAS

East Asian Allele Frequency from ExAC

ExAC_AC_AN_FIN

Finnish Allele Frequency from ExAC

ExAC_AC_AN_NFE

Non-Finnish European Allele Frequency from ExAC

ExAC_AC_AN_OTH

Other Allele Frequency from ExAC

ExAC_AC_AN_SAS

South Asian Allele Frequency from ExAC

ExAC_FILTER

Filter