Cytognomix is presenting the following paper at the Joint HGM 2013 and 21st International Congress of Genetics meeting in Singapore in the April 15 poster session (9:30-10 AM, 12:30-1:30 PM) on Cancer Genetics and Genomics (T05):
STRATEGY FOR IDENTIFICATION, PREDICTION, AND PRIORITIZATION OF NON-CODING VARIANTS OF UNCERTAIN SIGNIFICANCE IN HERITABLE BREAST CANCER.
P. Rogan 1,*, E. Mucaki 1, A. Stuart 2, E. Dovigi 1, C. Viner 3, B. Shirley 3, J. Knoll 2, P. Ainsworth 2 1Departments of Biochemistry, 2Pathology, and 3Computer Science, University of Western Ontario, London, Canada
Objectives: Non-coding sequence variants have been proven to significantly contribute to the phenotypes of high penetrance disorders. We develop an approach to predict pathogenic non-coding variants of uncertain significance(VUS) based on information theory-based analysis of changes in DNA and RNA sequences bound by regulatory factors.
Methods: Complete gene sequences are captured, enriching for non-coding variants in genes known to harbor mutations that increase breast cancer risk. Oligo baits covering the complete coding and intergenic regions 10kb up- and downstream of ATM, BRCA1, BRCA2, CDH1, CHEK2, PALB2 and TP53 were used in solution hybridization. Probe design captures both repeat-free and divergent repeat sequences that are effectively single copy. After Illumina sequencing of 21 high risk patient samples lacking coding mutations, information analysis prioritized non-coding variants within sequence elements recognized by proteins or protein complexes. VUS are being screened for mutations affecting essential binding sites recognized in mRNA splicing, by transcription factors (TFBS), and by protein interacting with untranslated regions (UTR). Information models for exon recognition predict the relative abundance of natural, cryptic, and mutant splice isoforms resulting from predicted mutations. A similar approach is introduced to detect mutations that alter strengths of TFBS and UTR binding sites. Information weight matrices were determined by entropy minimization of ENCODE ChIP-seq regions for 60 transcription factors embedded within DNAse I hypersensitive domains.
Results: The matrices were used to evaluate novel variants discovered by sequence analysis of breast cancer patients for alteration the TFBS binding strengths. This analysis prioritized 9 splicing, 8 TFBS, and 2 UTR variants as most likely to affect gene expression, potentially affecting 6 protein coding genes in the patient samples (from 7,909 variants in 7 genes).
Conclusion: This strategy more comprehensively covers non-coding regions in breast cancer genes than repeat masking, and introduces a unified framework for systematic interpretation of VUS that affect expression.