Loading [MathJax]/jax/output/CommonHTML/jax.js
+ - 0:00:00
Notes for current slide
Notes for next slide

The Demonstration Project: CAGE

John Thompson

29th March 2023

1 / 7

CAGE

Classification After Gene Expression

Aim

  • find gene expression features for use in patient classification.


  • investigate the pros and cons of features based on principal components
2 / 7

Data Source


The gene expression data come from,

Xu K, Shi X, Husted C, Hong R et al.
Smoking modulates different secretory subpopulations expressing SARS-CoV-2 entry genes in the nasal and bronchial airways.
Sci Rep 2022 Oct 28;12(1):18168.

GEO archive as GSE210271. Series Matrix File downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE210271.

3 / 7

Study Design


  • mRNA gene expression from 505 nasal epithelial brushings

  • Profiled using Affymetrix Gene 1.0 ST microarrays .. 21685 probes

  • 505 patients came from two Airway Epithelial Gene Expression in the Diagnosis of Lung Cancer (AEGIS) trials

  • AEGIS-1: 243 with lung cancer, 132 with benign lung disease
  • AEGIS-2: 66 with lung cancer, 64 with benign lung disease
4 / 7

Project Plan


  • Use AEGIS-1 for training and AEGIS-2 for validation

  • Develop the analysis on a sub-sample of 1000 probes

  • Predict Cancer/Benign using Logistic Regression (LR)

  • Compare 5 feature selection methods
    • Rank Probes .. Select the M best probes for LR
    • PCA of Covariances .. Select first M PCs for LR
    • Rank PCs .. Select the M best PCs for LR
    • PCA of Correlations .. Select first M PCs for LR
    • Rank PCs .. Select the M best PCs for LR

  • Once the code is working, use same workflow on all 21,685 probes
5 / 7

Loss Function

Compare models using the Cross-Entropy Loss

1NNi=1yilog(ˆyi)+(1yi)log(1ˆyi)

N is the number of subjects,
yi is 0 for benign cases and 1 for cancer cases
ˆyi is the predicted probability of the case being cancer under whatever model is being used

  • Measures the average prediction error
  • Equivalent to -log-Likelihood of the binomial distribution used in logistic regression
6 / 7

Documentation


  • Commented Scripts, Diary, Dashboard, Reports on all intermediate stages

  • Slide presentation on the final analysis

  • Draft journal article (PLoS)

  • Website describing the project
7 / 7

CAGE

Classification After Gene Expression

Aim

  • find gene expression features for use in patient classification.


  • investigate the pros and cons of features based on principal components
2 / 7
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow