Cell type-specific differential expression analysis with prior derived from single-cell data

It conducts Bayesian testing of cell-type-specific (CTS) differential gene expression analysis, via MCMC.

bmind_de(
  bulk,
  frac = NULL,
  sample_id = NULL,
  ncore = NULL,
  profile = NULL,
  covariance = NULL,
  profile_co = NULL,
  covariance_co = NULL,
  profile_ca = NULL,
  covariance_ca = NULL,
  y = NULL,
  covariate = NULL,
  covariate_bulk = NULL,
  covariate_cts = NULL,
  np = F,
  nu = 50,
  nitt = 1300,
  burnin = 300,
  thin = 1,
  max_samp = 1e+06,
  frac_method = NULL,
  sc_count = NULL,
  sc_meta = NULL,
  signature = NULL,
  signature_case = NULL,
  case_bulk = NULL
)

Arguments

bulk: bulk gene expression (gene x sample).
frac: sample-specific cell type fraction (sample x cell type). If not specified (NULL), it will be estimated by non-negative least squares (NNLS) by providing signature matrix or Bisque by providing single-cell reference.
sample_id: sample/subject ID vector. The default is that sample ID will be automatically provided for sample-level bMIND analysis, otherwise subject ID should be provided for subject-level bMIND analysis. Note that the subject ID will be sorted in the output and different sample_id would produce slightly different results in MCMCglmm.
ncore: number of cores to run in parallel. The default is all available cores.
profile: prior profile matrix (gene by cell type). Gene names should be in the same order of bulk, and cell type names should be in the same order as frac.
covariance: prior covariance array (gene by cell type by cell type). Gene names should be in the same order of bulk, and cell type names should be in the same order as frac. The default is 0.5 * Identity matrix for covariance of fixed effects.
profile_co: prior profile matrix (gene by cell type) for controls.
covariance_co: prior covariance array (gene by cell type by cell type) for controls.
profile_ca: prior profile matrix (gene by cell type) for cases.
covariance_ca: prior covariance array (gene by cell type by cell type) for cases.
y: binary (0-1) outcome/phenotype vector for CTS DE analysis (0 for controls, 1 for cases). Should be the same length and order as sample_id or sort(unique(sample_id)) and row names of covariate.
covariate: matrix for covariates to be adjusted in deconvolution model.
covariate_bulk: colnames of covariate denoting variables that affect bulk expression
covariate_cts: colnames of covariate denoting variables that affect CTS expression
np: option to use non-informative prior. Default is FALSE
nu: hyper-parameter for the prior covariance matrix. The larger the nu, the higher the certainty about the information in covariance, and the more informative is the distribution. The default is 50.
nitt: number of MCMC iterations.
burnin: burn-in iterations for MCMC.
thin: thinning interval for MCMC.
max_samp: max number of posterior samples to generate in testing. An adaptive procedure is used to increase nitt for those genes with p-values = 1/number of posterior samples.
frac_method: method to be used for estimating cell type fractions, either 'NNLS' or 'Bisque'. **All arguments starting from this one will be used to estimate cell-type fractions only, if those fractions are not pre-estimated.**
sc_count: sc/snRNA-seq raw count as reference for Bisque to estimate cell type fractions.
sc_meta: meta data frame for sc/snRNA-seq reference. A binary (0-1) column of 'case' is expected to indicate case/control status.
signature: signature matrix for NNLS to estimate cell type fractions. Log2 transformation is recommended.
signature_case: signature matrix from case samples for NNLS to estimate cell type fractions. Log2 transformation is recommended. If this is provided, signature will be treated as signature matrix for unaffected controls.
case_bulk: case/control status vector for bulk data when using case/control reference to estimate the cell type fractions for case/control subjects separately.

Value

A list containing the output of the bMIND algorithm (some genes with error message in MCMCglmm will not be outputted, e.g., with constant expression)

coef: the estimated coefficients matrix (gene x variables).
frac: the estimated cell type fractions (sample x cell type) if fractions are not provided.
pval: the p-values of CTS-DE testing (gene x cell type).
qval: the q-values of CTS-DE testing by BH FDR adjustment (gene x cell type).

Examples


data(example)
bulk = t(na.omit(apply(example$X, 1, as.vector)))
frac = na.omit(apply(example$W, 3, as.vector))
colnames(bulk) = rownames(frac) = 1:nrow(frac)
y = rbinom(n = nrow(frac), size = 1, prob = 0.5)
covariate = data.frame(c1 = rnorm(length(y)), c2 = rnorm(length(y)))

bulk[1:5, 1:5]
#>              1        2       3        4        5
#> ANK2  4.698981 4.703759 4.88463 5.192639 4.820824
#> POGZ  4.296181 4.502559 4.42363 4.337839 4.102224
#> TRIO  3.143821 3.007979 2.98249 3.508639 2.954794
#> AKAP9 2.648231 2.303789 2.91275 2.900519 1.741124
#> ASH1L 3.738681 3.646859 3.75403 4.000939 3.312794
head(frac)
#>    astrocytes mature neurons immature neurons oligodendrocytes       OPC
#> 1 0.000000000     0.07947864        0.6593213      0.000000000 0.2612001
#> 2 0.004001667     0.17504983        0.6031219      0.022518815 0.1953078
#> 3 0.004323026     0.10218298        0.5877968      0.018830465 0.2868668
#> 4 0.000000000     0.20181933        0.5692321      0.001399302 0.2275493
#> 5 0.000000000     0.18349406        0.5851250      0.026766014 0.2046149
#> 6 0.005763292     0.07574323        0.7951380      0.000000000 0.1233555

# CTS-DE (np = TRUE: use non-informative prior)
deconv = bmind_de(bulk, frac, y = y, covariate = covariate, covariate_bulk = 'c1', covariate_cts = 'c2', np = T)

# If you have informative prior, use get_prior() in https://randel.github.io/MIND/reference/get_prior.html to generate prior `profile` and `covariance` matrices. You can do this sepately for cases (profile_ca, covariance_ca) and controls (profile_co, covariance_co). Note that covariance matrix is required to be positive-definite.
deconv = bmind_de(bulk, frac, np = FALSE)
#> [1] "Prior profile matrix is required, otherwise set np = TRUE to use non-informative pror"
#> Error in bmind_de(bulk, frac, np = FALSE):