It conducts Bayesian testing of cell-type-specific (CTS) differential gene expression analysis, via MCMC.

bmind_de(
  bulk,
  frac = NULL,
  sample_id = NULL,
  ncore = NULL,
  profile = NULL,
  covariance = NULL,
  profile_co = NULL,
  covariance_co = NULL,
  profile_ca = NULL,
  covariance_ca = NULL,
  y = NULL,
  covariate = NULL,
  covariate_bulk = NULL,
  covariate_cts = NULL,
  np = F,
  nu = 50,
  nitt = 1300,
  burnin = 300,
  thin = 1,
  max_samp = 1e+06,
  frac_method = NULL,
  sc_count = NULL,
  sc_meta = NULL,
  signature = NULL,
  signature_case = NULL,
  case_bulk = NULL
)

Arguments

bulk

bulk gene expression (gene x sample).

frac

sample-specific cell type fraction (sample x cell type). If not specified (NULL), it will be estimated by non-negative least squares (NNLS) by providing signature matrix or Bisque by providing single-cell reference.

sample_id

sample/subject ID vector. The default is that sample ID will be automatically provided for sample-level bMIND analysis, otherwise subject ID should be provided for subject-level bMIND analysis. Note that the subject ID will be sorted in the output and different sample_id would produce slightly different results in MCMCglmm.

ncore

number of cores to run in parallel. The default is all available cores.

profile

prior profile matrix (gene by cell type). Gene names should be in the same order of bulk, and cell type names should be in the same order as frac.

covariance

prior covariance array (gene by cell type by cell type). Gene names should be in the same order of bulk, and cell type names should be in the same order as frac. The default is 0.5 * Identity matrix for covariance of fixed effects.

profile_co

prior profile matrix (gene by cell type) for controls.

covariance_co

prior covariance array (gene by cell type by cell type) for controls.

profile_ca

prior profile matrix (gene by cell type) for cases.

covariance_ca

prior covariance array (gene by cell type by cell type) for cases.

y

binary (0-1) outcome/phenotype vector for CTS DE analysis (0 for controls, 1 for cases). Should be the same length and order as sample_id or sort(unique(sample_id)) and row names of covariate.

covariate

matrix for covariates to be adjusted in deconvolution model.

covariate_bulk

colnames of covariate denoting variables that affect bulk expression

covariate_cts

colnames of covariate denoting variables that affect CTS expression

np

option to use non-informative prior. Default is FALSE

nu

hyper-parameter for the prior covariance matrix. The larger the nu, the higher the certainty about the information in covariance, and the more informative is the distribution. The default is 50.

nitt

number of MCMC iterations.

burnin

burn-in iterations for MCMC.

thin

thinning interval for MCMC.

max_samp

max number of posterior samples to generate in testing. An adaptive procedure is used to increase nitt for those genes with p-values = 1/number of posterior samples.

frac_method

method to be used for estimating cell type fractions, either 'NNLS' or 'Bisque'. **All arguments starting from this one will be used to estimate cell-type fractions only, if those fractions are not pre-estimated.**

sc_count

sc/snRNA-seq raw count as reference for Bisque to estimate cell type fractions.

sc_meta

meta data frame for sc/snRNA-seq reference. A binary (0-1) column of 'case' is expected to indicate case/control status.

signature

signature matrix for NNLS to estimate cell type fractions. Log2 transformation is recommended.

signature_case

signature matrix from case samples for NNLS to estimate cell type fractions. Log2 transformation is recommended. If this is provided, signature will be treated as signature matrix for unaffected controls.

case_bulk

case/control status vector for bulk data when using case/control reference to estimate the cell type fractions for case/control subjects separately.

Value

A list containing the output of the bMIND algorithm (some genes with error message in MCMCglmm will not be outputted, e.g., with constant expression)

coef

the estimated coefficients matrix (gene x variables).

frac

the estimated cell type fractions (sample x cell type) if fractions are not provided.

pval

the p-values of CTS-DE testing (gene x cell type).

qval

the q-values of CTS-DE testing by BH FDR adjustment (gene x cell type).

Examples


data(example)
bulk = t(na.omit(apply(example$X, 1, as.vector)))
frac = na.omit(apply(example$W, 3, as.vector))
colnames(bulk) = rownames(frac) = 1:nrow(frac)
y = rbinom(n = nrow(frac), size = 1, prob = 0.5)
covariate = data.frame(c1 = rnorm(length(y)), c2 = rnorm(length(y)))

bulk[1:5, 1:5]
#>              1        2       3        4        5
#> ANK2  4.698981 4.703759 4.88463 5.192639 4.820824
#> POGZ  4.296181 4.502559 4.42363 4.337839 4.102224
#> TRIO  3.143821 3.007979 2.98249 3.508639 2.954794
#> AKAP9 2.648231 2.303789 2.91275 2.900519 1.741124
#> ASH1L 3.738681 3.646859 3.75403 4.000939 3.312794
head(frac)
#>    astrocytes mature neurons immature neurons oligodendrocytes       OPC
#> 1 0.000000000     0.07947864        0.6593213      0.000000000 0.2612001
#> 2 0.004001667     0.17504983        0.6031219      0.022518815 0.1953078
#> 3 0.004323026     0.10218298        0.5877968      0.018830465 0.2868668
#> 4 0.000000000     0.20181933        0.5692321      0.001399302 0.2275493
#> 5 0.000000000     0.18349406        0.5851250      0.026766014 0.2046149
#> 6 0.005763292     0.07574323        0.7951380      0.000000000 0.1233555

# CTS-DE (np = TRUE: use non-informative prior)
deconv = bmind_de(bulk, frac, y = y, covariate = covariate, covariate_bulk = 'c1', covariate_cts = 'c2', np = T)

# If you have informative prior, use get_prior() in https://randel.github.io/MIND/reference/get_prior.html to generate prior `profile` and `covariance` matrices. You can do this sepately for cases (profile_ca, covariance_ca) and controls (profile_co, covariance_co). Note that covariance matrix is required to be positive-definite.
deconv = bmind_de(bulk, frac, np = FALSE)
#> [1] "Prior profile matrix is required, otherwise set np = TRUE to use non-informative pror"
#> Error in bmind_de(bulk, frac, np = FALSE):