The bMIND algorithm to estimate sample-level cell-type-specific expression

It calculates the Bayesian estimates of sample- and cell-type-specific (CTS) gene expression, via MCMC. For all input, dimension names are recommended if applicable.

bMIND(
  bulk,
  frac = NULL,
  sample_id = NULL,
  ncore = NULL,
  profile = NULL,
  covariance = NULL,
  nu = 50,
  V_fe = NULL,
  nitt = 1300,
  burnin = 300,
  thin = 1,
  frac_method = NULL,
  sc_count = NULL,
  sc_meta = NULL,
  signature = NULL,
  signature_case = NULL,
  case_bulk = NULL
)

Arguments

bulk: bulk gene expression (gene x sample). We recommend log2-transformed data for better performance, except when using Bisque to estimate cell type fractions, raw count is expected for Bisque. If the max(bulk) > 50, bulk will be transformed to log2(count per million + 1) before running bMIND.
frac: sample-specific cell type fraction (sample x cell type). If not specified (NULL), it will be estimated by non-negative least squares (NNLS) by providing signature matrix or Bisque by providing single-cell reference.
sample_id: sample/subject ID vector. The default is that sample ID will be automatically provided for sample-level bMIND analysis, otherwise subject ID should be provided for subject-level bMIND analysis. Note that the subject ID will be sorted in the output and different sample_id would produce slightly different results in MCMCglmm.
ncore: number of cores to run in parallel for providing sample/subject-level CTS estimates. The default is all available cores.
profile: prior profile matrix (gene by cell type). Gene names should be in the same order of bulk, and cell type names should be in the same order as frac. If not specified (NULL), the bulk mean will be supplied.
covariance: prior covariance array (gene by cell type by cell type). Gene names should be in the same order of bulk, and cell type names should be in the same order as frac. If not specified (NULL), bulk variance / sum(colMeans(frac)^2) will be supplied.
nu: hyper-parameter for the prior covariance matrix. The larger the nu, the higher the certainty about the information in covariance, and the more informative is the distribution. The default is 50.
V_fe: hyper-parameter for the covariance matrix of fixed-effects. The default is 0.5 * Identity matrix.
nitt: number of MCMC iterations.
burnin: burn-in iterations for MCMC.
thin: thinning interval for MCMC.
frac_method: method to be used for estimating cell type fractions, either 'NNLS' or 'Bisque'. **All arguments starting from this one will be used to estimate cell-type fractions only, if those fractions are not pre-estimated.**
sc_count: sc/snRNA-seq raw count as reference for Bisque to estimate cell type fractions.
sc_meta: meta data frame for sc/snRNA-seq reference. A binary (0-1) column of 'case' is expected to indicate case/control status.
signature: signature matrix for NNLS to estimate cell type fractions. Log2 transformation is recommended.
signature_case: signature matrix from case samples for NNLS to estimate cell type fractions. Log2 transformation is recommended. If this is provided, signature will be treated as signature matrix for unaffected controls.
case_bulk: case/control status vector for bulk data when using case/control reference to estimate the cell type fractions for case/control subjects separately.

Value

A list containing the output of the bMIND algorithm (some genes with error message in MCMCglmm will not be outputted, e.g., those genes with constant expression)

A: the deconvolved cell-type-specific gene expression (gene x cell type x sample).
SE: the standard error of cell-type-specific gene expression (gene x cell type x sample).
Sigma_c: the covariance matrix for the deconvolved cell-type-specific expression (gene x cell type x cell type).
mu: the estimated profile matrix (gene x cell type).
frac: the estimated cell type fractions (sample x cell type) if fractions are not provided.

Examples


data(example)
bulk = t(na.omit(apply(example$X, 1, as.vector)))
frac = na.omit(apply(example$W, 3, as.vector))
colnames(bulk) = rownames(frac) = 1:nrow(frac)

bulk[1:5, 1:5]
#>              1        2       3        4        5
#> ANK2  4.698981 4.703759 4.88463 5.192639 4.820824
#> POGZ  4.296181 4.502559 4.42363 4.337839 4.102224
#> TRIO  3.143821 3.007979 2.98249 3.508639 2.954794
#> AKAP9 2.648231 2.303789 2.91275 2.900519 1.741124
#> ASH1L 3.738681 3.646859 3.75403 4.000939 3.312794
head(frac)
#>    astrocytes mature neurons immature neurons oligodendrocytes       OPC
#> 1 0.000000000     0.07947864        0.6593213      0.000000000 0.2612001
#> 2 0.004001667     0.17504983        0.6031219      0.022518815 0.1953078
#> 3 0.004323026     0.10218298        0.5877968      0.018830465 0.2868668
#> 4 0.000000000     0.20181933        0.5692321      0.001399302 0.2275493
#> 5 0.000000000     0.18349406        0.5851250      0.026766014 0.2046149
#> 6 0.005763292     0.07574323        0.7951380      0.000000000 0.1233555

# with provided cell type fractions
deconv1 = bMIND(bulk, frac = frac, ncore = 12)

# set.seed(1)
# data(signature)
# bulk = matrix(rnorm(300 * ncol(bulk), 10), ncol = ncol(bulk))
# rownames(bulk) = rownames(signature)[1:nrow(bulk)]
# colnames(bulk) = 1:ncol(bulk)

## without provided cell type fractions: use built-in deconvolution methods NNLS or Bisque to estimate fractions
# deconv2 = bMIND(bulk, signature = signature[, -6], ncore = 12)

# If you have informative prior, use get_prior() in https://randel.github.io/MIND/reference/get_prior.html to generate prior `profile` and `covariance` matrices. Note that covariance matrix is required to be positive-definite.