It calculates the Bayesian estimates of sample- and cell-type-specific (CTS) gene expression, via MCMC. For all input, dimension names are recommended if applicable.

bMIND(
  bulk,
  frac = NULL,
  sample_id = NULL,
  ncore = NULL,
  profile = NULL,
  covariance = NULL,
  nu = 50,
  V_fe = NULL,
  nitt = 1300,
  burnin = 300,
  thin = 1,
  frac_method = NULL,
  sc_count = NULL,
  sc_meta = NULL,
  signature = NULL,
  signature_case = NULL,
  case_bulk = NULL
)

Arguments

bulk

bulk gene expression (gene x sample). We recommend log2-transformed data for better performance, except when using Bisque to estimate cell type fractions, raw count is expected for Bisque. If the max(bulk) > 50, bulk will be transformed to log2(count per million + 1) before running bMIND.

frac

sample-specific cell type fraction (sample x cell type). If not specified (NULL), it will be estimated by non-negative least squares (NNLS) by providing signature matrix or Bisque by providing single-cell reference.

sample_id

sample/subject ID vector. The default is that sample ID will be automatically provided for sample-level bMIND analysis, otherwise subject ID should be provided for subject-level bMIND analysis. Note that the subject ID will be sorted in the output and different sample_id would produce slightly different results in MCMCglmm.

ncore

number of cores to run in parallel for providing sample/subject-level CTS estimates. The default is all available cores.

profile

prior profile matrix (gene by cell type). Gene names should be in the same order of bulk, and cell type names should be in the same order as frac. If not specified (NULL), the bulk mean will be supplied.

covariance

prior covariance array (gene by cell type by cell type). Gene names should be in the same order of bulk, and cell type names should be in the same order as frac. If not specified (NULL), bulk variance / sum(colMeans(frac)^2) will be supplied.

nu

hyper-parameter for the prior covariance matrix. The larger the nu, the higher the certainty about the information in covariance, and the more informative is the distribution. The default is 50.

V_fe

hyper-parameter for the covariance matrix of fixed-effects. The default is 0.5 * Identity matrix.

nitt

number of MCMC iterations.

burnin

burn-in iterations for MCMC.

thin

thinning interval for MCMC.

frac_method

method to be used for estimating cell type fractions, either 'NNLS' or 'Bisque'. **All arguments starting from this one will be used to estimate cell-type fractions only, if those fractions are not pre-estimated.**

sc_count

sc/snRNA-seq raw count as reference for Bisque to estimate cell type fractions.

sc_meta

meta data frame for sc/snRNA-seq reference. A binary (0-1) column of 'case' is expected to indicate case/control status.

signature

signature matrix for NNLS to estimate cell type fractions. Log2 transformation is recommended.

signature_case

signature matrix from case samples for NNLS to estimate cell type fractions. Log2 transformation is recommended. If this is provided, signature will be treated as signature matrix for unaffected controls.

case_bulk

case/control status vector for bulk data when using case/control reference to estimate the cell type fractions for case/control subjects separately.

Value

A list containing the output of the bMIND algorithm (some genes with error message in MCMCglmm will not be outputted, e.g., those genes with constant expression)

A

the deconvolved cell-type-specific gene expression (gene x cell type x sample).

SE

the standard error of cell-type-specific gene expression (gene x cell type x sample).

Sigma_c

the covariance matrix for the deconvolved cell-type-specific expression (gene x cell type x cell type).

mu

the estimated profile matrix (gene x cell type).

frac

the estimated cell type fractions (sample x cell type) if fractions are not provided.

Examples


data(example)
bulk = t(na.omit(apply(example$X, 1, as.vector)))
frac = na.omit(apply(example$W, 3, as.vector))
colnames(bulk) = rownames(frac) = 1:nrow(frac)

bulk[1:5, 1:5]
#>              1        2       3        4        5
#> ANK2  4.698981 4.703759 4.88463 5.192639 4.820824
#> POGZ  4.296181 4.502559 4.42363 4.337839 4.102224
#> TRIO  3.143821 3.007979 2.98249 3.508639 2.954794
#> AKAP9 2.648231 2.303789 2.91275 2.900519 1.741124
#> ASH1L 3.738681 3.646859 3.75403 4.000939 3.312794
head(frac)
#>    astrocytes mature neurons immature neurons oligodendrocytes       OPC
#> 1 0.000000000     0.07947864        0.6593213      0.000000000 0.2612001
#> 2 0.004001667     0.17504983        0.6031219      0.022518815 0.1953078
#> 3 0.004323026     0.10218298        0.5877968      0.018830465 0.2868668
#> 4 0.000000000     0.20181933        0.5692321      0.001399302 0.2275493
#> 5 0.000000000     0.18349406        0.5851250      0.026766014 0.2046149
#> 6 0.005763292     0.07574323        0.7951380      0.000000000 0.1233555

# with provided cell type fractions
deconv1 = bMIND(bulk, frac = frac, ncore = 12)

# set.seed(1)
# data(signature)
# bulk = matrix(rnorm(300 * ncol(bulk), 10), ncol = ncol(bulk))
# rownames(bulk) = rownames(signature)[1:nrow(bulk)]
# colnames(bulk) = 1:ncol(bulk)

## without provided cell type fractions: use built-in deconvolution methods NNLS or Bisque to estimate fractions
# deconv2 = bMIND(bulk, signature = signature[, -6], ncore = 12)

# If you have informative prior, use get_prior() in https://randel.github.io/MIND/reference/get_prior.html to generate prior `profile` and `covariance` matrices. Note that covariance matrix is required to be positive-definite.