It calculates prior CTS profile and covariance matrix from single-cell data. The output can serve as hyper-parameters in bMIND. Only genes with positive definite covariance matrix are outputted.

get_prior(sc, sample = NULL, cell_type = NULL, meta_sc = NULL, filter_pd = T)

Arguments

sc

single-cell count matrix, gene x cell.

sample

variable for sample ID. It is used to generate pseudo-bulk data and calculate cell-type covariance matrix across samples for each gene.

cell_type

variable for cell type labels.

meta_sc

data.frame for meta of cells (cell x features, including columns `sample` (sample ID), `cell_type`). Not required if sample and cell_type are provided.

filter_pd

option to filer out genes without positive-definite cell-type covariance matrix, default is TRUE.

Value

A list containing

profile

CTS profile matrix (gene x cell type), in log2(CPM + 1) scale.

covariance

CTS covariance matrix (gene x cell type x cell type).

Examples

suppressMessages(library(scRNAseq))

suppressMessages(dar <- DarmanisBrainData())

# focus on adult cells with non-hybrid lables
dar = dar[, !dar$cell.type %in% c('fetal_quiescent', 'fetal_replicating', 'hybrid')]
dar = dar[rowSums(counts(dar))!=0,]

table(dar$cell.type, dar$experiment_sample_name)
#>                   
#>                    AB_S1 AB_S11 AB_S2 AB_S3 AB_S4 AB_S5 AB_S7 AB_S8
#>   OPC                  0      0     0     0     0     0     0    18
#>   astrocytes          11      7     2     0    38     0     3     1
#>   endothelial          2      1     0     1     0    16     0     0
#>   microglia            5      1     2     0     0     1     0     7
#>   neurons              6     50     1     2    19    18    35     0
#>   oligodendrocytes     0      1     0     0     1     9    12    15

# the error is caused by OPC is only from one sample
prior = get_prior(sc = counts(dar), sample = dar$experiment_sample_name, cell_type = dar$cell.type)
#> Error in if (!is.symmetric.matrix(x)) stop("argument x is not a symmetric matrix"): missing value where TRUE/FALSE needed

# drop cells only from one sample
dar = dar[, !dar$cell.type %in% c('OPC')]
dar = dar[rowSums(counts(dar))!=0,]

prior = get_prior(sc = counts(dar), sample = dar$experiment_sample_name, cell_type = dar$cell.type)
#> [1] "20976 genes are filtered out because cell-type covariance matrix is not positive-definite (PD);"
#> [1] "The filtering can be disabled by setting filter_pd = FALSE. Note the prior cell-type covariance matrix for each gene is required to be PD."

dim(prior$profile)
#> [1] 328   5
head(prior$profile)
#>        astrocytes endothelial microglia  neurons oligodendrocytes
#> ABL1    3.9444509    6.685126  1.425422 4.350492       5.68671293
#> ACBD5   6.5678615    3.946924  4.355075 5.738703       6.81941440
#> ADARB1  4.3358691    5.254009  1.505483 4.783873       0.05764407
#> AFTPH   5.1665212    7.679581  7.632321 7.114015       4.65149718
#> AGPAT4  0.5202005    2.990566  1.051626 6.240137       6.80423368
#> AGT    10.0188773    6.699810  2.403666 4.833541       4.00746237

dim(prior$covariance)
#> [1] 328   5   5
prior$covariance[1,,]
#>                  astrocytes endothelial  microglia   neurons oligodendrocytes
#> astrocytes         5.027003   -2.017456 -1.0778568 -2.120282       -2.0876742
#> endothelial       -2.017456   19.984533  0.0000000  7.774008      -10.2333931
#> microglia         -1.077857    0.000000  0.9088396  0.000000       -0.2422067
#> neurons           -2.120282    7.774008  0.0000000  5.120625       -3.3872491
#> oligodendrocytes  -2.087674  -10.233393 -0.2422067 -3.387249       10.5114827

# Note bMIND requires positive-definite prior covariance matrix
prior_all = get_prior(sc = counts(dar), sample = dar$experiment_sample_name, cell_type = dar$cell.type, filter_pd = F)