Microglia Cell proportions

Published

March 18, 2026

Code

# BiocManager::install("speckle")
# https://bioconductor.org/packages/devel/bioc/vignettes/speckle/inst/doc/speckle.html
library(qs2)
library(Seurat)
library(tidyverse) 
library(speckle)
library(kableExtra)

immune <- UpdateSeuratObject(qs_read("seurat_objects/20260225-mg_kai.qs2"))
Idents(immune) <- "sub.cluster"
# Order samples by Treatment.Group so groups are adjacent
immune$sample_ID <- factor(
  immune$sample_ID,
  levels = c("KK4_465", "KK4_504", "KK4_496", "KK4_492", "KK4_502", "KK4_464"))

Sample ID	Treatment Group
KK4_465	IgG
KK4_504	IgG
KK4_496	IgG
KK4_492	Adu
KK4_502	Adu
KK4_464	Adu

Code

p <- plotCellTypeProps(immune,
                  clusters = Idents(immune),
                  sample = immune$sample_ID)

# Enforce sample_ID factor order on x-axis
sample_order <- c("KK4_465", "KK4_504", "KK4_496", "KK4_492", "KK4_502", "KK4_464")
p + scale_x_discrete(limits = sample_order) +
  theme_minimal() +
  theme(axis.title = element_text(size = 14),
        axis.text.x = element_text(size = 13, angle = 30, hjust = 1),
        axis.text.y = element_text(size = 13),
        legend.text = element_text(size = 14))

The function propeller from the speckl package tests whether cell-type proportions differ between experimental conditions, while properly accounting for biological replication and sample-to-sample variability.

aggregates single cells into sample-level cell-type proportions,
applies a variance-stabilizing transformation (e.g. logit or arcsin),
fits a linear model for each cell type,
uses empirical Bayes moderation to stabilize variance estimates across cell types, and
controls for multiple testing.

As a result, propeller identifies true compositional changes in cell populations and avoids false positives driven by uneven cell capture or outlier samples.

Transform In propeller, transform refers to converting raw cell-type proportions into a scale where statistical testing is valid. Proportions are bounded between 0 and 1 and have unequal variance, so transforming them stabilizes variance and allows linear models to be used appropriately.

Logit The logit transform maps proportions from [0,1] to \((-\infty, +\infty)\) using

\(\log\left(\frac{p}{1-p}\right)\)

This reduces heteroskedasticity, improves power to detect differences in cell-type abundance, and makes effects interpretable as differences in log-odds, while small offsets are used to handle zeros. ***

Robust In propeller, robust refers to using robust empirical Bayes variance estimation in the linear model. This down-weights the influence of outlier samples with extreme cell-type proportions, preventing them from artificially inflating significance.

In practice, robust = TRUE makes the test more resistant to technical or biological outliers while preserving group means, leading to more reliable inference on cell-type composition differences.

Test for differences in immune cell type proportions between treatment groups in total brain

Code

# The propeller function can take a SingleCellExperiment object or Seurat object as input 
# and extract the three necessary  pieces of information from the cell information stored in colData.
#  The three essential pieces of information are

# cluster (Idents function by default)
# sample
# group

prop <- propeller(
  x = immune,
  clusters = Idents(immune),
  sample = immune$sample_ID,
  group = immune$Treatment.Group,
  trend = FALSE,
  robust = TRUE,
  transform = "logit")

prop |> as_tibble () |> 
  kbl(digits = 3) |> 
  kable_styling("striped")

BaselineProp.clusters	BaselineProp.Freq	PropMean.Adu	PropMean.IgG	PropRatio	Tstatistic	P.Value	FDR
Metabolic	0.203	0.179	0.229	0.783	-2.291	0.051	0.564
Il1b+	0.014	0.017	0.012	1.387	1.385	0.205	0.923
ARM	0.054	0.063	0.045	1.390	1.240	0.252	0.923
DAM_1	0.068	0.071	0.065	1.093	0.725	0.489	0.949
Hm1	0.094	0.089	0.099	0.908	-0.658	0.529	0.949
DAM_0	0.247	0.253	0.236	1.073	0.648	0.536	0.949
IRM	0.017	0.018	0.016	1.153	0.541	0.604	0.949
Hm2	0.131	0.133	0.130	1.028	0.259	0.803	0.953
Spp1+	0.080	0.085	0.079	1.086	0.227	0.827	0.953
Proliferating	0.006	0.006	0.006	0.974	-0.167	0.872	0.953
Gpnmb+	0.084	0.086	0.085	1.008	0.061	0.953	0.953

Interpretation

Column	Brief explanation
`BaselineProp.clusters`	Cell type or cluster being tested for proportional differences between conditions.
`BaselineProp.Freq`	Overall mean proportion of the cluster across all samples, independent of condition.
`PropMean.Adu`	Mean sample-level proportion of the cluster in the Adu group.
`PropMean.IgG`	Mean sample-level proportion of the cluster in the IgG group.
`PropRatio`	Ratio of mean proportions between groups (Adu / IgG); values >1 indicate enrichment in Adu, <1 depletion.
`Tstatistic`	Moderated t-statistic testing whether transformed proportions differ between conditions; sign indicates direction.
`P.Value`	Raw p-value for the difference in proportions for that cluster.
`FDR`	False discovery rate–adjusted p-value accounting for testing across all clusters.

Immune cell type proportions by brain region

Code

md <- immune@meta.data |> 
  group_by(Idents(immune),Region) |> 
  summarise(n.cells = n()) |> 
  pivot_wider(names_from = Region, values_from = n.cells)

kbl(md, caption = "number of cells per cluster per region") |> 
  kable_styling("striped")

number of cells per cluster per region
Idents(immune)	Cortex	Hippocampus	Hypothalamus	Thalamus	WM	NA
DAM_0	1874	769	222	768	414	1637
Metabolic	1992	276	325	766	91	1223
DAM_1	534	161	104	373	78	323
Gpnmb+	317	176	216	853	139	240
Hm2	410	313	544	710	533	508
Hm1	525	576	207	247	57	548
Spp1+	400	199	259	598	130	264
Proliferating	41	21	10	32	12	23
ARM	384	153	64	129	45	467
IRM	143	41	22	83	34	75
Il1b+	125	26	6	23	12	133

Code

# Wrap propeller so that regions where the model fails return NULL instead of an error
safe_propeller <- possibly(propeller, otherwise = NULL)

# Get all annotated regions, dropping NA entries
regions <- unique(immune$Region) |> na.omit() |> as.character()

# Run propeller separately for each brain region
prop_by_region <- map(regions, \(reg) {
  # Subset to cells from this region only
  sub <- subset(immune, Region == reg)

  # Skip if either treatment group has < 2 samples (propeller requires replication)
  n_per_group <- table(
    distinct(sub@meta.data, sample_ID, Treatment.Group)$Treatment.Group
  )
  if (any(n_per_group < 2)) return(NULL)

  # Test for proportion differences between Adu and IgG within this region
  safe_propeller(
    x         = sub,
    clusters  = Idents(sub),
    sample    = sub$sample_ID,
    group     = sub$Treatment.Group,
    trend     = FALSE,
    robust    = TRUE,       # down-weights outlier samples
    transform = "logit"     # variance-stabilising transform for bounded proportions
  )
}) |>
  # Name list elements by region, then drop NULLs (skipped/failed regions)
  set_names(regions) |>
  compact()

# Combine per-region results into a single data frame with a Region column
prop_region_df <- imap(prop_by_region, \(tbl, reg) {
  as_tibble(tbl) |> mutate(Region = reg)
}) |>
  list_rbind()

Code

prop_heatmap <- prop_region_df |>
  mutate(
    # Convert PropRatio (Adu/IgG) to log2 scale; set Inf/NaN (zero in one group) to NA
    log2ratio = case_when(
      is.nan(log2(PropRatio)) | is.infinite(log2(PropRatio)) ~ NA_real_,
      TRUE ~ log2(PropRatio)
    ),
    # Overlay significance stars on coloured tiles, and × on grey (zero-proportion) tiles
    stars = case_when(
      is.na(log2ratio)   ~ "\u00d7",   # cross for 0-proportion tiles
      P.Value < 0.001    ~ "***",
      P.Value < 0.01     ~ "**",
      P.Value < 0.05     ~ "*",
      TRUE               ~ ""
    )
  )

# Order cell types by mean log2ratio across regions (Adu-enriched at top)
celltype_order <- prop_heatmap |>
  group_by(BaselineProp.clusters) |>
  summarise(mean_r = mean(log2ratio, na.rm = TRUE)) |>
  arrange(desc(mean_r)) |>
  pull(BaselineProp.clusters)

prop_heatmap <- prop_heatmap |>
  mutate(
    CellType  = factor(BaselineProp.clusters, levels = rev(celltype_order)),
    Region    = factor(Region),
    # Use a darker grey for × marks so they remain visible on the grey background
    cross_col = ifelse(is.na(log2ratio), "grey30", "grey15")
  )

# Symmetric colour scale limit (rounded up to nearest integer)
clim <- ceiling(max(abs(prop_heatmap$log2ratio), na.rm = TRUE))

ggplot(prop_heatmap, aes(x = Region, y = CellType, fill = log2ratio)) +
  # Tile fill encodes direction and magnitude of the proportion shift
  geom_tile(colour = "white", linewidth = 0.5) +
  coord_equal() +
  # Significance stars (or ×) overlaid on each tile
  geom_text(aes(label = stars, colour = cross_col), size = 10, vjust = 0.75) +
  # Pass colour strings directly without mapping to a scale
  scale_colour_identity() +
  # Diverging palette: red = Adu-enriched, blue = IgG-enriched, grey = undefined
  scale_fill_distiller(
    palette   = "RdBu",
    direction = -1,
    limits    = c(-clim, clim),
    na.value  = "grey88",
    name      = "log2(Adu/IgG)"
  ) +
  labs(
    title    = "Cell type proportions by brain region",
    subtitle = "Fill: log2(PropRatio Adu/IgG);  * p<0.05  ** p<0.01",
    x = NULL, y = NULL
  ) +
  theme_minimal(base_size = 13) +
  theme(
    axis.text.x     = element_text(angle = 40, hjust = 1, size = 18),
    axis.text.y     = element_text(size = 18),
    panel.grid      = element_blank(),
    legend.position = "right",
    plot.title      = element_text(face = "bold", size = 18),
    plot.subtitle   = element_text(size = 16, colour = "grey40")
  )

Code

iwalk(prop_by_region, \(tbl, reg) {
  cat("\n\n<details><summary>**", reg, "**</summary>\n\n")
  tbl |>
    as_tibble() |>
    arrange(FDR) |>
    kbl(digits = 3, caption = paste("Region:", reg)) |>
    kable_styling("striped", full_width = FALSE) |>
    print()
  cat("\n\n</details>\n\n")
})

** Thalamus **

Region: Thalamus
BaselineProp.clusters	BaselineProp.Freq	PropMean.Adu	PropMean.IgG	PropRatio	Tstatistic	P.Value	FDR
Metabolic	0.167	0.149	0.197	0.753	-1.395	0.211	0.939
Spp1+	0.131	0.144	0.117	1.231	0.975	0.366	0.939
DAM_1	0.081	0.090	0.077	1.174	0.894	0.405	0.939
Hm2	0.155	0.143	0.159	0.900	-0.600	0.570	0.939
Gpnmb+	0.186	0.188	0.176	1.067	0.445	0.672	0.939
Hm1	0.054	0.054	0.057	0.943	-0.429	0.683	0.939
Proliferating	0.007	0.007	0.007	0.930	-0.406	0.699	0.939
ARM	0.028	0.031	0.023	1.376	0.237	0.821	0.939
Il1b+	0.005	0.005	0.006	0.864	-0.202	0.847	0.939
IRM	0.018	0.021	0.015	1.466	0.192	0.854	0.939
DAM_0	0.168	0.168	0.166	1.010	0.063	0.952	0.952

** Hypothalamus **

Region: Hypothalamus
BaselineProp.clusters	BaselineProp.Freq	PropMean.Adu	PropMean.IgG	PropRatio	Tstatistic	P.Value	FDR
Metabolic	0.164	0.113	0.231	0.486	-2.224	0.053	0.585
Spp1+	0.131	0.141	0.095	1.487	1.279	0.235	0.635
Il1b+	0.003	0.004	0.002	1.911	1.195	0.265	0.635
Proliferating	0.005	0.004	0.007	0.550	-1.179	0.271	0.635
Gpnmb+	0.109	0.120	0.087	1.380	1.068	0.315	0.635
Hm1	0.105	0.089	0.135	0.659	-0.976	0.355	0.635
Hm2	0.275	0.309	0.251	1.231	0.801	0.444	0.635
IRM	0.011	0.012	0.007	1.584	0.771	0.462	0.635
DAM_1	0.053	0.053	0.049	1.081	0.361	0.727	0.888
ARM	0.032	0.039	0.025	1.569	0.137	0.894	0.984
DAM_0	0.112	0.116	0.110	1.054	0.004	0.997	0.997

** Hippocampus **

Region: Hippocampus
BaselineProp.clusters	BaselineProp.Freq	PropMean.Adu	PropMean.IgG	PropRatio	Tstatistic	P.Value	FDR
IRM	0.015	0.022	0.006	3.855	3.956	0.006	0.064
Il1b+	0.010	0.013	0.005	2.850	2.320	0.056	0.307
Metabolic	0.102	0.086	0.125	0.688	-1.740	0.126	0.461
DAM_1	0.059	0.067	0.050	1.357	1.350	0.219	0.555
Hm2	0.115	0.102	0.135	0.756	-1.248	0.252	0.555
Hm1	0.212	0.190	0.242	0.784	-0.871	0.414	0.670
DAM_0	0.284	0.304	0.260	1.168	0.846	0.426	0.670
Gpnmb+	0.065	0.068	0.056	1.215	0.492	0.639	0.878
Proliferating	0.008	0.008	0.007	1.028	-0.269	0.796	0.941
Spp1+	0.073	0.081	0.064	1.264	-0.112	0.914	0.941
ARM	0.056	0.059	0.051	1.168	0.077	0.941	0.941

** Cortex **

Region: Cortex
BaselineProp.clusters	BaselineProp.Freq	PropMean.Adu	PropMean.IgG	PropRatio	Tstatistic	P.Value	FDR
ARM	0.057	0.073	0.042	1.734	2.255	0.043	0.473
Gpnmb+	0.047	0.038	0.051	0.746	-1.137	0.277	0.956
Metabolic	0.295	0.278	0.330	0.842	-0.914	0.378	0.956
DAM_1	0.079	0.081	0.072	1.138	0.643	0.532	0.956
DAM_0	0.278	0.288	0.261	1.103	0.554	0.589	0.956
IRM	0.021	0.020	0.019	1.056	0.526	0.609	0.956
Proliferating	0.006	0.006	0.005	1.183	0.437	0.670	0.956
Spp1+	0.059	0.055	0.058	0.947	-0.202	0.843	0.956
Il1b+	0.019	0.021	0.018	1.142	0.178	0.862	0.956
Hm1	0.078	0.079	0.086	0.922	-0.128	0.901	0.956
Hm2	0.061	0.060	0.058	1.044	0.056	0.956	0.956

** WM **

Region: WM
BaselineProp.clusters	BaselineProp.Freq	PropMean.Adu	PropMean.IgG	PropRatio	Tstatistic	P.Value	FDR
Metabolic	0.059	0.029	0.096	0.299	-3.325	0.003	0.032
IRM	0.022	0.028	0.015	1.916	1.615	0.120	0.658
ARM	0.029	0.024	0.054	0.441	-1.371	0.185	0.679
Gpnmb+	0.090	0.089	0.056	1.596	1.095	0.285	0.730
Proliferating	0.008	0.006	0.011	0.580	-0.898	0.382	0.730
DAM_1	0.050	0.042	0.057	0.738	-0.861	0.398	0.730
Il1b+	0.008	0.008	0.008	0.931	-0.640	0.529	0.831
Hm2	0.345	0.377	0.338	1.116	0.487	0.631	0.867
Spp1+	0.084	0.104	0.092	1.137	0.246	0.808	0.892
DAM_0	0.268	0.252	0.236	1.066	0.138	0.892	0.892
Hm1	0.037	0.041	0.037	1.097	0.137	0.892	0.892