Astrocyte subcluster proportions

Published

March 18, 2026

Code
library(qs2)
library(Seurat)
library(tidyverse)
library(speckle)
library(kableExtra)

astro <- UpdateSeuratObject(qs_read("seurat_objects/20260318-astro_cleaned2_0.3.qs2"))
Idents(astro) <- "seurat_clusters"
# Order samples by Treatment.Group so groups are adjacent
astro$sample_ID <- factor(
  astro$sample_ID,
  levels = c("KK4_465", "KK4_504", "KK4_496", "KK4_492", "KK4_502", "KK4_464"))
Sample ID Treatment Group
KK4_465 IgG
KK4_504 IgG
KK4_496 IgG
KK4_492 Adu
KK4_502 Adu
KK4_464 Adu
Code
p <- plotCellTypeProps(astro,
                  clusters = Idents(astro),
                  sample = astro$sample_ID)

# Enforce sample_ID factor order on x-axis
sample_order <- c("KK4_465", "KK4_504", "KK4_496", "KK4_492", "KK4_502", "KK4_464")
p + scale_x_discrete(limits = sample_order) +
  theme_minimal() +
  theme(axis.title = element_text(size = 14),
        axis.text.x = element_text(size = 13, angle = 30, hjust = 1),
        axis.text.y = element_text(size = 13),
        legend.text = element_text(size = 14))

The function propeller from the speckl package tests whether cell-type proportions differ between experimental conditions, while properly accounting for biological replication and sample-to-sample variability.

As a result, propeller identifies true compositional changes in cell populations and avoids false positives driven by uneven cell capture or outlier samples.

Transform In propeller, transform refers to converting raw cell-type proportions into a scale where statistical testing is valid. Proportions are bounded between 0 and 1 and have unequal variance, so transforming them stabilizes variance and allows linear models to be used appropriately.

Logit The logit transform maps proportions from [0,1] to \((-\infty, +\infty)\) using

\(\log\left(\frac{p}{1-p}\right)\)

This reduces heteroskedasticity, improves power to detect differences in cell-type abundance, and makes effects interpretable as differences in log-odds, while small offsets are used to handle zeros. ***

Robust In propeller, robust refers to using robust empirical Bayes variance estimation in the linear model. This down-weights the influence of outlier samples with extreme cell-type proportions, preventing them from artificially inflating significance.

In practice, robust = TRUE makes the test more resistant to technical or biological outliers while preserving group means, leading to more reliable inference on cell-type composition differences.

Test for differences in astrocyte subcluster proportions between treatment groups in total brain

Code
# The propeller function can take a SingleCellExperiment object or Seurat object as input
# and extract the three necessary  pieces of information from the cell information stored in colData.
#  The three essential pieces of information are

# cluster (Idents function by default)
# sample
# group

prop <- propeller(
  x = astro,
  clusters = Idents(astro),
  sample = astro$sample_ID,
  group = astro$Treatment.Group,
  trend = FALSE,
  robust = TRUE,
  transform = "logit")

prop |> as_tibble () |>
  kbl(digits = 3) |>
  kable_styling("striped")
BaselineProp.clusters BaselineProp.Freq PropMean.Adu PropMean.IgG PropRatio Tstatistic P.Value FDR
6 0.023 0.027 0.019 1.431 1.882 0.089 0.451
5 0.061 0.054 0.068 0.789 -1.601 0.140 0.451
7 0.020 0.017 0.023 0.764 -1.341 0.209 0.451
2 0.145 0.157 0.129 1.214 1.290 0.226 0.451
4 0.080 0.077 0.083 0.930 -0.302 0.769 0.956
3 0.137 0.137 0.139 0.984 -0.176 0.863 0.956
1 0.252 0.249 0.258 0.964 -0.167 0.871 0.956
0 0.281 0.282 0.280 1.005 0.057 0.956 0.956

Interpretation

Column Brief explanation
BaselineProp.clusters Cell type or cluster being tested for proportional differences between conditions.
BaselineProp.Freq Overall mean proportion of the cluster across all samples, independent of condition.
PropMean.Adu Mean sample-level proportion of the cluster in the Adu group.
PropMean.IgG Mean sample-level proportion of the cluster in the IgG group.
PropRatio Ratio of mean proportions between groups (Adu / IgG); values >1 indicate enrichment in Adu, <1 depletion.
Tstatistic Moderated t-statistic testing whether transformed proportions differ between conditions; sign indicates direction.
P.Value Raw p-value for the difference in proportions for that cluster.
FDR False discovery rate–adjusted p-value accounting for testing across all clusters.

Astrocyte subcluster proportions by brain region

Code
md <- astro@meta.data |>
  group_by(Idents(astro),Region) |>
  summarise(n.cells = n()) |>
  pivot_wider(names_from = Region, values_from = n.cells)

kbl(md, caption = "number of cells per cluster per region") |>
  kable_styling("striped")
number of cells per cluster per region
Idents(astro) Cortex Hippocampus Hypothalamus Thalamus WM NA
0 6045 2087 213 40 395 4025
1 54 100 3834 5790 18 1684
2 1051 1441 472 805 1405 1432
3 1055 892 1002 1249 372 1669
4 696 441 314 1381 175 623
5 1703 415 33 85 61 471
6 184 155 167 250 59 228
7 238 65 81 355 38 153
Code
# Wrap propeller so that regions where the model fails return NULL instead of an error
safe_propeller <- possibly(propeller, otherwise = NULL)

# Get all annotated regions, dropping NA entries
regions <- unique(astro$Region) |> na.omit() |> as.character()

# Run propeller separately for each brain region
prop_by_region <- map(regions, \(reg) {
  # Subset to cells from this region only
  sub <- subset(astro, Region == reg)

  # Skip if either treatment group has < 2 samples (propeller requires replication)
  n_per_group <- table(
    distinct(sub@meta.data, sample_ID, Treatment.Group)$Treatment.Group
  )
  if (any(n_per_group < 2)) return(NULL)

  # Test for proportion differences between Adu and IgG within this region
  safe_propeller(
    x         = sub,
    clusters  = Idents(sub),
    sample    = sub$sample_ID,
    group     = sub$Treatment.Group,
    trend     = FALSE,
    robust    = TRUE,       # down-weights outlier samples
    transform = "logit"     # variance-stabilising transform for bounded proportions
  )
}) |>
  # Name list elements by region, then drop NULLs (skipped/failed regions)
  set_names(regions) |>
  compact()

# Combine per-region results into a single data frame with a Region column
prop_region_df <- imap(prop_by_region, \(tbl, reg) {
  as_tibble(tbl) |> mutate(Region = reg)
}) |>
  list_rbind()
Code
prop_heatmap <- prop_region_df |>
  mutate(
    # Convert PropRatio (Adu/IgG) to log2 scale; set Inf/NaN (zero in one group) to NA
    log2ratio = case_when(
      is.nan(log2(PropRatio)) | is.infinite(log2(PropRatio)) ~ NA_real_,
      TRUE ~ log2(PropRatio)
    ),
    # Overlay significance stars on coloured tiles, and × on grey (zero-proportion) tiles
    stars = case_when(
      is.na(log2ratio)   ~ "\u00d7",   # cross for 0-proportion tiles
      P.Value < 0.001    ~ "***",
      P.Value < 0.01     ~ "**",
      P.Value < 0.05     ~ "*",
      TRUE               ~ ""
    )
  )

# Order cell types by mean log2ratio across regions (Adu-enriched at top)
celltype_order <- prop_heatmap |>
  group_by(BaselineProp.clusters) |>
  summarise(mean_r = mean(log2ratio, na.rm = TRUE)) |>
  arrange(desc(mean_r)) |>
  pull(BaselineProp.clusters)

prop_heatmap <- prop_heatmap |>
  mutate(
    CellType  = factor(BaselineProp.clusters, levels = rev(celltype_order)),
    Region    = factor(Region),
    # Use a darker grey for × marks so they remain visible on the grey background
    cross_col = ifelse(is.na(log2ratio), "grey30", "grey15")
  )

# Symmetric colour scale limit (rounded up to nearest integer)
clim <- ceiling(max(abs(prop_heatmap$log2ratio), na.rm = TRUE))

ggplot(prop_heatmap, aes(x = Region, y = CellType, fill = log2ratio)) +
  # Tile fill encodes direction and magnitude of the proportion shift
  geom_tile(colour = "white", linewidth = 0.5) +
  coord_equal() +
  # Significance stars (or ×) overlaid on each tile
  geom_text(aes(label = stars, colour = cross_col), size = 10, vjust = 0.75) +
  # Pass colour strings directly without mapping to a scale
  scale_colour_identity() +
  # Diverging palette: red = Adu-enriched, blue = IgG-enriched, grey = undefined
  scale_fill_distiller(
    palette   = "RdBu",
    direction = -1,
    limits    = c(-clim, clim),
    na.value  = "grey88",
    name      = "log2(Adu/IgG)"
  ) +
  labs(
    title    = "Astrocyte subcluster proportions by brain region",
    subtitle = "Fill: log2(PropRatio Adu/IgG);  * p<0.05  ** p<0.01",
    x = NULL, y = NULL
  ) +
  theme_minimal(base_size = 13) +
  theme(
    axis.text.x     = element_text(angle = 40, hjust = 1, size = 18),
    axis.text.y     = element_text(size = 18),
    panel.grid      = element_blank(),
    legend.position = "right",
    plot.title      = element_text(face = "bold", size = 18),
    plot.subtitle   = element_text(size = 16, colour = "grey40")
  )

Code
iwalk(prop_by_region, \(tbl, reg) {
  cat("\n\n<details><summary>**", reg, "**</summary>\n\n")
  tbl |>
    as_tibble() |>
    arrange(FDR) |>
    kbl(digits = 3, caption = paste("Region:", reg)) |>
    kable_styling("striped", full_width = FALSE) |>
    print()
  cat("\n\n</details>\n\n")
})
** Thalamus **
Region: Thalamus
BaselineProp.clusters BaselineProp.Freq PropMean.Adu PropMean.IgG PropRatio Tstatistic P.Value FDR
5 0.009 0.004 0.014 0.261 -2.302 0.041 0.176
0 0.004 0.002 0.006 0.380 -2.268 0.044 0.176
6 0.025 0.029 0.023 1.289 0.520 0.613 0.948
3 0.125 0.114 0.132 0.866 -0.363 0.723 0.948
7 0.036 0.033 0.039 0.859 -0.313 0.760 0.948
2 0.081 0.090 0.074 1.211 0.275 0.789 0.948
1 0.582 0.584 0.576 1.016 0.119 0.907 0.948
4 0.139 0.143 0.136 1.045 0.067 0.948 0.948
** Hypothalamus **
Region: Hypothalamus
BaselineProp.clusters BaselineProp.Freq PropMean.Adu PropMean.IgG PropRatio Tstatistic P.Value FDR
3 0.164 0.203 0.125 1.617 1.508 0.176 0.382
5 0.005 0.004 0.006 0.602 -1.419 0.199 0.382
6 0.027 0.030 0.017 1.721 1.401 0.204 0.382
0 0.035 0.023 0.047 0.483 -1.393 0.207 0.382
1 0.627 0.579 0.679 0.852 -1.203 0.268 0.382
2 0.077 0.094 0.063 1.489 1.154 0.287 0.382
4 0.051 0.057 0.048 1.169 0.618 0.556 0.636
7 0.013 0.012 0.013 0.884 -0.333 0.749 0.749
** Hippocampus **
Region: Hippocampus
BaselineProp.clusters BaselineProp.Freq PropMean.Adu PropMean.IgG PropRatio Tstatistic P.Value FDR
6 0.028 0.043 0.011 4.049 2.284 0.059 0.338
1 0.018 0.027 0.007 3.728 1.971 0.093 0.338
0 0.373 0.322 0.424 0.759 -1.751 0.127 0.338
2 0.258 0.285 0.234 1.220 0.961 0.371 0.742
5 0.074 0.071 0.080 0.888 -0.632 0.549 0.792
3 0.159 0.165 0.151 1.089 0.560 0.594 0.792
4 0.079 0.076 0.082 0.932 -0.313 0.764 0.824
7 0.012 0.012 0.012 0.992 -0.232 0.824 0.824
** Cortex **
Region: Cortex
BaselineProp.clusters BaselineProp.Freq PropMean.Adu PropMean.IgG PropRatio Tstatistic P.Value FDR
6 0.017 0.020 0.012 1.706 2.113 0.043 0.340
1 0.005 0.004 0.006 0.614 -1.525 0.137 0.549
5 0.154 0.142 0.172 0.826 -0.759 0.453 0.896
3 0.096 0.089 0.104 0.862 -0.641 0.526 0.896
0 0.548 0.577 0.535 1.080 0.547 0.588 0.896
7 0.022 0.019 0.022 0.843 -0.427 0.672 0.896
2 0.095 0.093 0.087 1.070 0.248 0.806 0.899
4 0.063 0.056 0.063 0.891 -0.129 0.899 0.899
** WM **
Region: WM
BaselineProp.clusters BaselineProp.Freq PropMean.Adu PropMean.IgG PropRatio Tstatistic P.Value FDR
1 0.007 0.009 0.004 2.422 1.238 0.245 0.706
0 0.157 0.154 0.203 0.759 -0.995 0.344 0.706
2 0.557 0.577 0.491 1.177 0.973 0.355 0.706
4 0.069 0.059 0.078 0.758 -0.860 0.411 0.706
5 0.024 0.019 0.030 0.633 -0.804 0.441 0.706
7 0.015 0.009 0.016 0.602 -0.530 0.608 0.811
3 0.147 0.147 0.161 0.917 -0.244 0.812 0.838
6 0.023 0.025 0.019 1.328 0.211 0.838 0.838