Bivariate Spatial Cross-Correlation

Lee’s L per sample (brain) for treatment comparison — kNN(15) spatial weights

Published

March 24, 2026

Setup

Code
source("R/setup.R")
library(spdep)

data   <- load_slides_and_distances("full")
slide1 <- data$slide1
slide2 <- data$slide2

## Identify cell-types of interest

ct_astro <- "Astrocyte"
ct_oligo <- "Oligodendrocyte"
ct_micro <- "Microglia"

Hypothesis

The three-layer glial response model proposes a spatial relay: microglia activation (Layer 1) drives astrocyte amplification (Layer 2), which in turn triggers oligodendrocyte remodeling (Layer 3).

If this relay is spatially structured rather than three independent regional responses, then activated microglia and reactive astrocytes should be spatially co-located: where one finds a microglial Ccl12 hotspot, one should also find a Cxcl10-expressing astrocyte hotspot nearby.

Lee’s L statistic is the bivariate generalisation of Moran’s I: it quantifies whether two spatial variables are jointly clustered. We use it to test the following pairs:

Pair Gene X Cell type X Gene Y Cell type Y Hypothesis
1 Ccl12 Microglia Cxcl10 Astrocytes Layer 1 → Layer 2 co-activation
2 Cxcl10 Astrocytes Opcml Oligodendrocytes Layer 2 → Layer 3 relay
3 Ccl12 Microglia Opcml Oligodendrocytes Long-range Layer 1 → Layer 3

For each pair we create two per-cell expression vectors: the gene’s raw count if the cell belongs to the relevant cell type, zero otherwise. Lee’s L of these two vectors tests whether the spatial patterns are concordant (L > 0) or discordant (L < 0) beyond what is expected by chance. Significance is assessed by Monte Carlo permutation (n = 499 shuffles) while keeping cell coordinates fixed.

Gene selection rationale

Each gene was chosen as the strongest spatially-validated marker for its layer in this dataset, based on the upstream plaque-proximity and pseudobulk DE analyses:

  • Ccl12 (Microglia — Layer 1): the most statistically significant gene enriched in the proximity of parenchymal plaques in Aducanumab-treated brains (notebook 8, pseudobulk DESeq2 in plaque-proximal microglia). It also appears as the most cross-regionally consistent microglial signal in the DEG_consolidated regional model, significant in both hippocampus and white matter microglia.

  • Cxcl10 (Astrocytes — Layer 2): a marker of astrocyte subcluster 6, which is expanded in Aducanumab-treated brains relative to IgG controls. Cxcl10 encodes an interferon-γ-inducible chemokine (IP-10) produced by reactive astrocytes, linking microglial inflammatory signalling to an astrocyte activation state that is treatment-sensitive. This makes it the most informative astrocyte gene to test for spatial co-localization with microglial Ccl12.

  • Opcml (Oligodendrocytes — Layer 3): the most statistically significant gene enriched near parenchymal plaques upon Aducanumab treatment in the oligodendrocyte gene-distance correlation analysis (notebook 12). It is an established oligodendrocyte lineage marker and its plaque-proximal upregulation provides the strongest Layer 3 spatial signal in this dataset.

Functions

Code
# Build a per-cell expression vector for gene G in cell type T:
# value = raw count if cell is of type T, 0 otherwise.
# cells_ordered must match the order of nodes in the spatial weight matrix (lw).
build_ct_expr_vector <- function(seurat_obj, gene, cell_types, cells_ordered = colnames(seurat_obj)) {
  meta       <- seurat_obj@meta.data
  counts_mat <- LayerData(seurat_obj, assay = DefaultAssay(seurat_obj), layer = "counts")

  n    <- length(cells_ordered)
  expr <- numeric(n)

  if (gene %in% rownames(counts_mat)) {
    in_mat       <- cells_ordered %in% colnames(counts_mat)
    expr[in_mat] <- as.numeric(counts_mat[gene, cells_ordered[in_mat]])
  }

  ct_cells <- rownames(meta)[meta$annotatedclusters %in% cell_types]
  expr[!cells_ordered %in% ct_cells] <- 0
  expr
}
Code
# Run Lee's L with Monte Carlo permutation for one gene pair.
run_lee <- function(seurat_obj, lw, gene_x, ct_x, gene_y, ct_y, nsim = 499) {
  lw_cells <- attr(lw$neighbours, "region.id")
  x <- build_ct_expr_vector(seurat_obj, gene_x, ct_x, lw_cells)
  y <- build_ct_expr_vector(seurat_obj, gene_y, ct_y, lw_cells)

  if (sum(x > 0) < 10 || sum(y > 0) < 10) {
    return(tibble(
      gene_x = gene_x, ct_x = paste(ct_x, collapse = "/"),
      gene_y = gene_y, ct_y = paste(ct_y, collapse = "/"),
      lee_L  = NA_real_, pval = NA_real_,
      n_x    = sum(x > 0), n_y = sum(y > 0)
    ))
  }

  mc_res <- lee.mc(x, y, listw = lw, nsim = nsim, zero.policy = TRUE)
  tibble(
    gene_x = gene_x, ct_x = paste(ct_x, collapse = "/"),
    gene_y = gene_y, ct_y = paste(ct_y, collapse = "/"),
    lee_L  = mc_res$statistic,
    pval   = mc_res$p.value,
    n_x    = sum(x > 0),
    n_y    = sum(y > 0)
  )
}

Run Lee’s L per sample

Each of the 6 brains is processed independently: subset the slide-level Seurat object to one sample’s cells, rebuild a kNN(15) spatial weight matrix from that sample’s cell coordinates, then run Lee’s L for the 3 relay pairs. This gives 18 rows (6 samples x 3 pairs) and enables direct Adu vs IgG comparison (n = 3 per group).

Code
relay_pairs <- list(
  list(gene_x = "Ccl12", ct_x = ct_micro, gene_y = "Cxcl10", ct_y = ct_astro),
  list(gene_x = "Cxcl10", ct_x = ct_astro, gene_y = "Opcml", ct_y = ct_oligo),
  list(gene_x = "Ccl12", ct_x = ct_micro, gene_y = "Opcml", ct_y = ct_oligo)
)

t0 <- Sys.time()

lee_results <- map(seq_len(nrow(sample_dict)), \(i) {
  t_start <- Sys.time()
  sid       <- sample_dict$sample[i]
  treatment <- sample_dict$treatment[i]
  slide_num <- sample_dict$slide[i]
  slide_obj <- if (slide_num == 1) slide1 else slide2

  # Subset to this sample's cells
  cells_i <- colnames(slide_obj)[slide_obj$sample_ID == sid]
  sub_obj <- subset(slide_obj, cells = cells_i)

  # Build per-sample kNN(15) spatial weight matrix
  coords <- GetTissueCoordinates(sub_obj) |>
    as.data.frame() |>
    dplyr::select(x, y) |>
    as.matrix()
  nn      <- nn2(coords, coords, k = 16)
  nb_list <- map(seq_len(nrow(coords)), \(j) as.integer(nn$nn.idx[j, -1]))
  class(nb_list) <- "nb"
  attr(nb_list, "region.id") <- cells_i
  lw_i <- nb2listw(nb_list, style = "W", zero.policy = TRUE)

  res <- map(relay_pairs, \(pair) {
    run_lee(sub_obj, lw_i, pair$gene_x, pair$ct_x, pair$gene_y, pair$ct_y) |>
      mutate(sample_ID = sid, treatment = treatment)
  }) |>
    bind_rows()

  elapsed <- as.numeric(difftime(Sys.time(), t_start, units = "mins"))
  message(sprintf("[%d/%d] %s (%s) — %.1f min",
                  i, nrow(sample_dict), sid, treatment, elapsed))
  res
}) |>
  bind_rows()

message(sprintf("Done. Total: %.1f min", as.numeric(difftime(Sys.time(), t0, units = "mins"))))

# write_csv(lee_results, "spatial/lee_bivariate.csv")

Results

Table

n_x and n_y are the number of cells with non-zero raw counts for gene X and gene Y, respectively, within their designated cell type. Because the expression vectors are zeroed out for all cells outside the target cell type, these counts reflect only the spatially active signal entering Lee’s L — they are the effective sample sizes for each variable. A very small n (< 10) causes the test to be skipped (reported as NA).

Code
lee_results <- read_csv("spatial/lee_bivariate.csv")

lee_results |>
  mutate(
    pair  = paste0(gene_x, " (", ct_x, ")\n× ", gene_y, " (", ct_y, ")"),
    lee_L = round(lee_L, 4),
    pval  = formatC(pval, format = "f", digits = 3)
  ) |>
  select(pair, sample_ID, treatment, lee_L, pval, n_x, n_y) |>
  kbl(
    caption = "Bivariate Lee's L — three-layer relay pairs (per sample)",
    col.names = c("Pair", "Sample", "Treatment", "Lee's L", "p-value (MC)",
                  "n_x (expressing cells)", "n_y (expressing cells)")
  ) |>
  kable_styling("striped", full_width = FALSE)
Bivariate Lee's L — three-layer relay pairs (per sample)
Pair Sample Treatment Lee's L p-value (MC) n_x (expressing cells) n_y (expressing cells)
Ccl12 (Microglia) × Cxcl10 (Astrocyte) KK4_465 IgG 0.0069 0.002 225 272
Cxcl10 (Astrocyte) × Opcml (Oligodendrocyte) KK4_465 IgG 0.0014 0.004 272 5628
Ccl12 (Microglia) × Opcml (Oligodendrocyte) KK4_465 IgG -0.0006 0.418 225 5628
Ccl12 (Microglia) × Cxcl10 (Astrocyte) KK4_492 Adu 0.0158 0.002 701 476
Cxcl10 (Astrocyte) × Opcml (Oligodendrocyte) KK4_492 Adu -0.0040 0.998 476 6116
Ccl12 (Microglia) × Opcml (Oligodendrocyte) KK4_492 Adu -0.0044 0.998 701 6116
Ccl12 (Microglia) × Cxcl10 (Astrocyte) KK4_504 IgG 0.0202 0.002 361 466
Cxcl10 (Astrocyte) × Opcml (Oligodendrocyte) KK4_504 IgG 0.0007 0.020 466 5297
Ccl12 (Microglia) × Opcml (Oligodendrocyte) KK4_504 IgG -0.0041 0.998 361 5297
Ccl12 (Microglia) × Cxcl10 (Astrocyte) KK4_496 IgG 0.0179 0.002 411 320
Cxcl10 (Astrocyte) × Opcml (Oligodendrocyte) KK4_496 IgG -0.0048 0.998 320 5600
Ccl12 (Microglia) × Opcml (Oligodendrocyte) KK4_496 IgG -0.0034 0.998 411 5600
Ccl12 (Microglia) × Cxcl10 (Astrocyte) KK4_502 Adu 0.0113 0.002 371 383
Cxcl10 (Astrocyte) × Opcml (Oligodendrocyte) KK4_502 Adu -0.0036 0.998 383 5196
Ccl12 (Microglia) × Opcml (Oligodendrocyte) KK4_502 Adu -0.0033 0.994 371 5196
Ccl12 (Microglia) × Cxcl10 (Astrocyte) KK4_464 Adu 0.0253 0.002 720 532
Cxcl10 (Astrocyte) × Opcml (Oligodendrocyte) KK4_464 Adu -0.0011 0.592 532 5436
Ccl12 (Microglia) × Opcml (Oligodendrocyte) KK4_464 Adu -0.0014 0.620 720 5436

Dot plot

Code
lee_plot <- lee_results |>
  mutate(
    pair = paste0(gene_x, " (", ct_x, ")\n× ", gene_y, " (", ct_y, ")"),
    treatment = factor(treatment, levels = c("IgG", "Adu"))
  )

ggplot(lee_plot, aes(x = treatment, y = lee_L, colour = treatment, size = -log10(pval))) +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "grey40") +
  geom_jitter(width = 0.1, height = 0) +
  stat_summary(fun = mean, geom = "crossbar", width = 0.4, linewidth = 0.5,
               colour = "black") +
  facet_wrap(~ pair, scales = "free_y") +
  scale_colour_manual(values = c("IgG" = "#4A90D9", "Adu" = "#E07B39")) +
  scale_size_continuous(name = "-log10(p)", range = c(1, 6)) +
  labs(
    title  = "Bivariate Lee's L ",
    x      = NULL,
    y      = "Lee's L",
    colour = "Treatment"
  ) +
  theme_minimal(base_size = 15) +
  theme(
    strip.text = element_text(size = 12),
    panel.grid.minor = element_blank(),
    #panel.grid.major = element_blank(),
    legend.position = "bottom"
  )

Lee’s L per brain, faceted by relay pair. Each point is one brain; colour indicates treatment group. Positive L supports spatial co-activation; dashed line at L = 0 (null). Adu brains above IgG would indicate treatment-enhanced relay coupling.

Interpretation

Running Lee’s L per sample (brain) rather than per slide enables treatment-level comparison with n = 3 biological replicates per group.

A positive and significant Lee’s L for the Ccl12(microglia) × Cxcl10(astrocyte) pair indicates that regions of microglial Ccl12 activation are spatially concordant with Cxcl10-expressing astrocytes — direct spatial evidence for the Layer 1 → Layer 2 relay. A positive L for Cxcl10(astrocyte) × Opcml(oligodendrocyte) supports the Layer 2 → Layer 3 step. If both pairs show L > 0 but the Ccl12 × Opcml pair (skipping Layer 2) shows weaker L, this provides spatial evidence for a sequential relay rather than independent parallel responses.

If Aducanumab-treated brains show consistently higher Lee’s L than IgG controls, this would indicate that anti-amyloid treatment strengthens the spatial coupling between glial response layers — the relay is not merely present but treatment-modulated.