Gene-to-distance-to-plaque correlation — microglia only
Microglia object provided by Kai Saito with Akhil HALO annotations
Published
April 27, 2026
By restricting the analysis to microglia only, we remove the composition confound. If a gene still correlates with distance within a single cell type, the association more likely reflects a genuine distance-dependent transcriptional programme rather than the fact that microglia are (or are not) concentrated near plaques.
# sample_ID and Treatment.Group are already in cell_distances_all (joined in load_slides_and_distances)cell_distances_all |>group_by(sample_ID, Treatment.Group, prox_any) |>summarise(n =n(), .groups ="drop") |>pivot_wider(names_from = prox_any, values_from = n, values_fill =0) |>mutate(total = proximal + distal) |>kbl(caption ="Microglia per sample — proximity to any plaque (50 um threshold)") |>kable_styling("striped", full_width =FALSE)
Microglia per sample — proximity to any plaque (50 um threshold)
sample_ID
Treatment.Group
distal
proximal
total
KK4_464
Adu
1488
2788
4276
KK4_465
IgG
840
2565
3405
KK4_492
Adu
2020
3057
5077
KK4_496
IgG
1052
2682
3734
KK4_502
Adu
1059
2551
3610
KK4_504
IgG
925
1976
2901
Correlation function and plot
For each gene we compute a Spearman rank correlation (rho) between two vectors that share one entry per cell:
The cell’s raw transcript count for that gene.
The cell’s Euclidean distance (in um) to its nearest plaque of the specified type.
Spearman correlation operates on ranks rather than raw values, so it captures any monotonic relationship (not just linear) and is robust to the extreme zero-inflation typical of spatial transcriptomics count data. We set exact = FALSE because the large number of tied zeros makes an exact permutation p-value computationally infeasible.
A gene that is highly expressed near plaques (short distance) and lowly expressed far away (long distance) will have a negative rho (high expression co-occurs with low distance). Conversely, a gene expressed preferentially far from plaques will have a positive rho.
To make the results more intuitive,so that “plaque-enriched” genes appear at the top of a ranked list rather than the bottom, we multiply rho by -1 to obtain rho_flip. After flipping, positive values mean “enriched near plaques” and negative values mean “enriched far from plaques”.
We apply the Benjamini-Hochberg (BH) procedure to control the false discovery rate, producing an adjusted p-value (padj). A gene is called significant at padj < 0.05.
Final classification. Each gene is labelled as:
enriched_near — padj < 0.05 and rho_flip > 0 (upregulated close to plaques)
enriched_far — padj < 0.05 and rho_flip < 0 (upregulated far from plaques)
ns — not significant
compute_gene_distance_correlation() is centralised in R/setup.R (sourced in setup chunk).
The microglia-specific gene-distance correlation, split by plaque type reveals that microglia activate fundamentally different transcriptional programmes near CAA vs parenchymal plaques.
Parenchymal plaque (right panel)
Near plaque (orange): Classic DAM (Disease-Associated Microglia) signature:
Cst7 (rho ~0.45) — the canonical DAM marker, cystatin F, involved in lysosomal processing
Ctsb, Cd63 — lysosomal/endosomal genes: microglia near parenchymal plaques are actively phagocytosing amyloid
Lyz2 — lysozyme, antimicrobial/inflammatory
Lpl — lipoprotein lipase, lipid metabolism rewiring in DAM
Far from plaque (purple): Homeostatic microglia signature:
P2ry12, Tmem119 — the two defining homeostatic microglia markers. Their strong negative rho confirms that microglia lose their resting identity as they approach parenchymal plaques.
Maf, Ptgs1, Lpcat2 — transcription factor and lipid metabolism genes associated with the surveilling/quiescent state
CAA plaque (left panel)
The effect sizes are much smaller (~0.08 vs ~0.45) and the gene signature is completely different.
Near CAA (orange): This is a lipid/cholesterol metabolism programme, not a phagocytic one. CAA sits in vessel walls, and microglia nearby appear to be responding to vascular lipid dysregulation rather than engulfing amyloid.
Glul — glutamine synthetase, typically an astrocyte gene but also involved in glutamate-glutamine cycling at the vasculature
Abhd17a — depalmitoylase, involved in protein trafficking
Srebf2, Hmgcs1 — cholesterol biosynthesis pathway (SREBP2 is the master regulator, HMGCS1 is its direct target)
Acsl6 — fatty acid activation
Far from CAA (purple):
Lgals3bp, Ftl1 — galectin-binding and ferritin (iron storage)
p_caa <-plot_gene_distance_correlation(cor_caa, title ="CAA plaque")p_paren <-plot_gene_distance_correlation(cor_paren, title ="Parenchymal plaque")(p_caa | p_paren) +plot_annotation( title ="Plaque proximity gene-distance correlation")+plot_layout(guides ="collect") &theme(legend.position ="bottom")
For each gene, Spearman’s rank correlation coefficient (ρ) was calculated between per-cell gene expression levels (raw count data) and the minimum distance to the nearest amyloid plaque. To facilitate interpretation, correlation coefficients were sign-inverted such that positive values indicate higher gene expression in cells located closer to plaques, whereas negative values indicate higher expression at increasing distances from plaques. Genes were ranked by the inverted correlation coefficient. Statistical significance was assessed using two-sided Spearman correlation tests followed by Benjamini–Hochberg false discovery rate correction. Genes were classified as plaque-enriched, plaque-distal, or not significant based on adjusted P values. The five most positive and five most negative correlations are labeled
plot_gene_distance_correlation(cor_caa_adu, title ="CAA — Aducanumab") |plot_gene_distance_correlation(cor_caa_igg, title ="CAA — IgG")
Code
plot_gene_distance_correlation(cor_paren_adu, title ="Parenchymal — Aducanumab") |plot_gene_distance_correlation(cor_paren_igg, title ="Parenchymal — IgG")
Code
plot_gene_distance_correlation(cor_any_adu, title ="Any plaque — Aducanumab") |plot_gene_distance_correlation(cor_any_igg, title ="Any plaque — IgG")
The rank plots above identified two biologically distinct distance-dependent programmes. To compare their effect sizes and significance side by side across both plaque types, we visualise the rho_flip values for the named genes in a single heatmap. Candidates were drawn directly from the labelled genes in the rank plots:
DAM / phagocytic programme (near parenchymal plaque): Cst7, Ctsb, Cd63, Lyz2, Lpl — the top-ranked enriched-near genes for parenchymal distance.
Vascular-lipid programme (near CAA): Srebf2, Hmgcs1, Acsl6, Glul, Abhd17a — the top-ranked enriched-near genes for CAA distance.
Homeostatic programme (far from parenchymal plaque): P2ry12, Tmem119, Maf, Ptgs1, Lpcat2 — the top-ranked enriched-far genes for parenchymal distance.
Cholesterol efflux / iron (far from CAA): Abca1, Abcg1, Lgals3bp, Ftl1 — the top-ranked enriched-far genes for CAA distance.
The heatmap is intended to show whether genes that are strongly associated with one plaque type also show a weaker or opposing signal for the other.