Microglia GSEA across brain

Aducanumab vs IgG, per-region fgsea against MSigDB M7 (Immunologic Signatures) mouse gene sets

Published

March 18, 2026

Code

library(msigdbr)
library(fgsea)
library(tidyverse)
library(kableExtra)
library(ggrepel)
library(pheatmap)

Overview

Gene Set Enrichment Analysis (GSEA) was applied to characterise the immunological pathways altered in microglia by aducanumab treatment, across individual brain regions. The analysis uses ranked gene lists derived from the pseudobulk DESeq2 results using the clean Kai´s microglia seurat object and tests them against the M7 Immunologic Signature gene sets from MSigDB.

M7 rationale. M7 gene sets represent cell states and perturbations within the immune system, with the initial release curated from Cui et al. 2023. These sets catalogue cell-type-specific transcriptional responses of mouse immune cells to 86 cytokines in vivo, making them highly appropriate for interpreting microglial responses to immunotherapy. See also the Immune Dictionary Portal for more information.

Panel constraint. The Xenium panel targets approximately 480 genes. Because GSEA only ranks genes present in the data, the effective size of each gene set reflects its intersection with this panel. To retain gene sets with as few as five panel-represented members, minSize = 5 was used (compared with the conventional threshold of 15 used for whole-transcriptome data). No explicit panel filter was applied: fgsea automatically intersects the ranked list with each pathway, so effective sizes already reflect the panel constraint.

Interpretation of NES. The Normalized Enrichment Score (NES) measures whether a gene set is over-represented at the top or bottom of the ranked gene list. It is normalized for gene set size, allowing comparison across pathways and regions. NES > 0 indicates that genes in the set are disproportionately upregulated in aducanumab-treated microglia; NES < 0 indicates disproportionate upregulation in IgG controls. Pathway redundancy within each region was reduced using collapsePathways (threshold p < 0.01) before cross-region comparison.

Setup

Load M7 gene sets

Code

gene_sets <- msigdbr(species = "Mus musculus", db_species = "MM", collection = "M7")
gene_sets_list <- split(gene_sets$gene_symbol, gene_sets$gs_name)

cat("M7 gene sets loaded:", length(gene_sets_list), "\n")

M7 gene sets loaded: 787

Load DE results

Code

de_by_region <- readRDS("results/20260303-DEG_MG_byRegion.rds")

# Panel genes = union of all genes tested across regions
panel_genes <- de_by_region |> map(\(df) rownames(df)) |> reduce(union)

cat("Regions available:", paste(names(de_by_region), collapse = ", "), "\n")

Regions available: Cortex, Hippocampus, Hypothalamus, Thalamus, WM

Code

cat("Panel genes:", length(panel_genes), "\n")

Panel genes: 370

Panel coverage in M7

The table below quantifies how many panel genes are represented in the M7 collection, and how many M7 gene sets have at least one panel member.

Code

m7_genes <- unique(gene_sets$gene_symbol)

tibble(
  panel_size            = length(panel_genes),
  m7_unique_genes       = length(m7_genes),
  panel_in_m7           = sum(panel_genes %in% m7_genes),
  pct_covered           = round(100 * sum(panel_genes %in% m7_genes) / length(panel_genes), 1),
  genesets_with_overlap = sum(map_int(gene_sets_list, \(g) sum(g %in% panel_genes)) > 0)
) |>
  kbl(caption = "Panel coverage in M7 gene sets") |>
  kable_styling("striped")

Panel coverage in M7 gene sets
panel_size	m7_unique_genes	panel_in_m7	pct_covered	genesets_with_overlap
370	5981	274	74.1	690

Per-region GSEA

For each brain region, genes are ranked by avg_log2FC (Adu vs IgG), and fgsea is run against all M7 gene sets. Results are displayed as a volcano plot of NES vs −log₁₀(adjusted p). All points are plotted; significant pathways (FDR < 0.05) are highlighted in red.

Because M7 contains many gene sets with overlapping membership, redundant significant pathways are reduced using collapsePathways (fgsea; p-value threshold 0.01). This procedure identifies a subset of representative (non-redundant) pathways (collapsed pathways), such that no two retained pathways share an unexpectedly large fraction of their leading-edge genes.

For clarity, only these collapsed pathways are labelled on the volcano; the remaining significant-but-redundant pathways remain visible as unlabelled red points. Pathway labels are cleaned by stripping the leading collection prefix (e.g. GSE12345_, GOLDRATH_) and converting underscores to spaces.

Code

gsea_by_region <- imap(de_by_region, \(de_res, reg) {

  # Build ranked list: avg_log2FC sorted descending, all panel genes
  gene_ranks <- de_res |>
    rownames_to_column("gene") |>
    arrange(desc(avg_log2FC)) |>
    (\(d) setNames(d$avg_log2FC, d$gene))()

  res <- fgsea(
    pathways = gene_sets_list,
    stats    = gene_ranks,
    eps      = 0.0,
    minSize  = 5,    # lowered from 15: panel has only ~480 genes
    maxSize  = 500
  )

  sig <- res |> filter(padj < 0.05) |> arrange(desc(NES))

  cat("\n\n##", reg, "\n\n")

  if (nrow(sig) == 0) {
    cat("_No significant pathways (padj < 0.05)._\n\n")
    return(list(fgsea = res, significant = sig, gene_ranks = gene_ranks))
  }

  # Collapse redundant pathways within region
  collapsed <- collapsePathways(
    fgseaRes       = res |> filter(padj < 0.05),
    pathways       = gene_sets_list[sig$pathway],
    pval.threshold = 0.01,
    stats          = gene_ranks
  )
  sig_collapsed <- sig |> filter(pathway %in% collapsed$mainPathways)

  # NES volcano plot
  volcano_data <- res |>
    mutate(
      neg_log10_padj = -log10(padj),
      significant    = padj < 0.05,
      pathway_label  = pathway |>
        str_remove("^[A-Z0-9]+_") |>
        str_replace_all("_", " ") |>
        str_to_sentence()
    )

  p <- ggplot(volcano_data, aes(x = NES, y = neg_log10_padj)) +
    geom_hline(yintercept = -log10(0.05), linetype = "dashed", colour = "grey50") +
    geom_vline(xintercept = 0, linetype = "dashed", colour = "grey50") +
    geom_point(aes(size = size, colour = significant), alpha = 0.6) +
    geom_text_repel(
      data = \(d) filter(d, pathway %in% collapsed$mainPathways),
      aes(label = pathway_label),
      size = 6, max.overlaps = 20, segment.colour = "grey60"
    ) +
    scale_colour_manual(
      values = c("TRUE" = "#E64B35", "FALSE" = "grey70"),
      labels = c("TRUE" = "FDR < 0.05", "FALSE" = "NS")
    ) +
    scale_size_continuous(range = c(1, 6), name = "Gene set size") +
    labs(
      title  = paste("GSEA M7 \u2014", reg),
      x      = "Normalized Enrichment Score (NES)",
      y      = expression(-log[10](p[adj])),
      colour = "Significance"
    ) +
    theme_bw(base_size = 16) +
    theme(panel.grid = element_blank(), legend.position = "bottom")
  print(p)

  list(fgsea = res, significant = sig_collapsed, gene_ranks = gene_ranks)

})

Cortex

Hippocampus

Hypothalamus

Thalamus

WM

Cross-region NES heatmap

The heatmap displays only collapsed pathways. Taking the union of each region’s collapsed main pathways yields a de-duplicated catalogue of biologically distinct immunological programmes altered by aducanumab, free of the inflation that arises when many highly overlapping gene sets reach significance simultaneously.

For each retained pathway, the NES from every region is assembled into a matrix and displayed as a hierarchically clustered heatmap. Asterisks mark cells where the pathway also meets FDR < 0.05 in that region.

Code

all_gsea_long <- imap(gsea_by_region, \(res, reg) {
  res$fgsea |> mutate(Region = reg)
}) |> list_rbind()

# Union of collapsed (non-redundant) main pathways across all regions
sig_pathways <- gsea_by_region |>
  map(\(res) res$significant$pathway) |>
  reduce(union)

if (length(sig_pathways) > 0) {
  clean_label <- function(x) {
    x |>
      str_remove("^[A-Z0-9]+_") |>
      str_replace_all("_", " ") |>
      str_to_sentence()
  }

  nes_matrix <- all_gsea_long |>
    filter(pathway %in% sig_pathways) |>
    pivot_wider(id_cols = pathway, names_from = Region, values_from = NES) |>
    mutate(pathway = clean_label(pathway)) |>
    column_to_rownames("pathway") |>
    as.matrix()

  padj_matrix <- all_gsea_long |>
    filter(pathway %in% sig_pathways) |>
    pivot_wider(id_cols = pathway, names_from = Region, values_from = padj) |>
    mutate(pathway = clean_label(pathway)) |>
    column_to_rownames("pathway") |>
    as.matrix()

  display_matrix <- ifelse(!is.na(padj_matrix) & padj_matrix < 0.05, "*", "")
  clim <- ceiling(max(abs(nes_matrix), na.rm = TRUE))

  pheatmap(nes_matrix,
    color           = colorRampPalette(c("#4575b4", "white", "#d73027"))(100),
    breaks          = seq(-clim, clim, length.out = 101),
    na_col          = "grey88",
    display_numbers = display_matrix,
    fontsize_number = 24,
    number_color    = "black",
    cluster_rows    = TRUE,
    cluster_cols    = TRUE,
    main            = "M7 GSEA NES (Adu vs IgG) by region\n",
    fontsize        = 15,
    fontsize_row    = 14,
    angle_col       = 45,
    cellwidth       = 30,
    cellheight      = 30
  )
} else {
  message("No significant pathways across any region — heatmap skipped.")
}

Leading-edge tables

Collapsible tables listing significant (after collapsePathways within each region) M7 pathways per region, with NES, adjusted p-value, effective gene set size, and the leading-edge genes driving enrichment.

Code

iwalk(gsea_by_region, \(res, reg) {
  cat("\n\n<details><summary>**", reg, "** significant pathways</summary>\n\n")

  if (nrow(res$significant) > 0) {
    res$significant |>
      select(pathway, NES, padj, size, leadingEdge) |>
      mutate(
        pathway     = str_remove(pathway, "^[A-Z0-9]+_") |>
                      str_replace_all("_", " ") |>
                      str_to_sentence(),
        padj        = signif(padj, 3),
        NES         = round(NES, 2),
        leadingEdge = map_chr(leadingEdge, \(x) paste(x, collapse = ", "))
      ) |>
      arrange(padj) |>
      kbl(caption = paste("M7 GSEA \u2014", reg)) |>
      kable_styling("striped") |>
      print()
  } else {
    cat("No significant pathways.\n\n")
  }

  cat("\n\n</details>\n\n")
})

** Cortex ** significant pathways

M7 GSEA — Cortex
pathway	NES	padj	size	leadingEdge
Cdc2 ifna1 response up	1.96	0.0251	39	C3, Ccl5, Ifitm3, Cxcl10, Lgals3bp, Ifit2, Ccl12, Ifit1, Axl, C1qb, Hif1a, Selplg, Ldlr, Bcl2a1b, Pgd, Hk3, Etnk1, Ly86, Aif1, Grn, Acer3, Clec2d, Sat1, Pnp
Langerhans ifna1 response up	1.85	0.0251	5	Ifitm3, Cxcl10, Lgals3bp, Ifit2, Ifit1

** Hippocampus ** significant pathways

M7 GSEA — Hippocampus
pathway	NES	padj	size	leadingEdge
Macrophage il15 response up	2.04	0.000101	35	Ccl12, Msr1, Cxcl10, Ifit2, Ccl2, Il1b, Ifit1, Ifitm3, Ccl5, Acod1, Ccl4, Clec2d
Cdc2 il15 response up	2.01	0.000334	39	Ccl12, Cxcl10, Ifit2, Ccl2, Ifit1, Ifitm3, Tnf, Vim, C3, Myo1e, Axl, Clec2d, Lgals3bp, Bcl2a1b, Ldlr, Mapkapk2, Ifitm1, Fas
Macrophage il12 response up	1.76	0.001440	6	Ccl12, Cxcl10, Ifit2, Ccl2, Ifit1
Macrophage il1a response up	1.79	0.008880	17	Ccl12, Msr1, Ccl2, Il1b, Nlrp3, Hmox1, Plin2, Bcl2a1b
T cell cd8 il4 response dn	1.62	0.033500	11	Cd3d, Ccl5, Nkg7, Vim

** Hypothalamus ** significant pathways

M7 GSEA — Hypothalamus
pathway	NES	padj	size	leadingEdge
Cdc2 ifna1 response up	2.22	0.00326	39	Lgals3, Ifitm3, Ccl5, C3, Cxcl10, Lgals3bp, Ccl2, Bcl2a1b, Axl, Ifit1, Ifit2, Hk3, Aif1, Ccl12, Plin2, Clec2d, Grn
Macrophage il15 response up	2.18	0.00326	35	Ifitm3, Ccl5, Il1b, Cxcl10, Msr1, Cd74, Ccl4, Ccl2, Bcl2a1b, Ifit1, Ifit2, Hk3, Ccl12, Hilpda, Junb, Acod1, Clec2d
Cdc2 il18 response up	2.09	0.00326	43	C3, Cxcl10, Tnf, Lgals3bp, Vim, Plac8, Myo1e, Ccl2, Bcl2a1b, Lpl, Ifit1, Mgl2, Ifit2, Hk3, Aif1, Ccl12
Pdc tnfa response up	1.82	0.03020	13	H2-Ab1, H2-Eb1, Lgals3bp, Cd74

** Thalamus ** significant pathways

M7 GSEA — Thalamus
pathway	NES	padj	size	leadingEdge
Cdc2 il15 response up	1.83	0.0446	39	Ccl12, Ifit2, Ifitm3, Tnf, Ldlr, Ccl2, Mapkapk2, Ifitm1, Cxcl10, Ifit1, Lgals3bp, Clec2d, C3, Axl, Myo1e

** WM ** significant pathways

M7 GSEA — WM
pathway	NES	padj	size	leadingEdge
Treg ifnb response up	2.00	0.0269	10	Ifitm3, Ifit2, Ms4a4b, St6galnac4, Cxcl10, Lgals3bp, Clec2d, Ifit1
Cdc2 il18 response up	1.82	0.0305	43	Ccl12, Ifit2, Hk3, Vim, Tnf, Cxcl10, Mgl2, Clec10a, C3, Lgals3bp, Clec2d, Nlrp3, Ifit1, Fas, Stt3a, Lpl, Cd44, Myo1e, Aif1, Slc33a1, Mapkapk2, Ldlr, Icam1, Pnp, Plac8