Inter-plaque distances

Over-segmentation assessment

Published

March 23, 2026

Problem

HALO image analysis software segments ThioS-positive objects by intensity thresholding and connected-pixel labelling and may subdivide one physical lesion into multiple adjacent objects. This high object counts would bias cell-to-plaque distance calculations: same cells or genes near a heavily fragmented vessel would be classified as “proximal” (< 50 µm) many times

Code

source("R/setup.R")
library(patchwork)
library(igraph)

THRESHOLDS <- c(25, 50)   # µm — candidate merge thresholds

plaques <- thios_all |>
  mutate(
    x_plaque  = ((XMin + XMax) / 2) * 0.2125,
    y_plaque  = ((YMin + YMax) / 2) * 0.2125,
    bb_width  = (XMax - XMin) * 0.2125,
    bb_height = (YMax - YMin) * 0.2125,
    bb_diag   = sqrt(bb_width^2 + bb_height^2)
  )

Plaque size overview

To establish the physical scale of annotated objects, we first characterise their size distribution. Two size metrics are reported: the maximum diameter (the longest axis of the object as reported by HALO) and the bounding-box diagonal (the diagonal of the smallest rectangle enclosing the object, computed from pixel coordinates converted to µm at 0.2125 µm/pixel).

Both metrics provide an upper bound on object size and are used later to determine whether inter-object distances are plausibly smaller than the objects themselves.

Code

plaques |>
  group_by(brain, plaque_type) |>
  summarise(
    n_objects       = n(),
    median_area     = median(Area),
    median_max_diam = median(`Maximum Diameter`),
    median_bb_diag  = median(bb_diag),
    .groups = "drop"
  ) |>
  kbl(
    digits = 1,
    col.names = c("Brain", "Type", "N objects", "Median area (µm²)",
                  "Median max diam (µm)", "Median BB diag (µm)")
  ) |>
  kable_styling(bootstrap_options = c("striped", "condensed"), full_width = FALSE)

Brain	Type	N objects	Median area (µm²)	Median max diam (µm)	Median BB diag (µm)
Brain1	CAA	345	58.8	12.6	6.8
Brain1	Parenchymal	3961	49.8	10.7	5.9
Brain2	CAA	446	67.8	14.4	7.3
Brain2	Parenchymal	2681	105.2	15.3	8.3
Brain3	CAA	366	73.5	14.4	7.8
Brain3	Parenchymal	2760	51.8	11.4	6.2
brain4	CAA	389	61.5	13.8	7.0
brain4	Parenchymal	3559	54.0	11.4	6.3
brain5	CAA	370	81.1	15.2	8.0
brain5	Parenchymal	3187	67.2	13.1	7.2
brain6	CAA	284	57.1	12.5	6.7
brain6	Parenchymal	2743	61.5	12.8	7.0

Code

p1 <- plaques |>
  ggplot(aes(`Maximum Diameter`, fill = plaque_type)) +
  geom_histogram(bins = 60, colour = NA, alpha = 0.8) +
  lims(x = c(0, 50))+
  facet_wrap(~plaque_type, scales = "free_y") +
  scale_fill_manual(values = c(CAA = "#d62728", Parenchymal = "#1f77b4"),
                    guide = "none") +
  labs(x = "Maximum diameter (µm)", y = "Count",
       title = "Maximum diameter distribution") +
  theme_minimal(base_size = 10)

p2 <- plaques |>
  ggplot(aes(bb_diag, fill = plaque_type)) +
  geom_histogram(bins = 60, colour = NA, alpha = 0.8) +
  lims(x = c(0, 30))+
  facet_wrap(~plaque_type, scales = "free_y") +
  scale_fill_manual(values = c(CAA = "#d62728", Parenchymal = "#1f77b4"),
                    guide = "none") +
  labs(x = "Bounding-box diagonal (µm)", y = "Count",
       title = "Bounding-box diagonal distribution") +
  theme_minimal(base_size = 10)

p1 + p2

Individual HALO objects are very small.

Across all 6 brains, the median maximum diameter is 12.5–15.2 µm for CAA objects and 10.7–15.3 µm for parenchymal objects, with bounding-box diagonals of 6.7–8.0 µm and 5.9–8.3 µm respectively. Both metrics are narrowly distributed, with the vast majority of objects falling below 50 µm. These size values serve as a reference frame for interpreting inter-object distances: any nearest-neighbour distance smaller than the median maximum diameter implies that two objects are closer to each other than their own width, making physical overlap, and thus over-segmentation, highly probable.

Nearest-neighbour distances between plaques

For each annotated object, we computed the distance to the closest other annotated object of the same type within the same brain, using a k-d tree (RANN::nn2). This nearest-neighbour (NN) distance is the primary metric for detecting over-segmentation: if two objects represent fragments of the same physical lesion, their centroids will be separated by a distance smaller than the lesion’s diameter, producing a NN distance close to zero. Distances are computed separately for CAA and parenchymal plaques to avoid cross-type interference.

Code

compute_nn <- function(df) {
  df |>
    group_by(brain, plaque_type) |>
    group_modify(\(grp, key) {
      if (nrow(grp) < 2) return(mutate(grp, nn_dist = NA_real_))
      mat <- as.matrix(grp[, c("x_plaque", "y_plaque")])
      res <- nn2(data = mat, query = mat, k = 2)
      mutate(grp, nn_dist = res$nn.dist[, 2])
    }) |>
    ungroup()
}

plaques_nn <- compute_nn(plaques)

Distance distribution

The ECDF shows, for each distance value on the x-axis, what fraction of all annotated objects have a nearest neighbour closer than that distance. A curve that rises steeply near zero indicates that most objects are tightly packed. Vertical dashed lines mark the two candidate merge thresholds (25 µm and 50 µm).

Code

plaques_nn |>
  filter(!is.na(nn_dist)) |>
  ggplot(aes(nn_dist, colour = brain)) +
  stat_ecdf(linewidth = 0.7) +
  geom_vline(xintercept = THRESHOLDS, linetype = "dashed", colour = "grey40") +
  annotate("text", x = THRESHOLDS[[1]], y = 0.02,
           label = paste0(THRESHOLDS[[1]], " µm"), hjust = -0.1, size = 4,
           colour = "grey30") +
  annotate("text", x = THRESHOLDS[[2]], y = 0.1,
           label = paste0(THRESHOLDS[[2]], " µm"), hjust = -0.1, size = 4,
           colour = "grey30") +
  facet_wrap(~plaque_type) +
  scale_x_continuous(limits = c(0, 500)) +
  labs(x = "NN distance (µm)", y = "Cumulative fraction",
       title = "ECDF of nearest-neighbour inter-plaque distances",
       colour = "Brain") +
  theme_minimal(base_size = 10)

The NN distance distributions differ in an informative way between the two plaque types. For CAA, the median NN distance ranges from 9.8 to 11.8 µm; for parenchymal plaques, from 11.1 to 20.1 µm. In both cases, median distances are equal to or smaller than the median maximum diameter of the objects themselves (11–15 µm), meaning the gap between adjacent annotated objects is on average no wider than the objects’ own diameter. The 10th percentile falls between 5.0 and 5.9 µm for CAA and 4.5 and 5.5 µm for parenchymal plaques — distances at which two centroids would overlap even with the smallest objects.

The log-scale histogram is plotted with a logarithmic x-axis to reveal structure at very small distances (1–10 µm) that would be invisible on a linear scale. A mode well below the 25 µm threshold is consistent with over-segmentation. A bimodal distribution would indicate two co-existing populations: over-segmented fragments (first mode, short distances) and genuinely distinct adjacent lesions (second mode, longer distances). The valley between the modes identifies the natural threshold that best separates the two populations.

Code

plaques_nn |>
  filter(!is.na(nn_dist), nn_dist > 0) |>
  ggplot(aes(nn_dist, fill = plaque_type)) +
  geom_histogram(bins = 60, colour = NA, alpha = 0.8) +
  geom_vline(xintercept = THRESHOLDS, linetype = "dashed", colour = "grey30") +
  facet_wrap(~plaque_type, scales = "free_y") +
  scale_x_log10() +
  scale_fill_manual(values = c(CAA = "#d62728", Parenchymal = "#1f77b4"),
                    guide = "none") +
  labs(x = "NN distance (µm, log scale)", y = "Count",
       title = "Nearest-neighbour distance histogram (log-scale x)") +
  theme_minimal(base_size = 10)

Code

plaques_nn |>
  filter(!is.na(nn_dist)) |>
  group_by(brain, plaque_type) |>
  summarise(
    n         = n(),
    p10       = quantile(nn_dist, 0.10),
    p25       = quantile(nn_dist, 0.25),
    median_nn = median(nn_dist),
    .groups = "drop"
  ) |>
  kbl(
    digits = 1,
    col.names = c("Brain", "Type", "N", "10th pct (µm)",
                  "25th pct (µm)", "Median NN (µm)")
  ) |>
  kable_styling(bootstrap_options = c("striped", "condensed"), full_width = FALSE)

The log-scale histogram reveals a qualitative difference between the two types. For CAA, the distribution is unimodal, with a single mode between approximately 5 and 20 µm and a monotonic decay beyond it. This is consistent with a population that is almost entirely composed of over-segmented fragments: there is no second peak at longer distances that would indicate a distinct population of well-separated CAA objects. For parenchymal plaques, the distribution is bimodal: a first peak at 5–20 µm (fragments of the same plaque deposit) and a second, broader peak centred around 50–200 µm (genuine distinct plaques in the tissue). The valley between the two modes falls around 25–50 µm and provides an empirical, data-driven justification for the merge threshold: a 25 µm cutoff sits in the trough between the two populations, capturing fragments while leaving the second-mode objects intact (actual distinct plaques).

Threshold impact

To quantify the practical consequence of over-segmentation, we report, for each brain and plaque type, the number and proportion of objects whose nearest same-type neighbour falls within each candidate merge threshold. An object “within threshold” has at least one nearby neighbour with which it would be merged under the corresponding merging rule. High percentages indicate that the majority of HALO objects are candidates for consolidation.

Code

map_dfr(THRESHOLDS, \(thr)
  plaques_nn |>
    group_by(brain, plaque_type) |>
    summarise(
      threshold_um = thr,
      n_total      = n(),
      n_within_thr = sum(nn_dist <= thr, na.rm = TRUE),
      pct_within   = mean(nn_dist <= thr, na.rm = TRUE) * 100,
      .groups = "drop"
    )
) |>
  arrange(plaque_type, threshold_um, brain) |>
  kbl(
    digits = 1,
    col.names = c("Brain", "Type", "Threshold (µm)", "N total",
                  "N within threshold", "% within threshold")
  ) |>
  kable_styling(bootstrap_options = c("striped", "condensed"), full_width = FALSE) |>
  collapse_rows(columns = 1:2, valign = "top")

Brain	Type	Threshold (µm)	N total	N within threshold	% within threshold
Brain1	CAA	25	345	304	88.1
Brain2		25	446	334	74.9
Brain3		25	366	290	79.2
brain4		25	389	328	84.3
brain5		25	370	304	82.2
brain6		25	284	231	81.3
Brain1		50	345	315	91.3
Brain2		50	446	379	85.0
Brain3		50	366	323	88.3
brain4		50	389	350	90.0
brain5		50	370	328	88.6
brain6		50	284	256	90.1
Brain1	Parenchymal	25	3961	3077	77.7
Brain2		25	2681	1509	56.3
Brain3		25	2760	1947	70.5
brain4		25	3559	2606	73.2
brain5		25	3187	2146	67.3
brain6		25	2743	1939	70.7
Brain1		50	3961	3654	92.2
Brain2		50	2681	2093	78.1
Brain3		50	2760	2383	86.3
brain4		50	3559	3159	88.8
brain5		50	3187	2698	84.7
brain6		50	2743	2327	84.8

At the 25 µm threshold, 74.9–88.1% of CAA objects and 56.3–77.7% of parenchymal objects have at least one neighbour of the same type within that distance.

At the less stringent 50 µm threshold, these proportions rise to 85.0–91.3% and 67.4–92.2%, respectively. These values are not driven by a subset of densely annotated regions: the proportions are consistently high across all six brains and both slides, indicating pervasive over-segmentation rather than a localised artefact. The smaller percentage for parenchymal plaques relative to CAA at 25 µm reflects that parenchymal plaques, while still overwhelmingly clustered, are distributed across a wider tissue area and thus have a slightly lower packing density than CAA objects confined to vessel trajectories.

Connected components

The threshold impact analysis counts objects with nearby neighbours but does not account for “chain-like clustering”: an object may be within 25 µm of one neighbour, which is itself within 25 µm of a further neighbour, and so on. To quantify the number of distinct lesions that would remain after merging all transitively connected objects, we construct a graph in which each HALO object is a node and edges connect pairs of objects within the threshold distance.

The number of connected components (disconnected subgraphs) equals the number of merged lesions. Singletons are components containing a single object (i.e. isolated objects with no neighbour within the threshold). The % reduction is the percentage decrease in object count relative to the original, equivalent to the fraction of objects that would be absorbed into multi-object clusters.

Code

count_components <- function(df, thr) {
  mat <- as.matrix(df[, c("x_plaque", "y_plaque")])
  k   <- min(nrow(mat), 50L)
  res <- nn2(data = mat, query = mat, k = k)
  within <- res$nn.dist <= thr & res$nn.idx != row(res$nn.idx)
  from   <- row(res$nn.idx)[within]
  to     <- res$nn.idx[within]
  edges  <- tibble(from, to) |> filter(from < to)
  g      <- graph_from_data_frame(edges, directed = FALSE,
                                  vertices = seq_len(nrow(mat)))
  comps  <- components(g)
  tibble(
    n_objects     = nrow(mat),
    n_components  = comps$no,
    n_singletons  = sum(comps$csize == 1),
    pct_reduction = (1 - comps$no / nrow(mat)) * 100,
    threshold_um  = thr
  )
}

component_results <- plaques_nn |>
  group_by(brain, plaque_type) |>
  group_modify(\(grp, key) {
    if (nrow(grp) < 2) return(tibble())
    map_dfr(THRESHOLDS, \(thr) count_components(grp, thr))
  }) |>
  ungroup()

component_results |>
  arrange(plaque_type, threshold_um, brain) |>
  kbl(
    digits = 1,
    col.names = c("Brain", "Type", "N objects", "N components",
                  "N singletons", "% reduction", "Threshold (µm)")
  ) |>
  kable_styling(bootstrap_options = c("striped", "condensed"), full_width = FALSE) |>
  collapse_rows(columns = 1:2, valign = "top")

Brain	Type	N objects	N components	N singletons	% reduction	Threshold (µm)
Brain1	CAA	345	113	41	67.2	25
Brain2		446	204	112	54.3	25
Brain3		366	153	76	58.2	25
brain4		389	148	61	62.0	25
brain5		370	142	66	61.6	25
brain6		284	119	53	58.1	25
Brain1		345	93	30	73.0	50
Brain2		446	153	67	65.7	50
Brain3		366	115	43	68.6	50
brain4		389	113	39	71.0	50
brain5		370	102	42	72.4	50
brain6		284	90	28	68.3	50
Brain1	Parenchymal	3961	1678	884	57.6	25
Brain2		2681	1709	1172	36.3	25
Brain3		2760	1380	813	50.0	25
brain4		3559	1737	953	51.2	25
brain5		3187	1679	1041	47.3	25
brain6		2743	1410	804	48.6	25
Brain1		3961	769	307	80.6	50
Brain2		2681	1112	588	58.5	50
Brain3		2760	845	377	69.4	50
brain4		3559	929	400	73.9	50
brain5		3187	1014	489	68.2	50
brain6		2743	894	416	67.4	50

At 25 µm, merging connected objects would reduce CAA counts from 284–446 objects to 90–204 components per brain (a reduction of 58.1–67.2%). Of these surviving components, 28–112 are singletons, meaning between one-quarter and one-half of merged lesions have no neighbours within 25 µm and would be unaffected by merging.

Applying the same threshold to parenchymal plaques would reduce counts from 2,681–3,961 to 1,380–1,709 components (a 36.3–57.6% reduction) with 804–1,172 singletons per brain. Increasing the threshold to 50 µm yields an additional reduction: CAA to 90–153 components (65.7–73.0% reduction) and parenchymal to 769–1,014 components (58.5–80.6% reduction), but the incremental gain over 25 µm is modest relative to the additional risk of merging genuinely distinct lesions.

Spatial maps

To determine whether clustering is spatially coherent, consistent with biological structure (vessel trajectories for CAA, plaque fields for parenchymal) rather than random annotation noise, we map each object’s position coloured by its NN distance. Objects coloured red have a neighbour within 25 µm; grey objects have neighbours at approximately the midpoint distance (25 µm); navy objects are well-isolated (> 100 µm to nearest neighbour). Spatial coherence of red objects would confirm that clusters represent real lesion structures rather than scattered annotation errors.

Code

make_map <- function(type_label) {
  plaques_nn |>
    filter(plaque_type == type_label) |>
    ggplot(aes(x_plaque, y_plaque, colour = nn_dist)) +
    geom_point(size = 0.8, alpha = 0.5) +
    scale_colour_gradient2(
      low = "red", mid = "grey80", high = "navy",
      midpoint = 60, name = "NN dist (µm)",
      limits = c(0, 200), oob = scales::squish
    ) +
    facet_wrap(~brain, scales = "free") +
    theme_minimal(base_size = 10) +
    labs(title = type_label, x = "x (µm)", y = "y (µm)")
}

wrap_plots(make_map("CAA"), make_map("Parenchymal"), ncol = 1)

CAA objects are almost uniformly red across all six brains and form the characteristic linear, curvilinear, and branching patterns expected of cortical vessel distributions. The rare navy points represent larger, isolated CAA deposits without nearby same-type objects.

For parenchymal plaques, the maps show dense with blue/navy objects occurring only at the periphery of annotated regions. The spatial coherence of the red clusters in both types confirms that over-segmentation reflects the underlying anatomy: HALO has systematically fragmented continuous structures (vessel segments, plaque fields) rather than produced random annotation artefacts.

Conclusions and recommended next step

Taken together, the distance distributions, threshold impact analysis, connected-component counts, spatial maps, and size–distance correlations provide consistent and convergent evidence that HALO substantially over-segments ThioS-positive objects in both CAA and parenchymal plaque categories.

The key observations are:

Median NN distances (9.8–11.8 µm for CAA; 11.1–20.1 µm for parenchymal) are at or below the median maximum diameter of the objects themselves (11–15 µm), indicating that adjacent objects are in near-contact or physically overlapping in centroid space.
The parenchymal NN distance histogram is bimodal on a log scale, with a first peak at 5–20 µm (fragments) and a second peak at 50–200 µm (genuine distinct plaques). The inter-modal valley falls around 25–50 µm, providing an empirical justification for the merge threshold that is independent of prior assumptions about plaque size. The CAA distribution is unimodal — almost all CAA objects are fragments with no second population of well-separated lesions.
75–88% of CAA objects and 56–78% of parenchymal objects fall within 25 µm of a same-type neighbour, confirming that the vast majority of annotations are candidates for consolidation.
A 25 µm merge threshold would reduce CAA counts by 58–67% and parenchymal counts by 36–58% per brain, leaving 90–204 and 1,380–1,709 components respectively — plausible numbers of distinct CAA vessel segments and parenchymal deposits.
Spatial clustering is anatomically coherent: CAA fragments align along vessel trajectories; parenchymal fragments form localised dense fields rather than scattered noise.

Over-segmentation affects both plaque types. The bimodal structure unique to parenchymal plaques further distinguishes the two types: CAA is almost uniformly over-segmented (single mode), while the parenchymal dataset contains a mixture of over-segmented fragments and genuine distinct deposits, with the two populations separable at ~25 µm.