HALO image analysis software segments ThioS-positive objects by intensity thresholding and connected-pixel labelling and may subdivide one physical lesion into multiple adjacent objects. This high object counts would bias cell-to-plaque distance calculations: same cells or genes near a heavily fragmented vessel would be classified as “proximal” (< 50 µm) many times
To establish the physical scale of annotated objects, we first characterise their size distribution. Two size metrics are reported: the maximum diameter (the longest axis of the object as reported by HALO) and the bounding-box diagonal (the diagonal of the smallest rectangle enclosing the object, computed from pixel coordinates converted to µm at 0.2125 µm/pixel).
Both metrics provide an upper bound on object size and are used later to determine whether inter-object distances are plausibly smaller than the objects themselves.
Across all 6 brains, the median maximum diameter is 12.5–15.2 µm for CAA objects and 10.7–15.3 µm for parenchymal objects, with bounding-box diagonals of 6.7–8.0 µm and 5.9–8.3 µm respectively. Both metrics are narrowly distributed, with the vast majority of objects falling below 50 µm. These size values serve as a reference frame for interpreting inter-object distances: any nearest-neighbour distance smaller than the median maximum diameter implies that two objects are closer to each other than their own width, making physical overlap, and thus over-segmentation, highly probable.
Nearest-neighbour distances between plaques
For each annotated object, we computed the distance to the closest other annotated object of the same type within the same brain, using a k-d tree (RANN::nn2). This nearest-neighbour (NN) distance is the primary metric for detecting over-segmentation: if two objects represent fragments of the same physical lesion, their centroids will be separated by a distance smaller than the lesion’s diameter, producing a NN distance close to zero. Distances are computed separately for CAA and parenchymal plaques to avoid cross-type interference.
The ECDF shows, for each distance value on the x-axis, what fraction of all annotated objects have a nearest neighbour closer than that distance. A curve that rises steeply near zero indicates that most objects are tightly packed. Vertical dashed lines mark the two candidate merge thresholds (25 µm and 50 µm).
The NN distance distributions differ in an informative way between the two plaque types. For CAA, the median NN distance ranges from 9.8 to 11.8 µm; for parenchymal plaques, from 11.1 to 20.1 µm. In both cases, median distances are equal to or smaller than the median maximum diameter of the objects themselves (11–15 µm), meaning the gap between adjacent annotated objects is on average no wider than the objects’ own diameter. The 10th percentile falls between 5.0 and 5.9 µm for CAA and 4.5 and 5.5 µm for parenchymal plaques — distances at which two centroids would overlap even with the smallest objects.
The log-scale histogram is plotted with a logarithmic x-axis to reveal structure at very small distances (1–10 µm) that would be invisible on a linear scale. A mode well below the 25 µm threshold is consistent with over-segmentation. A bimodal distribution would indicate two co-existing populations: over-segmented fragments (first mode, short distances) and genuinely distinct adjacent lesions (second mode, longer distances). The valley between the modes identifies the natural threshold that best separates the two populations.
The log-scale histogram reveals a qualitative difference between the two types. For CAA, the distribution is unimodal, with a single mode between approximately 5 and 20 µm and a monotonic decay beyond it. This is consistent with a population that is almost entirely composed of over-segmented fragments: there is no second peak at longer distances that would indicate a distinct population of well-separated CAA objects. For parenchymal plaques, the distribution is bimodal: a first peak at 5–20 µm (fragments of the same plaque deposit) and a second, broader peak centred around 50–200 µm (genuine distinct plaques in the tissue). The valley between the two modes falls around 25–50 µm and provides an empirical, data-driven justification for the merge threshold: a 25 µm cutoff sits in the trough between the two populations, capturing fragments while leaving the second-mode objects intact (actual distinct plaques).
Threshold impact
To quantify the practical consequence of over-segmentation, we report, for each brain and plaque type, the number and proportion of objects whose nearest same-type neighbour falls within each candidate merge threshold. An object “within threshold” has at least one nearby neighbour with which it would be merged under the corresponding merging rule. High percentages indicate that the majority of HALO objects are candidates for consolidation.
At the 25 µm threshold, 74.9–88.1% of CAA objects and 56.3–77.7% of parenchymal objects have at least one neighbour of the same type within that distance.
At the less stringent 50 µm threshold, these proportions rise to 85.0–91.3% and 67.4–92.2%, respectively. These values are not driven by a subset of densely annotated regions: the proportions are consistently high across all six brains and both slides, indicating pervasive over-segmentation rather than a localised artefact. The smaller percentage for parenchymal plaques relative to CAA at 25 µm reflects that parenchymal plaques, while still overwhelmingly clustered, are distributed across a wider tissue area and thus have a slightly lower packing density than CAA objects confined to vessel trajectories.
Connected components
The threshold impact analysis counts objects with nearby neighbours but does not account for “chain-like clustering”: an object may be within 25 µm of one neighbour, which is itself within 25 µm of a further neighbour, and so on. To quantify the number of distinct lesions that would remain after merging all transitively connected objects, we construct a graph in which each HALO object is a node and edges connect pairs of objects within the threshold distance.
The number of connected components (disconnected subgraphs) equals the number of merged lesions. Singletons are components containing a single object (i.e. isolated objects with no neighbour within the threshold). The % reduction is the percentage decrease in object count relative to the original, equivalent to the fraction of objects that would be absorbed into multi-object clusters.
At 25 µm, merging connected objects would reduce CAA counts from 284–446 objects to 90–204 components per brain (a reduction of 58.1–67.2%). Of these surviving components, 28–112 are singletons, meaning between one-quarter and one-half of merged lesions have no neighbours within 25 µm and would be unaffected by merging.
Applying the same threshold to parenchymal plaques would reduce counts from 2,681–3,961 to 1,380–1,709 components (a 36.3–57.6% reduction) with 804–1,172 singletons per brain. Increasing the threshold to 50 µm yields an additional reduction: CAA to 90–153 components (65.7–73.0% reduction) and parenchymal to 769–1,014 components (58.5–80.6% reduction), but the incremental gain over 25 µm is modest relative to the additional risk of merging genuinely distinct lesions.
Spatial maps
To determine whether clustering is spatially coherent, consistent with biological structure (vessel trajectories for CAA, plaque fields for parenchymal) rather than random annotation noise, we map each object’s position coloured by its NN distance. Objects coloured red have a neighbour within 25 µm; grey objects have neighbours at approximately the midpoint distance (25 µm); navy objects are well-isolated (> 100 µm to nearest neighbour). Spatial coherence of red objects would confirm that clusters represent real lesion structures rather than scattered annotation errors.
CAA objects are almost uniformly red across all six brains and form the characteristic linear, curvilinear, and branching patterns expected of cortical vessel distributions. The rare navy points represent larger, isolated CAA deposits without nearby same-type objects.
For parenchymal plaques, the maps show dense with blue/navy objects occurring only at the periphery of annotated regions. The spatial coherence of the red clusters in both types confirms that over-segmentation reflects the underlying anatomy: HALO has systematically fragmented continuous structures (vessel segments, plaque fields) rather than produced random annotation artefacts.
Conclusions and recommended next step
Taken together, the distance distributions, threshold impact analysis, connected-component counts, spatial maps, and size–distance correlations provide consistent and convergent evidence that HALO substantially over-segments ThioS-positive objects in both CAA and parenchymal plaque categories.
The key observations are:
Median NN distances (9.8–11.8 µm for CAA; 11.1–20.1 µm for parenchymal) are at or below the median maximum diameter of the objects themselves (11–15 µm), indicating that adjacent objects are in near-contact or physically overlapping in centroid space.
The parenchymal NN distance histogram is bimodal on a log scale, with a first peak at 5–20 µm (fragments) and a second peak at 50–200 µm (genuine distinct plaques). The inter-modal valley falls around 25–50 µm, providing an empirical justification for the merge threshold that is independent of prior assumptions about plaque size. The CAA distribution is unimodal — almost all CAA objects are fragments with no second population of well-separated lesions.
75–88% of CAA objects and 56–78% of parenchymal objects fall within 25 µm of a same-type neighbour, confirming that the vast majority of annotations are candidates for consolidation.
A 25 µm merge threshold would reduce CAA counts by 58–67% and parenchymal counts by 36–58% per brain, leaving 90–204 and 1,380–1,709 components respectively — plausible numbers of distinct CAA vessel segments and parenchymal deposits.
Spatial clustering is anatomically coherent: CAA fragments align along vessel trajectories; parenchymal fragments form localised dense fields rather than scattered noise.
Over-segmentation affects both plaque types. The bimodal structure unique to parenchymal plaques further distinguishes the two types: CAA is almost uniformly over-segmented (single mode), while the parenchymal dataset contains a mixture of over-segmented fragments and genuine distinct deposits, with the two populations separable at ~25 µm.