Detection Expected Calibration Error (D-ECE)

uq_detr.dece(detections, ground_truths, *, tp_criterion, ...)

D-ECE measures the gap between a detector's confidence and its precision, using binned calibration.

Definition

Detections are grouped into \(J\) bins by confidence score. For each bin \(j\):

\[ \text{D-ECE} = \sum_{j=1}^{J} \frac{|\hat{D}_j|}{|\hat{D}|} \left| \bar{p}_j - \text{precision}(j) \right| \]

where \(\bar{p}_j\) is the average confidence in bin \(j\), and \(\text{precision}(j)\) is the fraction of true positives in that bin.

A detection is a true positive if it has IoU above a threshold \(\tau\) with a ground-truth object of the same class.

TP Criterion

D-ECE requires you to specify how TP/FP labels are assigned:

# Each detection independently checks any GT (non-exclusive)
uq_detr.dece(dets, gts, tp_criterion="independent")

# COCO-style: sorted by confidence, each GT matched at most once
uq_detr.dece(dets, gts, tp_criterion="greedy")

`tp_criterion`	Matching	Multiple dets can match same GT?
`"independent"`	Non-exclusive	Yes
`"greedy"`	COCO-style exclusive	No

Note

This parameter is required --- there is no default. This is intentional: the choice affects the metric value and users should be aware of which they are using.

Usage

import uq_detr

result = uq_detr.dece(
    detections, ground_truths,
    tp_criterion="greedy",
    iou_threshold=0.5,
    n_bins=25,
)
print(result.score)

Parameters

Parameter	Type	Default	Description
`detections`	`list[Detections]`	required	Predictions per image
`ground_truths`	`list[GroundTruth]`	required	Annotations per image
`tp_criterion`	`str`	required	`"independent"` or `"greedy"`
`iou_threshold`	`float`	`0.5`	IoU threshold for TP assignment
`n_bins`	`int`	`25`	Number of calibration bins

References

Kuppers et al., "Multivariate confidence calibration for object detection", CVPR Workshops 2020.
Kuzucu et al., "On calibration of object detectors: Pitfalls, evaluation and baselines", ECCV 2025.