Example: HuggingFace DETR Models
Evaluate calibration of any HuggingFace DETR-family model on a dataset.
Setup
pip install uq-detr transformers torch torchvision
End-to-End: DETR on COCO
import numpy as np
from PIL import Image
from transformers import AutoImageProcessor, AutoModelForObjectDetection
import uq_detr
from uq_detr import Detections, GroundTruth, select
# ── Load model ──
model_name = "SenseTime/deformable-detr" # sigmoid + focal loss
processor = AutoImageProcessor.from_pretrained(model_name)
model = AutoModelForObjectDetection.from_pretrained(model_name)
model.eval()
# ── Run inference on your dataset ──
all_queries = []
ground_truths = []
for image, annotation in dataset: # your COCO-style dataset
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
# Raw outputs (all queries, before post-processing)
raw_logits = outputs.logits.squeeze(0).numpy() # (Q, C)
boxes = outputs.pred_boxes.squeeze(0).numpy() # (Q, 4) cxcywh normalized
H, W = image.height, image.width
# Apply activation (softmax for original DETR, sigmoid for later variants)
# probs = softmax(raw_logits, axis=-1)[:, :-1] # original DETR
probs = 1.0 / (1.0 + np.exp(-raw_logits)) # Deformable-DETR, DINO, RT-DETR
all_queries.append(
Detections.from_cxcywh(boxes, probs, image_size=(H, W))
)
ground_truths.append(
GroundTruth(boxes=annotation["boxes"], labels=annotation["labels"])
)
# ── Evaluate with different post-processing ──
for thr in [0.1, 0.3, 0.5, 0.7]:
filtered = [select(q, method="threshold", param=thr) for q in all_queries]
oce = uq_detr.oce(filtered, ground_truths).score
dece = uq_detr.dece(filtered, ground_truths, tp_criterion="greedy").score
print(f"thr={thr:.1f} OCE={oce:.4f} D-ECE={dece:.4f}")
Notes on HuggingFace DETR Outputs
model(**inputs) returns the raw, unfiltered output from all object queries:
outputs.logits: shape(batch, Q, C), raw logits (pre-activation). See below for which activation to apply.outputs.pred_boxes: shape(batch, Q, 4), normalized cxcywh coordinates in [0, 1]. UseDetections.from_cxcywh()withimage_sizeto convert to absolute xyxy.
No filtering or post-processing is applied --- you get all Q queries (DETR: 100, Deformable-DETR: 300, DINO: 900). This is exactly what you want for OCE evaluation: pass the full query set and let uq_detr.select() handle post-processing.
Activation: Softmax vs Sigmoid
| Model | Loss | Activation | Background class? |
|---|---|---|---|
DETR (facebook/detr-resnet-50) |
Cross-entropy | Softmax | Yes (last class is "no object") |
| Deformable-DETR, DINO, RT-DETR | Focal loss | Sigmoid | No |
# Original DETR: softmax, then drop the background (last) class
from scipy.special import softmax
probs = softmax(outputs.logits.squeeze(0).numpy(), axis=-1)[:, :-1]
# Deformable-DETR, DINO, RT-DETR: sigmoid
probs = outputs.logits.sigmoid().squeeze(0).numpy()
Using the HuggingFace Post-Processor
If you prefer to use HuggingFace's built-in post-processing:
# HuggingFace post-processes into xyxy absolute coords
results = processor.post_process_object_detection(
outputs, threshold=0.3, target_sizes=[(H, W)]
)[0]
det = Detections(
boxes=results["boxes"].numpy(), # already xyxy absolute
scores=results["scores"].numpy(), # (N,) max-confidence
labels=results["labels"].numpy(), # (N,) class indices
)
Warning
HuggingFace's post-processor returns max-confidence scores (N,), not full class distributions (N, C). OCE will use the binary approximation in this case. For exact OCE, use the raw outputs.logits as shown above.