Summary — new pipeline (LLM + frozen SAM) vs trained NN
Dataset
LLM pipeline (training-free)
LLM old→new
NN N=10
N=25
N=50
N=100
Crossover
ISIC 2018
draw → numpy genome (pigment cue); SAM gated OFF
0.882→0.892(+0.01)
0.749
0.830
0.828
0.858
never (≤100)
Kvasir-SEG
draw → frozen SAM decoder (no GT-free cue)
0.580→0.783(+0.20)
0.628
0.777
0.808
0.845
N≈50
Chest X-ray
CLAHE → draw 2 lungs → SAM per-component → darkness-grow
0.788→0.899(+0.11)
0.925
0.943
0.950
0.956
N≈10
Spleen
soft-tissue window → relocate → SAM decode
0.420→0.563(+0.14)
0.537
0.686
0.838
0.885
N≈25
All pipelines are training-free (frozen LLM draws + frozen SAM + numpy, ≤8 genome params fit on a few labels — no target-task NN trained). The router picks each domain's path GT-free: cue-rich → numpy genome (ISIC); no boundary cue → frozen SAM decoder (polyp/lung/spleen); SAM never touches a cue-rich genome boundary. Red cells = the training-free LLM matches/beats the trained net at that budget. NN = U-Net/resnet34, ImageNet-pretrained.
ISIC 2018 (dermoscopy)
Dermoscopy
N
LLM
NN
10
0.892
0.749
25
0.892
0.830
50
0.892
0.828
100
0.892
0.858
Cue-rich pigment boundary → the numpy genome recovers it (0.892), beating the trained net at every N≤100. SAM is gated OFF here (it would regress an already-good boundary: 0.887→0.817). The standout positive case.
GTLLMNN @N
✓ Good cases
isic · ISIC_0036159
raw image
ground truth
LLM
0.99
NN N=10
0.91
NN N=25
0.91
NN N=50
0.95
NN N=100
0.94
isic · ISIC_0023694
raw image
ground truth
LLM
0.95
NN N=10
0.77
NN N=25
0.92
NN N=50
0.93
NN N=100
0.95
isic · ISIC_0023836
raw image
ground truth
LLM
0.97
NN N=10
0.86
NN N=25
0.96
NN N=50
0.96
NN N=100
0.97
✗ Bad cases
isic · ISIC_0023306
raw image
ground truth
LLM
0.17
NN N=10
0.54
NN N=25
0.30
NN N=50
0.15
NN N=100
0.04
isic · ISIC_0022313
raw image
ground truth
LLM
0.06
NN N=10
0.36
NN N=25
0.35
NN N=50
0.71
NN N=100
0.52
isic · ISIC_0022007
raw image
ground truth
LLM
0.06
NN N=10
0.04
NN N=25
0.13
NN N=50
0.10
NN N=100
0.15
Kvasir-SEG (polyp / endoscopy)
Polyp / endoscopy
N
LLM
NN
10
0.783
0.628
25
0.783
0.777
50
0.783
0.808
100
0.783
0.845
No GT-free boundary cue (polyp ≈ mucosa) → a FROZEN SAM decoder, box-prompted by the LLM draw, recovers the boundary the polygon can't: 0.58 → 0.783, now BEATING the NN @N=25 (0.750), with zero target-task labels.
GTLLMNN @N
✓ Good cases
kvasir · cju7fq7mm2pw508176uk5ugtx
raw image
ground truth
LLM
0.90
NN N=10
0.79
NN N=25
0.82
NN N=50
0.94
NN N=100
0.92
kvasir · cju5o1vu9gz8a0818eyy92bns
raw image
ground truth
LLM
0.87
NN N=10
0.72
NN N=25
0.95
NN N=50
0.93
NN N=100
0.96
kvasir · cju7amjna1ly40871ugiokehb
raw image
ground truth
LLM
0.85
NN N=10
0.69
NN N=25
0.91
NN N=50
0.96
NN N=100
0.96
✗ Bad cases
kvasir · cju3xl264ingx0850rcf0rshj
raw image
ground truth
LLM
0.12
NN N=10
0.72
NN N=25
0.70
NN N=50
0.96
NN N=100
0.95
kvasir · cju5u8gz4kj5b07552e2wpkwp
raw image
ground truth
LLM
0.10
NN N=10
0.13
NN N=25
0.56
NN N=50
0.01
NN N=100
0.42
kvasir · cju884985nlmx0817vzpax3y4
raw image
ground truth
LLM
0.06
NN N=10
0.26
NN N=25
0.55
NN N=50
0.88
NN N=100
0.85
🧪 Skill: gated polyp ROI contrast-enhance new
Bad Kvasir cases mislocate — the draw lands on the wrong region. A GT-free contrast enhancer (specular-inpaint + CLAHE-L + saturation) makes the polyp salient enough to find. It rescues mislocations but distorts already-good boundaries, so it is gated on K-draw self-consistency (applied only to low-confidence draws). Skill polyp_roi_enhance (b45cd77).
case group
LLM raw
LLM enhanced
Δ
bad
0.094
0.434
+0.340
good
0.874
0.729
-0.145
gated (raw on good, enhance on bad)
0.484
0.654
+0.170
GTLLM rawLLM on enhanced view
✗ Bad cases — rescued by enhancement
kvasir · cju3xl264ingx0850rcf0rshj
raw
enhanced view
GT
LLM raw
0.12
LLM enhanced
0.63
kvasir · cju5u8gz4kj5b07552e2wpkwp
raw
enhanced view
GT
LLM raw
0.10
LLM enhanced
0.61
kvasir · cju884985nlmx0817vzpax3y4
raw
enhanced view
GT
LLM raw
0.06
LLM enhanced
0.06
✓ Good cases — enhancement not needed (gated off)
kvasir · cju7fq7mm2pw508176uk5ugtx
raw
enhanced view
GT
LLM raw
0.90
LLM enhanced
0.65
kvasir · cju5o1vu9gz8a0818eyy92bns
raw
enhanced view
GT
LLM raw
0.87
LLM enhanced
0.81
kvasir · cju7amjna1ly40871ugiokehb
raw
enhanced view
GT
LLM raw
0.85
LLM enhanced
0.73
Chest X-ray (lung fields)
Chest X-ray
N
LLM
NN
10
0.899
0.925
25
0.899
0.943
50
0.899
0.950
100
0.899
0.956
Two lung fields → SAM PER-COMPONENT (one box per lung) recovers the precise boundary, then a darkness-grow recovers under-segmentation: 0.79 → 0.899, near the trained NN (0.925), GT-free. The per-component prompt was essential (single-box hurts: 0.72).
GTLLMNN @N
✓ Good cases
lungseg · MCUCXR_0150_1
raw image
ground truth
LLM
0.86
NN N=10
0.89
NN N=25
0.91
NN N=50
0.91
NN N=100
0.93
lungseg · CHNCXR_0330_1
raw image
ground truth
LLM
0.85
NN N=10
0.96
NN N=25
0.97
NN N=50
0.98
NN N=100
0.98
lungseg · CHNCXR_0421_1
raw image
ground truth
LLM
0.85
NN N=10
0.94
NN N=25
0.95
NN N=50
0.96
NN N=100
0.95
✗ Bad cases
lungseg · CHNCXR_0229_0
raw image
ground truth
LLM
0.71
NN N=10
0.91
NN N=25
0.94
NN N=50
0.94
NN N=100
0.94
lungseg · MCUCXR_0055_0
raw image
ground truth
LLM
0.71
NN N=10
0.96
NN N=25
0.97
NN N=50
0.97
NN N=100
0.98
lungseg · CHNCXR_0027_0
raw image
ground truth
LLM
0.70
NN N=10
0.94
NN N=25
0.95
NN N=50
0.95
NN N=100
0.96
🧪 Skill: X-ray CLAHE + non-convex lung trace new
Bad lung draws came out as rounded ovals — the polygon rasterizer already supports non-convex (PIL even-odd fill), so this was a drawing issue, not a skill limit. Fix = the literature-standard CXR CLAHE enhancer (xray_clahe) + a guide to trace the true shape (concave mediastinal border + costophrenic angle). Net-positive and non-destructive on good cases (lungs are large/consistent) → applied ungated. Part of the new modality→preprocess registry (8b535f1).
case group
LLM raw
LLM enhanced
Δ
bad
0.707
0.762
+0.055
good
0.855
0.846
-0.009
all (ungated)
0.781
0.804
+0.023
GTLLM rawLLM on enhanced view
✗ Bad cases — round ovals fixed
lungseg · CHNCXR_0229_0
raw
enhanced view
GT
LLM raw
0.71
LLM enhanced
0.78
lungseg · MCUCXR_0055_0
raw
enhanced view
GT
LLM raw
0.71
LLM enhanced
0.74
lungseg · CHNCXR_0027_0
raw
enhanced view
GT
LLM raw
0.70
LLM enhanced
0.77
✓ Good cases — unaffected
lungseg · MCUCXR_0150_1
raw
enhanced view
GT
LLM raw
0.86
LLM enhanced
0.85
lungseg · CHNCXR_0330_1
raw
enhanced view
GT
LLM raw
0.85
LLM enhanced
0.83
lungseg · CHNCXR_0421_1
raw
enhanced view
GT
LLM raw
0.85
LLM enhanced
0.86
Spleen (abdominal CT)
Abdominal CT
N
LLM
NN
10
0.563
0.537
25
0.563
0.686
50
0.563
0.838
100
0.563
0.885
Small invisible-boundary organ → a soft-tissue window relocates the mislocated draw, then SAM decodes: 0.22 → 0.42 → 0.563 (+SAM). Still below the NN (0.70@25) — the residual bottleneck is LOCALIZATION (SAM is prompted by a weak 0.42 draw), not the boundary.