LLM не доверять числовые clinical codes

Policy: LLM-агенту запрещено возвращать числовые clinical codes (SNOMED CT concept IDs, ICD-10 codes, RxNorm codes) как primary output. Агент возвращает только English medical term на естественном языке; coding на write-side через deterministic resolver. Это применяется во всех pipelines где LLM участвует в извлечении medical terms.

Контекст

При портировании Mastra write tools (V0.5: recordSymptom / recordMedication / recordAllergy / recordProcedure / recordFamilyHistory) в session 7ff79368 (March 2026) обсуждался вопрос — должен ли LLM-агент сам выбирать SNOMED-код или это делается отдельно?

Variant C из RFC: LLM генерирует SNOMED-код напрямую, помечает unsure кейсы для needs_review.

Evidence — bare LLM coding не работает

Verified на gpt-4o-mini:

Garbage code на разные термины — один и тот же 431855005 (CKD stage 1) выдан в ответ на «усталость», «метформин», «диабет», «изжога». Полная семантическая независимость от input’а.
Нестабильный код — migraine получает 37796000 или 37796009 (одна цифра разницы) при regenerate.
Confidence не помогает — модель помечает оба свои ответа high confidence.

Lesson: LLM плохо умеют надёжно цитировать numeric identifiers — это low-frequency long-context tokens с минимальной semantic constraint в обучающих данных. Confidence-score self-reporting также ненадёжен в этой задаче.

Decision

LLM возвращает только English medical term (например "headache", "chronic kidney disease", "colonoscopy with biopsy"). Coding в SNOMED / ICD / RxNorm — на write-side, через deterministic resolver.

Multilingual implication — LLM-агент инструктирован «Always provide English medical term» в tool description. Пациент пишет на любом языке (русский / немецкий / иврит / etc.) → LLM нормализует в English term → terminology server / resolver находит код. Verified: «Kopfschmerzen» → LLM «headache» → корректный SNOMED concept.

Деление обязанностей: LLM = translation + normalization, resolver = code resolution.

Mitigated LLM coding (если всё-таки используется)

В narrative-to-fhir/snomed-coder.ts LLM используется для Tier 2 fallback (когда terminology server / hardcoded table не находят term). Это не plain Variant C — применяются mitigations:

Strict JSON schema — отказ если ответ не digits-only string
«Never invent codes. Return null if not confident» instruction в system prompt
Confidence score возвращается, caller может фильтровать по threshold (TODO)
gpt-5.2 (стабильнее gpt-4o-mini)
Null fallback на любую schema mismatch / LLM error → caller продолжает без coding (text only)
Pre-deploy audit static lookup table через terminology server $lookup (Артуров case)

Это mitigated bare LLM coding, не unrestricted LLM coding из Variant C. Без mitigations — недопустимо.

Реализации (детали)

См. clinical-code-resolution для:

Два параллельных pipeline’а (narrative-to-fhir hardcoded+LLM vs V0.5 Mastra $expand)
Deployment progression V0.5 → V1 → V1+
Audit practice через CSIRO Ontoserver $lookup
История проб (JSON cache, SQLite RF2, Snowstorm Lite, tx.fhir.org $expand)

Следствия

Tool descriptions всех LLM-агентов работающих с clinical terms должны инструктировать «Provide English medical term, not SNOMED code»
Все resolvers должны возвращать null (а не выдуманный код) на unknown terms — caller-side продолжает без coding (text only)
Confidence threshold для mitigated LLM Tier 2 — TODO (сейчас confidence возвращается но не фильтруется)
Audit policy для hardcoded lookup tables — team/clinical-code-validation (TBD page) или раздел внутри clinical-code-resolution

Open questions

Refresh cycle для hardcoded SNOMED_QUICK_LOOKUP table — Артур делал manual audit май-2026. Triggers / automation?
Inactive concepts handling — $lookup returns inactive=true + replacedBy. Не обрабатывается сейчас, hardcoded table может содержать deprecated code.
RxNorm для medications — когда medications emit начнёт работать, US Core рекомендует RxNorm. Сейчас medications через SNOMED Pharmaceutical / biologic product hierarchy. Pattern тот же — LLM → English term → resolver.
ICD-10-CM dual-coding — US Core рекомендует SNOMED primary + ICD-10-CM secondary в одном CodeableConcept.coding[]. Не делаем — если выйдем на US billing, понадобится.

Связано

clinical-code-resolution — implementation details для обоих pipeline’ов (narrative-to-fhir + V0.5 Mastra tools)
snomed — SNOMED CT standard (структура, hierarchies, что в codesystem)
fhir-meta-tagging — другая parallel decision-page для V0.5 architecture (как помечать write-event source)
clinical-record-reconciliation — что делать когда тот же term приходит повторно (dedup / reconciliation)
agent-vs-workflow — connected pattern: structured-LLM с deterministic resolver layer
medical-context-survey — где V0.5 write tools используются

Источники

Источники: ¹ ² ³.

Сноски

Сессия ildar/7ff79368, 2026-03-27 — (original decision. ↩
FHIR R4 ValueSet $expand operation, accessed 2026-05-17, https://www.hl7.org/fhir/valueset-operation-expand.html. ↩
FHIR R4 CodeSystem $lookup operation, accessed 2026-05-17, https://www.hl7.org/fhir/codesystem-operation-lookup.html. ↩

Quartz 4

Explorer