Validating code system values

Greetings ,
I’m working on building a FHIR validator , and now I’m focusing on validating code system values. I’ve got several different questions on this topic. I’m a bit new in the field , and the questions may seem to be obvious for many FHIR experts , so I would appreciate your help and understanding .

First I’m trying to understand some basic level concepts , where is CodeSystems / ValueSets supposed to be stored ? Let’s say I need to expose a list of hospitals in my country to be accessible for FHIR users all over the world. What is the right place to store it? Should I always go to https://terminology.hl7.org ? Or maybe other servers, like https://simplifier.net are also acceptable ?

I got this patient example with reference to Identifier Type codes.
Due to some current software limitations it most likely I can’t validate this code using
http://terminology.hl7.org/CodeSystem/v2-0203/$validate-code?code=SS .
Instead my validation will first get full list of codes ( i.e. Identifier Types) and then
check if specific code (SS) exists in the downloaded list. In the patient example
below I can send a request to http://terminology.hl7.org/CodeSystem/v2-0203 and
get the list of Identifier Types . Is it the right assumption that the list of values in
question is always available at that address
(http://terminology.hl7.org/CodeSystem/v2-0203 in our example) ?


<identifier>
  <type>
    <coding>
      <system value="http://terminology.hl7.org/CodeSystem/v2-0203"/>
      <code value="SS"/>
    </coding>
  </type>
  <system value="https://github.com/projectcypress/cypress/patient"/>
  <value value="577390"/>
</identifier>

I realized that I can get a list of codes also via https://hapi.fhir.org/baseR4/CodeSystem/v2-0203. It returns it in a different format but basically it’s the same list that I can use for the validation. Does it make any difference which server I use for validation , terminology.hl7.org or hapi.fhir.org? And something to understand on a more high level: what is the role of each one of them?
Does terminology.hl7.org serve as the main source of system codes and
hapi server just provides access to this source?

The last question is about the approach I’ve mentioned above :
the validation will first get full list of codes ( i.e. Identifier Types) and then check if specific code (SS) exists in the downloaded list.
I’m aware it’s not the best approach but can you see some showstopper disadvantages ? For example , some code lists can be just huge and the approach will turn out to be unacceptable in terms of performance

Thanks !
Ilia Kaplan

First, who’s going to be using your validator, how often will it be hit, and what are the performance expectations? It’s fine to grab code system occasionally from terminology.hl7.org (Say every week or every month), but you can’t hit it multiple times per second. It’s not designed to handle that sort of load. If you’re doing runtime validation, you’ll want to host your own terminology server - for performance, availability and to ensure it has specifically the terminologies that you need. The HL7 validator uses tx.fhir.org as its terminology server. It’s also used by the HL7 publisher. It won’t mind if you make a few thousand calls a day, but it’s going to get cranky if you try for more than that (so again, not something for validating instances at runtime). terminology.hl7.org is the source of truth for HL7 code systems, but those code systems are then hosted on a wide variety of servers throughout the world. HAPI is one of those, tx.fhir.org is another. Those same servers will typically also host SNOMED, ICD10, UCUM, ISO code systems, LOINC, and a wide variety of other code systems whose source of truth is maintained elsewhere.

Be aware that writing your own validator is HARD - especially if you’re planning to support validation against profiles that use slicing, FHIRPath, etc. Is there a reason you’re writing your own rather than using one of the exiting free ones?

Downloading all codes, then checking for one can work if the code system only has a few 100 members. It will fall flat if you’re using something like LOINC or SNOMED which have 10k-100k+ codes.

Hi Lloyd ,
Thanks for the answer ,
I’m aware it’s not going to be easy , are there any other validation related topics that you think are complicated apart from slicing, FHIRPath?
How would you suggest to validate code from a large code system like SHOMED ?

For terminology validation, you’re going to need a terminology service. That’s a complex exercise in its own right. For example, with SNOMED, you’ll first need to collect the SNOMED source information from all of the relevant SNOMED sub-organizations. (There’s no single source of truth - international, US, Canada, UK, Japan, etc. and even HL7 all publish independently.) Each of those sources of truth change multiple times a year. And, on occasion, the syntax in which the updates are shared also change. You’ll also have to write code to handle parsing and validating post-coordinated concepts and efficiently evaluating subsumption relationships. Finally, you’ll need to be able to expand complex value sets and determine whether, for a specified set of assumptions around release date and jurisdiction, a given code is valid.

All of the current validators rely on external terminology services for this, though they also cache validation results for efficiency.

I’m not sure what all of the other gotchas are in the validator, but all told, I’d say you’re probably looking at well over 1500 person hours to get to the level of comprehensiveness that the existing Java and .NET validators have. There are a shared set of test cases in Git you could use to evaluate your own effort if you’re bound and determined to proceed.