Validating code system values

ilkaplan · January 18, 2022, 8:33pm

Greetings ,
I’m working on building a FHIR validator , and now I’m focusing on validating code system values. I’ve got several different questions on this topic. I’m a bit new in the field , and the questions may seem to be obvious for many FHIR experts , so I would appreciate your help and understanding .

First I’m trying to understand some basic level concepts , where is CodeSystems / ValueSets supposed to be stored ? Let’s say I need to expose a list of hospitals in my country to be accessible for FHIR users all over the world. What is the right place to store it? Should I always go to https://terminology.hl7.org ? Or maybe other servers, like https://simplifier.net are also acceptable ?

I got this patient example with reference to Identifier Type codes.
Due to some current software limitations it most likely I can’t validate this code using
http://terminology.hl7.org/CodeSystem/v2-0203/$validate-code?code=SS .
Instead my validation will first get full list of codes ( i.e. Identifier Types) and then
check if specific code (SS) exists in the downloaded list. In the patient example
below I can send a request to http://terminology.hl7.org/CodeSystem/v2-0203 and
get the list of Identifier Types . Is it the right assumption that the list of values in
question is always available at that address
(http://terminology.hl7.org/CodeSystem/v2-0203 in our example) ?


<identifier>
  <type>
    <coding>
      <system value="http://terminology.hl7.org/CodeSystem/v2-0203"/>
      <code value="SS"/>
    </coding>
  </type>
  <system value="https://github.com/projectcypress/cypress/patient"/>
  <value value="577390"/>
</identifier>

I realized that I can get a list of codes also via https://hapi.fhir.org/baseR4/CodeSystem/v2-0203. It returns it in a different format but basically it’s the same list that I can use for the validation. Does it make any difference which server I use for validation , terminology.hl7.org or hapi.fhir.org? And something to understand on a more high level: what is the role of each one of them?
Does terminology.hl7.org serve as the main source of system codes and
hapi server just provides access to this source?

The last question is about the approach I’ve mentioned above :
the validation will first get full list of codes ( i.e. Identifier Types) and then check if specific code (SS) exists in the downloaded list.
I’m aware it’s not the best approach but can you see some showstopper disadvantages ? For example , some code lists can be just huge and the approach will turn out to be unacceptable in terms of performance

Thanks !
Ilia Kaplan

lloyd · January 19, 2022, 5:09am

First, who’s going to be using your validator, how often will it be hit, and what are the performance expectations? It’s fine to grab code system occasionally from terminology.hl7.org (Say every week or every month), but you can’t hit it multiple times per second. It’s not designed to handle that sort of load. If you’re doing runtime validation, you’ll want to host your own terminology server - for performance, availability and to ensure it has specifically the terminologies that you need. The HL7 validator uses tx.fhir.org as its terminology server. It’s also used by the HL7 publisher. It won’t mind if you make a few thousand calls a day, but it’s going to get cranky if you try for more than that (so again, not something for validating instances at runtime). terminology.hl7.org is the source of truth for HL7 code systems, but those code systems are then hosted on a wide variety of servers throughout the world. HAPI is one of those, tx.fhir.org is another. Those same servers will typically also host SNOMED, ICD10, UCUM, ISO code systems, LOINC, and a wide variety of other code systems whose source of truth is maintained elsewhere.

Be aware that writing your own validator is HARD - especially if you’re planning to support validation against profiles that use slicing, FHIRPath, etc. Is there a reason you’re writing your own rather than using one of the exiting free ones?

Downloading all codes, then checking for one can work if the code system only has a few 100 members. It will fall flat if you’re using something like LOINC or SNOMED which have 10k-100k+ codes.

ilkaplan · January 20, 2022, 11:53am

Hi Lloyd ,
Thanks for the answer ,
I’m aware it’s not going to be easy , are there any other validation related topics that you think are complicated apart from slicing, FHIRPath?
How would you suggest to validate code from a large code system like SHOMED ?

lloyd · January 20, 2022, 4:58pm

For terminology validation, you’re going to need a terminology service. That’s a complex exercise in its own right. For example, with SNOMED, you’ll first need to collect the SNOMED source information from all of the relevant SNOMED sub-organizations. (There’s no single source of truth - international, US, Canada, UK, Japan, etc. and even HL7 all publish independently.) Each of those sources of truth change multiple times a year. And, on occasion, the syntax in which the updates are shared also change. You’ll also have to write code to handle parsing and validating post-coordinated concepts and efficiently evaluating subsumption relationships. Finally, you’ll need to be able to expand complex value sets and determine whether, for a specified set of assumptions around release date and jurisdiction, a given code is valid.

All of the current validators rely on external terminology services for this, though they also cache validation results for efficiency.

I’m not sure what all of the other gotchas are in the validator, but all told, I’d say you’re probably looking at well over 1500 person hours to get to the level of comprehensiveness that the existing Java and .NET validators have. There are a shared set of test cases in Git you could use to evaluate your own effort if you’re bound and determined to proceed.

ilkaplan · August 27, 2022, 11:59am

Hi @lloyd
It’s been a while since I started to look into validation.

I realize that checking code systems like LOINC, SNOMED requires comprehensive business analysis and development , so I set it aside for a while .

FHIR codes

On the other hand I assume that checking FHIR codes should be relatively easy task – please correct me if I’m wrong :

Get definitions from HL7.TERMINOLOGY\Downloads - FHIR v4.0.1
For every concept that has to be validated access the relevant CodeSystem structure.
For instance , to validate that 1014 is a valid religion code (Confucianism) - find if relevant concept/code exists in CodeSystem-v3-ReligiousAffiliation.xml

LOINC, SNOMED

Regarding LOINC, SNOMED systems , neither Hapi nor Firely server validate these concepts , do you know why ?

I got AllergyIntolerance sample with wrong SNOMED code 3957900199 .

Validating with https://hapi.fhir.org/baseR4/AllergyIntolerance/$validate or https://server.fire.ly/AllergyIntolerance/$validate show no results regarding the wrong code , however Java validator does recognizes that

The code “3957900199” is not valid in the system snomed.info/sct

mirjam · August 29, 2022, 7:10am

The Java validator is probably configured so it reaches out to a terminology service for validating SNOMED or LOINC codes. The public test servers you mention may not have been connected to a terminology service, so they only do the basic validation against the smaller code systems.
Note that both HAPI and Firely Server can be connected to such a service, just like you could connect your own implementation to one if you need that.

ilkaplan · September 1, 2022, 10:37am

Thanks Mirjam
What do you think about my approach for checking FHIR codes , does it make sense ?

mirjam · September 7, 2022, 11:25am

I guess it makes sense, if you need to validate some of the codes. You could of course look into using the validation methods that are present in the HAPI (for Java) and Firely SDK (for C#) libraries. You will get the FHIR definitions automatically when you those, so there’s no need to download them by hand.