Schema validation in api gateway such as data-power

eyaldviri · March 9, 2023, 1:49pm

Hi all,

we have encounter an issue raised by our data security team - In order to scan data, before it arrives to the health organization domain, most tools (data power, incapsula imperva ETC) need a pre-defined schema.
In our understanding, this is almost impossible to implement since the schema is dynamic and sometime recursive.
we would love to know if someone overcome this issue and what solution did they use.

we understand that we could use FHIR data validator outside the organization . But since the main issue is the data scanning, as a mean to verify that the data does not contain malware/data injections of sorts, it does not provide a valid solution.

thanks,

Eyal Dviri

lloyd · March 9, 2023, 7:05pm

The schema isn’t dynamic for a given version of FHIR - there’s a single schema that applies to all FHIR instances that use a single version. If you’re supporting multiple versions, you’ll need a schema for each. The schema is certainly large though, and there is some recursion to it, so if your tools can’t handle that, they’ll have problems.

That said, whether you’re receiving XML or JSON, you can’t really pass malware. You could possibly pass instance data that’s so large it could overrun data buffers, but that might be true even if the data was schema-valid. (Some of the FHIR data types do not have upper length limits.) I.e. schema validation may not accomplish what you’re hoping it will.

eyaldviri · March 9, 2023, 7:30pm

Thanks for the elaborated answer. I think i get it. However could you share what is considered to be the common solution for scanning data in transit.
I guess it will concern more the level of state or enterprise health organization and less small medium size.

lloyd · March 10, 2023, 1:37am

I don’t know that there’s a specific “common” solution. Different organizations have different infrastructures, different threat models, different approaches to validation and different layering approaches to processing content. The data they expect to consume will also vary. Some might impose a sanity check on string length of 100 characters, while for others, hundreds of megabytes might be feasible or even common.

grahamegrieve · March 10, 2023, 3:31am

we understand that we could use FHIR data validator outside the organization . But since the main issue is the data scanning, as a mean to verify that the data does not contain malware/data injections of sorts, it does not provide a valid solution.

really? why? I don’t follow why

eyaldviri · March 10, 2023, 7:23am

Well, not so sure. Its our cyber security demand, that all inbound data will be scanned, and fhir data validator is not a security product.
I try to figure, since it’s a global security thing, how does other organizations approch it or do they think is not needed.

if not needed, then why

If they do use some solution then which solutions?

grahamegrieve · March 10, 2023, 8:38am

Well, typically they use some kind of API accelerator that handles that kind of thing as part of their business service. I don’t know what they actually do in detail. But the validator can be run in a security mode where it checks the incoming content for likely problems in the string content. Some people are using it for that. I don’t know what your technical requirements are, or your architecture, but it’s at least a possibility