0 votes
1 view
in AI and Deep Learning by (16.3k points)

I work at a public health department that takes in and stores lots of medical data every day. I've written a program that uses regular expressions to determine if particular fields in the incoming data are valid or invalid. Ex: DOBs come in as YYYYmmDD, so they should match regex ^[0-9]{8}$

I want to analyze the "invalid" data to help identify problems in our system (we get way too much data to go through each 'bad' record row-by-row). Can anyone suggest AI techniques/machine learning techniques that can 'monitor' the bad data and find patterns in what is wrong? I think that coming up with a bunch of regular expressions for possible ways the data could be invalid (ex. not enough or too many characters) and then keeping track of those results might work. But instead of me thinking up all of the ways the data could be invalid, I'm curious about ways to 'learn' the patterns from the bad data using AI.

Are there any known techniques that do this?

1 Answer

0 votes
by (36.6k points)

Bayesian filtering might be the solution to your problem. Bayesian Filtering implies a probabilistic technique for data fusion. The technique combines a concise mathematical formulation of a system with observations of that system. Probabilities are used to represent the state of a system, likelihood functions to represent their relationships. In this pattern, Bayesian inference can be applied and further related probabilities deduced.