Missing data mechanisms

Missing data is a universal problem with data sets but there are still many aspects that are not fully explored

The purpose of the project is to develop a method to identify missing data mechanism both from the pattern of missingness in the data and the use of location information, where available.

Missing data affects virtually all data sets but, despite extensive study, there remains gaps in its full understanding, particularly in the case of Missing Not at Random data. This affects the ability to use the data as well as understanding the uncertainty associate with research results.

The approach being used is to start by identifying the specific missing data mechanism involved through the use of a novel analytical method, supplemented by the use of spatial information, where available. This will increase the likelihood of selecting an appropriate method for imputing missing data and of obtaining better results from the analysis of data in the presence of missing data. These results will be not only be more accurate but also better describe their level of uncertainty.

The ability to correctly identify the missing data mechanism and select the appropriate method to deal with this missingness is of great value to any users of data. These range across virtually all use cases, including environmental, medical, business, official and technological (i.e., AI) uses of data. Not only should it allow greater analytic and predictive accuracy but also a better appreciation of the uncertainties associated with the data and its products.