IATI Validator update
The IATI Technical Team is pleased to release activity level validation via the IATI Validator. This feature will allow more IATI data published on activities to enter and be accessed via the IATI Datastore.
The Validator previously assessed activity files (which contain multiple activities) and if any data in the file was critically invalid then no data in the file would be able to enter the Datastore. However now the Validator will individually assess each activity contained within activity files and exclude only activities with critical errors.
How it works
- The Validator will first assess the published activity file. If the file contains no critically invalid data, then all data will enter the IATI Datastore.
- If the Validator finds that an activity file contains critically invalid data, after a delay of 24hrs, it will perform checks on the file to see if it is: a) accessible, b) contains well formatted XML and c) is published according to version 2 of the IATI Standard
- Files that pass these checks will then be validated at activity level against the IATI schema, allowing all activities that are not critically invalid to enter the IATI Datastore.
- If a publisher creates or updates more than 100 critically invalid files in 24hrs, activity level validation will pause for that publisher. The Technical Team will then notify the publisher, requesting them to correct their data before continuing validation.
This change responds to the growing demand from the IATI Community to have as many activities as possible made accessible via the IATI Datastore. This is also the approach followed by many community-built tools and services. By implementing activity level validation, our aim is to provide users of the IATI Datastore with as much usable information as possible whilst still upholding standardisation.
Feel free to add any of your questions or comments related to this thread through the comment-box below.
Should the criterium of 100 invalid files (meaning more than 100 files 'containing critically invalid data' ?) mentioned in bullet point 4 not be much lower? Most users only publish a couple of files. This would mean that this criterium would almost never stop invalid files being processed implicitly covering up a flawed publication process. Wouldn't an percentage of the number of files containing critically invalid data be a better metric in this case?
Looks like a great improvement! I have the same suggestion as Herman but also curious: is the 24 hour delay under bullet point 2 a technical necessity, or deliberately designed that way?
Thank you for the feedback. The "why" behind the 100 invalid files in 24hrs criteria (point 4) is mainly a technical "safety valve" to prevent additional cost in the system if a large publisher that is publishing many files makes a mistake in their publishing process that causes all their files to become invalid. That concern is based on past experience.
This process is not meant to be a method for proactive notification to publishers to fix their critically invalid files. We are working on an enhancement in the Registry that will email a publisher when they publish/update a critically invalid file.
The 24hrs in point 2 is a logical necessity given point 4. We must wait the 24hrs before processing a file so that we can see if the publisher of the file has published 100 critically invalid files in the past 24hrs.
Seems to be a good and sensible development - many thanks.
Please log in or sign up to comment.