I noticed that an update of the IATI Sector code list is upcoming, as the source list - the DAC CRS Purpose Codes - has been updated.
This is where I run into a slight difficulty. I can access six versions (in English) of this list, via the OECD website, at different URLs:
Via a page called DAC and CRS code lists:
- A spreadsheet “DAC and CRS list of codes (xls, July 2016)” - available at http://www.oecd.org/dac/stats/documentupload/DAC-CRS-Code-List.xls
- An XML file “DAC and CRS code lists in XML format based on IATI standard code list CLv1” at http://www.oecd.org/dac/stats/documentupload/DAC_codeLists.xml
And then, at “Purpose Codes: sector classification”:
- 2015 Purpose codes (PDF) - available at: http://www.oecd.org/dac/stats/documentupload/2015%20CRS%20purpose%20codes%20EN_updated%20April%202016.pdf
- 2016 Purpose codes (PDF) - available at: http://www.oecd.org/dac/stats/documentupload/CRS%20purpose%20codes%20-%202016%20flows%20updated%20April%202016.pdf
- Budget identifier voluntary purpose codes (PDF): http://www.oecd.org/dac/stats/documentupload/Budget%20identifier%20purpose%20codes_EN_Apr%202016.pdf
- Budget identifier voluntary purpose codes (xls): http://www.oecd.org/dac/stats/documentupload/Budget%20identifier%20purpose%20codes_Apr%202016.xls
I think these might be the same list, available in different formats (PDF, xls, XML), which could be acceptable. But, the XML version looks to be very out of date (last updated 2015-06-23T16:42:28), whilst the budget identifier voluntary codes do not appear on the 2015 or 2016 purpose codes PDF…
The fact that a) all these documents are at different URLs and b) there’s a lack of consistent or accessible changelog information are two reasons that slow me down when trying to answer the following:
- which list is definitive?
- when was it last updated?
- what was added in the last update?
It seems vital that where IATI relies on external code lists, people can ask these questions, and get some answers. For lists published via IATI, that seems to be the case. Is it reasonable to ask the same of DAC and CRS code lists?
NB: there’s a chance that I’m looking in the wrong place, or have missed a clarification document (which is a further issue), so please correct me if so
Super useful thread. Rory Scott , I think it is quite unlikely that the DAC would release the codelists in XML format by default.
I did some work on converting the CRS codelists from the source Excel to the IATI codelists XML format, for some other work on a simple IATI-compatible projects database I built:
GitHub markbrough/IATI-Codelists-NonEmbeddedIATI codelists that are derived from third party lists. - markbrough/IATI-Codelists-NonEmbedded
My main aim was to generate bilingual EN/FR CSV files. You run the scripts in the following order:
It follows the pattern outlined in this pull request from Ben Webb - IATI Secretariat (which I really think should be merged)
The crs_lib.py file provides patterns to interpret the Excel spreadsheets.
NB, it does not handle historical codes (i.e codes that previously existed, but no longer do) – these are just removed at the moment, but should instead be marked as withdrawn.
Hi there
I won’t wade into the technical discussion here, but am wondering whether anything could be done on the governance side to make things easier.
For instance, Mark you say it’s unlikely that the DAC would release codelists in XML. Does anyone know if this was ever brought up with the DAC? Would it be technically difficult for them to release XML codelists? Are there other solutions that could be explored with the DAC? I wonder if this may be something to bring through the WP-STAT.
Hi Yohanna Loucheur ,
Thank you for this suggestion. I actually think the the governance side is the only viable avenue for substantial progress on this topic. Some progress was made on releasing XML codelists, as evidenced by the existence of the (now very outdated) files mentioned by Steven Flower above (see the penultimate link here), but it seems that this has gone cold in 2016.
For me, the hierarchy of preference for solutions to this is as follows (high preference to low):
Or,
Or,
In all of these cases, three things are completely necessary:
I have tried to contact people whom I believe to be relevant, but with little success.
Mark Brough thank you for directing me to your code. I’ve written some similar scripts in R and Pandas, but the issue is that no matter how sophisticatedly a script has been written, there’s no guarantee that the next spreadsheet will have the same structure as the one it was written for. For instance, having just cloned your script and updated the URLs (which have been changed), I can see that there’s been an arbitrary change which will stop the script from working at the first hurdle:
That sheet is now called ‘Purpose codes’. Clearly this isn’t an insurmountable problem, but what if there is a more subtle difference which just means that the script runs incorrectly but doesn’t halt? Things start to become more complicated when we put our trust in non-deterministic procedures.
Now consider their XML version:
This would make makes the lives myself and other IATI/CRS users significantly easier, but it would also allow a much more responsive and rapid effort in joining up CRS codes with others, helping to make data much more interoperable. Minimally, I could just use a diff-checker to make sure none of the element or attribute names have changed, and we could even introduce a schema to start to standardise.
I recognise that there may be serious counter-arguments to the points I’m making but I would be very interested to start a dialogue about them and I’d be interested to hear what could potentially be achieved from the governance angle.