Hi everyone! Detail question for you. I have noticed that the participants in our publishing course sometimes put spaces in the IDs of their activities. Is this ever a problem for data users out there?
Yes! It can be a problem. We had this issue with Datastore Classic. A lot of the AidData data (which has not been updated for many years) has all kinds of problems with it. One of those problems is that there's a lot of whitespace in lots of elements, for example:
The identifier should be "US-501c3-522318905-SG-112009103505", but because of the whitespace around it, the actual link to this on Datastore Classic is as follows:
Hi Thea. I believe that this would more likely be an issue for tools processing/making the data available. I know that there were some discussions around this with the Datastore (previous version) and d-portal. Perhaps Amy Silcock
could chime in here to elaborate.
Hi Thea, I agree with Mark's comments that it causes issues for IATI tools and external ones: d-portal, IATI Datastore, iati.cloud. You often can't use the standard notation to access a specific activity via an API/URL. There are work arounds but it ideally needs to be fixed by the publisher.
We've added a safeguard into the Registry which automatically strips out whitespace to IATI Org IDs. But don't have a way to do this for activity-ids.
Justin Senn
here's one: the publisher GB-CHC-328206 has a few spaces in their IDs, like "GB-CHC-328206-GB CHC 328206/6" and "GB-CHC-328206-UK Aid Match 2015-2018"
I do agree that it shouldn't happen. Ideally the publishing tool strips those spaces or gives an error. Fixing it in a later stage is always a workaround in my opinion. We will mention it in the next round of webinars.
Sarah McDuff
I know of people who simply use the XML files, so skip the tools altogether. Much easier too, in my experience, in some situations (small scale analysis mostly)
Whitespace causes all kinds of problems when using data. Imo they should be avoided for all NEW activities. For existing activities, there is a problem to change the identifier though. It will break traceability among publishers.
Steven Flower
oh my, i see some things come back again and again... That thread got picked up 2 years ago as well! Herman van Loon
If you catch them early enough, and/or they don't have implementing partners who publish, it is probably relatively harmless. But we will ask our participants to avoid spaces, dashes etc in their IDs.
Just one thing I would add on this (re discouraging the use of other characters) -- I think in that thread we also talk about the importance of publishers using the project code that's actually in their systems. For example, for the European Commission's International Partnerships, the codes are often of the form 2022/123-456 . I think it's important that they continue to use this code, as it also means that others can easily refer to the same code, to support traceability, as Herman says.
Agreed. Although I wonder if those slashes never lead to issues in their own systems, exports, interfaces and such, but that's a whole different story altogether.
another example of orgID with characters is pacja, KE-NCB-OP.218/051/2009/0496/G065
Guess they also participated in your trainings. We got a warning back from MFA, so I removed the '.' and '/' in our admin. No traceability as a result.
Informed PACJA to improve their Org ID on registry..
PACJA is fairly new so that is still a good solution in my opinion. I don't see that activity in their data now so they may have found a very drastic solution... I'll contact them.
Hi Thea. Do you have an example activity?
Yes! It can be a problem. We had this issue with Datastore Classic. A lot of the AidData data (which has not been updated for many years) has all kinds of problems with it. One of those problems is that there's a lot of whitespace in lots of elements, for example:
The identifier should be "US-501c3-522318905-SG-112009103505", but because of the whitespace around it, the actual link to this on Datastore Classic is as follows:
https://datastore.codeforiati.org/api/1/access/activity.xml?iati-identi…
Hi Thea. I believe that this would more likely be an issue for tools processing/making the data available. I know that there were some discussions around this with the Datastore (previous version) and d-portal. Perhaps Amy Silcock could chime in here to elaborate.
Hi Thea, I agree with Mark's comments that it causes issues for IATI tools and external ones: d-portal, IATI Datastore, iati.cloud. You often can't use the standard notation to access a specific activity via an API/URL. There are work arounds but it ideally needs to be fixed by the publisher.
We've added a safeguard into the Registry which automatically strips out whitespace to IATI Org IDs. But don't have a way to do this for activity-ids.
Justin Senn here's one: the publisher GB-CHC-328206 has a few spaces in their IDs, like "GB-CHC-328206-GB CHC 328206/6" and "GB-CHC-328206-UK Aid Match 2015-2018"
I do agree that it shouldn't happen. Ideally the publishing tool strips those spaces or gives an error. Fixing it in a later stage is always a workaround in my opinion. We will mention it in the next round of webinars.
Sarah McDuff I know of people who simply use the XML files, so skip the tools altogether. Much easier too, in my experience, in some situations (small scale analysis mostly)
Hi Thea Schepers just flagging a long thread from six years ago on this: https://iaticonnect.org/group/standard-management-consultations-0/discu… !
Whitespace causes all kinds of problems when using data. Imo they should be avoided for all NEW activities. For existing activities, there is a problem to change the identifier though. It will break traceability among publishers.
Steven Flower oh my, i see some things come back again and again... That thread got picked up 2 years ago as well!
Herman van Loon If you catch them early enough, and/or they don't have implementing partners who publish, it is probably relatively harmless. But we will ask our participants to avoid spaces, dashes etc in their IDs.
Just one thing I would add on this (re discouraging the use of other characters) -- I think in that thread we also talk about the importance of publishers using the project code that's actually in their systems. For example, for the European Commission's International Partnerships, the codes are often of the form 2022/123-456 . I think it's important that they continue to use this code, as it also means that others can easily refer to the same code, to support traceability, as Herman says.
Agreed. Although I wonder if those slashes never lead to issues in their own systems, exports, interfaces and such, but that's a whole different story altogether.
another example of orgID with characters is pacja, KE-NCB-OP.218/051/2009/0496/G065
Guess they also participated in your trainings. We got a warning back from MFA, so I removed the '.' and '/' in our admin. No traceability as a result.
Informed PACJA to improve their Org ID on registry..
PACJA is fairly new so that is still a good solution in my opinion. I don't see that activity in their data now so they may have found a very drastic solution... I'll contact them.
Please log in or sign up to comment.