If you it, you should put a DOI on it

Some time ago, during a workshop in Nairobi looking at the discoverability of scholarly communication in Africa, I made this silly meme :

I was certainly not the only one to have made that tenuous connection between Beyonce.

Just to support the thesis that all good points can be made with judicious application of musical references1, this happened a few days ago :

Now, a “Citation” and a “DOI” are not the same thing, but they depend on each other:

you can’t cite something if you can’t uniquely identify it.

So, if you do something, and you think it will be useful in the future, it’s good scientific practice to give it a persistent identifier, like a DOI2. This is so pervasive in scholarly literature that one hardly considers it worth mentioning, however when that literature is redefined as to include software and tools, then it’s an entirely different story.

What to cite

Software has become fundamental to the process of scientific discovery in many fields and indeed software packages are often cited in literature which has used them. However, the citation usually refers to the writeup of the package, and only in rare cases does the actual source code get cited… almost never is there a persistent identifier for the source code given as a citation. The Software Sustainability Institute has a very good green paper3 entitled “How to Cite and Describe Software”, with several suggestions and tips of “how to cite the software which may have been used in your paper”.

The first of these tips is :

Describe any software that played a critical part in, or contributed something unique to, your research.

Now, it may be up to the author, or up to the license of the software that the author used to decide what software falls into that category. Some software licenses may explicitly mandate the citation of the software if it was used at all in the generation of new knowledge. Other software packages, such as simulation packages make the choice obvious - without them, the paper could not have been written, so clearly they should be cited.

At the other end of the scale, we could follow this argument ad absurdum to state that every piece of software used in the process should be cited. This is clearly unreasonable - are we going to cite our text editor45 ?

Are we going to cite the butterfly's brain?

Footnotes and References

  1. Actually, I’m so not hip to the pop scene, that I erroneously attributed the desire of ring-on-finger to Rhianna. My bad. 

  2. The DOI - Digital Object Identifier - is one of a few systems for uniquely resolving digital objects. It is based on the ISO 26324:2012 Standard

  3. (http://www.software.ac.uk/how-cite-and-describe-software?mpw) of the 

  4. “Atom.io - the hackable text editor” https://github.com/atom/atom/releases/tag/v1.6.0-beta3 

  5. It turns out that the release of Atom that I was using to write this article was published by a bot. I don’t think it’s too far-fetched to have automated software generators being cited more often in the near future, or to cite datasets or discoveries which have been created by artificial intelligence.