Generic PDF File Names Considered Harmful

As we continue to expand our bibliographic datastore, we have noticed an appalling lack of thought going into the names of PDF files offered for download over the internet.

While individual authors may only be working on one book or paper at a time, in aggregate there are countless such projects being authored globally.

So please, for the love of your fellow academics who collect digital artifacts, do not name your book, book.pdf, your dissertation, dissertation.pdf, your thesis, thesis.pdf, your paper, paper.pdf, or your program’s manual, manual.pdf, if you plan to post it on the Internet.

Choose a file name that incorporates several semantic elements like your last name, the date, the file’s version number, topic, key title excerpt, or (for a more opaque solution) a cryptographic hash of the file contents.

Modern operating systems have no difficulty with longer file names and a sensible name will be deeply appreciated by your readers.

Likewise, if you have written a lot of papers, names like paper-17.pdf are just as problematic since web browsers and operating systems often automatically transform conflicting file names using just such a numbering scheme — or even worse using a scheme that employs the word “copy” to signify “file name copy” as opposed to “file contents copy”.

Thus, your reader won’t know if something like paper-17.pdf in his or her download folder is your 17th paper, or a paper that was written in 2017, or their 17th copy of paper.pdf, or a renamed copy of the 17th unique file originally named paper.pdf, or their 15th copy of a file originally named paper.pdf that had been previously automatically renamed to paper-2.pdf because someone else’s paper.pdf had been previously downloaded in the same location. (This is probably rather confusing, because, well, frankly it is! Which is our point.)

Even worse, browser level renaming can be combined with OS level renaming to produce horrors like paper-3 copy 2.pdf in the same directory as paper.3 copy.pdf with two dimensions of ambiguity. Likewise book.1.pdf and book-1.pdf might represent identical files downloaded in different browsers to a directory with a different book.pdf already present!

Similarly, programmers are often guilty of naming their manuals manual.pdf and then using a numerical extension to designate a version leading to ambiguous names like manual.2.pdf which might be a version 2.0 manual or a second copy of a version 1.0 manual generated by a web browser after a redundant download. Is a manual.2.1.pdf a copy of a version 2 manual or an original version 2.1 manual?

Why should we be forced to open a file to read its internal title when an unambiguous program_name-manual(version_number).pdf naming convention would eliminate any doubt.

Furthermore, when devising a naming scheme, note that lots of books and papers are written in any given year, for any given conference, or on any given high level topic — so names like 2016-book.pdf, ai-book.pdf, and chi-2018-paper.pdf are almost guaranteed to come into conflict with other downloads.

When a generic file name invites its renaming to something like paper-3.pdf, it is far more serious than just an annoyance to the reader trying to remember what the paper is about.

Generic file names create a clear and present danger that your book or paper will look like a copy of something else — leading to its being accidentally deleted and lost forever!