Candy From Strangers

In september 2016 I gave a talk at ESC about some reasons why I think that using software and especially libraries from the packages of a community managed distribution is important and much better than alternatives such as pypi, nmp etc. This article is somewhat close to a translated transcript of that talk.

When I was young, my parents taught me not to accept candy from strangers, unless they were present and approved of it, because there was a small risk of very bad things happening. It was of course a simplistic rule, but it had to be easy enough to follow for somebody who wasn't proficient (yet) in the subtleties of social interactions.

One of the reasons why it worked well was that following it wasn't a big burden: at home candy was plenty and actual offers were rare: I only remember missing one piece of candy because of it, and while it may have been a great one, the ones I could have at home were also good.

Contrary to candy, offers of gratis software from random strangers are quite common: from suspicious looking websites to legit and professional looking ones, to platforms that are explicitly designed to allow developers to publish their own software with little or no checks.

Just like candy, there is also a source of trusted software in the Linux distributions, especially those lead by a community: I mention mostly Debian because it's the one I know best, but the same principles apply to Fedora and, to some measure, to most of the other distributions. Like good parents, distributions can be wrong, and they do leave room for older children (and proficient users) to make their own choices, but still provide a safe default.

Among the unsafe sources there are many different cases and while they do share some of the risks, they have different targets with different issues; for brevity the scope of this article is limited to the ones that mostly concern software developers: language specific package managers and software distribution platforms like PyPi, npm and rubygems etc.

These platforms are extremely convenient both for the writers of libraries, who are enabled to publish their work with minor hassles, and for the people who use such libraries, because they provide an easy way to install and use an huge amount of code. They are of course also an excellent place for distributions to find new libraries to package and distribute, and this I agree is a good thing.

What I however believe is that getting code from such sources and using it without carefully checking it is even more risky than accepting candy from a random stranger on the street in an unfamiliar neighbourhood.

The risk aren't trivial: while you probably won't be taken as an hostage for ransom, your data could be, or your devices and the ones who run your programs could be used in some criminal act causing at least some monetary damage both to yourself and to society at large.

If you're writing code that should be maintained in time there are also other risks even when no malice is involved, because each package on these platform has a different policy with regards to updates, their backwards compatibility and what can be expected in case an old version is found to have security issues.

The very fact that everybody can publish anything on such platforms is both their biggest strength and their main source of vulnerability: while most of the people who publish their libraries do so with good intentions, attacks have been described and publicly tested, such as the fun typo-squatting one (archived URL) that published harmless malicious code under common typos for famous libraries.

Contrast this with Debian, where everybody can contribute, but before they are allowed full unsupervised access to the archive they have to establish a relationship with the rest of the community, which includes meeting other developers in real life, at the very least to get their gpg keys signed.

This doesn't prevent malicious people from introducing software, but raises significantly the effort required to do so, and once caught people can usually be much more effectively prevented from repeating it than a simple ban on an online-only account can do.

It is true that not every Debian maintainer actually does a full code review of everything that they allow in the archive, and in some cases it would be unreasonable to expect it, but in most cases they are at least reasonably familiar with the code to do at least bug triage, and most importantly they are in an excellent position to establish a relationship of mutual trust with the upstream authors.

Additionally, package maintainers don't work in isolation: a growing number of packages are being maintained by a team of people, and most importantly there are aspects that involve potentially the whole community, from the fact that new packages that enter the distribution are publicity announced on a mailing list to the various distribution-wide QA efforts.

Going back to the language specific distribution platforms, sometimes even the people who manage the platform themselves can't be fully trusted to do the right thing: I believe everybody in the field remembers the npm fiasco where a lawyer letter requesting the removal of a package started a series of events that resulted in potentially breaking a huge amount of automated build systems.

Here some of the problems were caused by some technical policies that caused the whole ecosystem to be especially vulnerable, but one big issue was the fact that the managers of the npm platform are a private entity with no oversight from the user community.

Here not all distributions are equal, but contrast this with Debian, where the distribution is managed by a community that is based on a social contract and is governed via democratic procedures established in its constitution.

Additionally, the long history of the distribution model means that many issues have already been met, the errors have already been done, and there are established technical procedures to deal with them in a better way.

So, shouldn't we use language specific distribution platforms at all? No! As developers we aren't children, we are adults who have the skills to distinguish between safe and unsafe libraries just as well as the average distribution maintainer can do. What I believe we should do is stop treating them as a safe source that can be used blindly and reserve that status to actual trustful sources like Debian, falling back to the language specific platforms only when strictly needed, and in that case:

actually check carefully what we are using, both by reading the code and by analysing the development and community practices of the authors;
if possible, share that work by becoming ourselves maintainers of that library in our favourite distribution, to prevent duplication of effort and to give back to the community whose work we get advantage from.

Send a comment: unless requested otherwise I may add it, or some extract, to this page.

Return to Top