Privacy expectations in private repositories hosted on public forges

Chapril moderation of private repositories

In 2021, I started volunteering as a co-administrator of the Chapril publicly available Gitea instance run by a non-profit dedicated to Free Software, April. As part of the term of service, it was required that each software project is published under a Free Software license. And it made perfect sense to me: it would be absurd for a non profit dedicated to the advancement of Free Software to help publish proprietary software.

Moderators were expected to go over the repositories from time to time and remind the users of this particular requirement. Most of the time it was simply because the project was new and the license not specified yet. A process I was very familiar with since it was essentially the same back in 2001 when I was working as a co-administrator of Savannah. However I was surprised to learn than even private repositories had to be scrutinized in the same way. I had a vague feeling there was something off but did not think too much about it and simply suggested that the Terms of Service mentioned this explicitly. In case someone expected that private repositories were not looked at by admin and stored credentials or other confidential information there. If I was surprised about that kind of moderation, other users would, presumably.

Codeberg moderation of private repositories

Early 2022 the Terms of Use of Codeberg, another public forge based on Gitea and run by a non profit with a strong commitment to Free Software were updated. Since I’m a member of the non profit, I followed the discussions relative to the moderation of the private repositories. The result was this part of the Terms of Use:

Private repositories are only allowed for things required for FLOSS projects, like storing secrets, team-internal discussions or hiding projects from the public until they’re ready for usage and/or contribution.

which is essentially a variation of the policy applied at Chapril and also requires moderators to access private repositories in order to enforce this policy.

Privacy expectations in online services

I’m a daily user of online services such as a Discourse forum, a Nextcloud instance and many other services that gives me control over the visibility of the data I store in them. If it is available to the general public, everybody can see the data. If it is available to a selected group of users, I trust them to not share it publicly without my consent. I also expect the system administrators will be very careful to not look at the data that is not explicitly addressed to them. Even if they can because they have extended permissions.

During my years helping journalists and human right defenders, I was a system administrator with root privileges. I had technical access to confidential material that was covered by attorney client privilege and I became very conscious of the difference between having the means to look at something and having permission to do so. It was particularly challenging when I had to assist lawyers or whistleblowers with problems that required me to go over phone conversations without reading them. It was also an opportunity for me to have close interactions with users of online services and realize they implicitly trust the system administrator to respect their privacy… without reading the terms of service.

In a nutshell, the moderation of private repositories is an exception that goes against what system administrators do in the context of every other online services. And it also goes against users expectations for privacy.

Private repositories are useful

An option to avoid moderating private repositories would be to forbid the creation of private repositories altogether. They are, however, commonly used to hold credentials to third party services for the purpose of running continuous integration. Or to deploy infrastructure as code in the cloud, going directly from code to a production ready online service. Forges are increasingly used to manage the entire development, distribution and deployment lifecycle and the inability to create private repositories would be problematic.

The only solution being to deploy private repositories on software forges that allow them to exist, there would be no incentive for software authors to use a forge that forbids them.

Free Software requirements cannot be enforced in private repositories

Since getting rid of private repositories is not an option as it would cripple the forge for legitimate Free Software projects, it is tempting to impose requirements on private repositories and moderate them. The intention is that such private repositories are not used under the hood to further software projects that are not Free Software.

Codeberg chose to restrict the use of private repositories containing software to this definition:

Private repositories are only allowed for […] hiding projects from the public until they’re ready for usage and/or contribution.

Which is problematic because it can always be argued that a software is not yet Free Software but could be in the future. It amounts to moderate based on the intent of the author instead of its actions. In the best case scenario, the outcome is to ban a project because it is obviously abusing the forge and has no intention to publish Free Software any time soon. But the worst case scenario can also happen when a project is gradually improving over a period of years in an large company, making very slow but steady progress towards software freedom. The moderator may not have enough time to figure that out and ban the project.

Chapril chose a little differently and imposes that private repositories display a Free Software license:

Les administratrices et administrateurs d’un dépôt sur forge.chapril.org s’engagent à ce que son contenu soit sous licence libre[…] Tout dépôt, public ou privé, peut à tout moment être analysés par les animateurs du service afin de vérifier leur conformité.
The owner of a repository on forge.chapril.org agree that its content is under a Free Software license[…] Every repository, public or private, can be audited at any moment by the administrators of the service in order to verify their compliance.

The moderation process requires to go over private repositories and ask their authors to add a Free Software license. This approach is also problematic because a license is only effective when a software is distributed. Which is not the case, by definition, for software that is not accessible to the public because it resides in a private repository. In other words, adding a license to a private repository has no effect.

Accessing confidential information is a liability

If a moderator accesses private repositories on a software forge, they are bound to run into confidential information. It can be argued that such data should be encrypted and that online services cannot be expected to respect your privacy. But there are other instances where such an intrusion is considered a violation of privacy such as when Apple listens to private conversation for the purpose of improving the quality of their voice recognition software. The distinction between a private information contained in a software forge and the private information conveyed in a voice conversation is unclear from a legal standpoint.

There is at least one scenario where accessing private data for the purpose of moderation can lead to a liability for the organization running the forge:

  • A private repository is created with unencrypted credentials to a cloud provider
  • The moderator goes over the private repository
  • An intruder gets a copy of the credentials and spends X€ in resources on the cloud provider
  • The owner of the private repository holds the organization running the forge responsible for this unexpected expense and claims the moderator is responsible for this extra spending

Imposing quotas on private repositories

The only viable solution to restrict private repositories is therefore to impose quantitative quotas instead of engaging in qualitative moderation. It could measure usage of:

  • Bandwidth
  • Disk
  • CPU via the frequency of API requests

If the quotas for private repositories are exceeded for a given owner, they are kindly invited to reduce them. The organization running the forge could also choose to have different quotas for private and public repositories, for instance to favor projects that consume more resources on publicly available Free Software repositories and less on private repositories.

Matching privacy expectations with Free Software commitment

Experiments in moderating private content to achieve the same goal proved to be:

  • ineffective
  • contrary to the users expectation for privacy
  • a liability for the organization running the forge

By choosing to limit private repositories based on quantitative measures, a software forge can effectively be committed to Free Software while being respectful of the privacy of its users.

2 Likes

Great write-up!

For moderation of private repositories, I believe the moderators should also consider if there are any users of the repositories in question. If they have users and the repository is private, then the user should be informed about the forge’s policy regarding Free Software exclusiveness.

But if the project doesn’t have any users and is only used by the owner, it is technically Free Software as all the four rights available to its owner.

1 Like
1 Like

The legal implications are entirely different if:

  • all users are employed by the same company
  • some users are under contract but not employed

just to mention two. The number of users is not a quantitative measure that can be used to deduce anything regarding the legal status of the private repositories. Or the relevance of displaying a reminder regarding Free Software.

1 Like