Horizontal organizations challenge: domain names

dachary · August 27, 2021, 3:14pm

Bonjour,

I recently came across two examples of the worst that can happen to a Free Software project beause of centralized resources

July 2021

I(Syuilo) don’t accept any issue and pull request to Misskey(Git Repository: GitHub - misskey-dev/misskey: 🌎 An interplanetary microblogging platform 🚀) for an indefinite period of time.

Due to a reason above,I remove all collaborators from Misskey.Beacuse they still can make a issue and pull request.

On 30 March 2021, Leah Rowe appointed herself once again as leader of the Libreboot project, removing then lead developers Sebastian ‘Swift Geek’ Grzywna and Andrew Robbins in what the latter described as a coup.

Centralized organizations try to prevent this unfortunate turn of events by creating a foundation or organizing democratic elections to delegate the power to a group of people. It works to some extent because a takeover requires a group of individuals and cannot be conveniently done overnight by a single person. But it eventually leads to the same problem and the project is controlled by an elite (see “the nature of elitism” explained in the tyranny of structurelessness). I won’t go into the details of my own reasoning but I’d be interested to have that discussion of someone is motivated.

Fedeproxy (and Enough) chose a different approach by establishing a structure for a horizontal community of individuals. The bottom line is that a horizontal community makes it possible for every member to takeover or destroy all centralized resources. Should that happen, other community members will be forced to give up using the centralized resource that is no longer shared according to the manifesto. This is the primary reason why the fedeproxy information system is infrastructure as code: if someone was to lock everyone else out of the current instance in the same way the misskey lead did, other members could conveniently rebuild it in a matter of hours using the documented procedures.

But there is one thing that would still make it very unpractical: the domain name. It is a centralized resource that cannot be conveniently rebuilt. People bookmarked it, links to the domain are scattered all over the internet. If someone seize control, it will effectively be very difficult for other community members to compensate for that loss. People and machines would still go to the original domain name, not the newly created one. They would have to rebuild an audience, starting from zero. I recently had a discussion with @misc on that topic but it only confirmed the problem is difficult to solve.

It would be great if there was a simple way to solve that. Or better yet: an example to follow.

What do you think?

aschrijver · August 28, 2021, 1:08pm

To ForgeFed something similar happened when they were still on Github and the owner of the repo suddenly went AWOL. To this day that repo’s README is not updated with a pointer to the new location.

I haven’t checked this, but a simple solution - but with a certain extent of centralization to it - may be having e.g. a Codeberg organization where you have a team of people with highest privilege, and then do not use a custom domain name but the Codeberg Pages supplied domain, which can be e.g. fedeproxy.codeberg.eu

misc · August 30, 2021, 9:48pm

You can’t really solve the problem in all cases. Fundamentally, this is a problem of distributed consensus among N nodes (with N being either “the whole world”, or “the members of the project”).

And while there is solution for consensus when there is a majority who agree (eg, a way to decide), there isn’t if there is no majority (as it seems there is a need for 2/3 of the nodes to agree on anything, if I read the theory behind the Byzantine General problem ).

So no matter what, there is always a risk of not reaching a consensus (at least in a free software project).

So in the end, what is needed is to make sure that no one can make a unilateral decision, which requires a bigger community and a lot of bureaucratic process. That’s a rather heavy trade-off, especially since UNIX (and so most servers based on UNIX-like system) have a default model of “root can do everything”.

Or indeed, if the goal is just to raise the bus factor, Google do regular tests by simulating failures (search for DiRT on the web). So for example, pretend to see what happen if @dachary go rogue (or more likely missing, cause I think you can’t really avoid a admin going rogue expect by having a bigger infra and separate teams), and how can the project survive. And then, debrief and say “ok, we found that access to X for Y is missing”.

dachary · August 30, 2021, 10:01pm

True and this is fine. A fork (even a friendly one) is the embodiment of the lack of consensus. It can be a short lived fork because the author intends to reach a consensus by proposing a merge request and the fork is motivated by the need to fix a trivial bug. It can last for months if the fork is about something more fundamental in the structure of the code. And while I’m just refering to the codebase of the project, I really mean the software project as a whole, including discussions, releases, issues etc.

You summarized my goal quite well And I don’t see how one can create a horizontal community without working to solve this problem.

techknowlogick · August 31, 2021, 6:49am

Another interesting use case around this same topic is voidlinux:

The issue of domains is one that was somewhat central to what voidlinux went through.

dachary · August 31, 2021, 7:31am

Why behind closed doors? Well we truly believed this was going to be a temporary problem, and we would be able to continue with business as usual when Xtraeme returned, and we believed he would return.

I find this to be a particularly illuminating example of how slippery it can be to give up on transparency, even for the most honorable reasons. I think to remember reading somewhere (a manifesto of a Free Software project maybe?) something along the lines of “we won’t hide problems”.

There are boundaries to transparency and revealing personal details about the person who was unavailable would have been wrong. But acknowledging the problem, explaining the impact and the attempts to resolve them would have been the right thing to do.

Source Control, or the headache of GitHub
Easily the worst part of gaining control of the project was…

This is a shining example of what hosting a development software project is about, namely the infrastructure (i.e. the forge but also the external tooling, travis in the case of voidlinux). If there is no infrastructure as code (which is impossible with GitHub because it would require it to be self-hostable Free Software), it cannot conveniently be moved from one place to another.

Fascinating read on this fine morning

dachary · August 31, 2021, 9:12am

Here is an unfinished idea, food for thoughts. Since there is no way to not centralize a domain name, let say a software project has no domain name and its home page is a wikipedia page or section in a wikipedia page or a wikidata entry. Although it is very difficult to create a web page for a project, adding a section in an existing page is comparatively easier. It could be a section in forges that explains the concept of federated forges with a table with a list of projects that are independent of any forge and whose sole purpose is to act as a federation proxy between them. And one entry would be about fedeproxy.eu, another about a fork of fedeproxy.eu under the domain example.eu etc.

This wikipedia page is how people know about the project and it is governed by the wikipedia rules, independent of any individual involved in the project. And this wikipedia page will list all the domain names where the project can be found instead of a single domain name. This list will presumably be a table that also explains the differences between each domain name.

To ensure this list is not merely a list of forks and one domain name becomes the de-facto standard, each website would never link itself but link the wikipedia page instead or the wikidata resource. For instance when publishing a blog article, the link to the project would be to the wikipedia page and not to the domain name. The same could be done for releases with wikidata.

The wikipedia page / wikidata entry would be about the project and explain that deliberate strategy to workaround domain name centralization. When a domain name is created and deviates from this strategy, it would no longer be listed in the wikipedia page because it would no longer match the description.

It is more involved than it should be. But maybe the core idea of using wikipedia/wikidata as an indirection to multiple domain names is something interesting.

misc · August 31, 2021, 10:15am

There is others boundaries. For example, handling of CoC tickets tend to be private, for several reasons, but I think the biggest one is about protections. Protection of the entity (especially when faced with people eager to claim libel, as I have seen), protection of the reporter (so that count as personal details for sure) and protection of the community itself (as we have seen that several high profile cases (Pycon 2013 Donglegate, Drupal one, etc) have become a network wide shitshow).

So they tend to default to private, which is IMHO a problem since now, the perception of handling disputes is of some shadowy cabal handling edicts from above, and not a lengthy discussion process.

Here, the problem seems to have been less with Travis/Github as proprietary systems and more about Github not working for free to untangle the governance of VoidLinux. The project would likely had faced the exact same type of problem with a registrar or a bank by asking “can we take over that domain/account, we have no structure to prove, but we are the good guys™”.

The obligation to have a structure to open a bank account is a interesting point. On one hand, this clearly add friction. On the other, this solve so much problems down the road that maybe this is something to replicate. For example, configure a forge so that you can’t create a org on a forge unless there is more than 2 admins. And make the forge do a regular check that the admins are alive and interested, on a regular basis.

The part of the issue with Github not being engineered for 10 000 repos is real, as is the question of load for Travis, but to be fair, that’s a engineering issue, and I think that it is not really a Github specific issue. If the community had not enough sysadmins to setup their own CI/Forge, Github being free software wouldn’t have changed that at all, and the problem was here before the governance crisis, and wasn’t created by that crisis, just made visible. If Void Linux decided to go on gitlab.com at the start, I feel their problem would have been the same. The same incentive to not spend too much time on non paying users would apply, the same lack of sysadmins would apply, the same scaling issue exist and the only difference would have been 1 specific API to change the parent of a repo.

Granted, this part is specific to Github, but I see that as a missing feature more than a fundamental limitation due to Github nature and/or stack, since Gitlab has a hidden API call to fix that, and github doesn’t. And I am not sure if Gitea has the same API (since the doc of /repository/repoEdit do not explicitely list parent as a field that can be changed).

misc · August 31, 2021, 10:53am

That’s a rather problematic issue. For a start, there is several different linguistic version of WP, each with slightly different rules. So you may need to update each WP when there is a fork, which is a daunting task. Being able to list the difference in each language is also going to be a chore.

And while free software projects have historically been able to not be too affected by the rules around notability (eg, have 2 articles in the media, with at least 2 years of interval), I suspect that the status quo is rather fragile. So if your project is too small to be notable, or if the rules change, the whole page could be removed (especially if the free software project is also commercial ).

Wikidata seems a much better choice since there is no issue with translation, and sine there is already support for multiple entries. And the rules around what can go in Wikidata are much relaxed, as “if it exist (and can be proven), we can add it”.

The biggest problem is that Wikidata do not allow to add free form text to explain why choose a link over another (and doing so would bring back the translation problem). This and the fact this is not user friendly, or at least, not like a browser + web page, so you would need another frontend, and that can’t be a webpage controlled by the project. Then you also get the question of “who control the Q item”, but this is likely less a problem.

dachary · August 31, 2021, 12:36pm

If VoidLinux had infrastructure as code and the only other dependencies were (i) a domain name, (ii) a cloud provider for virtual machines, the problem described in this part of the article would not exist, that was the point I was trying to make. Whatever strategy VoidLinux would have developed over the years to deal with 10,000 repos would have been done in a context that was reproducible since it would have been based on infrastructure as code.

dachary · August 31, 2021, 12:51pm

Wikipedia and wikidata work together: for instance the software infobox of Thunderbird in French gets content from wikidata to display them in the page, possibly mixed with free form text. Another example, which is closer to what I had in mind, is the table displaying the Session Parlementaires of the XIVe législature de la Cinquième République française, also getting facts from wikidata.

misc · August 31, 2021, 1:42pm

But the main issue is that they didn’t create the automation, not that they couldn’t. You can fully manage Github repo with the API (for example using Ansible, and I know that Microsoft did exactly that, and had a internal portal to people could connect their Github ID to Microsoft Internal AD, and it would add you to the right group, give right access on projects (it was a few years before they acquired the company, so now, I can’t find it ). But Red Hat do exactly that, you are removed from specific groups on the Openshift org as soon as your account is closed in our LDAP (where people indicate their github account).

So while there is a argument for the lack of automation on a lot of SaaS, I think this doesn’t apply to Github in that case.

Now in practice, I know no one who do that (not even me, who in the past automated phpbb configuration in Puppet, or suggested we automate matrix room creation using ansible to the Ansible folks).

I suspect that if Void Linux folks were ok with non automation of Github, they would also have not automated a self hosted gitlab.

misc · August 31, 2021, 1:57pm

True, but that’s per wiki instance. The infoboxes are specific to each wiki, and that’s Lua code, placed in a special namespace and duplicated (or rather, rewritten) on each wiki. They also (TW: software coding horror) manage that on the wiki directly. There is no git/svn, the code is on a wiki page, edited using the online text editor. No PR/MR, nothing. Also, directly deployed in production…

And there was also a strong pushback against Wikidata on at least the french Wikipedia, for some reasons I can understand (it make harder to watch vandalism, a problem that can be solved) and some that sound just dumb. I suspect this started to change (because of Wikipedia structural problems leading to burnout, people sooner or later leave when they are too implicated in some discussions).

dachary · August 31, 2021, 3:55pm

To illustrate where the table listing fedeproxy.eu could be located in wikipedia, I created a new section in the forge wikipedia page about interoperability.

It includes a chapter about federation and this is where a table of software implementing federation for forges could be placed.

To be honest I’m still unsure about the value of this idea but it was definitely worth spending most of the day thinking about it a discussing it with you @misc

dachary · August 31, 2021, 3:57pm

Yes, that’s what I meant although I phrased it incorrectly.

dachary · August 31, 2021, 4:00pm

I’m not saying it is perfect, far from it. But it exists and it looks lie it is not going to disappear any time soon.

misc · August 31, 2021, 6:11pm

To be fair to your proposal, this is how people manage to keep track of sci-hub, so it definitely work enough (even if this kind of censorship is different from the use case we discuss).

So I guess that also bring another question, on why people would keep that up to date for less important projects than Sci-hub.

Also, another issue when forking/moving is data. For example, git repositories need to be replicated (which is not the bigger problem, as long as you scripted that), but also authentication and identity (which is a lot harder).

Another issue is also there is a lot more things that have the same property as domain names for a project (eg, being exclusive). For example, the naming on pypi/CPAN/rubygems/dockerhub/etc. While people can host them locally (Fedora has its own registry), having a prefix (eg example.org/module) is going to be less convenient than not having (eg just module), and this cause some problem from security issues (name squatting on dockerhub) to fixing consumer (podman ask to choose the registry when I do “podman run fedora”).

it seems like a free software movement wide initiative to get that fixed.

dachary · August 31, 2021, 8:51pm

It is indeed overwhelming. I’m being cautiously optimistic though: if we keep trying to fix it, we may end up finding a solution The VoidLinux problem is inspiring in that regard: their biggest issue turned out to be their inability to rebuild their infrastructure, which is a problem that is actually easy to solve as long as you make the right choice from the beginning.

Since fedeproxy is in its infancy, maybe we can make good choices right now. It could be to not distribute anything via pypi or not have a forum based on discourse because it only works in a centralized way. Not having a forum now would be painful for me right now but maybe it’s worth it if it saves fedeproxy from the problems of a centralized resource in the future?

misc · September 1, 2021, 2:01pm

I think what is missing is also a list of the problems we want to solve, as there is surely others solutions than “decentralize everything”, especially since decentralization come with its own set of problems.

For example, the problem of void linux with a key figure that disappeared is a SPOF, and there is several way to deal with it, not all implying a fundamental shift in the infra side.

There is also the question of the impact. If discourse right now disappear, this would be bad for the project, but not impact that much long term, nor impact the user community.

dachary · September 2, 2021, 2:10pm

Here is another unfinished idea, fresh from this morning, to workaround the problem of domain names in the context of horizontal communities.

It solves two problems:

Takeover by a single individual, by way of a distributed secret between community members
Centralized self-hosted services, by way of identical reverse proxies, each under the control of a different community member and periodic backup resurrections

Shared control of a domain name

Enable TOTP 2FA on a registrar
Setup a shared TOTP app that can only be accessed if N individuals agree (https://en.wikipedia.org/wiki/Shamir's_Secret_Sharing or something)

This is not convenient but domain name changes are very rare therefore it is not an inconvenience that could impact the productivity of the software project. Assuming the project has a manifesto similar to fedeproxy, the group of individuals controlling the domain need to agree to a change otherwise none of them can do it.

Decentralization via reverse proxy

The domain points to a reverse proxy (first tier) that has two types of backend:

non-federated self-hosted services: each person sharing control of the domain runs an identical reverse proxy (second tier). Rose has no control over Jean’s reverse proxy or Edith’s reverse proxy. These second tier reverse proxy are all configured in the same way: their backend are the same self-hosted services. The first tier reverse proxy randomly chose one of the second tier reverse proxy when forwarding a request.
federated services: the first tier reverse proxy has a backend that randomly chose one of the federated services. They are synchronized in real time and it does not matter which one handles the request.

federated services                                   non-federated self-hosted services



                                                        +--------+
    +---------+                                         | Reverse|           +-------------+
    |  Forge  |<-------------+              +---------->| Proxy  +---------->|             |
    |  Jean   |              |              |           | Jean   |           |             |
    +---------+              |              |           +--------+           |             |
                             |              |                                |  Chat       |
                           +-+--------------+-+           +--------+         |             |
    +---------+            |                  |           | Reverse|         |             |
    |  Forge  |<-----------+      Reverse     +---------->| Proxy  +-------->|             |
    |  Rose   |            |      Proxy       |           | Rose   |         |             |
    +---------+            +-+--------------+-+           +--------+         |  Website    |
                             |      ^       |                                |             |
                             |      |       |                                |             |
                             |      |       |                                |             |
    +---------+              |      |       |       +--------+               |             |
    |  Forge  |<-------------+      |       |       | Reverse|               |  Forum      |
    |  Edith  |                     |       +------>| Proxy  +-------------->|             |
    +---------+              +------+------+        | Edith  |               |             |
                             | example.com |        +--------+               +-------------+
                             |             |
                             |  registrar  |
                             +-------------+
                                    ^
                        +-----------------------+
                        | Rose & Jean & Edith   |
                        |                       |
                        |    Shamir Secret      |
                        +-----------------------+

Data portability via backup resurrection and infrastructure as code

Jean, Rose and Edith are trusted with a full backup of the services (i.e. the horizontal community agreed). At a given point in time one of them runs the backends used by the second tier of reverse proxy. On a regular basis the services are migrated under the control of someone else, to verify resurrecting services from backups actually works. The second tier of reverse proxy are re-configured to use the newly instantiated backend.