Discussions around concurrent Gitea processes

Bonjour,

There has been discussions around running concurrent Gitea processes in the chat room. The general idea is that two or more Gitea process run with the same database and files, for high availability or performance purposes.

There are two problems that need fixing before that can happen:

  • Gitea processes must be stateless which involves moving some storage from in memory to something external to Gitea (a good example of that is using redis).
  • All user actions carried on by a Gitea process on behalf of a user (via the web interface or the API) must be atomic. For instance creating a pull request on a given repository must be guarded with a lock so that another pull request creation does not race against it.

The stateless aspect is discussed in the helm repository and is relatively easy. The atomic aspect is much more challenging as it would require extended modifications to the Gitea codebase.

That being said, since each user actions is already managed by an independent goroutine in Gitea, it already is exposed to the same race conditions that would occur in a multi process environment. The consequences of which are inconsistencies in the files, repositories or the database that trigger unexplained problems: tracing them back to the root cause is very difficult. They are of no practical concern to Gitea instances that have a few repositories and users. But since there is no protection, they are bound to happen, by chance or when the Gitea instance is very busy.

It follows that using multiple Gitea processes for high availability purposes, while being exposed to race conditions, would not make things worse than they already are, once Gitea is made stateless and all changes to external resources (redis, database, repositories etc.) are locked to prevent race conditions between processes.

Using multiple Gitea processes for performance reasons is much more challenging because it would require to solve all the race conditions that currently exist. This is not too much of a concern because a single Gitea is fast enough to handle instances that match the needs of the current user base, including the largest known instance which hosts repositories in the order of magnitude of 10,000.

My 2cts

1 Like