#1612 discusses Federation in general but I wanted to open an issue for the Acti…vityPub + ForgeFed solution specifically and concretize this unit of work. Let's keep to discussing ActivityPub+ForgeFed design specifics here, and have the question "whether it should be ActivityPub+ForgeFed at all" discussed in the other issue (#1612).
# Background
I work on the go-fed suite of ActivityPub related libraries, sites, and tools. So my knowledge is more centered on the AP/ForgeFed angle, but as I've only spent light amount of time with the Gitea code I'm not comfortable making the changes required, especially w/o without serious design discussions. And I'm not here to shill go-fed as being *the* solution, it just provides *a* solution, as there are reasons to pick it entirely, partly, or not at all. I'll try to keep the evangelization to a minimum and self-contained, at the very end.
## (Optional, Time-Sensitive) Grant Opportunity
There is a limited-time opportunity for this work to be submitted as a grant to the NLNet folks via the [NGI Search & Discovery](https://nlnet.nl/discovery/guideforapplicants/) grant (5-50k euros) or [DAPSI](https://dapsi.ngi.eu/apply/) grants (50k+ euros?). There is never a guarantee a grant will be selected to receive money, but given the slate of other Fediverse projects that *have* gotten funding via the NGI S&D, I think this is a great exercise. There is the possibility that the EU will extend certain NGI funding periods for further cycles, but it is not guaranteed.
Concretely, would need 1 or more Gitea community member volunteers interested in taking the lead on those. I personally am applying separately for other projects, so I don't have time / energy to push this aspect forward, but happy to provide guidance where possible.
## Community Standards
I believe the ForgeFed work is still ongoing. An outcome could be that this effort allows whoever wants to help to pioneer additional [ForgeFed behavior](https://forgefed.peers.community/behavior.html) and be a voice [there](https://talk.feneas.org/c/forgefed/10). Additionally, depending how Gitea embraces ActivityPub in general, it may also have opportunities to create [Fediverse Enhancement Proposals](https://socialhub.activitypub.rocks/t/fep-a4ed-the-fediverse-enhancement-proposal-process/1171), so no matter what the volunteers will definitely get open-source community leadership opportunities.
# Design
## Overview
ActivityPub is based on the concept of *actors* exchanging *activities*. These activities tell federated peers how to update their view of the "Federated Web", which is a linked-data graph composed of different RDF ontologies. That's a jargon-y way of saying "data types are flexible, and have pointers to other pieces of data". ForgeFed is just one ontology focusing on Forge behaviors and entities. Peers are not expected to know how to interpret every single ontology on this Federated Web, so Gitea can just focus on a narrow one -- ForgeFed -- in addition to the basic ActivityStreams vocabulary that acts as a common language.
This does not prevent Gitea from adopting different ontologies later, if the project decided to support viewing/interacting with the other kinds of *activities* going on in the wider web. It is just not a requirement right now, in the spirit of keeping scope limited.
The [ForgeFed](https://forgefed.peers.community/behavior.html#actors) spec outlines the *actors* and the *data* being exchanged. A subset of that data are *activites* which are shareable between actors. One actor can `Create` a `Ticket` (issue) and give it to a peer, who knows: "this *data* -- which happens to be an activity -- is a `Create` so I'll invoke my *create* behavior/function with the payload".
*So if REST is*...
`POST` to `/issues/new` with the body containing `payload` and a `session_id` containing authenticated credentials. This results in invoking the server's `CreateIssue` function with `payload` based on the `user` calling it.
...where REST scales by just creating more endpoints and using more HTTP verbs: `POST` and `GET` and `PATCH` to `/repo`, `/merge-request`, `/repo`
*Then ActivityPub is*...
`POST` to `/actors/cj/inbox` with an `http_signature` header and Activity `payload`. This results in invoking the server's `WhatActivityIsThis` function which determines the Activity is a `Create`, so it calls the `CreateIssue` function with the rest of the `payload` information based on the `federated_peer_user` calling it.
... where ActivityPub scales by just having new types of Activities (`Create`, `Update`, `Offer`, etc) and new data types that are acted upon (`Person`, `Repository`, `Commit`, `Note`, etc).
This means Gitea adopting ActivityPub will require a little bit of a different philosophical mindset than perhaps is common. Rectifying that, or isolating that, with the existing codebase is a core engineering challenge.
Therefore, at a high level, Gitea would need to support the following concepts:
- Actors
- Sending Activities
- Receiving Activities
- Serving ActivityStreams
- Fetching ActivityStreams
...then can "First Federated Behavior" be reasoned about, as a penultimate section.
Let's concretely dive into what is required to do them. The "why"s might not come together until the "Fetching ActivityStreams" section. These sections are the things I can think off of the top of my head, it may be incomplete.
*Note: This only goes into S2S federation and not C2S federation*
## Actors
ForgeFed only has [suggested guidance](https://forgefed.peers.community/behavior.html#actors) for actors:
- `Person`
- `Project`
- `Repository`
- `Group`/`Organization`/`Team`
To support any of these, the following would need to be tackled:
- Conceptually, mapping Gitea's concepts ("People", "Teams") into "Actor" concepts in the ActivityPub world, and just having the community aligned with these thoughts is a big step before anything else is done.
- Mapping Gitea's actor concepts into the ontology. "What Gitea concepts are which ActivityStreams/ForgeFed types."
- The Actor IRIs (at what URL's peers can fetch the actor document) are arbitrary. So mapping, say, a Gitea Repository actor to a `/repository/actor/{id}` IRI. Note: Existing IRIs can be re-used, so long as they respect the whole `Accept` / `Content-Type` headers for ActivityStreams content.
- "Translation layer": how to map Gitea's existing database columns into the actual fields in the ActivityStreams document to serve. I put "translation layer" in quotes, because the design could be something else.
- A new private key generated for users. Private keys are needed for signing federating request for HTTP Signatures, *if* Gitea wants to use this community-adopted standard. I strongly recommend sticking to it for interoperability and dont-shave-the-yak reasons. More on this later.
- Managing `inbox` and `outbox` ordered collections. This one is the big doozy. Each actor basically has a "sent" and "received" queue, which means:
- Backing storage in the database
- User self-moderation capabilities (ex: block peers, delete receiving this message, etc)
- Admin moderation capabilities (ex: block abusive peer Gitea instances, etc)
- (optional, but strongly recommended) managing `following` and `followers` collections for actors. In addition to the previous concerns with the `inbox` and `outbox`:
- curating followers lists (there is an option to manually approve followers), and how to manage that for things like a "Team" or "Repository" or "Project"
- how to follow others (involves fetching the *peer* actor and presenting a UI)
- implementing the `Follow` and `Accept`/`Reject` flow
- (truly optional) managing the `liked` collection, if a "Team" or "Project" wants to star or favorite other items seen on the Fediverse. I'm unsure what this would look like in practice, but this could be an opportunity for someone wanting to play with UI/UX to do so.
## Sending Activities
Sending activities is described in the [ActivityPub spec](https://www.w3.org/TR/activitypub/#delivery). Unfortunately, it also explicitly relies on the [C2S](https://www.w3.org/TR/activitypub/#client-addressing) addressing, which must be kept in mind while implementing.
This both unblocks Gitea's ability to do *any* federated flow in the future, but alone is insufficient to do any specific federated flow.
To keep it succinct:
- Mapping the actors' `outbox` to a concrete IRI, ex: `/users/{id}/outbox`, from which it can serve the outbox collection.
- Addressing ("who am I sending it to")
- recursive unwrapping of collections (to a specified depth)
- deduplication and self-removal
- stripping of sensitive fields (C2S)
- Addressing normalization ("Automatically creating a `Create` activity") (C2S)
- (optional, controversial) `sharedInbox`, alleviates the peer server if they're a massive instance
- Adding to the actor's Outbox (and backing datastore)
- Transport
- HTTPS with HTTP Signatures (recall the actor's private key in the previous section), so being able to tell "who" is delivering the activity to the peer
- Sets headers appropriately (doubly so due to HTTP Signatures)
- Dereferencing peer actors to get their `inbox` and `POST`ing to there. See later on Dereferencing.
- Bounded retrying of failed deliveries (and all those messy networking considerations)
- Admin capabilities to manage this (prevent DoSing a peer, backlogged queues, etc)
- Inbox Forwarding
- detecting when to do it
- determining who to forward to
## Receiving Activities
Receiving activities is the second half:
- Mapping the actors' `inbox` to a concrete IRI, ex: `/users/{id}/inbox`, from which it can serve the inbox collection for a `GET` request (if desired), or receive federated peer `POST` requests.
- Verification of HTTP Signatures: fetching the peer actor, getting their key information, and verifying the signature.
- Ensure it is not blocked for some reason (admin or user level)
- **Specifying specific Gitea behavior to do as a consequence of receiving the peer's federated data**
- This is the goal to get our "First Federated Behavior" section below
- This may change something like a `federated_view` table of data that is just a cached version of the peer's data (but can always be authoritatively gotten from a peer, see "Fetching ActivityStreams below")
- Note that this would need to be manageable by an admin (ex: remove undesired content)
- Ensuring the new Gitea state is reflected in "Serving ActivityStreams" (see next section)
- This lets federated peers "understand" Gitea without having to do something like scraping the HTML/Javascript UI to get the information, maybe get notifications and fetch this data, etc.
- (optional, controversial) `sharedInbox` support. This is a way to optimize your Gitea instance if you have thousands of users on one instance and tons of activities flying about constantly, and don't want to DoS that server instance. I personally dislike `sharedInbox` but won't go into those reasons here.
## Serving ActivityStreams
As a consequence of the aforementioned section of "having Actors do things", they will generate data that needs to have an ActivityStreams representation so that peers, upon looking at what an actor has been up to, can natively understand what is going on. For Gitea specifically, this gets into the [ForgeFed examples](https://forgefed.peers.community/modeling.html#repository).
- Mapping Gitea concepts like "commits" and "repository" to the [ForgeFed definitions](https://forgefed.peers.community/modeling.html) (or ActivityStreams definitions, too).
- Mapping these types to IRIs, ex: `/repo/{id}`. Note: as before, existing IRIs can be re-used, so long as they respect the whole `Accept` / `Content-Type` headers for ActivityStreams content.
- Having a "translation layer" to transmute the Gitea database columns into ActivityStreams data server in `http.Handler`. Again, "translation layer" is in quotes because there may be a better design choice.
- Ensuring that the endpoint is protected and requires appropriate credentials to view, if applicable.
This unblocks the next bit...
## Fetching ActivityStreams
*If I ever use the term "dereferencing", this is what I mean*.
A Gitea instance will be able to fetch a peer Gitea instance's ActivityStreams, thanks to the work outlined in the previous section. This allows you on `foo.gitea.io` to fetch my `Person` actor on `bar.gitea.io` but, say, render it on a webpage to yourself natively on `foo.gitea.io`. Dereferencing is needed for other operations, in particular the Delivery and Addressing portions of "Sending an Activity".
Fetching ActivityStreams data also allows `foo.gitea.io` to potentially display all the information of a Repository shown on `bar.gitea.io`, without having to actually navigate to `bar.gitea.io`. Any actions done by users would spawn new Activities and resulting in invoking the "Sending Activities" section. Again, that's up to the concrete design.
- Transport (`http.Client`)
- Optionally with HTTP signatures if a user is signed in
- Sets the `Accept` header appropriately
- The UI work to render the desired peer data types (`Repository`, `Person`, `Team`) in Gitea's UI
## First Federated Behavior
All of the above, and we haven't yet discussed the behaviors unlocked by ForgeFed yet. They [list several](https://forgefed.peers.community/behavior.html#server-to-server-interactions):
- Reporting Pushed Commits
- Opening a Ticket
- Create
- Offer
- Commenting
I would propose just aiming for *one* initially. Even aiming for *none* of these, but doing the other sections above, is a large enough feat worth celebrating: Getting to the point where the followers flow works is a celebratory moment, *if* Gitea wants to have the concepts of "followers"/"following" (and I think it does?).
This first federated behavior would involve:
- The UI/UX work to introduce that behavior/flow, if needed. Something like "Reporting Pushed Commits" might just be an additional backend effect.
- Adding the IRI endpoint that would trigger the new behavior/flow, if needed.
- Sending the appropriate Activity to the federated peer for that behavior, as defined by ForgeFed.
- Designing a hook when *receiving* that Activity from a federated peer, as defined by Forgefed.
Whew, done. :)
## Go-Fed
I promised to keep this at the end and self-contained. :) The `go-fed/activity` library focuses on being middleware. You implement several interfaces like `Database` which it will use. You then use the resulting library calls in a `http.Handler` to deal with "Actors sending Activities" or "serve this ActivityStreams representation".
Since it is middleware, it only solves *some* of the problems I listed above. Big picture, the main problems remaining unsolved by go-fed are the integration ones:
- How to map IRIs/URLs to ActivityStreams data
- How to map ActivityStreams data to the database
More specifically, going section-by-section and listing the bullets that are addressed by `go-fed`:
- ActivityStreams
- How to deserialize ActivityStreams JSON-LD data into a concrete Golang type. It turns out this isn't easy; struct field-tagging with `json` is woefully insufficient. I didn't mention this aspect at all before, but I won't dive into this unless someone wants to do so. Highly caution *against* deserializing ActivityStreams on one's own, `go-fed/activity/streams` is there to help to go from `[]byte` to `ActivityStreamsCreate`. Scales across all sorts of RDF vocabularies, too.
- This makes it easy to create and manipulate ActivityStreams data natively in Go
- Actors
- Handy functions simplify serving Actors, as it's just like any other ActivityStreams data. Mapping the IRIs remains the largest challenge here.
- Sending Activities
- The "Addressing", "Transport" and "Inbox Forwarding" bullets above are all addressed by `go-fed/activity/pub`
- All sub-bullets, too! Except: `sharedInbox`
- There is a separate `go-fed/httpsig` for HTTP Signatures signing and verifying
- Receiving Activities
- Upon receiving federated data, provides hook for doing that HTTP Signatures (or other) authentication
- Provides hook to map `payload` to your Activity-specific behavior (`Create => createFn()`, etc).
- It provides common out-of-the-box default behavior for [ActivityPub specific](https://www.w3.org/TR/activitypub/#create-activity-inbox) Activity behavior (7.2-7.10), for example a peer's `Create{Note}` will attempt to create the federated `Note` via the local `Database` interface, to try to help make bootstrapping quicker.
- Serving ActivityStreams
- Handy functions simplify serving ActivityStreams, just like the Actors section mentioned above. Mapping the IRIs remains the largest challenge here.
- Fetching ActivityStreams
- `go-fed/activity/pub` provides an optional transport to dereference using HTTP signatures, but doesn't tell you how to use the resulting data nor does it have any idea what it is for.
The `go-fed/activity` library does **not** solve:
- Managing your mapping of IRI endpoints to concepts/actors/types
- How to actually store things in a database
- Making sure your IRI endpoints match your database queries
- *For what reason* do you want to dereference to fetch a peer's latest data or send data to a peer (it only does just enough for its own purposes and no more)
- It can't magically "know" what ActivityStreams data you want to use -- the ability to crafting the desired Activity and objects without constraint is indeed the whole point.
Finally, some downsides of `go-fed`:
- Due to code generation hundreds of thousands of lines, the binary size is *large*. It cannot compile on a Raspberry Pi, due to that code size.
- There are parts of the API that are still rough. However, I am open to feedback here.
- The tutorials are not the most intuitive. I know the go-fed.org site looks like a backend engineer wrote it, but that's being addressed as we speak.
- Implementing the interfaces can be unclear at times. However, I need to hear the questions, to better change the interfaces and/or fix unhelpful/unclear comments.
- Gitea would be the first (other than me & my blogs/pet-projects) to adopt `go-fed/activity/pub`, which comes with its own risks. WriteFreely uses `go-fed/activity/streams`.
I hope this kicks off a productive discussion. Thanks for reading this far, if you made it. :)