Issue federation development plan (June & July 2021)

dachary · May 29, 2021, 9:48am

Bonjour,

This is the software architecture that will be the first implementation of fedeproxy. It will be presented to @arthurlogilab June 16th, 2021.

It focuses on allowing individual issues to be federated. There is evidence (see the User Research report)

Cheers

Use case

Federating issues

Issue A is created on Forge A by user A
Issue B is created on Forge B by user B
User B adds a link to issue A in issue B
All comments / edits on Issue A made by user A are copied over to issue B
All comments / edits on issue B made by user B are copied over to issue A

Adding a comment to an existing issue

An ActivityPub message of type forgefed comment is received
The comment is added to the corresponding issue

Notifications:

All comments / edits are published via ActivityPub as forgefed commits

Constraints:

User B never comments / edits on Issue A
User A never comments / edits on Issue B

Users

Users on a given forge are uniquely identified by their public URL (e.g. https://lab.fedeproxy.eu/dachary).

All actions carried out by the proxy on behalf of a given user are attributed to the user from which the action originates. If the user does not exist on a given forge, it is created by the proxy.

User A exists on forge A
User A/B is created on forge B to act on behalf of user A
Credentials for User A/B on forge B are stored in a private repository of the User A on forge A

A user is created by the proxy (i) with administrative privilege on the forge, (ii) by drawing in a pool of unused existing users created manually.

State

The fedeproxy service is stateless.

Issue data is stored in the software project DVCS (e.g. in an orphan git branch)
User private information (e.g. users created by fedeproxy on other forges on behalf of the user) is stored in a private project

Issue data format

A JSON based format is created to be used as a pivot format internally.

The forgefed commit is used to represent the issue activity. It is not human readable and is used in server to server communications:

The commit activity is received
The forge and project URL of the DVCS are deduced from the context
The commit hash is checked out
The issue associated with the commit is deduced from the content of the commit

Software stack

The fedeproxy web service is based on:

Django for the administrative REST API
Allauth for authenticating with forges
Copy/pasting from BookWyrm for ActivityPub

Supported forges

GitLab (GitLab CE to GitLab CE as the first step)
GitHub (when GitLab CE support works) with Gitea for CI

Workflow

A fedeproxy instance is responsible for a single forge. For instance if there are two GitLab CE instances, there are two fedeproxy instances.

When fedeproxy receives a notification via ActivityPub about a new commit hash it:
- Gets the issue state from the originating DVCS using the commit hash
- Converts the pivot format to update the forge via the REST API

User eXperience

fedeproxy will federate Issue B with Issue A if
- a comment contains @fedeproxy follow URL (for an existing issue or a new one)
- it receives a follows from ActivityPub
fedeproxy will stop federating Issue B with Issue A if
- a comment contains @fedeproxy unfollow URL
- it receives a[“unfollow” from ActivityPub

pilou · June 2, 2021, 5:20am

What is that ?

Allauth will be used to authenticate the proxy, user A and user B on both GitLab instances ?

What happens when this notification is not sent by the forge or not received by the proxy ?

dachary · June 2, 2021, 5:24am

The service will need to be configured with credentials, for instance an API token with administrative access to a GitLab instance. Instead of it being in a configuration file, I figured it would make more sense that it is set in a database and with a REST API to modify it. The database with the configuration of the web service is the only persistent state that the web service needs to keep.

dachary · June 2, 2021, 5:27am

The ultimate source of truth is the state saved in the DVCS. If the notification comes from a random source it will be spam and the worst that can happen is that it triggers a refresh of the state of the issue that is not needed.

I did not think about the proxy missing notifications and I don’t have answer to this one just yet. Good point!

dachary · June 2, 2021, 5:28am

Yes. Maybe another library can be used, I just mentioned this one because I’ve used it in the past.

pilou · June 2, 2021, 8:54am

From my CI experience, missing notifications occur more than often.

pilou · June 2, 2021, 8:56am

Ok, I was wondering if you were referencing to a specific project.

dachary · June 2, 2021, 9:20am

Not really. I’m inclined to go with whatever makes more sense in the django environment at this point in time. Ideally it would be OpenAPI compatible. Last time I checked it was not very trendy but it was about a year or two ago.

dachary · June 3, 2021, 8:06pm

I thought the fedeproxy web service should probably provide an API for developers to use it. But then remembered that the idea is that fedeproxy is transparent and therefore does not provide any way to interact with it. All interactions are with the forge the developer prefers and fedeproxy is merely a relay. It may be silly to write down such a misguided thought but … it may help in the future if it comes back to me

aschrijver · June 6, 2021, 5:15am

Here’s an example of where domain analysis will improve your use case. Consider this: Do you only deal with comments in the context of an Issue? I argue this is not the case, and the use case will be flawed if it does, which makes it unusable even for a MVP.

What is an Issue? It is a statement of something that needs to be addressed, something that needs attention at a future time. It probably relates to a Project and implicitly or explicitly refers to Project artifacts.

Addressing an Issue means that issue-related Activities need to take place. A Comment is just one such Activity.

Consider this scenario:

Issue A and Issue B mirror each other on different code forges via a link

- User A comments to Issue A: "PR #123 is ready for merge"
- The Comment is transferred to Issue B
- Dependabot updates dependencies in Repo A --> an Activity in Issue A informs about this
- The update breaks the build --> CI report in Issue A informs about this
- User A comments: "Oops, all is broken, lotsa trouble now"
- The Comment is transferred to Issue B

User B now has the wrong context and thinks that the PR is problematic. Might do a lengthy code review and comment: “All seems right. I see nothing wrong with it” to the confusion of User A.

So how might you avoid that in Ubiquitous Language:

An Issue is resolved by Activities
A Comment is an Activity
A Note is an Activity

Don’t know if Note is the best domain terminology, but it is just an example.

Another thing to consider if you want to do manual linking like this. Why wouldn’t the code forge integrations make an API call and synchronise an issue?

User A creates Issue A in Repository A
User A federates Issue A to Repository B

So UI-wise you may have a dropdown of pre-configured repo’s. The underlying code takes care of issue synchronisation. This avoids User B being lazy and creating an incomplete derivative of Issue A. (Note that “federates” is not domain terminology, just example here, but the federation takes placee ‘under the covers’ and is no user concern)

Also note that I use the ubiquitous language term Repository instead of Forge. The repository might well live on the same forge. As a dev I shouldn’t care about that fact, and the underlying funcitonality you’d build already facilitates the requirement “Repository can be Local or Remote”. So you get that almost for free.

Did you take a peek at the Outreach project? In the README you find the Domain Model with Ubiquitous Language below. Then each of these will be elaborated into Gherkin scenario’s that will become automated BDD tests.

The project specifications are still mostly me playing with the best setup, but here is a BDD test for Launch Community. The Gherkin part can be copied as-is to an e.g. test/features folder in the codebase that’s configured for the BDD testsuite. You might even automate that. And this way codebase and documentation are always up-to-date (all documentation follows README-driven development for the same reason of keeping docs + code in sync).

aschrijver · June 6, 2021, 5:26am

I wouldn’t be so sure about that. Anyway it is not needed I agree. But having FedeProxy in the middle means that you can offer value-added services there, with features that will never become part of any forge. These would be surfaced through a FedeProxy user-facing platform UI.

But it is possible to avoid that using extensions (e.g. widgets) in each forge that are fed AP messages (they have an actor/event-based AP ‘API’).

A FedeProxy platform UI might be ideal for non-technical or external stakeholders (without privilege) that you want to take along in the software development process. Then talking (far) future roadmap, of course, but you should have the option to add this with ease once the need arises… you might also see these as additional related projects that work on the core functionality of FedeProxy and may be developed by entire different dev teams.

dachary · June 6, 2021, 10:35pm

I meant comments in the sense of the GitLab API i.e. the way by which humans and bots alike add to an issue (or a merge request). Or the Gitea issue API.

Although I now understand what An Issue is resolved by Activities means (in good part thanks to our discussions), it would have confused me six month ago because I did know what Activities meant in that context. I was however familiar, as I am now, with the notion of issue as well as the fact that it is made of two things:

metadata that describe the issue
a conversation, i.e. comments ordered in chronological order

For this reason I would argue that comments / edits on issues is probably adequate because people who are familiar with the sub-domain “Issues in the context of a development environment” immediately understand what it means without the need to explain further. And also because it fully represents (and that’s user research jargon, I think) how the mental model of the developer matches with the conceptual model of issue trackers as they currently exist.

Just to clarify: I don’t want that. User research shows that developers currently do that, for lack of a tool that automates this for them.

The benefit of a dropdown is clear but I’m not sure how to implement that: how would repos be pre-configured? The lack of UI is an important aspect of this first step, so that the focus can be on the UX, i.e. what follows, which is essentially a convention and copy/pasting a URL.

But then it would make more sense to me if that was implemented as a contribution to Gitea (because GitHub & GitLab are not amicable to that kind of contribution, AFAIK). I cannot forsee a UI fedeproxy would provide that would not be relevant to Gitea. Or maybe I don’t see something and you have an example in mind?

aschrijver · June 7, 2021, 6:46am

Ah yes. I typed my examples off the top of the head without doing the background work, because its more on the concept of DDD I wanted to explain (and in haste probably inaccurate as dependabot will probably report to the PR only, not the Issue).

But there is a danger to looking at API’s. While the API’s are very good input to analyse the domain that is implicitly/explicitly implemented within these apps, they are NOT the domain. They are convenient endpoints for information exchange that may be denormalized, and may carry user interface and/or implementation details.

Comparing Gitlab and Gitea for instance you already see that they use different (but synonymous?) domain terminology e.g. Notes versus Comments. (A notebook with a log of activities/notes may be most accurate).

Also - having taken only a quick peek - Gitlab uses a system boolean flag in their API to create different Note Types (“A Note has a Type”?). Probably having system=true and body="closed" renders something with a ‘Close’ icon, and maybe closes the issue? I am not familiar enough with the API to answer that. On the other hand the Gitea API has entirely different mechanics. But underlying both may be the same domain model. They’d have variations/extensions and synonymous terminology, though.

The domain describes from the perception of what the stakeholders / domain experts want, and not what particular software forces them to do due to their limitations or design choices. The domain is software-independent, a universal model.

Have a look at the following issue chosen randomly from the Gitea repository:

fedeproxy-github-issue-activity

Suppose you are interviewing Lunny and they say “An Issue has Comments. That’s it. And you can edit Comments.” then you hand them this image and respond “So here you made 4 comments? And you can edit e.g. the second one to show a different Label?”.

I guess here both of you would come to new domain insights. Lunny will answer something like “Well no, I actually made just 1 Comment - the issue description itself - and the other entries are the System acting on behalf of me to keep track of project activity that is related to this Issue. I cannot edit these as they are a log of things that already happened. A history of issue-related activity”.

Forget my mention of UI specifics. They are irrelevant when it comes to domain modeling. Domain is independent of UI (in technical terms this relates to ‘inversion of control’). UI decisions are made based on a stable domain model, and there can be many UI’s surfacing the same domain to stakeholders.

There’s many ways this can be done, and it depends on features you offer. From UX standpoint it is indeed best to not bother the dev team with additional steps they must take into account for their process. But at some level you need to know which repo is allowed to sync with which other repo’s and configure privileges and what-have-you. Might as well be tasks for stakeholders acting in an admin or project manager role, idk.

The following is completely brainstorm / imaginary / examplary… only stakeholder needs can tell.

[ Gitlab repo ] <------> [ FedeProxy Platform ] <------> [ Gitea repo]
                          |                  |
                          |                  |
                    [ FP Svc A ] . . . [ FP Svc B ]

FedeProxy Platform sits in the middle of your ‘developer project federation’ as a spider in its web (domain concepts needed?). This is where value-added services can be modeled in addition to specific extensions to indivual code forges (not drawn). What might these be?

An Admin Service that provides config, auditing, auth/authz, install/upgrade, etc. (might have a CLI, web UI, or even Android app).
A consolidated external API that abstracts away the various code forges that participate in the project federation, focuses just on the domain model of the development process (where the code forge is irrelevant, an ‘implementation detail’).
Dashboard UI’s that are targeted to specific stakeholder audiences, filtering information. Think of reporting to keep non-technical stakeholders (e.g. the customer) into the loop, but also allowing them to still interact with the project (goes beyond mere reporting).
(…all kinds of integrations like this)

None of these are MVP material, but anticipating that someday you may want to offer them has implications for the architecture of FedeProxy.

(Note that, imho, a very naive implementation of FedeProxy would be to focus too much on the ‘proxy’ part and create a codebase that is merely a collection of adapters from forge type A to forge type B and as forge support is added an explosion of adapters occurs. No domain model is needed here, but a very constrained system results that may grow into a Ball of Mud as you extend it with unforeseen features. This is probably not what you have in mind, but wanted to mention anyway)

dachary · June 7, 2021, 12:59pm

Agreed.

The subdomain “Issues in the context of a development environment” is only defined differently by Gitea or GitLab or any other issue tracker really. The subdomain is not well defined nor is it standardized. The differences between implementations are sometime significant enough and converting from a given implementation (Phabricator) to another (GitLab) is problematic. For instance an issue in GitLab is bound to a software project: it cannot exist otherwise. Phabricator does not have the same constraint and as a result some Phabricator issues cannot be easily mapped to GitLab issues (that was a takeaway from the user research interviews).

The same is true for the “DVCS” subdomain: Mercurial and Git developers would give different definitions (not just about details).

I’m sure you will agree that, by that definition, the “Issues in the context of a development environment” does not exist yet. The variations in the implementations that are found in Gitea, GitLab etc. are rooted in domain definitions that only partly overlap and are therefore not (yet) universal. And not just on details

There indeed are parts of the issue or even the comments attached to it that are immutable (depending on your permissions or the implementation). Some labels cannot be removed, sometime comments cannot be deleted, etc.

Understood. In the context of fedeproxy I have no intention to undertake the enormous task of defining a domain. I however acknowledge that all forges are based on a domain definition. It is nowhere to be found because it was never written down. But even when implicit, the domain definition that the developers have in mind when working on the forge does exist and it worth investigating. To be less vague, fedeproxy is concerned about identifying commonalities in the “Issues in the context of a development environment” domain as implemented by different forges. All implementations provide comments ordered in chronological order and this is where fedeproxy can help with federation. A contrario fedeproxy cannot federate issues that are not bound to a software projects (although it belongs to the “Issues in the context of a development environment” domain) because some issue trackers do not implement it.

Or maybe fedeproxy could just federate whatever it can. Issues, repos, projects have permissions that already imposes restrictions on what the user can do. Since fedeproxy uses the developer accounts on each forge to act on their behalf, I’m not sure why it would be useful to further control what they can do.

I understand what you are suggesting and maybe someone will go in this direction. This is a little too ambitious for me at this stage

My personal inclination, long term, is to work so that fedeproxy disappears because all forges are interoperable and federated. If it does not make itself as invisible as possible right from the start, I feel (not sure ) that it would work against this goal.

This is exactly what I have in mind: very well put. In other words: I believe there is room for incrementally improving federation on specific subdomains, with immediate results. Domain modeling would require convincing forge authors to agree on it and followup with an implementation before anything can be used by the developers and it would take years. The price to pay for my approach is the explosion of adapters. But my hope is that, as more developers use these adapters daily, forge authors are pressured to achieve interoperability (for instance by agreeing on a domain model) which would make federation possible. Or that forge authors are drawn to federation which requires some kind of interoperability. I don’t see in the future and I’m unable to guess which is more likely to happen, reason why I pursue both.

aschrijver · June 7, 2021, 1:48pm

There is no enormous task. A domain model often starts with a single sentence. It is broken down into multiple sentences based on some analysis and insight gleaned from your user research. In the case of Campaign Management for Outreach, the domain model turned out to be 18 brief plain-English sentences and a diagram. Afterwards with a MVP ready more sentences will be added that reflect new features on the roadmap.

Not explicitly defining your domain model doesn’t mean it doesn’t exist. It does, but now it is implicitly present only in your code.

Okay, but then some considerations:

A Proxy design gives you: Gitlab2Gitea, Gitlab2Github, Gitea2Gitlab, etc.
Ports & Adapters + FedeProxy domain gives: Gitlab2FP, Gitea2FP, Github2FP, etc. (less in total than the proxy design)

Now how would the domain model look like? There are 3 options. Either representing:

The intersection of features (the common denominator)
The union of viable features
The aggregate of all features exposed by supported forges

Option 1) may be too limited and 3) too much. Option 2) the middle road, says that some features may be supported in most forges, but not necessarily all of them. You only do that if they add significant value, and you may do that later in your roadmap (just start with option 1).

In the proxy design it is harder to keep up with new releases of a forge, as it may require updating multiple adapters.

Note too that your federation model (message format, msg exchange) are also representative of your domain, especially if you want them to become a kind of de-facto standard. Federation is part of Ports & Adapters.

Anyway, I am not trying to ‘sell you onto DDD’ nor onto specific features, requirements of architecture decision, I just liked to explain the concept in more detail. I am sorry if this derails this thread a bit. Feel free to move to a different topic.

Why would that be? What would be their gain? The walled gardens / lock-in are very often intentional.

dachary · June 7, 2021, 4:17pm

The intention here is definitely to go for the second option, the first one would require a lot more time. The GitLab export format is going to be used as a pivot (what you wrote as FP). This is de facto borrowing the domain defined by GitLab, which is not ideal.

Yes and part of the work of this first development step is to write a converter mapping the GitLab issue format into the ForgeFed issue/ticket format.

On the contrary! I very much need that kind of discussions to make sure I’m not pushing fedeproxy in a direction that would be problematic. Re-formulating / re-thinking the development plan using different concepts / vocabulary helps a lot in that regard.

Yes but there is a fundamental difference between say Facebook users and GitHub users. The later are Free Software developers, they are the people with the skills to create the change, they do not have to wait for it to happen. I’m hoping this is what will ultimately create the conditions to either:

motivate developers to take an active part in the making of forges that are working towards federation or at least interoperability
motivate the most reluctant forge developers to work towards federation or interoperability on a subset of their features

dachary · June 10, 2021, 9:44am

@pilou I agree with you that https://pypi.org/project/federation/ is a better choice because it is used by https://socialhome.network/ . And there is no need to extract the code from bookwyrm.

dachary · June 11, 2021, 6:01pm

I thought it would be possible a add an arbitrary URL in the newly added “link” field of an issue in GitLab but it turns out it is not possible.

dachary · June 16, 2021, 5:11am

Since we’re experimenting, maybe there should be something related to A/B testing. That came up during last week discussions on User Research and seems relevant. Maybe it’s too early to think about that but it would not make sense to recruit users a not be able to observer how they use the service.

dachary · June 16, 2021, 6:56am

The user experience regarding the management of the identities is still unclear. However it should not significantly change the development plan. The user credentials are stored in a private repository of the user, on the forge they are using. Since there is no SSO between forges (or there is but it is limited in the sense that, for instance, application tokens are bound to the forge, or that it is merely a way to centralize account management for user and password but no other properties associated with the user), there does not seem to be a way to avoid storing user information.

There needs to be a user facing web interaction when a user already has an account on a remote forge. The OAuth2 URL will obtain the token that will allow fedeproxy to impersonate the user. This will be true for all user who already have a GitHub account. They do not want fedeproxy to create a user for them: they want fedeproxy to impersonate them so their actions are attributed to their existing user.

It would be convenient that fedeproxy always and immediately creates a user on the remote forge if it does not exist. And at a later time such users could be merged into existing ones to avoid duplicates.