Friendly Forge Format generic grant application

Bonjour,

This is a generic grant application similar to the one created for Gitea in July 2021 that contains material to be used on specific grant applications such as:


Title

  • Title: Friendly Forge Format
  • Tag line: an Open Standard to store software projects

Executive summary

There is no standard file format to share the content of a software forge (e.g. git repository, issues, etc.), and only some of them provide an undocumented internal format. It is possible to use the forgefed vocabulary to send a message about a particular detail (i.e. an individual commit or an issue) via the ActivityPub protocol. But the receiving forge may not know the context in which this information can be interpreted because it cannot conveniently obtain it. As a whole, a software project is a large dataset that is made of numerous interconnected elements stored in a strongly consistent state. Without a standardized file format, interoperability between forges is very difficult.

The full description of the Friendly Forge Format (abbreviated F3) is here.

Why is it “Friendly” and “F3”?

Forges are isolated and people working together cannot conveniently communicate with each other. Using F3 creates a more friendly environment where they can better exchange information and collaborate.

The acronym for “Friendly Forge Format” is “FFF”. It is difficult to say and write and F3 is preferred. It is similar to how WWWC is abbreviated W3C.

The F3 file extension is used when the content is an archive conformant with the F3 file format.

Comparison with existing or historical efforts

The State of the Forge Federation: 2021 to 2023 published in June 2022 contains a detailed description of the projects related to F3. It is designed to be a building block that can be reused by all of them to facilitate the implementation of forge federation features.

F3 is different from ForgeFed. ForgeFed is an ActivityPub extension with its own vocabulary and models represented in JSON-LD. F3 is a JSON based Open File Format providing a strongly consistent representation of a software project at a given point in time. They both need to define a glossary of terms and explain concepts that are common between forges. This already led to contributions to Forgefed and more are expected in the future.

F3 emerged in the context of the forgefriends project where the Gitea internal file format was improved to make room for federated features. The effort duplication of maintaining internal file formats in all existing forges and its consequences inspired the authors to create F3 and publish it as an Open File Format.

F3 is at the crossroad of a number of forge related projects funded by NGI in the recent past: ForgeFed, Storing Efficiently Our Software Heritage, Federated software forges with Gitea, Contribute to all Free Software from the comfort of your forge. It adds an essential piece to the puzzle that will eventually be completed and allow forges to continuously exchange data using open standards.

The forgefriends and ForgeFlux forge federation proxies will include F3 in their upcoming releases. This integration will be a showcase demonstrating how the Go and Python API can be used for integration in other software forges.

Deploying F3 in production in its first version is challenging because it does not yet have a reassuring reputation of stability and robustness. When problems are discovered, they require a level of understanding and an investment in time from system administrators that most service providers would consider too costly. The Hostea service provider is committed to advance forge federation and will deploy F3 as soon as it is available. It will likely be the first production instance supporting F3.

Significant technical challenges

Reliable release process

A release of F3 is a specification document composed of the description of many forge artifacts that evolve independently. It is bound to the reference implementation that shows how each aspect of the specifications work. Dataset generators and fixtures can then be used by the reference implementation to verify the conformance with the specifications.

These three parts (specifications, reference implementation and datasets) must be used to establish a Quality and Assurance process that verifies a F3 release candidate is consistent before it is published.

Representing a very large number of forge artifacts

A forge is an unbounded set of tools (e.g. issues, pull requests, comments, releases, VCS, etc.) that are used by developers when they work on a particular software project. Each of these tools also has an unbounded set of features. It is common for a forge to provide tools and features for which there is no equivalent in another forge (e.g. there are mailing lists on SourceHut but not in Gitea).

Even when there is an equivalence (e.g. Woodpecker CI provides the same kind of functionality as GitHub actions), representing both in F3 is challenging because there capabilities often have subtle variations.

Robust test infrastructure

The reference implementation must track in real time the evolution of forges for which it provides conversion to an form F3. A continuous integration pipeline must be run against all supported versions as soon as a new version is published. It involves bootstraping forges from scratch and feeding them with sample data created from fixtures which is a resource intensive process.

Ensuring the stability of the CI pipeline is, in itself, a challenge. A flaky pipeline that fails randomly or is too sensitive to environmental problems will create false negatives. When there are too many, the developers and contributors maintaining the reference implementations may spend more time debugging the CI than anything else.

Seamless contributor onboarding

Setting up a local development environment that allows a contributor to modify:

  • the JSON Schema of the specifications
  • the reference implementation

must be made as easy as possible so that they can debug problems. It is critically important because the very large number of details that F3 needs to address requires crowdsourcing development. And without a seamless contributor onboarding process there is too much friction for that to happen effectively.

Ecosystem of the project

F3 was first announced publicly when the State of the Forge Federation: 2021 to 2023 was published in June 2022. Although the idea first emerged early 2022, it still is largely unknown and has never been described formally.

Monthly reports and videoconferences

A monthly report will be published and disseminated to provide a high level overview of the evolution of the F3 specification and reference implementations. They will also be discussed monthly during a videoconference where active participants can better explain what they are doing and why.

In combination with regular developer releases, this will achieve two goals:

  • allow future users to keep in touch and plan for F3 integration in their development workflow
  • encourage contributors to participate by clarifying what is being done and where more workforce is useful

Forgefriends

F3 will be released as an integral part of forgefriends from the start.

Hostea

The Hostea hosting provider currently proposes vanilla Gitea instances. It will also propose forgefriends instances as soon as they are released. This will enable anyone to try the import / export feature provided by F3 as soon as they are available.

Gitea

Discussions began with Gitea first since it is derived from its internal format and a pull request to merge F3 as an integral part of upcoming releases will be worked on until it is accepted. The strongest incentives for Gitea to use F3 are robustness and active development which is lacking for the legacy export/import codebase.

ForgeFlux & Pagure

The author of ForgeFlux is committed to use the Python reference implementation of F3 as part of its first release scheduled next year. The author of the ForgeFed plugin for Pagure will be contacted and assistance will be provide to use F3 via the Python reference implementation to get context when receiving an activity via ActivityPub.

GitLab

The Go reference implementation will support importing from GitLab CE and, to an extent limited by the API, exporting to GitLab. Merge requests will be opened in the GitLab CE tracker to advocate for features required for federation (e.g. mapping of Issue ids, etc.).

ForgeFed

Improvements to the ForgeFed specifications will be proposed so that data contained in a F3 archive can be translated and sent over ActivityPub. It easier for forges to implement federation: they would otherwise have to figure out how to map their internal data structures or format into ForgeFed models and vocabulary.

Diversity

Improving diversity requires work and paid staff permanently dedicate 5% of their time to do so.

How might this work be sustained long-term?

A software forge is a moving target with many details that can only be adressed by crowdsourcing the evolution of the corresponding Open File Standard. However, even when this is successfully implemented, there will be a need for a small core team of people dedicated to care for the infrastructure and the process to allow contributors to seamlessly work together. Their salary and the infrastructure costs can be paid for by donations from companies, governments and individuals via a non-profit organization.

Workplan

Go package reference implementation

A reference implementation of F3 in Go provides:

  • An API for integration in a forge written in Go
  • Validation of a F3 archive (JSON Schema validation, VCS sanity checks)
  • Import and Export support for Gitea and GitLab
  • Dataset generators and fixtures to verify the conformance with the specifications

Milestone: The Go package is published https://pkg.go.dev/

Python package reference implementation

A reference implementation of F3 in Python provides:

  • An API for integration in a forge written in Python
  • Continuous deployment of the F3 documentation
  • The same features as the Go package reference implementation

Milestone: The Python package is published https://pypi.org/

Specification and documentation

The F3 Specification includes:

  • An introduction
  • JSON Schema with embedded documentation
  • Release notes
  • A normative file hierarchy
  • A glossary of terms and their definition

Milestones:

First release

The first F3 release is a bundle that includes:

  • The specifications and documentation
  • The Go reference implementation
  • The Python reference implementation

They are verified to be consistent and tagged with the same version number.

Milestone: simultaneous publication of F3 version 1.0.0 at:

Integration in the Gitea codebase

The F3 Go reference implementation is used as a replacement of the internal format used for repositories dump and restore features in the Gitea codebase.

Milestones: pull request merged in https://forgefriends.org or https://gitea.io

Deployed by a forge hosting provider

A forge implementation using F3 to provide import / export is deployed by a hosting provider and available to the general public. The users can export their software project from the forge in the F3 format and import a new one by uploading an archive in the F3 format.

Milestone: A dedicated forge provisioned at a given hosting provider is capable of importing and exporting software projects in the F3 format.

2 Likes

Excellent write-up as usual! It is worth emphasising how easy it would be for a popular project like Gitea with fairly long history/logs to migrate forge instances. Gitea hosted Gitea(GitHub to gitea.com migration of the Gitea project) is WIP since February 2017. It probably wouldn’t take so long if a portable format like FFF exists.

Additionally, the simple file format allows for integration with third-party tools like Git and provide functionality beyond what could be predicted by the FFF library developers.

For instance, if a comment is edited trice, it is difficult to represent the same in REST API response without coming up with a method to represent history. If a forge implemented history in REST API, it would only serve to complicate the API and leak complexity into API-dependent projects. But with files, it is possible to offload history to a VCS like Git.

Another use case that the file format could enable is running analytics on the repository activity: the entire dataset is available for download in a simple and convenient format.

2 Likes

This is a good point and I used that very example a number of times to illustrate the difficulty to break free from GitHub. I avoided using it in this context because I’m concerned that it would send a mixed message: good & bad simultaneously. FFF is based on the Gitea format which is a good start. Gitea is an example of forge that is stuck on GitHub which is bad.

I’m also hesitant to provide a precise example of a project that would be migrated thanks to FFF. Because there is a good chance the first release of FFF will miss a few things and will only be a good fit for partial migration or mirroring. The focus is on issues so that will be first. But migrating a project as a whole might take longer.

Everything written in this grant application must hold true when the grant is over, reason why I’m very timid when speculating about the details of deliverables or giving specific examples.

Does that make sense?

2 Likes

This is a good point and it should be mentioned :+1: Software Heritage conducted a large gender study based on the content of repositories that was published earlier this year, IIRC. If the meta data had been available in a DVCS, they could have used it as well instead of just the commits in the code.

2 Likes

First of all: Very well written, and that helps provide a lot of clarity! Here’s additional feedback on text formulation and content…

Textual review

  • As mentioned on chat reorganize the feature bullet list to match decreasing order of importance. My suggestion: Portability, Trust, Mirror, Archival, Notification

    • Notification might be renamed to Reporting, in whatever form or manner (such as notifications).
    • Analytics as mentioned by @realaravinth is a core feature that will be unlocked.
    • Versatility can be added, indicating all the unexpected use cases.
  • You might split this list into “FFF key requirements” and “FFF use cases”. I would make such distinction.

    • Then it is less tied to a forge, has a better split between FFF specific and stuff that can be done with it once you have it.
    • Key requirements: Portability, Versatility, Trust
    • Use cases: Analytics, Reporting, Archival, Mirror (of a forge)
  • “Pivot format”… I do not know what that is, or means. Either choose another word or explain.

  • “Reference implementations in Go …”, I’d move that down a couple lines to just before “The reference implementation is modular…”

  • “Reference implementation is modular”, doesn’t that imply that FFF itself is modular, and this is another Key Requirement?

  • “Evolve into a stardardized format”, does indicate to me that Open Standard is another Key Requirement.

    • Consistency might be mentioned with this Key Requirement, i.e. by providing a common language to use when talking about the domains that are covered by FFF. The consistency is guaranteed by the standardization.
  • With Versatility, onboarding of contributors, gradual crowdsourcing, etc. I think Extensibility is another Key Requirement.

  • “that encompasses a growing number of forges over time”, might reformulate “with broad adoption by forges and related development tools”. The “over time” isn’t needed as with “evolve” you already refer to a vision of the future.

Balance technical and social → Sociotechnical

Why is it “Friendly”?

Here you have an opportunity to highlight more of the social aspects than you currently do. You mention forges being isolated. And having FFF data exchange enabling collaboration. But it goes deeper than that. As phrased here it is rather technical: “I provide you this serialization format, so you can be more social”.

But the creation of FFF itself is a social process. You mention “crowdsourcing”, that is one aspect. But really FFF should represent the “baby that is birthed” and loved by the collective of the forge ecosystem. The “onboarding” starts with people developing a passion for the idea, the concepts and vision that make FFF a worthwhile effort to contribute to.

Then in “Seamless contributor onboarding” you get to explain a mechanism to lower the barriers and friction to actually participate in the crowdsourcing. But here social aspects should be addressed as well. The term “sociotechnical” can be used in the text.

Positioning: forg.es

Given the social aspects, the ecosystem-level scope, I feel strongly that FFF project should be part of forg.es. It can help make this umbrella community meaningful, and it gives independence to the project, same as ForgeFed gives independence to forge federation.

This has additional benefits. I see forg.es as the home of an Ecosystem Alliance where all the sociotechnical ‘cross-cutting’ concerns of forging software (the craft and arts) can be addressed. If there is a crowdsourcing process, then it has to be defined only once. If there are multiple standardization processes, they can be based on the same reference material and improved over time.

Social coding

“Ecosystem Alliance” is a Social Coding best-practice defined by me that is part of the Social Coding FSDL. It is very interesting to elaborate the benefits such alliance can bring, and how it can be established. Note that FSDL is integral part of Social Coding Movement, and hence that it can be part of the Alliance around forges. It would be a great fit. forg.es can be member of the co-shared community (CC @realaravinth). I would love to discuss the ins and outs in more detail.

Process

Given FFF to evolve into a standardized format on the basis of crowdsourcing, a process that allows that must be defined. Creating the process, and organizing to set it in motion should imho be part of the application and time reserved for it.

Maybe you say: “I don’t intend to be involved with that, and anyone can pick up initiative, write a grant application, and organize themself”. Well, that in itself would be a process to clearly highlight in this application. As evaluator of the application I would frown a bit reading that, and matching it to other statements made in the text. So it should be very clear how that intended to work.

You might split into “Technical challenges” and “Organizational challenges”.

1 Like

… reorganize the feature bullet list to match decreasing order of importance. My suggestion: Portability, Trust, Mirror, Archival, Notification

Done!

Notification might be renamed to Reporting, in whatever form or manner (such as notifications).

I’m not sure I understand the rationale for this change, although I can imagine a few. Would you be so kind as to shortly clarify what you have in mind?

Analytics as mentioned by @realaravinth is a core feature that will be unlocked.

Added:

Analytics: data mining the contents of files is more practical than issuing a large number of queries to an API

Versatility can be added, indicating all the unexpected use cases.

Versatility: when published and updated as a FFF archive, a software project effectively is Open Data on which an unlimited range of applications can rely, even outside of the forge domain

You might split this list into “FFF key requirements” and “FFF use cases”. I would make such distinction.

Done.

“Pivot format”… I do not know what that is, or means. Either choose another word or explain.

Good catch! A “pivot format” is a format for which there is a a large number of conversions from and to other formats. So instead of implementing N² conversions it is enough to implement N+1 conversions. To convert to format A to format B you can always use format P as an intermediate format, to “pivot” between the two (A => P => B => P => A).

But there is no need to clarify this notion. It is important to reduce the complexity of maintaining and implementing FFF but it is a detail. I just removed the two instances of “pivot format”.

“Reference implementations in Go …”, I’d move that down a couple lines to just before “The reference implementation is modular…”

Done.

“Reference implementation is modular”, doesn’t that imply that FFF itself is modular, and this is another Key Requirement?

I’m not sure what would be best for the FFF specifications at this time and it is probably difficult to forsee. However, it is crystal clear that the reference implementations must be modular to be maintainable: there are so many tiny details that crowdsourcing the implementation is critical to its success. And crowdsourcing requires modularity so that implementors can focus on a specific area.

I chose to hint this with just “modular” in this context but maybe it is best to remove the word.

“Evolve into a stardardized format”, does indicate to me that Open Standard is another Key Requirement.

That should be in the tagline since it is what FFF is about. “Friendly Forge Format - an Open Standard to represent software projects”

Consistency might be mentioned with this Key Requirement, i.e. by providing a common language to use when talking about the domains that are covered by FFF. The consistency is guaranteed by the standardization.

Yes, added.

With Versatility, onboarding of contributors, gradual crowdsourcing, etc. I think Extensibility is another Key Requirement.

I’m not sure exactly what you mean by Extensibility. Is it something similar to FEP for ActivityPub?

“that encompasses a growing number of forges over time”, might reformulate “with broad adoption by forges and related development tools”. The “over time” isn’t needed as with “evolve” you already refer to a vision of the future.

Excellent, done.

2 Likes

The generic grant application must contain all these and they are to be developed. I did not get to them first because they are not required for the NLnet applications and the application form has a very limited number of words so there won’t be an opportunity to dive into this.

At the minimum it should be something similar to what can be found in the Gitea generic grant application or the DAPSI fedeproxy grant and maybe more depending on the grant application requirements.

1 Like

Reporting is a broader term for the use case than notification. A more general classification. Notification is a rather specific feature. Reporting might be ‘notification’ in a log file, audit log, or generating project summaries that are tooted about, a management report, etc.

Maybe these two go together. If FFF has a large scope (which it has) then separate parts may evolve separately. They may have different versions, or at least some means where someone who crowdsources in the “issue management” area doesn’t have to wait for months for a major release, because people are stuck in defining some “revision control” aspects.

With Extensibility I meant the ease of people being able to extend, and not break backwards compatibility.

Think both are strongly related to Process, which should facilitate all this.

1 Like

What I look for in an Open Standard is that it is useful, easy to understand and use. JSON is the perfect example. But it is much less complex than FFF. Let’s take one element of FFF: the VCS repository itself.

The top down approach would be to define a VCS in FFF as any kind of known VCS and provide a list with normative names so there is no ambiguity: RCS, CVS, SVN, GIT, HG etc. And the reference implementation would only cover some of them. The others would be left to the interpretation of the reader which is another way to say: we did not think about it, deal with it. Exactly what I do not want in an Open Format.

The bottom up approach would be to define a VCS in FFF as whatever is supported by the reference implementation. Which is critically important, for instance to represent the notion of pull request. With the reference implementation, the specifications and the sample data, it is possible to understand, down to the last detail how a pull request described in FFF relates to the content of the VCS. Of course it is possible to add RCS as well and the reference implementation would just verify that the notion of pull request is deactivated / incompatible with this particular VCS.

So FFF is, by nature, an ever growing set of elements: rather than being extensible, I’m under the impression that it is a consistent set of elements (VCS is one of them, discussions/mailing lists another, issues yet another, etc.) that evolve independently. Its main purpose is to provide versioning, consistency, backward compatibility, reference implementation that bind these elements into a consistent set.

Does that make sense to you?

Understood, thanks for explaining. The suggested change was made.

Maybe that Modular and extensible should be a guarantee to those involved in adopting FFF and/or actively helping FFF to evolve. They both are then process-related, where one party might propose to e.g. define a ‘Licensing’ component to the specs, consent is sought, and they can start. And another party can request an existing definition of a concept to be extended with their property, which would go through an approval process.

Mentioning this explicitly helps parties feel more at ease, that they are not submitted to a specification where they cannot participate in the evolution.

Yes, that’s what I have in mind, agreed.

A section about the release process was added to the Significant technical challenges:

Reliable release process

A release of F1F is a specification document composed of the description of many forge artifacts that evolve independently. It is bound to the reference implementation that shows how each aspect of the specifications work. Dataset generators and fixtures can then be used by the reference implementation to verify the conformance with the specifications.

These three parts (spefications, reference implementation and datasets) must be used to establish a Quality and Assurance process that verifies a F1F release candidate is consistent before it is published.

1 Like

typo :0

1 Like

Fixed, and a few more :+1:

1 Like

The following was added because Plaintext group grant application needs it.

How might this work be sustained long-term?

A software forge is a moving target with many details that can only be adressed by crowdsourcing the evolution of the corresponding Open File Standard. However, even when this is successfully implemented, there will be a need for a small core team of people dedicated to care for the infrastructure and the process to allow contributors to seamlessly work together. Their salary and the infrastructure costs can be paid for by donations from companies, governments and individuals via a non-profit organization.