Bonjour,
This is a generic grant application similar to the one created for Gitea in July 2021 that contains material to be used on specific grant applications such as:
- NGI Assure / Entrust deadline August 1st ~60K€
- OTF rolling concept note
Title
- Title: Friendly Forge Format
- Tag line: an Open Standard to store software projects
Executive summary
There is no standard file format to share the content of a software forge (e.g. git repository, issues, etc.), and only some of them provide an undocumented internal format. It is possible to use the forgefed vocabulary to send a message about a particular detail (i.e. an individual commit or an issue) via the ActivityPub protocol. But the receiving forge may not know the context in which this information can be interpreted because it cannot conveniently obtain it. As a whole, a software project is a large dataset that is made of numerous interconnected elements stored in a strongly consistent state. Without a standardized file format, interoperability between forges is very difficult.
The full description of the Friendly Forge Format (abbreviated F3) is here.
Why is it “Friendly” and “F3”?
Forges are isolated and people working together cannot conveniently communicate with each other. Using F3 creates a more friendly environment where they can better exchange information and collaborate.
The acronym for “Friendly Forge Format” is “FFF”. It is difficult to say and write and F3 is preferred. It is similar to how WWWC is abbreviated W3C.
The F3 file extension is used when the content is an archive conformant with the F3 file format.
Comparison with existing or historical efforts
The State of the Forge Federation: 2021 to 2023 published in June 2022 contains a detailed description of the projects related to F3. It is designed to be a building block that can be reused by all of them to facilitate the implementation of forge federation features.
F3 is different from ForgeFed. ForgeFed is an ActivityPub extension with its own vocabulary and models represented in JSON-LD. F3 is a JSON based Open File Format providing a strongly consistent representation of a software project at a given point in time. They both need to define a glossary of terms and explain concepts that are common between forges. This already led to contributions to Forgefed and more are expected in the future.
F3 emerged in the context of the forgefriends project where the Gitea internal file format was improved to make room for federated features. The effort duplication of maintaining internal file formats in all existing forges and its consequences inspired the authors to create F3 and publish it as an Open File Format.
F3 is at the crossroad of a number of forge related projects funded by NGI in the recent past: ForgeFed, Storing Efficiently Our Software Heritage, Federated software forges with Gitea, Contribute to all Free Software from the comfort of your forge. It adds an essential piece to the puzzle that will eventually be completed and allow forges to continuously exchange data using open standards.
The forgefriends and ForgeFlux forge federation proxies will include F3 in their upcoming releases. This integration will be a showcase demonstrating how the Go and Python API can be used for integration in other software forges.
Deploying F3 in production in its first version is challenging because it does not yet have a reassuring reputation of stability and robustness. When problems are discovered, they require a level of understanding and an investment in time from system administrators that most service providers would consider too costly. The Hostea service provider is committed to advance forge federation and will deploy F3 as soon as it is available. It will likely be the first production instance supporting F3.
Significant technical challenges
Reliable release process
A release of F3 is a specification document composed of the description of many forge artifacts that evolve independently. It is bound to the reference implementation that shows how each aspect of the specifications work. Dataset generators and fixtures can then be used by the reference implementation to verify the conformance with the specifications.
These three parts (specifications, reference implementation and datasets) must be used to establish a Quality and Assurance process that verifies a F3 release candidate is consistent before it is published.
Representing a very large number of forge artifacts
A forge is an unbounded set of tools (e.g. issues, pull requests, comments, releases, VCS, etc.) that are used by developers when they work on a particular software project. Each of these tools also has an unbounded set of features. It is common for a forge to provide tools and features for which there is no equivalent in another forge (e.g. there are mailing lists on SourceHut but not in Gitea).
Even when there is an equivalence (e.g. Woodpecker CI provides the same kind of functionality as GitHub actions), representing both in F3 is challenging because there capabilities often have subtle variations.
Robust test infrastructure
The reference implementation must track in real time the evolution of forges for which it provides conversion to an form F3. A continuous integration pipeline must be run against all supported versions as soon as a new version is published. It involves bootstraping forges from scratch and feeding them with sample data created from fixtures which is a resource intensive process.
Ensuring the stability of the CI pipeline is, in itself, a challenge. A flaky pipeline that fails randomly or is too sensitive to environmental problems will create false negatives. When there are too many, the developers and contributors maintaining the reference implementations may spend more time debugging the CI than anything else.
Seamless contributor onboarding
Setting up a local development environment that allows a contributor to modify:
- the JSON Schema of the specifications
- the reference implementation
must be made as easy as possible so that they can debug problems. It is critically important because the very large number of details that F3 needs to address requires crowdsourcing development. And without a seamless contributor onboarding process there is too much friction for that to happen effectively.
Ecosystem of the project
F3 was first announced publicly when the State of the Forge Federation: 2021 to 2023 was published in June 2022. Although the idea first emerged early 2022, it still is largely unknown and has never been described formally.
Monthly reports and videoconferences
A monthly report will be published and disseminated to provide a high level overview of the evolution of the F3 specification and reference implementations. They will also be discussed monthly during a videoconference where active participants can better explain what they are doing and why.
In combination with regular developer releases, this will achieve two goals:
- allow future users to keep in touch and plan for F3 integration in their development workflow
- encourage contributors to participate by clarifying what is being done and where more workforce is useful
Forgefriends
F3 will be released as an integral part of forgefriends from the start.
Hostea
The Hostea hosting provider currently proposes vanilla Gitea instances. It will also propose forgefriends instances as soon as they are released. This will enable anyone to try the import / export feature provided by F3 as soon as they are available.
Gitea
Discussions began with Gitea first since it is derived from its internal format and a pull request to merge F3 as an integral part of upcoming releases will be worked on until it is accepted. The strongest incentives for Gitea to use F3 are robustness and active development which is lacking for the legacy export/import codebase.
ForgeFlux & Pagure
The author of ForgeFlux is committed to use the Python reference implementation of F3 as part of its first release scheduled next year. The author of the ForgeFed plugin for Pagure will be contacted and assistance will be provide to use F3 via the Python reference implementation to get context when receiving an activity via ActivityPub.
GitLab
The Go reference implementation will support importing from GitLab CE and, to an extent limited by the API, exporting to GitLab. Merge requests will be opened in the GitLab CE tracker to advocate for features required for federation (e.g. mapping of Issue ids, etc.).
ForgeFed
Improvements to the ForgeFed specifications will be proposed so that data contained in a F3 archive can be translated and sent over ActivityPub. It easier for forges to implement federation: they would otherwise have to figure out how to map their internal data structures or format into ForgeFed models and vocabulary.
Diversity
Improving diversity requires work and paid staff permanently dedicate 5% of their time to do so.
How might this work be sustained long-term?
A software forge is a moving target with many details that can only be adressed by crowdsourcing the evolution of the corresponding Open File Standard. However, even when this is successfully implemented, there will be a need for a small core team of people dedicated to care for the infrastructure and the process to allow contributors to seamlessly work together. Their salary and the infrastructure costs can be paid for by donations from companies, governments and individuals via a non-profit organization.
Workplan
Go package reference implementation
A reference implementation of F3 in Go provides:
- An API for integration in a forge written in Go
- Validation of a F3 archive (JSON Schema validation, VCS sanity checks)
- Import and Export support for Gitea and GitLab
- Dataset generators and fixtures to verify the conformance with the specifications
Milestone: The Go package is published https://pkg.go.dev/
Python package reference implementation
A reference implementation of F3 in Python provides:
- An API for integration in a forge written in Python
- Continuous deployment of the F3 documentation
- The same features as the Go package reference implementation
Milestone: The Python package is published https://pypi.org/
Specification and documentation
The F3 Specification includes:
- An introduction
- JSON Schema with embedded documentation
- Release notes
- A normative file hierarchy
- A glossary of terms and their definition
Milestones:
- JSON Schema for F3 are published in a dedicated repository
- The documentation is published at https://readthedocs.org/
First release
The first F3 release is a bundle that includes:
- The specifications and documentation
- The Go reference implementation
- The Python reference implementation
They are verified to be consistent and tagged with the same version number.
Milestone: simultaneous publication of F3 version 1.0.0 at:
- https://readthedocs.org/ for the specifications
- https://pypi.org/ for the Python reference implementation
- https://pkg.go.dev/ for the Go reference implementation
Integration in the Gitea codebase
The F3 Go reference implementation is used as a replacement of the internal format used for repositories dump and restore features in the Gitea codebase.
Milestones: pull request merged in https://forgefriends.org or https://gitea.io
Deployed by a forge hosting provider
A forge implementation using F3 to provide import / export is deployed by a hosting provider and available to the general public. The users can export their software project from the forge in the F3 format and import a new one by uploading an archive in the F3 format.
Milestone: A dedicated forge provisioned at a given hosting provider is capable of importing and exporting software projects in the F3 format.