Common JSON Schema for repository information

Bonjour @realaravinth,

I’d like to propose to you that we work together on a JSON Schema to document and validate the data structures used by Gitea. So that ForgeFlux & forgefriends & Gitea could use the same format to convert data from one forge to another.

To illustrate what I mean, I created a branch in forgefriends with a commit that adds JSON Schema for an issue, a label and a reaction. There exist libraries / commands to use these schemas to:

The immediate benefits of this work would be:

  • To make Gitea a little more robust by validating the content of the JSON file and potentially detect problems that would show if corrupted files are read and interpreted.
  • For ForgeFlux & forgefriends to agree on a common format instead of working on different formats.
  • For ForgeFlux & forgefriends to immediately use the Gitea code that exists to export a project originating from Gitea, GitHub, GitLab and Gogs, and import a project into Gitea and maybe soon into GitHub as well.

Regarding the last point, although ForgeFlux is implemented in python and cannot use Gitea as a library, it can conveniently use the gitea dump-repo/restore-repo command line to do the same.

What do you think?

1 Like

I’m going through Gitea’s implementation at the moment but I agree with you, documenting this module could prove useful later.

There exist libraries / commands to use these schemas to:

Are you referring to OpenAPI-based tools? If yes, in addition to this, we could also use this opportunity to verify the semantic correctness of the API. For instance, the fork creation endpoint returns HTTP 500 with the following message when you try to fork a repository twice when a 4xx error is more appropriate:

{
  "message": "repository is already forked by user [uname: realaravinth, repo path: bot/tmp2, fork path: realaravinth/tmp22]",
  "url": "https://git.batsense.net/api/swagger"
}

I did not have that in mind. But maybe Golang OpenAPI tools can make use of JSON Schemas as well. However… these would have to be about the JSON datastructures used by the API which are different. I’m going to write a summary of which datastructures are used for what in Gitea to clarify that

1 Like

make generate-swagger uses the JSON API comments to generate the API documentation, I think documentation on the JSON datastructures that you mentioned will also reflect in the API docs.

But you are right, verifying semantic correctness is a task on it’s own :smiley:

1 Like

Will hopefully clarify where and how the various data structures are used within Gitea.

1 Like

It will not because the API data structures and the migration data structures (the one that are of interest to ForgeFlux & forgefriends to agree on a common pivot format) are distinct.

Oh, now I understand. Thank you for your patience! :smiley:

1 Like

It took me a while to sort all this out, I’m glad it was a little faster for you :slight_smile:

1 Like

It also vastly simplifies the work at hand. Documenting the many JSON data structures used by all API endpoints is a large undertaking. There are only a dozen data structures for migrations, a much easier task.

1 Like

I’m a little confused about how this will be used though. Forgefriend’s project base implementation stores issues and pull requests data in a separate git repository, for which YAML will be sufficient. But JSON documentation makes me wonder if you are going to make it available via HTTP as well.

Excellent question. I prefer JSON over YAML because I think (but I may be wrong) that it is more likely to be accepted as a standard format by existing / future forges for their native implementation of federation, primarily because it is more widely spread to store data. I did not find an equivalent of https://json-schema.org/ for YAML: a quick look led to this article and this example that use JSON schema to validate a YAML file. And even if it does exist, I very much doubt there are readily available libraries implementing YAML Schema in various languages. Last but not least, it is a better fit for ActivityPub/forgefed that is based on JSON-LD. If forge information on file is in JSON, documented and agreed on by ForgeFlux, forgefriends & Gitea, it will provide a sound foundation for forgefed to re-use for the next iteration of their vocabulary.

Just created a merge request to keep track of the work done. I’ll add YAML validation, schemas for all data structures and minimal tests before starting the discussion in the Gitea forum/chat.

1 Like

My biggest gripe with JSON is that it’s limited by it’s JavaScript origins: legal JSON’s integer is 32-bit signed integer

I was not aware, can I read more about this somewhere?

The following thread was started in the Gitea forum. All the bits and pieces are in place and it turns out to be relatively simple. But there is code in many places and if there is push back it will be difficult to keep it up to date.

@realaravinth Lunny suggested to send a pull request to keep the conversation going. I was hoping to get feedback on the forum, it was worth a shot. But the preferred discussion place for gitea are issues/pull requests. If you have time it would be useful to create a PR in draft state with this merge request. If not I’ll ask someone else: you were already burdened with numerous manual proxy requests lately :blush:

1 Like