Managing references

Objects in F3 may have references to other objects:

  • Issues reference the User that created them
  • Comments contain text fields where references to Issues and Pull Requests can be found
  • etc.

For all forges (Gitea, GitHub, etc.) these references are exclusively interpreted as relative to the forge. While an issue comment can contain a URL to a pull request that resides on another forge, it will not be parsed or recognized.

These references in F3 can be converted into full URIs that can be included in ForgeFed and used as ids. When parsing F3 files they can also be relative to where the parser is within the hierarchy. Here are a few examples:

  • URI: https://example.com/user/1234/project/458/issue/1/comment/435
  • Issue 1 interpreted while parsing https://example.com/user/1234/project/458 is the equivalent of the URI https://example.com/user/1234/project/458/issue/1

The currently implementation needs a few new codepaths to manage references for mirroring purposes because they need to be transformed from being relative to the originating forge into being relative to the destination forge.

  • GetReferences is added to ContainerObjectInterface returning all the references found in the object as a list of tuples:

    • fieldname of the reference as a string matching a JSON field in the object format
    • the reference as a ContainerObjectInterface
    • the list of parents (each of them ContainerObjectInterface)

    For instance, GetReferences on an Issue would return (“poster_id”, *ForgeUser{ID: 43}, *Forge).

  • SetReferences is added to ContainerObjectInterface with a list of tuples (see GetReferences). The existing references are replaced with those given in argument.

  • RemapReferences is added to ContainerInterface with the following arguments:

    • A list of references (tuples returned by GetReferences) of the same type (i.e. User, Comment etc.)
    • the list of parents (each of them ContainerObjectInterface) to which each of these reference belong (they must be at the same level of the hierarchy)

    and returning the modified list of references with identifiers replaced as instructed by the map of identifiers obtained from the parent common to each object. If there is no mapping, the reference is set to the zero value.
    For instance, if an issue comment contains a text that references pull request number 23 and the identifier map has a record that pull request number 23 is mirrored into pull request number 435, it will be modified accordingly.

The context in which the methods are called is the Mirror method. For each object that about to be Upsert (except if they are identical and no action is done), call GetReferences and RemapReferences. The remapped references are Upsert before the object that contains them to guarantee they exist (assuming there is no cyclic references). SetReferences is then called on the object with the remapped references to perform the substitution. The object can now be Upsert: its references have been updated.

For instance, if the “poster_id” of an Issue is user 34 and is mapped into user 584, the “poster_id” field will be updated from 34 to 584 before the issue is Upsert.

As SetReferences/GetReferences will ultimately be implemented by the drivers (API Gitea, internal Gitea), they will be methods implemented by providers.

There are a number of providers, each of them designed to simplify the implementation of the driver by supporting variable number of strongly typed arguments. At the moment each of those provider base types also implement all methods that have the same signature such as ToFormat, GetImplementation etc. That is a lot of redundancy and it will keep growing.

To keep the provider base class DRY, the ProviderBase class is introduced, with all methods that do not have variable signatures.

I thought more about how the proposed methods would be used and here is what makes sense to me.

When an object is read from a forge (Get), it contains references to other objects. The GetReferences method returns all these references so that they can be stored in memory and used later, for instance when mirroring. The references may be relative to the object they contain (for instance when they are about a user) or absolute (for instance when they are about an issue that exists in another project). The SetReferences will walk the forge hierarchy to store the reference where it belongs.

When mirroring, an object is Get and SetReferences(GetReferences()). But the identifiers (issue number for instance) it contains are meaningful in the context of the originating forge: they need to be translated into the corresponding identifiers in the destination forge.

The first time around, these identifiers do not exist and should be set to 0 because the object will be created. The identifiers created when the object is created in the destination forge are collected and stored in a map in the originating forge. When the originating forge is F3, they are stored in a _identifiers.json file.

The second time around the identifiers map returned by GetIDMap (which reads from _identifiers.json when the originating forge is F3) are used to substitute the identifiers from the originating forge before the object is upsert in the destination forge. This is required for the object identifier itself but also for all the references it contains. For mirroring, the object from the originating forge is transformed into the F3 in memory representation and contains the identifiers relative to the originating forge. The RemapReferences function is responsible for modifying the F3 in memory representation (and not the originating forge objects) using the relevant identifier map for each of them. The modified F3 in memory representation is then given in argument to Upsert: all its references are now relative to the destination forge.

The order in which objects are inserted matters. If an object is inserted after all the objects it references, RemapReferences will be able to remap all identifiers effectively. To guarantee that order, mirroring is required to first mirror all the references before mirroring the object. Note that there may be circular references but they only happen when extracted from a text (for instance an issue comment can reference an issue comment that has a back reference to the same comment) but avoid them can be left for later.