[Starchart] Murmurations evaluation

@circlebuilder mentioned(@aschrijver on this forum) that Murmurations could help with implementing an ecosystem-wide crawler like ForgeFlux/Starchart. So I performed feasibility evaluation of Murmurations for Starchart within the ForgeFlux and the broader ActivityPub ecosystem.

Comparison Notes

Timely updates

ForgeFlux deals with implementing federation external to the forge. As a result, we are unable to source events that are not published on the
forge’s notification endpoints in real-time. Murmurations require the data owner/subject to notify aggregators(Starchart in this case), of
changes so that the aggregator might update the data relevant to the subject stored on their systems.

For events that are published on the forge’s notification endpoints, they can be discovered by simply subscribing to the actor via
ForgeFlux/Interface.

Index of Nodes: WebFinger overlap

ForgeFlux ActivityPub implementation includes WebFinger, which is essentially an index for all related resources that are owned by an actor. Murmurations seem to provide a similar functionality. What is missing, however, is a map that charts all the various actors in the network. In software forge federation context, a map of all Free repositories, users and forges in the federated ecosystem.

Wishlist

It would be interesting if we could come up with tools to offer access and privacy control options on a per actor(repository and user) basis.
This would be similar to a robots.txt file per repository/user to instruct spiders on how their data should be used/served.

Resources:

2 Likes

Indeed there’s some very interesting aspects to this. I would like to add some notes, though lacking much of the domain knowledge on code forges and particular ideas being considered for ongoing code federation projects such as ForgeFlux.

  • Talking about a ‘crawler’ is inaccurate terminology in the context of Murmurations. Instead of pulling information, it is pushed to unknown subscribers (therefore the protocol is very interesting for any party that maintains directories of information).

  • In any offering of murmurations-like features, even though from end-user perspective it seems bundled/integrated with e.g. ForgeFlux, the murmurations code can be (one or more) wholly independent libs, and also evolve separately as a project.

  • In Federated Murmurations the centralized Index will not exist. Any publisher will push information to any Index / Aggregator from where it federates across the network to any other interested Index / Aggregator, where “interest” is determined by the type of information or particular metadata defined in the information schema (this works similar to Topics in traditional message queueing systems, but the topics being much richer in information on which you can filter).

  • The Index / Aggregator can sit on multiple locations. If a forge has federation support (as Gitea will have) the murmurations libs might be installed natively on it, or as an optional extension.

  • If a forge has no federation support (e.g. Github) then pushing a murmurations profile to an Index / Aggregator may be a very lightweight action. No ActivityPub is required. A REST-based POST to submit the JSON would be enough, and - for Github - this might be encapsulated in a Github Action or what-have-they.

  • Interesting idea: There may be many different profile schema’s that define stuff interesting to aggregate and subscribe to. While Murmurations projects offers an online UI to easily create conformant JSON, the most likely scenario in a code forge context is that the JSON is a regular version-controlled file in the repo codebase. That being so, existing JSON’s are already candidate profiles… think a package.json in the case of a NodeJS project.

  • Re:Wishlist. I don’t know what data you wanna collect, but with JSON files under source control, access would be per repo and under control of the maintainers. For privacy-sensitive stuff that is a different matter and you’d have to look where forges allow to stash such data securely.

1 Like

Thank you for taking the time to provide feedback :slight_smile:

Talking about a ‘crawler’ is inaccurate terminology in the context of Murmurations.

I chose to use a general term instead of Murmurations-speak, but the equivalent in Murmurations is an Index.

In Federated Murmurations the centralized Index will not exist.

I have been thinking about federation for both ForgeFlux/Starchart and ForgeFlux/Northstar, which are both indexes. I’m trying to understand if federation will make sense in the context of indexing software. Both Starchart and Northstar will publish indexed items periodically, and both will have capabilities to follow and merge items from other instance(configured by instance admins). I’m not sure if federation would require additional features.

ForgeFlux is trying to implement federation externally, so most of the forges that we will deal with won’t have native capabilities :slight_smile:

The UX that you are suggesting(POST request with CI) would require repository maintainers to configure and maintain integration, which could cause friction. It might make sense for more sophisticated projects because Starchart will only store and serve minimal information, that is already publicly available(repository link, owner, name, description and tags, if any)

Good idea, but there should be easier ways to do it. I’m torn between asking all interested devs to add another configuration file to their repositories and the simplicity of simply storing configuration files in the repository to makeup for a forge’s lack of features.

P.S apologies for the delayed response, this week was pretty hectic for me.

1 Like

Oh, wanted to mention (because I bumped into it again) that there’s also this W3C recommendation: WebSub

1 Like

And another candidate that I actually kinda forgot about: https://skohub.io

This already does PubSub and based on git workflow. Really interesting to watch this video:

1 Like

For the record, @6543 on GitHub suggested publiccodeyml format for storing repository data for use in the Git-based synchronization mechanism to share indexed data. Please see here for discussion the with them.

For folks that don’t have a GitHub account, I’m willing to ferry comments over to GitHub and I will also ask them if they’d be willing to continue the discussion on here.

1 Like

There are many elements of this format that are similar or identical to how wikidata stores information about FLOSS. It is not a file format but it is (to my knowledge) the most widely used and maintained database of Free Software: a SPARQL query lists ~15,000.

1 Like

Manually maintained directory(SkoHub, publiccodeyml and Wikidata) are able to deduce attributes that can’t be auto-generated from repository data provided by forges. It might be worth adopting indexing-friendly features in forges to better describe the projects they host but IMO, it should be a post-federation goal.

With Starchart, I’m trying for maximum interoperability(with existing standards like publiccodeyml and wikidata) with the data available right now. :slight_smile:

1 Like

Updates from discussion on GitHub:

@ruphy from publiccodeyml reached out to discuss the standard’s implementation in Starchart and suggested I open a thread on their forums. So I started a GitHub discussion(their forum of choice) to discuss the same.

1 Like