Indeed there’s some very interesting aspects to this. I would like to add some notes, though lacking much of the domain knowledge on code forges and particular ideas being considered for ongoing code federation projects such as ForgeFlux.
-
Talking about a ‘crawler’ is inaccurate terminology in the context of Murmurations. Instead of pulling information, it is pushed to unknown subscribers (therefore the protocol is very interesting for any party that maintains directories of information).
-
In any offering of murmurations-like features, even though from end-user perspective it seems bundled/integrated with e.g. ForgeFlux, the murmurations code can be (one or more) wholly independent libs, and also evolve separately as a project.
-
In Federated Murmurations the centralized Index will not exist. Any publisher will push information to any Index / Aggregator from where it federates across the network to any other interested Index / Aggregator, where “interest” is determined by the type of information or particular metadata defined in the information schema (this works similar to Topics in traditional message queueing systems, but the topics being much richer in information on which you can filter).
-
The Index / Aggregator can sit on multiple locations. If a forge has federation support (as Gitea will have) the murmurations libs might be installed natively on it, or as an optional extension.
-
If a forge has no federation support (e.g. Github) then pushing a murmurations profile to an Index / Aggregator may be a very lightweight action. No ActivityPub is required. A REST-based POST to submit the JSON would be enough, and - for Github - this might be encapsulated in a Github Action or what-have-they.
-
Interesting idea: There may be many different profile schema’s that define stuff interesting to aggregate and subscribe to. While Murmurations projects offers an online UI to easily create conformant JSON, the most likely scenario in a code forge context is that the JSON is a regular version-controlled file in the repo codebase. That being so, existing JSON’s are already candidate profiles… think a
package.jsonin the case of a NodeJS project. -
Re:Wishlist. I don’t know what data you wanna collect, but with JSON files under source control, access would be per repo and under control of the maintainers. For privacy-sensitive stuff that is a different matter and you’d have to look where forges allow to stash such data securely.