I agree ![]()
As a developer working on a project I would want to be able to have a local copy (via federation) of all the activity of all forks. But of course I would not be interested in getting that amount of information on a project that is only a dependency of my project and for which I only care about a single issue.
I can see how some forge instances with lots of resources could get very ambitious and keep all the data they can collect regarding each and every project they get in contact with.
But I think the reasonable default would be that such information is not collected by default. To be more concrete, you may create a project on your own forge that “follows” a project on another forge and get a local copy of all it contains. That would be the default. But you could also tick a box that says that you’re interested to recursively follow all followers of this project, effectively getting activity updates from all forges where this project exists.
Even the largest projects, such as the Linux kernel can be federated in this way with resources that are affordable to an individual. But in the vast majority of software projects the space/cpu requirement are much, much lower ![]()
GitHub is huge because it has many software projects but when thinking about each of them individually, everything is manageable with very little resources.
That would be a problem if getting a local copy of all the data for processing purposes was not possible or because it requires resources that are expensive.
My other job (fedeproxy is 50% only) is currently to work on the storage of https://www.softwareheritage.org/ which crawls forges to extract the code they contain. It is used by researchers to analyze the Free Software code that was harvested (around 750TB currently). It currently is highly specialized, in the sense that it has no commonality with any forge although it contains and publishes repositories of code and you can browse the code using an interface that is very much like what you find on GitLab. It would make sense to me if, over the years, it evolves into being just another forge, federated with the others. If the people maintaining the project are not interested in managing users, it could be read-only. It would just be configured to follow every project of every forge and store whatever it contains, everywhere.