Decentralized website

MartijnWeterings · 12 November 2019 12:58

Hello,

I have come here from Stats.StackExchange and am similarly dissatisfied with the organization as others here.

Over there I wondered whether a more decentralized website/format could be possible.

What I imagine is the organisation of the Q&A site in terms of a set of protocols that allows posting and retrieving Q&A and the edits that are made to them (as well as potentially governing the comments and votes, although this might possibly be done separately).

I imagine that the content should be able to be hosted as a peer to peer network or as a network of servers. Then the content is being handled in a format that is very flexible in how it is being spread out to the world.

This type of organisation of content will disentangle the host of the content/backend from the host of the frontend and reduces any dependency on the owner of the frontend website (which is currently StackExchange/StackOverflow). Possibly it might also make the hosting of the site cheaper (by that I mean the physical part, the hardware and electricity).

Is this viable in this Codidact project, or is it either a step too far or a silly idea?

manassehkatz · 12 November 2019 15:18

There has already been some discussion in the Discord system (gradually being moved to this forum) about decentralized systems. Bottom line: My opinion, and some others though not 100% consensus, is that it just doesn’t work well in the real world. There are a bunch of technical/logistical reasons why it is just not likely to work well.

What this project does allow (no matter how the details of each instance are eventually done) is for another person or organization to copy the entire project and then do whatever they want (within license privileges which are still to be determined, but which will be on the relatively open side of things). So if Alice & Bob decide they want to have their own Codidact server with their own data, therefore separate (decentralized) from the main project, they can do that without a problem. And if they want to fork the code to include distributed decentralized storage across multiple servers, they will be free to do that too - but I don’t think it will make sense for the main project.

cellio · 12 November 2019 16:25

I’d still like us to keep the door open to decentralization. Even if, for the forseeable future, we’re working toward a single-hosted Codidact system with multiple communities, please build it in a way that a community (or several) hosted elsewhere could still join the network without moving data.

This means APIs (for joining, sharing content, network-wide stuff of whatever sort) and also some sort of governance policy – what criteria apply if a site wants to join, or do we say anybody can? There might be a community that needs to be separate for legal or national-censorship reasons, on the one hand, and we’d want to support that. And on the other hand, we might be unwilling to put the Codidact stamp of approval on a site dedicated to child porn. In between lies much murkiness.

manassehkatz · 12 November 2019 16:39

The key question I think then is “the network” vs. “the same software”. Just as many different chat/forum/bulletin board/etc. systems for many years (going back to dial-up modem days) have allowed you to clone the software and make your own separate system, I think the same would apply here. Allowing others to “associate” as part of a network really complicates things in terms of:

Governance
Legality of particular content, including country-specific issues
“Be nice” (if a site doesn’t want to “be nice”, we can’t stop them from using the software, but we should not be part of a network with them - and defining “nice” gets a bit tough…)
API - needs to do a lot more to handle the back & forth of various bits of data to keep everything synchronized, particularly since we can’t control/maintain other people’s servers
and probably other things I haven’t thought of yet

So I am really inclined to a A Codidact instance can have multiple topic sites but is governed by one group of people and resides on one server (or group, but technical as in load-sharing, etc., not as in “really different”) and anyone else can make their own Codidact server for any reason they want. We could, relatively easily, provide a single sign-on/authentication system so that other Codidact instances could very loosely associate, but the profile details (beyond username/authentication), Q&A content, etc. would be totally separate, which should, I hope, avoid a lot of technical issues and also legal issues (i.e., if we aren’t moderating a particular topic site then we are not responsible for what someone posts there).

ArtOfCode · 12 November 2019 17:17

I tend to hold a similar opinion on this. Decentralization is a nice thing to have, but there’s significant technical challenges towards implementing it that we certainly don’t want to address in MVP, because it’ll slow us down considerably.

Once we’re closer to being set up and running, we do need to turn consideration to things like - as Monica said - the criteria that apply to sites that want to join, and we’ll need to balance that against our available hosting resources. Ultimately, the more sites we host, the more it’ll cost in terms of both technical resources and human resources to staff and support. That’ll be offset some by the added publicity providing some degree of boost to our funding, but there’s likely to be a disparity and we need to make sure we balance that.

That said, being open-source software gives us a degree of decentralization for free - since anyone can download the software and set up their own instance of it, sites that we don’t feel able to host for whatever reason on the “official” instance can set up for themselves. To that end, I do feel that we should make APIs and data dumps and the like available to make it easy for communities to migrate - both in to a Codidact instance (such as an SE community moving to our “official” instance), and out of one (such as a community that doesn’t like our governance wanting to set up for themselves).

cellio · 12 November 2019 18:44

I agree with most of this. In case I wasn’t clear, I’m only asking that we develop in such a way that we could enable a distributed network – i.e. different Codidact instances talking to each other to link profiles, propagate network-wide announcements, maintain a global listing of network sites, etc. I’m not saying we need to do any of that early on (and we might not do it at all), but I’d like to keep the possibility open.

Anybody can take Codidact and set it up; we’re not locking down the software at all. So, as we go, we should track what’s needed to actually set up an instance, to enable others to actually do that.

gilles · 12 November 2019 21:11

Decentralization is hard to build. A decentralized information repository is harder to use: at any point, not everything is easily accessible if at all (DenverCoder9, what do you see that I can’t?). Decentralized information repository is harder to share: different people have a different view of the available information (“Look at the three answers to this question. — Huh? It only has two.”).

The problem we’re trying to solve is creating a repository of knowledge, not aggregating existing knowledge. How is it useful to spread the information among sites?

I strongly oppose peer-to-peer storage. I want information to be available to everyone, not lost because nobody happens to be sharing a particular block anymore.

There is a dependency on the owner of the frontend anyway. If it isn’t the content server, it’s the content server directory — whatever serves as the entry point to the platform. We can solve ownership of the entry point through legal means, with a sutiably governed nonprofit organization. We don’t need any fancy technical solution for that.

This feels like a solution in search of a problem to solve. It definitely creates many more problems.

cellio · 12 November 2019 21:36

I meant decentralization at the level of the community, not at the level of the individual post. The latter would be completely unworkable.

On SE, Ask Ubuntu and Math Overflow are sites with external affiliations where someone else might have preferred to host the community. (I’m not deeply familiar with the history of either site.) SE has several single-product sites that might fit better into sponsoring organizations’ structures, or not. Also on SE, Christianity has long struggled with some of SE’s expectations (at least based on reading their meta) and perhaps would struggle with ours but be happy to stand up their own server.

Maybe it’s not a strong use case; I haven’t thought deeply about it. I just worry a little that if we say you can only do this through our network, then we could have SE’s problems several years from now. Maybe it’s premature to think about that now; I thought it wouldn’t be hard to keep the door open on decentralized communities, and so suggested it. If that makes the project way more complex, then I agree that those wanting it need to make a stronger case.

MartijnWeterings · 13 November 2019 14:31

Maybe I am getting too old and I am nostalgic about the old (but much simpler) internet, when content seemed to be more spread out. Several companies found that not so useful and gathered everything into one single place. Now all searching is done on Google, all online sales on Amazon, all social stuff on Facebook (or whatever they bought up), all videos on Youtube. Internet is gravitating towards a commercialized mono-culture (there are some alternatives but it remains limited).

Creating a fork is nice, but the Spaghetti that you eat with it is the nice stuff. Maybe I am too naive about the technological part of a project like this. I imagine that the actual basis/content of the sites, whenever it is according to some standard, could be swapped or shared from place to place and that the content can be kept separate. Then the communities can grow much more independently, while possibly sharing repositories of questions and answers (which can be simple and only need to contain some sort of version control and way to credit the originators) on a shared server (but it will be, due to the split architecture, easier to move around and rebuild elsewhere).

It is a bit like how I dislike that my Facebook profile is stuck to Facebook. I can not easily take of the Facebook-coat and put on a different coat. There is all sorts of integration that must be cut out. I can take all the posted photo’s and video’s but basically all the history links with friends and interactions in messages and posts is being lost.

So what I am thinking about for a decentralized Q&A is something analogous as what https://diasporafoundation.org/ is to Facebook. At least I am personally reluctant (and I imagine others might be as well) to start again creating questions and answers on just another copy of StackExchange/StackOverflow. How much different is this new Q&A site gonna be from SE/SO if it ends up as a nearly similar web 2.0 concept (contrary to say this definition of web 3.0)? What prevents the same problems to come back?

MasonWheeler · 13 November 2019 16:17

Which is problematic, if our value system is supposed to value creating a strong community. Having a system that makes it easy to break up the community is then antithetical to our values.

manassehkatz · 13 November 2019 16:26

@MasonWheeler Which is problematic, if our value system is supposed to value creating a strong community. Having a system that makes it easy to break up the community is then antithetical to our values.

Not necessarily. Keep in mind three things:

There are really two products here: Codidact == Software for a Q&A site and TBD == An instance of Codidact with content imported from SE (exactly which/what TBD) and new content with a core initial group of largely former SE users who would like to start a new Q&A community. We have to build Codidact before we can start an instance of it to support our community goals, and doing so open-source is supported by most of the people involved at this time (certainly myself included).
Just because something is open-source does not mean it will actually get used “elsewhere” to any significant degree. Sometimes that is the case (look at the number of Linux distros), sometimes it is not.
There may be additional communities where for a variety of reasons:
- a group wants to have a private Q&A system like SE Teams but have full control over the details
- a group wants to have a public Q&A but is not compatible with the “be nice” or other policies enforced by the Codidact development group on its own instance) and this would allow them to use the software without (for better or worse) any of the policies.
- a group in another country is unable (due to government restrictions limiting outside access to sites that have relatively “free as in speech” content) to use our primary instance of Codidact but would still like to benefit from the environment it provides by setting up their own instance inside their country for similar uses (well, without as much “free as in speech” on certain topics, but with “be nice” and multiple topic sites, etc.)

gilles · 14 November 2019 07:35

The software is open source. The content is open source. So there’s no technical or legal obstacle to forking. All you have to do is to convince people to join your community. What problem does “decentralization” (and I still don’t really understand what you mean by that) solve?

To put it another way, in what way is Wikipedia overly centralized? What would a decentralized Wikipedia look like?

sklivvz · 14 November 2019 08:08

I think I have a good idea on how to build a decentralized network without impacting the development of codidact significantly:

codidact remains as a public server with its own backend datastore
stack exchange remains as a public server with its own datastore
other sites (on codidact or se or other software) can also be added

all that is needed is a communication mechanism (an API) that connects rep/questions/answers/comments/users across different sites.

Such an API can be built on top of SE’s API (albeit in one direction only), but I see no major problem in implementing it as an add-on to codidact or other similar software.

How would this work? On a cron (e.g. daily or hourly) new content can be exchanged via the API. Each server participating in the network will have a list of other servers to sync to. Each server is free to do whatever it wants with the content (respecting to CC-WIKI of course). Typically the API implementation would have some server reputation mechanism to prune the incoming firehose.

Having this network has significant advantages:

if a server goes off the grid or gets paywalled or becomes evil, their content will already be shared across the network
the community can be local to a server (which is good) but it can also migrate easily to any other server without losing content (which is also good)
different servers can serve different purposes (e.g. one server could be read-only and focused on being a googleable reference, another focused on curating content, another on community building)

Regarding the feasibility of the system, it has been implemented in the past e.g. with FIDONet. Newsgroups are also distributed systems but without many of the essentials features we’d have.

gilles · 14 November 2019 08:17

@sklivvz Reading your post as a user of the system rather than a developer: “blah blah blah tech stuff blah blah blah more tech stuff”

As a user, what is this all about? I know that “servers” are involved in storage and communication, but why should I care how they work? I’m a member of my community. I go to this site that I’ve bookmarked (or I can type a URL, or I can search for the name in a web search engine), and I see the same content and the same people every day (modulo whatever’s changed since yesterday).

sklivvz · 14 November 2019 08:24

The first major difference for a user is that if their server becomes evil or goes down, they can (almost) seamlessly go elsewhere and find their content already attributed.

The other major difference is that you’d see extra content (and in some cases significant amounts of it) appearing with a slight delay on your “daily” server.

You’d also see some notice of other servers, depending on how the server implements it, but I think this could be fairly similar in appearance to how stack exchange implements its own network, except the servers would be different nodes.

MartijnWeterings · 14 November 2019 08:38

Maybe I mean that the site should be modular. Yes you can take the entire code and database and put it somewhere else, but can you? What if you want to change feature X how is it gonna impact Y and Z? What if you want to just use the content but it has been organised in a way that worked for the particular site but becomes awkward to reuse somewhere else?

I guess that what I am steering at is ‘to have the site set up in a more independent way’, independent from presentation. Keep the content in a format that is as much general as possible and does not involve any particular style of the website. The goal is to have a database with questions and answers. All the frivolous additions should stay away from the main data.

How you organise the site does influence the shape of the content a lot. On SE/SO questions and answers you have a lot large contributions by single users. This makes keeping track of ‘ownership’ and ratings like ‘reputation’ important and that meta data becomes important (but is not easy to transport/move to another site; because it is private data).

This contrasts with the wiki-type posts and articles which are much more like many little contributions from a lot of different people. The ‘ownership’ is much less important.

(To explain better how I feel about this: on Wikipedia I have much less troubles edditing a piece off text directly. However, on SE I either place a comment or post an answer/question myself. On SE I somehow feel obstructed to edit someone else’s post and feel it is much more ‘their’ work.)

A wiki can be copied one-to-one but the SE-style Q&A can not be copied 100% because the way it is organised creates some connection (e.g. see how we discuss the transfer of reputation and votes which would not be an issue for wiki-style data).