If that is the case, and there is only one entity controlling the “main” repository, then the problem is not solved but ignored. There are three main issues to prevent to have resiliency from an entity going bad:
Data ownership. This has been solved by SE already by having all data be creative commons licensed.
Software ownership. This will be solved by open-sourcing Codidact.
Community “ownership”. This can be solved by a distributed system but not a centralized system.
Having a centralized community invariably leads to a lot of platform lock-in. Communities do not simply “leave” as you seem to assume. They split, instead. And this makes some of the parts (or all) fail. Even assuming a community leaves a serve en masse does not automatically mean they go to the same next place. What you want here is a system that does not require communities to be bound by a single server such as a distributed one. This minimizes the impact to the community since it’s already “split” on servers, yet unified by APIs.
Distributed systems of this kind worked extremely well for a long time, e.g. FidoNet or NNTP for text or IRC for chat. I don’t see a practical reason why a properly designed API would not work equally as well or better. What am I missing?
I find these arguments for separate databases persuasive. An idea I was previously missing is that separate databases can still run in the same database server, so we don’t have to have a copy of the full stack for every site.
I don’t want us to make decisions now that will make it hard for us to grow later. I also think that, six months out, we’re going to have 5-10 communities here, so if we can start down a path that makes sense for the “small” case while keeping the door open for other options later, we should do that.
Until today I thought the “small” answer was one DB hosting multiple communities. But I think @sklivvz is right that we can start with separate DBs and add network support within a single DB later when/if we need it.
I don’t have enough tech cred to participate in this decision, but since I’ve previously argued for one DB I want to speak up now to retract that.
This is an interesting idea. I’m just not sure how practical it is. My vision (which is not eactly the same as yours but I don’t know what various other people think of it) was that the Codidact primary instance would function kind of like a “good” SE Network. Central login/authentication. Unified “management” at the top level (both policy and technical). But with most moderation and day-to-day tasks handled at the community-level, but all through servers provided & maintained (both web & database) by the central group.
There would be a number of things - e.g., the centralized authentication/login the most important, but also subdomains (writing.codidact.com, worldbuilding.codidact.com etc. rather than needing to set up a separate domain for each community), easy navigation between sites, shared information at certain levels (user profiles showing info. about all member sites; displayed information on the main pages (not necessarily HNQ but something analogous), etc.) that all gain from central administration. All these things would indeed also provide a certain amount of “lock in”, for better or worse.
The “Community ownership” then becomes a key unsolved issue. We don’t want a situation where the “management” and a specific community end up seriously antagonistic to each other, but if it were to get to that point (perhaps due to different decisions about a Code of Conduct or a desire by a community to add advertising or features that a community wants to add but which the primary instance is not ready to do, whatever) then there should be a clear and easy way for a community to leave with all their data without having to resort to trying to make some huge simplified data dump (e.g., SEDE) or pulling a question at a time via a rate-limited API (each of these having some serious limitations, as we already know).
At the same time, from a very practical standpoint I don’t think that we (Codidact primary instance management) could/would simply “hand over the keys to a specific server” (either at the beginning when a community starts or at the end when a community wants to leave). There are, I think, enough reasons in the nature of data integrity (both technical and “real world”), security, etc. why I think that every database in a given instance, especially the “primary instance”, should be controlled exclusively by instance management and not by individual communities. Once that is clear, IMHO the question of multiple databases vs. single database, etc. becomes a technical issue (as we have been discussing) and not a policy issue.
I want those things too. I think our “good” Codidact instance could, behind the scenes, actually run on multiple instances while presenting a unified public face. This is what I was getting at when I wrote about integrating multiple instances through a common API.
Absolutely. And if we do one DB per community that does become easier; we can presumably bundle up a whole DB for others to install in another Codidact instance (that they’ve set up themselves).
I think emulating Stack Exchange is the wrong approach. Codidact’s central system will only be “good” in its own eyes and it will be, ultimately, no better than Stack Exchange is today - very much detached by the community.
I know it will be unpopular because everyone here is biased by the good old days at Stack Exchange, but I urge you to stop the temptation of trying to recereate that because it’s easy. Do your own thing, solve the problems SE did not solve. They were not trying to build a durable community when they created stack exchange. They wanted to create a centrally controlled set of communities in order to monetize it. You don’t need this.
We do, however, want to ease cross-site participation. I think one thing that means instance-level account services, so adding one of “our” communities is easy. But account management was already going to be separate, I believe.
There’s nothing besides identity management, domains and a few logos that the Codidact primary instance management really has. This is an open source project which plans to have the hosted content in a license that allows exports and imports.
“which allows exports and imports” - which is one aspect of “open” - e.g., don’t make arbitrary extra layers (like SEDE) or impose unnecessary (beyond keeping system running smoothly for regular use) limits
and
“hand over the keys to the server” - Not sure you actually meant "that would be OK, but I am clarifying my thoughts - in order for the network of communities to run well, there needs to be a certain level of control. That should have limits - if a community decides "Codidact isn’t letting us do what we want because we disagree on Code of Conduct (which could go many ways - e.g., clearly a community that sanctions within the Codidact system what the central group broadly agrees to be “hate” - then the community should be able to leave with their data but until they leave they might get put on hold by Codidact management; but also the other way - a community may want to eject users or (no!) moderators due to problems where the central group broadly agrees the steps taken are a poor reflection on the system as a whole).
If a community controls its own community database server directly while still being affiliated with the primary instance then we are in for potentially serious troubles in the future. I recommend we have a policy (i.e., this is administrative, not technical) as part of the primary Codidact instance which states clearly that for each community which (hosted by us) that there are limits, based largely upon the Code of Conduct, and that should a community violate those rules (clearly stated, with room for discussion, etc.) then the primary instance could (a) put that community on hold pending resolution and (b) ask the community to leave. We would then be obligated (by license rules for the basics, but I would argue we go beyond the bare minimum) to provide them their entire data set (no matter how we store it locally, they would get what they need to start their own complete instance) everything except for user authentication (and that is where an extra API level may be key so that a community which is no longer affiliated/hosted could authenticate “returning” users via our authentication server - and then those users would get added to their authentication server).
I still think that communities shouldn’t be separated at all.
Choosing a precise category for a question is the wrong approach. It may belong to multiple categories. A programming question might be related to configuration of the user’s OS etc.
I can think of countless examples, especially in Worldbuilding, where questions tend to cross borders.
Therefore, I still recommend: One giant pool of posts, which are somewhat linked to categories (using weighted links). And for this, a Graph DB is likely more suited.
That’s a good point. I do think that new users to a particular site should somehow be made to read the site policies.
When I said that it didn’t make sense to me to explicitly “join” a site when I was already logged in to the SE network, I didn’t mean that I should be a “member” of all sites automatically. There are some I might explicitly NOT want to be shown as a user of, just to point out one issue.
What I was thinking more is that joining would be somewhat automatic. I read HNQ, find something interesting where I can provide a good answer that others haven’t, and try to answer. Currently on SE, you can’t do that because you’re not a member. I was thinking that the act of answering, or upvoting, or commenting would make you a member. Perhaps there should be a warning Continuing will make you a member of this community, then you can continue or cancel. If you continue, this might be where you are shown the site rules and the like.
This is getting more and more complicated, so maybe the SE way of doing it isn’t so bad after all. Making sure new users read the site rules is more important than to be able to post something quickly from “outside”. In fact, there should be some mechanism to ensure that the site rules are actually read or understood. It shouldn’t be as easy a clicking to dismiss them, or even just clicking a single button I have read and understood the site rules, because then it will get clicked a lot without the rules actually having been read. This is something that needs to be kicked around, but I think it’s off topic for this thread, so I’ll quit here.
Even on far more serious sites - e.g., Amazon where you are going to spend money, store credit card information for future use, etc. how many people actually READ the fine print, as opposed to “scroll down and click ‘I agree’”? I’d guess < 10%. Probably < 1%. Seriously.
So on a site where the PII is basically limited to “email address for account recovery” (pretty much anything else like name can be anything you want), no $ involved, no real consequences to your actions (i.e., the worst in almost any situation is being kicked off of the site), why would most people actually read much of the “fine print”?
SE tried at least with the Tour. Whether or not the Tour was actually very helpful (I think we can and should do better), the reality is that most people didn’t really learn how to use the site from the Tour but rather from diving in and asking & answering. Hopefully not getting too messed up (“bad” question…downvote…leave vs. actually improve the question) along the way.
I think the solution, which has been discussed elsewhere, is to provide more guidance when the new user is actually asking or answering a question. Automatic help screens (but not too annoying), a bit of logic (to try and prevent super-short answers or link-only answers), etc.
The leads and assorted other people met today to discuss this and other architecture issues in detail. Here are some results:
Each community will have its own database. No PII.
Account management is its own service. Has PII.
There is an instance database and service. This facilitates global admin, network profiles, the list of sites, and so on. The database here contains db connection info for the individual community databases. Has PII, e.g. I think any IP-tracking would need to be logged here (right @Marc.2377 / @Helmar?). Might contain other PII, for example an alternate contact method required by China.
Still TBD: one application service per community versus one application instance being able to serve multiple communities. @Marc.2377 is working on a proof of concept for the latter.
If an application instance can serve multiple communities, then here’s how that will work:
Each application knows which communities it can serve. (There was a discussion about nginx assigning sessions to app instances based on the URL but it sounds like we’re not doing that. I missed part of that discussion, sorry.)
When needed, the app asks the network service for database connection info and connects.
We can have more than one app service. Splitting some communities off onto another app is a matter of changing the list of communities served by each instance of the application. (Presumably the sequence is: read-only on source, set up on dest, enable on dest. We didn’t discuss it at that level.)
Or we might end up with one server app per community. Either way, databases will be per-community.
Reddit works that way. You visit a subreddit you never posted to, and you can simply participate without any extra step. From that moment onwards, this subreddit will appear listed under “My Communities” in the drop-down menu (top left). I’m just not sure if there’s an easy way to “leave” a particular subreddit later.
(I was gonna say this discussion is meant for another topic, but - considering the title of this thread, maybe it is not - so I leave it to Admins to make that judgment.)
Monica, I did some copy-editing here to clarify a possible confusion between “server application” (which is in itself often understood) and “application server”. Not sure how good of a job I did, please review. You can see the diff by clicking the orange Pencil icon to the right of your username on the post.
(This post was supposed to be a ‘whisper’ . Delete when you see it, at your discretion.)
Because I want to ask a question without its being rejected by the community.
So before I post a question on a new-to-me site I look for “what’s on-topic?” in the Help and/or the Meta.
Software-wise a solution may be simple i.e. simply let the community customise what the text is on whatever the landing page or funnel is for new users.
We’ve already (or at least I’ve suggested and I’m pretty sure gotten agreement) that landing page language, question guidance, help pages, etc. will all be customizable for each community.
I’m just saying that I think the typical new user won’t read much. They’ll Google “writing advice” or something that gets them to writing.codidact.com and see some questions that sound sort of like what they’re looking for and dive in with their own question. Without reading a detailed intro. page. Without reading how to look for duplicate questions first. Without reading “how to write a good question”. etc. If SE didn’t have lots of users like this, a lot of the other discussions we have had about new users, initial trust level, question rate limits, etc. would not be happening. It is a real problem - the new users who actually take the time to read through instructions are few & far between.
Yes and I guess there’s only one page which you know they’ll see: i.e. the page on which they enter or confirm their user name for the site.
Because they may see+read nothing else before diving in and writing something, IMO the community should be able to put their Welcome message on that very page (as well as possibly on some other pages – like Tour, Help, FAQ, Sandbox, Meta, etc.)
You are mixing two different things here: The topic of the thread (whether there should be separate databases per community — now decided) and the fundamental question whether it should be a distributed system (which I had advocated for early on on Discord, but the majority was against).
Note that Usenet was clearly a distributed system, but NNTP servers usually had an unique “database” that contained posts from all newsgroups. Indeed, one and the same post could appear in several newsgroups (crossposting), something that is decidedly not possible with separate databases (you could have copies, of course, but the whole point of crossposting was that you had only one copy of the message; multiposting (sending the same content independently to several newsgroups) was unwelcome.
Not in principle; I’m thinking this service should be concerned with managing sites from a Codidact instance (deployment, upgrading of the core software, db migrations, load balancing etc) but the way I see it all sorts of PII should really be isolated from both this and the core service. It might fit best along with the authentication service. This is yet another detail to be ironed out with other contributors. We’ll get this sorted out in the days ahead.
@celtschk, @sklivvz - I haven’t read it through, but it seems the other topic listed in post #2 on this thread concerns that specific matter. Decentralized website (edit: ah, it seems sklivvz has already weighed in on that topic - cool.)