Marco (@sklivvz) gave me the greenlight to share the contents of our conversation here. I’ll paste what I found most relevant, as food for thought.
Note, much of what is shared here is already available in the form of public talks, which can be found on YouTube. I recommend for those who want to delve deeper and can find the time to watch them to do so.
You are correct, Stack Exchange uses one db for each community (in fact two, one for meta and one for main). The reason for this is mostly historical – the site was not built to be a network. This approach has a lot of advantages but also some disadvantages. It’s super easy and safe to use: once the correct connection string is loaded for the request, everything that happens will never interfere with the other sites. Also this is great for canary deployments: you can run db migration and deploy new code only on some test sites and not all the network very easily. (…)
On the other hand some other stuff becomes pretty hard and in fact characterizes how the network works.
For example, there’s one copy of your user on every site and you have to “join the community” to create a user. Almost all you see on the site applies only to the site: reputation, moderation, powers, reputation levels, help center, etc. Which is great if you want this, and basically you have a lot of data that is different acriss sites (imagine having a “siteid” column in basically every database table!), but not so good if you want it to be the same.User syncing is a complex matter on stack exchange, well in fact any information passing between sites is challenging and goes through API calls or background jobs, Do you want to ban a user on all sites? It’s a batch that calls every site to ban the user there. Do you want to migrate a question? It’s a background job.
On the other hand, if in your design you want to share information across sites (e.g. do you want to have a single user with a single rep everywhere?) then perhaps a multisite db can be better.In other words: the best solution depends on what you want to get out of it.
(…) there are in fact two special dbs which are also used: one is the stackexchange.com website (…) and the other is the “sites” database which is the database which contains cross-sites data (like for example the ~300 database connection strings).
Scalability wise, of course, one db per site is much much better, as it has scalability built in. At the moment SO runs on a db cluster and SE on another cluster. This was achieved effortlessly, just by changing the connection strings.
Finally for Teams I think they used different schemas to achieve something similar to different dbs, whilst keeping open the possibility of making cross site queries (which is super hard with sharded sites, and SE essentially achieves via elastic search).