DB Schema Proposal Round 6

cbrumbaugh · 5 February 2020 04:04

Something else, we are going to have audit tables that record all of the changes, so I see no need for specific history tables.

manassehkatz · 5 February 2020 04:15

There are two ways to go on this:

The id is set in the software permanently.
In this case, no need for internal_name. The id gets assigned as a manifest constant and done. However, in that case the database table has to be always set up the same as far as id - can’t just “add” the records when setting up a new instance (or a test database) because there is no inherent guarantee that autoincrement ids will do what you want.
The id is arbitrary.
In this case, yes you need to add some sort of internal_name field. But to speed up processing I wouldn’t actually be checking that field all the time. Instead at system startup I would read the table into memory and then work with the ids - essentially creating constants (for duration of execution) from the database.

I have used both methods at times myself. Hard to say which is “better”. Done right, there is a little overhead either one place or another but the bulk of the code (everything except database creation and system initialization) will be IDENTICAL. So no real gain for one or the other in terms of readability, if done right.

Actually, putting these fields (to determine what types can have answers, what can be 3rd-party edited, etc.) actually helps the software. This way the software doesn’t have to know about “Meta” specifically, rather it knows:

Writing has 2 categories, Main & Meta
Main has a Post Type that allows Q & infinite A & comments & edits (aka regular Q&A)
Main has a Post Type with Q but no A, comments, limited 3rd-party edits (aka Blog)
Meta has a Post Type that allows Q with single A & comments & edits (aka Meta with single answers)

The software then just looks to each feature setting to determine what to do.

No specific code for “this is a blog” or “this is Meta” or “this is an Article” or whatever.

luap42 · 5 February 2020 08:55

No, but IMO, more storage-efficient. If we have 1000 users with each 15 privileges, your approach will lead to 15000 (15k) entries, whereas mine would not lead to more than 15 (assuming no privileges have been revoked). If privileges are revoked, your approach would not change, mine would increase the row count, but we can assume that priv revokation should be more seldom than priv gain.

I don’t think we need a secondary table for that. Columns on post are fine IMO.

SE handles it that way: A post can be closed (applies close banner), locked or noticed or locked and noticed. This is what my proposal is aiming for.

That doesn’t make sense to me. Tag sets don’t have a parent tag. There may be one tag set for Meta, one for Q&A and Wiki and one for Blog. These don’t neccessarily have a parent tag.

Finte to me, as long as they are queryable by the core software (post history) and also contain for example post notice changes.

manassehkatz · 5 February 2020 15:30

Two tables is more storage-efficient, but it:

Limits privileges strictly to Trust Level. If we add more privileges based on direct assignment - e.g., moderator-like privileges for someone who is not a full moderator (or even privileges beyond normal moderator privileges, which would be on a limited basis for overall management), etc. then a single table with additional tools to manage the privileges would work much better. For the ordinary Trust Level privileges, it is no big deal to add the 15 privilege records when TL3 (or whatever) is reached by a user.
Requires more queries for ordinary tasks:
- Finding out “what privileges a member has” changes from one simple query to two (one to get what they “should” have and another to get what they don’t have any more)
- Finding out a single member privilege becomes two queries instead of one or requires extra code to handle it (e.g., “if trust-level2 and (not query-member-privilege-X-revoked)” instead of “if member-privilege-X-active”)

The storage is “nothing”. 1,000,000 users and 100 privileges - yes, that’s a lot. But most communities will be far smaller (remember, for better or worse each community is a separate DB, this is where that helps!) and most users will have relatively few privileges (show up, ask a ~~homework~~ well-formed question, get answers, leave), so the typical table might be more like 100,000 x 10 privileges (the low-level users) + 1,000 x 100 privileges + 100 x 200 privileges = 1.12 million very small records. No big deal.

avonwyss · 5 February 2020 16:56

The post seems to be missing date information (when created, modified, deleted), that seems odd to me.

Also, it may be a good idea to have a member_post_views table (which should also have a lastvisited timestamp) as to enable the user to see new posts, and to allow future analytics (e.g. what kind of topics is a user interested in? etc.).

I have seen that there should be a generic audit mechanism, however I nevertheless think it may be a good idea to separate the actual full post content out into its own table, e.g. post_content with a version/revision and date? This way most of the DB operations could be done on the post (there could also be a summary character field there with limited size to allow for displaying post lists quickly), and one could still easily retrieve content, diff, view changes etc. per revision of a post.

Just a note on the post-only approach (versus having questions/answers/wiki…), I think this is a great idea since it also allows for forums-like functionality (e.g. replies to replies). The post-type should control what (whether) nested posts are allowed and how deep, all questions/threads etc. would be posts without a parent.

However, following up on that approach, comments could also be seen as yet another nested post, since it shares lots of the same columns anyways.

That being said, I think that the is_meta should be removed since it is redundant if the meta functionality is using its own post-type.

cellio · 5 February 2020 17:38

There’s the flip side as well. While not in MVP, one of our goals is that moderators be able to apply restrictions in a way less coarse than suspension. Somebody who participates well in general but just can’t seem to participate in comments without getting out of hand could be suspended from commenting for a time. Someone with a long sad trail of bad edits could be suspended from editing for a time. (SE has the latter.) We haven’t worked all this out, but we do want to keep that door open.

(I’n not commenting on which approach to tables is better, just presenting a functional need.)