DB Schema Round 7

cbrumbaugh · 5 February 2020 03:41

Change log

Added setting,category,category_post_type,member,post_duplicate_link
social_media → social_media_type

setting

display_name (text) UNIQUE NOT NULL
current_value (text)
can_be_changed_by_mods (boolean default to FALSE)

category

display_name (text)
url_part (varchar 20)
is_primary (boolean default to FALSE)
short_explanation (text)
long_explanation (text)
contributes_to_tl (boolean default to TRUE) – whether this category is included for trust level calculations
minimum_tl_to_participate (bigserial FK to trust_level.id)

category_post_type

category_id
post_type_id
is_active (boolean default to TRUE)
UNIQUE CONSTRAINT (category_id,post_type_id)

post_duplicate_link

original_id
duplicate_id
UNIQUE CONSTRAINT (original_id,duplicate_id)

member

display_name (varchar 150)
bio (text)
profile_picture_link (text)
is_temporarily_suspended (boolean default to FALSE)
temporary_suspension_end_date (date)
temporary_suspension_reason (varchar 35)
trust_level_id (ref to trust_level.id)
network_account_id (bigserial, nullable)
is_moderator (boolean default to FALSE)
is_administrator (boolean default to FALSE)
is_synced_with_network_account (boolean default to TRUE)

trust_level

display_name (text 100) UNQIUE NOT NULL
explanation (text DEFAULT NULL) UNQIUE

social_media_type
The types of social media that the member can display in his profile

display_name (text 100)

member_social_media_type
The social media that the member would like to display in his profile per community

social_media_id (bigserial)
member_id (bigserial)
content (text 100)
UNIQUE CONSTRAINT (member_id,social_media_id)

privilege
Table for privileges

display_name (text 100)
description (text)

member_privilege
For recording which members have which privilege in a community. If a member has a privilege suspended, then that is also recorded here, and a nightly task will auto undo the suspension once the privelege_suspension_end_date has passed.

member_id (bigserial)
privilege_id (bigserial)
is_suspended (boolean default FALSE)
privilege_suspension_start_date (date)
privelege_suspension_end_date (date)
UNIQUE CONSTRAINT (member_id, privilege_id)

post_type
Records the type of post, question/answer/blog etc

display_name (Text 100)
description (Text)

status
For setting the status of a post locked/featured etc

display_name (Text 100) UNIQUE
description (Text)

post
I thought about splitting into a answers table and and questions_table but doing it in the same table lets comments have a post_id instead of a questions_id and a answers_id. Meta posts are denoted by the is_meta column. Type of post is determined by the post_type_id

member_id (bigserial)
title (Text 255 default NULL)
body (text)
upvotes (bigint)
downvotes (bigint)
net_votes (bigint)
score (decimal)
views (bigint)
is_accepted (boolean default FALSE)
post_type_id (bigserial)
is_closed (boolean default FALSE)
is_protected (boolean default FALSE)
parent_id (bigserial FK to post default NULL)
category_id
is_deleted (boolean default FALSE)

post_status

post_id (bigserial)
status_id (bigserial)
UNIQUE CONSTRAINT (post_id, status_id)

comment
Table for the comments on posts, both questions and answers.

member_id (bigserial)
post_id (bigserial)
parent_comment_id (bigserial FK to comment default NULL)
body (text)
upvotes (bigint default 0)
downvotes (bigint) default 0)
net_votes (bigint) default 0
score (decimal default 0)
is_deleted (boolean Default FALSE)
deleted_at (Date)

comment_votes

comment_id (bigserial)
member_id (bigserial)
vote_types_id (bigserial)
UNIQUE CONSTRAINT (comment_id, member_id)

tag
Table for all of the tags

body (text 100 default NULL UNIQUE)
description (text)
tag_wiki (text default NULL)
is_active (boolean default TRUE)
tag_synonym_id (bigserial FK to tag default NULL)
parent_id (bigserial FK to tag default NULL)
usages (bigint)

post_tag

post_id (bigserial)
tag_id (bigserial)
UNIQUE CONSTRAINT (post_id , tag_id )

vote_type
Table for the vote types, upvote/downvote.

display_name (text) UNIQUE

post_vote
The reason for this table is so that votes by spammers/serial voters can be undone.

post_id (bigserial)
vote_type_id (bigserial)
member_id (bigserial)
UNIQUE CONSTRAINT (post_id, member_id)

cellio · 5 February 2020 13:48

This looks good! I have a couple small nits and one other comment.

Post should probably have deleted_at, like Comment does.

On Post, is member_id the author? Maybe call it author_id for clarity?

On social media, it’s an internal name so it doesn’t matter much, but I don’t think of personal blogs and GitHub as social media and both could be stored there. Really, these are “external associations” or “external links”, right? That is, this is where we’ll put whatever stuff we give users fields for on the profile, right? Blog, Twitter, Facebook, GitHub, LinkedIn, next new thing that hasn’t been invented yet…

I see one omission: we need the idea of a tag set. A tag set has a name and contains tags; a category has a (single) tag set. (The tag set’s name is just so those customizing a community have a way to say which tag set to use for a category; I don’t see this name being exposed to end users.)

Tensibai · 5 February 2020 13:55

Is the table description to be adapted or is the column is_meta missing ?

cellio · 5 February 2020 14:06

Description should be edited; the column was deprecated by category_id. (Meta is a category.)

Dustytrash · 5 February 2020 14:38

is_deleted (boolean Default FALSE)
deleted_at (Date)

Why do we need both of these columns? is_deleted can be derived from the deleted_at date

Edit: Related discussion: https://stackoverflow.com/questions/3916847/database-design-replace-a-boolean-column-with-a-timestamp-column

I’m fine with either way.

manassehkatz · 5 February 2020 14:56

“description” == “title”

Tensibai · 5 February 2020 15:01

I’d say the title of the table is post and the italic part is the description of the table, according to the schema, the description has a problem mentioning is_meta where this column doesn’t exists anymore.

Late edit: just understood where you’re coming from, I was talking about the description of the table and not about the column “title” or meaning this a column named “description” was missing

manassehkatz · 5 February 2020 15:06

To clarify in DB terms:

tag_set is a table with a name/description/etc.
tag gets a new field tag_set_id FK to tag_set not NULL
category gets a new field tag_set_id FK to tag_set not NULL

I do not want to end up in the extra-flexible concept of “each tag can be in more than one tag set”. That would just lead to a mess when you have a tag (lame example, but could happen) “post” in DIY that means “fence post” but in Meta DIY refers to “that thing which can be a Q or A or Blog or whatever”. Yes, that does mean in some cases there will be the same tags created twice - e.g., Worldbuilding might use a much smaller tag set for Blog than for Main, but have “time travel” in both, meaning the same thing - but that will be easier than dealing with a messed up tangled web of tags.

cellio · 5 February 2020 15:16

Thanks. Looks good to me.

Definitely not. That way lies chaos. I’m expecting blog categories to share the Q&A (main) tag set, but a community would be free to do a custom one if they wanted to. But they can’t mingle tags from different tag sets.

If two different tag sets need tags with the same name, like in your “post” example, then that just means there are two tags with two different tag IDs in different tag sets that happen to have the same display name. There is a uniqueness constraint on tag name + category, but not on tag name globally. Tag ID is always unique and auto-assigned, like all other IDs, and the ID is what should be used internally to wire things together.

misha130 · 6 February 2020 03:55

I think remove the deleted_at columns as they are more or less implied like the other 5 columns. Or add them to everything that isnt an enum table.

Change the suffix _date to _at everywhere (member suspensions etc)

Why is explaination on trust_level unique?

mcalex · 8 February 2020 03:54

A couple queries on tags:

can a tag_set only have tags or tag_sets as well [composite]?
is there any discernment between nature/type vs program/function tags? eg ‘J2EE’, ‘WPF’, ‘SQL’ would all be ‘functional’ and only affect users working in that function area, but tags such as ‘refactoring’, ‘switch’ (control flow), ‘runtime error’ (nature/type) could potentially cut across each of those functions. Does categorising this difference make sense?

tuggyne · 8 February 2020 06:07

This is probably too challenging to define at the moment, since it’s definitely not an MVP feature to have this sort of tag categorization, but the intended working of the feature would need to be designed fairly carefully before we could do anything to the schema to reflect it that wouldn’t need to be ripped out and rewritten later when we actually implement the feature.

On the other hand, it should be reasonably practical to leave this out of the schema for now. Sure, if/when implemented, there would be a lot of backfilling of data needed, but that’s the case no matter what.

celtschk · 8 February 2020 11:34

With your examples, I don’t see what (on the technical side) those categories would mean (what does it mean for a tag to affect users? Tags are applied to posts).

However there are certain cases where categorization of tags makes sense. For example, in SE metas, there are some tags that are treated specially (like the feature-request, bug-report and discussion tag (in the following referred to as the"red tags" because of how they are displayed), the featured tag, or the status-* tag set). I don’t know the exact rules enforced by the SE software (as opposed to enforcement by moderation), but of the red tags, there will always be at most one, and the status-* tags are only to be set by the moderators.

Something similar could make sense also on main sites. For example, one might consider replacing accepting an answer with special tags “resolved”, “partially-resolved”, “superseded” that can only be set by the original author or moderators.

Or maybe on an SO replacement, one might want to indicate the type of answers wanted with a special class of tags containing e.g. language-lawyer (“I’m interested in what the standard says, no matter what actual implementations do”), best-practices (“I’m interested in what is recommended, not just in what is allowed or what works”), and practical-solutions (“I’m interested in what actually works, even if it formally violates the standard or goes against general advice”), where at most one of them can be applied.

Or there might be a separate category of language tags that is not treated specially other than that it gets a different colour. Or maybe it has the restriction that at least one of them has to be used (for questions that are explicitly language agnostic, a special tag language-agnostic could then be added to the language tag category).

Then again, it may not be worth the effort to implement. In any case, it’s definitely not MVP.

luap42 · 8 February 2020 12:11

What you’re proposing is a “meta-tag”. They are bad, because tags are used to filter by content (for example I want to answer a python question) and don’t include any other value. See https://stackoverflow.blog/2010/08/07/the-death-of-meta-tags/ for more information on this.

Tag sets are associated with categories. We might want to have one tag set for q&a and blog and a different one for meta. Different tag sets mean, that a post in a category can only have tags from within that tag set and it will be created there, even if the same tag name exists in another tag set.

cwellsx · 8 February 2020 13:14

Actually they are still used, e.g. https://graphicdesign.stackexchange.com/questions/tagged/critique and https://buddhism.stackexchange.com/questions/tagged/reference-request – and a community might define special rules for them, like “if it’s a critique question then you need to do this and that”.

The tag isn’t only used to filter or search but to associate a handling rule – what kind of question should supply what kind of information and/or expects what kind of answer.

Though it may be true, as you say, that SO doesn’t use them any more. And from an implementation point of view, they’re simply tags like any other. Except that there may be a special usage/guidance note associated with them (e.g. as illustrated here).

celtschk · 8 February 2020 17:33

From the page you linked:

Meta-tags are actually a subset of a larger problem that I usually call dependent tags. These are tags that don’t say anything by themselves – you can’t tell what the question is about unless they’re paired with some other tag (or several of them). These tags are a problem because people don’t realize this and will often use that as the question’s only tag.

So if that is the problem, this can be easily solved by the software through a tag category “dependent tag”. There would simply be a rule that each post needs at least one non-dependent tag.

cwellsx · 8 February 2020 18:31

Yes. I once went through (reading and editing) all the topics on Meta, to ensure that every topic there had at least one tag in addition to “discussion” – that was to study and be able to summarise them all.

That would be hard to do on the main site (too many topics and tags), not impossible on a Meta site (not too many types of thing being discussed, i.e. not many different meta-tags).

celtschk · 8 February 2020 19:50

That’s the nice thing about software: It never gets tired. Also, it would catch it right when you try to post it, so the improperly tagged post would never get posted to begin with.

pinnymz · 9 February 2020 02:52

3 questions:

Is there some creation/lastUpdated timestamp on all these tables that just isn’t being shown? Having that data would be invaluable.
Is there a histories table (or multiple) to hold previous versions of posts, comments, etc? Timestamps and author of the change would be important.
Why is temporary suspension on the member table rather than its own construct? Couldn’t a single member have numerous suspensions over time?

Are we assuming audit tables would be sufficient for handling the above concerns? I’m asking because, in my experience, auditing data is typically kept separate from data that drives app functionality.

manassehkatz · 9 February 2020 03:03

Timestamps are part of all records/tables.

History is an automatic audit trail/history table paralleling each regular table.

IMHO, audit tables done right should be sufficient for most purposes. I have done this in other (PostgreSQL but not C#) projects.

As far as temporary suspension, I am not 100% certain of the details but basically there is a certain amount of information, particularly status related, that is effectively in 2 places - the “latest” in some master record (e.g., Post or Member) and details elsewhere. That allows retrieving key information from the main records that are being retrieved “anyway”, without having to go to the more detailed tables all the time.