DB Schema Round 5

cbrumbaugh · 8 January 2020 05:00

Change log

Renamed columns called “name” to “content”
Added status and post_status tables
Added upvotes_cast, downvotes_cast, profile_views, and reputation to member_community_table
Added column called 'views" for counting the number of views on a question.
Removed the unique constraint on the (community,title) combination.

member
This table will hold the global user records for an Codidact Instance. A user should only have one email to login with, that would be stored here. I am leaving the password storage and hashing to the experts.

display_content (text 100)
bio (text)
email (text 255) UNQIUE
gravatar_hash (text)
location (text 100)
is_from_stack_exchange (boolean default FALSE)
stack_exchange_id (bigint)
stack_exchange_validated (date)
stack_exchange_last_imported (date)
is_email_verified (boolean default FALSE)
is_suspended (boolean default FALSE)
suspension_end_date (date)

community
Table for each of the individual sites inside of a Codidact installation

content (text 40) UNQIUE
tagline (text 100) UNQIUE
url (text 255) UNQIUE
status (enum)

trust_level
This needs to be specific to a site for when we get different language sites and it needs to be localized.

content (text 100) UNQIUE NOT NULL
explanation (text) UNQIUE

trust_level_community

community_id (bigserial)
is_same_as_instance (boolean default TRUE)
content (text 100) UNQIUE
explanation (text) UNQIUE

member_community

A user may want to join one site and not another, and having a join table between users and sites allows a user to be suspended or gain/lose privileges on a single site. I listed the privilege booleans in the order of lowest to highest, the privileges after is_moderator aren’t listed because we probably are going to take any of moderator’s lower privileges away without demodding them.

member_id (bigserial)
community_id (bigserial)
display_content (text 100)
bio (text)
upvotes_cast (bigint)
downvotes_cast (bigint)
profile_views (bigint)
reputation (bigint)
email (text 255) UNQIUE
gravatar_hash (text)
location (text 100)
is_from_stack_exchange (boolean default FALSE)
stack_exchange_id (bigint)
is_same_as_instance (boolean default TRUE)
stack_exchange_validated (date)
stack_exchange_last_imported (date)
is_suspended (boolean default FALSE)
suspension_start_date (date)
suspension_end_date (date)
trust_level_community_id (bigserial)
is_moderator (boolean default FALSE)
is_deleted (boolean default FALSE)
UNIQUE CONSTRAINT (member_id, community_id)

social_media
The types of social media that the member can display in his profile

content (text 100)

member_social_media
The social media that the member would like to display in his profile per community

social_media_id (bigserial)
member_id (bigserial)
community_id (bigserial NULL)
content (text 100)
UNIQUE CONSTRAINT (community_id,member_id,social_media_id)

privilege
Table for privileges

content (text 100)
description (text)

member_community_privilege
For recording which members have which privilege in a community. If a member has a privilege suspended, then that is also recorded here, and a nightly task will auto undo the suspension once the privelege_suspension_end_date has passed.

community_id (bigserial)
member_id (bigserial)
privilege_id (bigserial)
is_suspended (boolean default FALSE)
privilege_suspension_start_date (date)
privelege_suspension_end_date (date)
UNIQUE CONSTRAINT (community_id,member_id, privilege_id)

privilege
Table for privileges

content (text 100) UNIQUE
description (text)

post_type
Records the type of post, question/answer/blog etc

content (Text 100)
description (Text)

status
For setting the status of a post locked/featured etc

content (Text 100) UNIQUE
description (Text)

post
I thought about splitting into a answers table and and questions_table but doing it in the same table lets comments have a post_id instead of a questions_id and a answers_id. Meta posts are denoted by the is_meta column. Type of post is determined by the post_type_id

member_id (bigserial)
community_id (bigserial)
title (Text 255 default NULL)
body (text)
upvotes (bigint)
downvotes (bigint)
net_votes (bigint)
score (decimal)
views (bigint)
is_accepted (boolean default FALSE)
post_type_id (bigserial)
is_closed (boolean default FALSE)
is_protected (boolean default FALSE)
parent_id (bigserial FK to post default NULL)
is_meta (boolean default FALSE)
is_deleted (boolean default FALSE)

post_status

post_id (bigserial)
status_id (bigserial)
UNIQUE CONSTRAINT (post_id, status_id)

comment
Table for the comments on posts, both questions and answers.

member_id (bigserial)
post_id (bigserial)
parent_comment_id (bigserial FK to comment default NULL)
body (text)
upvotes (bigint default 0)
downvotes (bigint) default 0)
net_votes (bigint) default 0
score (decimal default 0)
is_deleted (boolean Default FALSE)
deleted_at (Date)

comment_votes

comment_id (bigserial)
member_id (bigserial)
vote_types_id (bigserial)
UNIQUE CONSTRAINT (comment_id, member_id)

tag
Table for all of the tags

community_id (bigserial)
content (text 100)
description (text)
tag_wiki (text default NULL)
is_active (boolean default TRUE)
tag_synonym_id (bigserial FK to tag default NULL)
usages (bigint)
UNIQUE CONSTRAINT (community_id, content)

post_tag

post_id (bigserial)
tag_id (bigserial)
UNIQUE CONSTRAINT (post_id , tag_id )

comment_votes

comment_id (bigserial)
member_id (bigserial)
vote_types_id (bigserial)
UNIQUE CONSTRAINT (comment_id, member_id)

vote_type
Table for the vote types, upvote/downvote.

content (text) UNIQUE

post_vote
The reason for this table is so that votes by spammers/serial voters can be undone.

post_id (bigserial)
vote_type_id (bigserial)
member_id (bigserial)
UNIQUE CONSTRAINT (post_id, member_id)

manassehkatz · 8 January 2020 06:41

I am still confused about votes. Specifically, what in the database determines that “vote_type ‘x’” is an Upvote and “vote_type ‘y’” is a Downvote?

My suggestion was to add a numeric field - +1 or -1 - to vote_type. There are other ways as well. But there has to be something that determines it.

raphaelschmitz · 8 January 2020 10:26

Yeah, vote type doesn’t make sense to me either.
What is the supposed benefit of this? Voting -0.9, or {heart_emoticon}?

Plus a numeric field, restricted in code to be either +1 or -1, gives you a nice fast SUM operation for determining score.

luap42 · 8 January 2020 11:27

There can be other votes:

accepts (if we have them)
close votes
reopen votes
deletion votes
undeletion votes
…

luap42 · 8 January 2020 11:28

Your proposal doesn’t account for that (^). I think there should be instead a field called “internal_id”, which is what the software queries for.

celtschk · 8 January 2020 18:16

Do we also need vote types unupvote/undownvote, or is that handled by deleting the corresponding vote?

manassehkatz · 8 January 2020 18:19

That is handled by deleting the corresponding vote (which would of course be a soft-delete and archived so history can be determined for moderation issues).

codymullins · 8 January 2020 18:22

I would suggest we do not need a table for vote types. They will likely be used mostly in code with rules around them and won’t be something that would be added dynamically. Keep it simple and use an enum, I’d say:

public enum VoteType 
{
   NotSet,
   Open,
   Close,
   Delete,
}

tuggyne · 9 January 2020 01:53

The way this works out looks kind of awful, unfortunately. “display_name” would work fine (it’s not a reserved word) and make a lot more sense; “display_content” is incredibly vague and even misleading, since you’d expect that field to contain what’s actually in “bio”. For most of the other tables, “content” is likewise vague and confusing, especially when there are other free-form text fields for descriptions or explanations or taglines.

It seems to me that it would be a lot better to just use “name” prefixed (or perhaps suffixed) with some disambiguator. For example, “display_name”, “site_name”, “trust_level_name” or “tl_name”, “social_media_name”, “privilege_name”, etc.

I didn’t see where this change was suggested, so maybe there were further considerations already brought up.

cbrumbaugh · 9 January 2020 02:18

display_content is a goof, I did a search and replace on name. I do not like using the table name as the prefix, it sounds redundant to me when table names get prefixed to columns.

At the same time, I like to to keep my column names consistent across tables, “content” is the best one I have found that describes something without being a reserved word.

That’s how I build my tables, if there are better suggestions we should go with that, but “name” definitely had to go since it was a reserved word.

tuggyne · 9 January 2020 02:45

Well, if you don’t want to use (part of) the table name, you could always just use “display_name” across all of them, or “full_name”, or something (perhaps a synonym of name, such as “cognomen”, “moniker”, or what have you). “content” is just … not the same thing as a name, at all. Content is much more fitting for the body or longform than the title or name.

cellio · 9 January 2020 03:18

Don’t we need both platform and value (content) here? How do I record that my Twitter handle is @monicacellio? I guess it involves the member_social_media table, but I don’t see a place there to record the actual value (handle, in this case), either.

Minor question, but does having the first provide some performance boost? If I need to answer the yes/no question, I could check whether the second (the SE ID) is null or populated, right?

If a post has more than one special status, like protected and comment-locked, that just means two rows here for that post ID, right? (Hey, I think I’m starting to get the hang of this. )

I don’t think we need to track net votes on things, since net votes != score and we’re not going to display net votes with posts, so we don’t need it as an optimization.

And deleted_by.

That reminds me; we’ll need to track flag history for both posts and comments. I guess that’s separate tables, comment_flag and post_flag, with columns for the comment/post IDs and then the other flag info (what flag, who flagged, timestamp)?

I too find “content” as a replacement for “name” in most tables to be confusing. You said “name” is a reserved word. In Postgres? I’m surprised. I haven’t used Postgres, but I’ve definitely had tables with “name” columns in Vertica databases, and Vertica is a Postgres derivative. Or do you mean it’s fine in Postgres but some library or API we’ll depend on has a problem with it?

manassehkatz · 9 January 2020 03:56

name is definitely not a Postgre problem - I just checked that on one of my own databases. And a quick check for keywords in C# finds namespace and nameof but not name.

tuggyne · 9 January 2020 10:04

Sure enough, it’s listed as “non-restricted” (meaning it’s recognized by the parser but is still a legal identifier in most or all contexts).

raphaelschmitz · 9 January 2020 11:03

I’m confused that this is part of MVP (Minimum Viable Product) in the first place. We’re making the release be later, just to work out & implement something which would be optional anyway?

Maybe we need a split between the ideas of first release and MVP.
MVP being the version where we say “Hey everybody, look at this cool website for Q&A”.
First release being a website that we run and start using among us, probably a meta taking over voting.

Because we don’t need social media accounts, in any way, shape, or form, in order to ask questions and get answers.

cellio · 9 January 2020 13:24

Yes, good point. We’re not going to write code around that anyway, so maybe we can ignore this for now. I assume it was added to the DB schema because that’s easy and, eh, sense of completeness maybe? But if it’s not 100% straightforward and we’re not using it anyway, we could just drop it.

manassehkatz · 9 January 2020 16:00

I disagree. Within reason, it makes sense to have extra fields & tables for future use. This has the big advantage of acting as a reminder - could even have stub code to match up with it as well (e.g., in the User Profile entry page processing). There are 3 ways this can work out as far as the database in the long term:

“Just right” - i.e., the extra fields/tables get used in the end “as is”. In which case this avoided a future migration
“Almost right” - typically this will result in a future database migration that changes some fields but at least has the tables right, which is good
“Not used” - In this case either the extra fields/tables stick around “forever” waiting for a feature that doesn’t happen (so extra dead weight in the database, but empty tables don’t matter much and unused fields are not a big deal on small tables - i.e., avoid having any on the Posts & Comments tables but pretty much everywhere else, “whatever”) or need a migration to remove the unused stuff.

In the given example of social media info., I would include it even if not implemented at this time.

celtschk · 9 January 2020 18:58

On the gravatar field: I seem to dimly remember some discussion a while ago that we don’t want to use gravatar (I’m not sure if there was a consensus, though).

cellio · 9 January 2020 19:37

I guess using gravatar means that people can be tracked via gravatar. We probably want to offer an image-upload option for people’s graphical representations. I don’t know if this is required for MVP, but if we’re uploading images for posts, it’s probably not too much work to also support it for profile pictures.

manassehkatz · 9 January 2020 20:18

I didn’t know anything about gravatar until today. Now I understand how it works (basically). In a perfect world it would be fantastic. But we don’t live in a perfect world, and I can see a number of different privacy & security problems. I suggest that we:

Allow gravatar as an option but require verification before linking. I suspect some places do it pretty much automatically, and I would not want to do that automatically.
Allow upload of an image, automatically resized as needed.

Since anyone who has/uses a gravatar could just upload that image, there ends up being no functional need for “use gravatar”. Therefore, IMHO, just allow upload.