DB Schema Proposal Round 6

cbrumbaugh · 3 February 2020 03:11

Change log

Changes done now that we are doing one DB per comunity.
Renamed to “content” to “display_name” for the most part. Still need to know what to call the column (username?/handle?) in the member_social_media table.

trust_level

display_name (text 100) UNQIUE NOT NULL
explanation (text DEFAULT NULL) UNQIUE

social_media
The types of social media that the member can display in his profile

display_name (text 100)

member_social_media
The social media that the member would like to display in his profile per community

social_media_id (bigserial)
member_id (bigserial)
content (text 100)
UNIQUE CONSTRAINT (member_id,social_media_id)

privilege
Table for privileges

display_name (text 100)
description (text)

member_privilege
For recording which members have which privilege in a community. If a member has a privilege suspended, then that is also recorded here, and a nightly task will auto undo the suspension once the privelege_suspension_end_date has passed.

member_id (bigserial)
privilege_id (bigserial)
is_suspended (boolean default FALSE)
privilege_suspension_start_date (date)
privelege_suspension_end_date (date)
UNIQUE CONSTRAINT (member_id, privilege_id)

privilege
Table for privileges

display_name (text 100) UNIQUE
description (text)

post_type
Records the type of post, question/answer/blog etc

display_name (Text 100)
description (Text)

status
For setting the status of a post locked/featured etc

display_name (Text 100) UNIQUE
description (Text)

post
I thought about splitting into a answers table and and questions_table but doing it in the same table lets comments have a post_id instead of a questions_id and a answers_id. Meta posts are denoted by the is_meta column. Type of post is determined by the post_type_id

member_id (bigserial)
title (Text 255 default NULL)
body (text)
upvotes (bigint)
downvotes (bigint)
net_votes (bigint)
score (decimal)
views (bigint)
is_accepted (boolean default FALSE)
post_type_id (bigserial)
is_closed (boolean default FALSE)
is_protected (boolean default FALSE)
parent_id (bigserial FK to post default NULL)
is_meta (boolean default FALSE)
is_deleted (boolean default FALSE)

post_status

post_id (bigserial)
status_id (bigserial)
UNIQUE CONSTRAINT (post_id, status_id)

comment
Table for the comments on posts, both questions and answers.

member_id (bigserial)
post_id (bigserial)
parent_comment_id (bigserial FK to comment default NULL)
body (text)
upvotes (bigint default 0)
downvotes (bigint) default 0)
net_votes (bigint) default 0
score (decimal default 0)
is_deleted (boolean Default FALSE)
deleted_at (Date)

comment_votes

comment_id (bigserial)
member_id (bigserial)
vote_types_id (bigserial)
UNIQUE CONSTRAINT (comment_id, member_id)

tag
Table for all of the tags

body (text 100 default NULL UNIQUE)
description (text)
tag_wiki (text default NULL)
is_active (boolean default TRUE)
tag_synonym_id (bigserial FK to tag default NULL)
usages (bigint)

post_tag

post_id (bigserial)
tag_id (bigserial)
UNIQUE CONSTRAINT (post_id , tag_id )

comment_votes

comment_id (bigserial)
member_id (bigserial)
vote_types_id (bigserial)
UNIQUE CONSTRAINT (comment_id, member_id)

vote_type
Table for the vote types, upvote/downvote.

display_name (text) UNIQUE

post_vote
The reason for this table is so that votes by spammers/serial voters can be undone.

post_id (bigserial)
vote_type_id (bigserial)
member_id (bigserial)
UNIQUE CONSTRAINT (post_id, member_id)

Mithrandir24601 · 3 February 2020 14:09

I’m probably missing something here but what’s a ‘meta post’ in this context? If it’s where the post is placed on the site, I thought we needed more than ‘meta’ or ‘not meta’? (Where ‘not meta’ can only be ‘main’)

Dustytrash · 3 February 2020 15:07

Sorry I missed the first 5 rounds, is this the whole schema? Or which part?
Where’s the ‘community_name’ column?

manassehkatz · 3 February 2020 15:18

Correct. We have evolved a bit (despite not having any code!) from the original. (@Cellio or someone else following this closely can correct my mistakes - but this is as I currently understand it):

Each Community has Main & Meta

to

Each Community has a group of Categories
The minimum Categories for a normal community will be 2 - Main & Meta, but some communities may have more.
Each Category will have its own set of tags (previously this was a “post type” could share tags with Main or have its own set - but now each Category can have multiple Post Types so the tags go with the Categories. I think.)
Each Category has one or more Post Types, including (but subject to change):
- Q & A - Each question can have answers. Each question and each answer can have comments/discussion.
- Meta - Might be Q&A or might be just Q + discussion/comments.
- Sandbox (“potential” questions posed for comment/discussion before being turned into regular Q & A). Has comments/discussion but no Answers
- Blog (story, ~~rant~~ passionate statement on a topic, useful information but not typically “just the facts”)
- Canonical (this is an IT term, might come up with something else, though I like it) - A post on a specific topic designed to cover what might otherwise be repeated Q&A (“duplicate questions”) of very similar things - e.g., a thorough explanation of the OSI model, or a full explanation of regex or a list of recommended books for learning photography beginners or a basic how-to on replacing switches & receptacles. No answers - in fact here the post itself really is the answer. Minimal discussion.
- Wiki - a linked group of articles on related topics. (Not to replace Wikipedia, but same structural concept.) No answers (effectively each Wiki is a linked group of answers).

etc.

So (and I can’t take the time to do it now) we need to add a category table and more detail in the post_type - by including whether a post type has Answers, Comments, perhaps other settings, the code can handle each Community’s needs without having to have kludgy solutions as SE did (e.g., Sandbox as questions=Answers to a single Meta question - ugh!).

cbrumbaugh · 3 February 2020 15:40

This is as far as has been developed, we probably still need a community table with the configuration values etc.

cellio · 3 February 2020 16:56

Correct.

Not quite. There is a concept of a tag set, and each category uses a tag set. That doesn’t necessarily mean 1:1; main and meta will use different tag sets but a blog category might use the same tag set as main and a sandbox almost certainly would.

(N.B. We need DB representation for tag sets.)

As with tag sets, post types and categories are orthogonal concepts that are then configured together. When defining a category you decide what post types are available in it. For example, the meta category might allow Q&A, discussion, and wiki posts; the main category might allow Q&A and wiki; a sandbox category might allow only discussion posts (see below), etc.

For MVP the only post type we require is Q&A. As soon as possible thereafter, we should add some others:

Q&A: a question post with any number of attached answers. Each question and each answer can have comments. Questions and answers have authors/owners.
Article (or blog post, but I’m choosing a more general term in case there are other applications): a post that can have comments but does not take answers. This post type has an author/owner.
Wiki: like article, but does not have a single author/owner. Role of comments TBD. Use wiki for canonical posts and other group knowledge-building projects.
Discussion: a question with comments that are more visible than on Q&A (not auto-collapsed) that allows a single answer by the question author or a moderator. Intended for use on meta for the types of meta questions that people have said isn’t well-supported by the Q&A format. (We should examine that use case more before finalizing this post type.)

There are no post types for meta, sandbox, or canonical; these are implemented using categories and post types as above.

Suggestion: store the number of answers allowed, so article can have 0, discussion can have 1, and question can have unlimited (however you represent that).

manassehkatz · 3 February 2020 17:25

This is where I see a slightly different concept. To me, Wiki implies a large hypertext-linked document. Anything, of course, can have links, but the idea is that for a Wiki links between the pages within that Wiki are a key. That is different from a Blog or Article or Canonical post where each one largely stands alone.

Using my DIY example, we could build a Wiki for “Electrical” which would have a home page with an overview and then pages on Receptacles, Switches (which would link to WiFi Switches and Timers and Occupancy Sensors etc.) and Lighting (which could eventually split into different lighting technologies, etc.). Essentially writing a book, piece-by-piece collaboratively. A Wiki would be a place for someone to go if they wanted to really research a topic.

A Canonical page might be something focused on a topic but “medium size”. So a How-To guide for replacing Receptacles (including the most-needed information about GFCI and grounding but not a full “everything you ever wanted to know about GFCI and grounding”, as that would be more of a Wiki with multiple pages). A Canonical page would be the place to go to find answers to what would otherwise end up as (near)-duplicate questions.

An Article would be much more specific - e.g., “Changes in NEC 2020”. An article would be of interest to many “regulars” but probably not matter much to one-question-newbies.

And a Blog - well, DIY probably wouldn’t have one. But if it did, that would be the place where people could put their Home Renovation Horror Stories. Blogs would be of interest to regulars, depending on the subject matter. Obviously anything like Worldbuilding or Writing could use Blog as a place for people to showcase samples of their work. (Subject to how we handle licensing…)

cellio · 3 February 2020 17:48

Ah. To me, if you want a whole wiki you’d create a wiki category, which has individual posts of the article type. All posts can link to other posts.

This does reveal a need I hadn’t previously seen for categories: we need some top-level text that explains what the category is for, which would presumably be displayed by default, dismissable, and findable via navigation (help?). If you’re building an organized wiki as opposed to a collection of wiki pages, that’s where you’d put the top-level index. We might need to think more about this. Wikis aren’t MVP so we can take the time we need.

The difference you’re describing between Canonical and Article seems to be one of how people use it, not fundamentally different post types. Am I missing something?

manassehkatz · 3 February 2020 17:53

1 - Wiki - think (in some ways, hopefully just the good parts) like the ill-fated Documentation.

2 - Canonical vs. Article

I see Canonical as more of a “fast answers to very common questions”, and probably created by one person but likely very collaborative in nature, depending on the specifics.

I see Article as more of a “very specific topic, possibly less “important forever” than Canonical”. Going to Sci Fi as another example:

Wiki - Everything you ever wanted to know about Star Trek
Blog - Nonfiction - a visit (with pictures) to a Star Trek convention, or fiction - a short Star Trek story made up by a user.
Article - An analysis of some real-world physics developments and how they relate to the fictional Star Trek warp drive
Canonical - List of every Star Trek episode.

cellio · 3 February 2020 18:33

But how do Article and Canonical differ in how people use them? that’s the part I’m not getting. They’re both “here’s information that’s useful enough to bother compiling in one place here”. It’s not the case that all canonical posts are more important than all articles; as a Java programmer I don’t care at all about that canonical article about PHP but I might care a lot about that article covering diamond interface inheritance in Java 11.

You’re describing some ways that a community would want to organize content. That’s good! I’m challenging the assumption that each use case requires its own post type. I’m suggesting that we have some core post types that can be used in different ways. To me, “article” or “wiki” or “blog post” are all the same thing (except for authorship on the wiki) and we should think of use cases as customizations of these building blocks rather than whole new things.

(I don’t know if the wiki type I identified is actually needed; perhaps “community authorship” should be handled separately, like it is on SE.)

manassehkatz · 3 February 2020 19:18

The way I see it (and of course what will really matter is how communities use these things, to be determined), Article, Canonical and Blog are almost the same in the development functionality - “blobs of text started by one person, collaboratively edited by others, no answers, some discussion”. The differences are more in usage.

Blog - “interested in some users’ personal stories, views, commentary on the overall community topic, come over here and read for a while”. Not to answer an unasked question at all. But just “of interest to the community”.
Article - “specific new (or old, eventually) topics of interest in-depth for people who want to learn more”. Not primarily to answer an unasked question, though in some cases it will, but of interest to those who are really “into” a particular community/subject area.
Canonical - “specifically designed to answer common questions on a portion of the community’s subject matter”.

All get tags. (Perhaps Article and Canonical share the Main tags and Blogs get there own. May vary.)
But Canonical specifically would show up in the main Q&A page - i.e., easily discoverable. Very much like “Related” questions that show in SE when you are in a question. If you are looking for how to do “X” and start looking at questions that have “X” then the Canonical articles that also have “X” (or that have the same Tags as the question you just displayed, or something like that) will show on the side because they are likely to help answer the unasked (so far) question that you searching through existing Q&A to find an answer.

As far as PHP vs. Java - that’s where Tags come in.

I also think Articles would tend more towards “single author with relatively minor edits from others” where Canonical I would expect (encourage even) other “experts” to help make it the best possible Canonical answer to a group of related frequently asked questions. Except can’t call it FAQ as that is assumed by many to be “how do I use the web site”.

Wiki is different. It starts as one page (have to start somewhere) and becomes lots of linked pages. That is different. It is more of a “reference work”. Using DIY as an example, there might be a handful of Articles added every year (depending on who wants to write what, but constantly growing), a dozen or 2 Canonical posts about particular things (updated over time as building code and technology changes) and perhaps 2 - 6 Wikis (Electrical, Plumbing - very broad categories). In programming, there might be a Wiki for each popular language functioning as a user-developed manual/documentation, Canonical for lots of things.

cellio · 3 February 2020 19:28

I think we are talking at cross purposes here and I don’t know how to fix it. I’m talking about tools that communities can use in various ways. I’m trying to decouple the tools from the specific uses, because we’ll never know all of them. Nobody imagined sandboxes on SE years ago. There’s something we’re not going to imagine now that our communities will need. I want to build a flexible set of tools.

manassehkatz · 3 February 2020 19:41

From a “tool” standpoint, maybe we can look at it like this:

Q&A (with no limit on answers) (aka “Main”)
Q with 1 answer (possibly for Meta or other things)
Q with no answer (Sandbox, possibly for Meta)
Not-a-Q, just a “thing”, no answer: Article, Canonical, Blog
Wiki - designed to be hyperlinked (i.e., only the home page of each Wiki is in the list of Wikis, not the individual pages - in a sense the pages are Answers to the Question of the Wiki home page, but they are not presented in that way - they are presented as links within the pages - but structurally (database) they would be connected like Answers to a Question).

Plus multiple Categories per community and multiple Post Types per community/category (mix 'n match). Multiple Tag sets per community (with each Post Type within a Category being matched to a particular Tag set).

Simplifying it more (i.e., what actually gets coded differently by Post Type):

0, 1 or infinite answers
Discussion emphasized (show all right away by default) or less (some/popular/recent like SE) or minimized (totally collapsed by default) or none-at-all.
Wiki structure - this one becomes the most “different” from a programming standpoint.
Discoverability - e.g., if you are not “inside” a particular Category/Post Type, how do you find out about them? (That’s where I see specifically Canonical more “next to” Q&A, Blog more separate, etc.)
Possibly Trust Levels relating to starting a post (anyone Q&A, limit some of the others).

cellio · 3 February 2020 19:56

This is closer. Thanks for working with me here.

The wiki that you’re describing is very different from everything else on the site. Categories show lists of posts. The wiki you’re describing is more like an actual wiki – one entry point, link-based navigation, no list of pages or table of contents. That could be useful for a community to develop, but I don’t see how it fits into the category model. I mean, it could as easily be a pinned post (for the main page) and then you have links from there, with all those posts being post-only (no answers, probably no comments) and, presumably, shared authorship. Maybe instead of talking about a new post type to support this, we should be talking about post visibility outside of the category-based “post list”.

Can we defer this style of wiki for now? I think it’s different enough that (a) we need to better understand how communities actually want to use it (if they do) and (b) we won’t gain any benefits of commonality by doing it now.

For the rest, I think we’re largely on the same page. I think choice of tag set should be tied to the category not the post type (else users will be confused when they see tags being legal on some posts in the category and not others, plus the tag set helps define the category). Any post type within a category uses the same tag set.

I think some “canonical” posts will be articles and some will be questions with one answer. On SO some “canonicals” are tag wikis and some are answered questions. Consider that a question might start life as a normal question and later be discovered to be canonical, so it gets fixed up and promoted.

Agreed on trust levels for creating some of these post types.

manassehkatz · 3 February 2020 20:00

Absolutely. I think it has a lot of potential, but it isn’t even close to being MVP.

Agreed. I was thinking that originally and had changed because I thought I saw some discussion of it not being that way.

But I think that is because of the limitations of SO. With a proper place, just like Sandbox, this will work much better.

Which could be similar to a migration, but simpler because it is all in one database

luap42 · 3 February 2020 20:44

Great work so far, but I have some comments with things I’d change/add/remove/do differently:

trust_level

Ok.

social_media

I’d rename it to social_media_types, to clarify that it’s a list of which can be chosen for another table. Or member_social_media_type to make it clear that it contains types for the member_social_media table.

member_social_media

Ok.

privilege

Wouldn’t it be better to add a column called internal_name (or _id) that will be queried by the software. I think something like WHERE internal_name = 'may_suspend_all_the_bad_people' is better to read and understand than WHERE id = 42. Same applies to the trust_level table too.

member_privilege

I’d go away with this table and rather split it into two tables, which are:

trust_level_privilege

trust_level_id → trust_level.id
privilege_id → privilege.id

member_privilege_revokation

member_id → member.id
privilege_id → privilege.id
is_suspended (boolean default FALSE)
suspension_ends_at (date)

privilege (duplicate table; needs to be removed)

post_type

I’d add columns, such as:

allows_3rd_party_edits (boolean) – (for example to distinguish blog posts and wiki posts)
allows_replies (boolean) – (for example to distinguish Q&A from blog posts)
can_be_parent (boolean) – to distinguish answers from all other posts
internal_id (see above)

status

What is this table for?

post

We don’t need an is_meta as said before, but instead a category.id reference. I’d also add the following columns:

is_locked (boolean default FALSE)
toplevel_close_reason (reference to a close_reason.id)
(?) post_notice (reference to a post_notice_type.id or free form text)

Also date entries for closure, protection, deletion and locks. And “actor” entries for protection and lock. Both of these for creation, edit and last_activity (includes answers)

post_status

Same as above.

Also I’d recommend splitting it up into:

closure_reason (id, display_name, parent_id, description, is_active + date fields)
post_notice (id, display_name, description, is_active + date fields)

comment, comment_votes

Ok.

tag

We need to have tag sets (as said below). I’d recommend adding a tag_set.id ref and a tag_set table with id and display_name

post_tag

Ok.

comment_votes (duplicate table, invalid plural)

vote_type

Ok. Maybe add an internal_id as above.

post_vote

Add date fields.

Furthermore I’d propose to add these tables:

settings
Table for all community-specific settings

display_name (text)
internal_id (varchar 50)
current_value (text)
can_be_changed_by_mods (boolean default to FALSE)

category
Holds all the category stuff

display_name (text)
url_part (varchar 20)
is_primary (boolean default to FALSE)
short_explanation (text)
long_explanation (text)
contributes_to_tl (boolean default to TRUE) – whether this category is included for trust level calculations
minimum_tl_to_participate (bigserial FK to trust_level.id)

category_post_type

category_id → category.id
post_type_id → post_type.id
is_active (boolean default to TRUE)

post_history

post_id (bigserial FK to post.id)
member_id (bigserial FK to member.id)
post_history_type_id (bigserial FK to post_history_type.id)
title (text 255 nullable)
body (text nullable)
tag_changes (text nullable)
post_notice_type_id (bigserial FK to post_notice_type.id)
close_reason_id (bigserial FK to close_reason.id)
is_hidden (boolean default to FALSE)

+date fields

post_history_type

display_name

post_duplicate_link

original_id (bigserial FK to post.id)
duplicate_id (bigserial FK to post.id)

post_close_subreason

post_id (bigserial FK to post.id)
close_subreason_id (bigserial FK to close_reason.id)

Also all user/member tables are missing AFAIK. I’d propose the following:

member

display_name (varchar 150)
bio (text)
profile_picture_link (text)
is_temporarily_suspended (boolean default to FALSE)
temporary_suspension_end_date (date)
temporary_suspension_reason (varchar 35)
trust_level_id (ref to trust_level.id)
network_account_id (bigserial, nullable)
is_moderator (boolean default to FALSE)
is_administrator (boolean default to FALSE)
is_synced_with_network_account (boolean default to TRUE)

+date fields

Anything I’m missing for member?

member_history

member_id (bigserial FK to member.id)
event_by_member_id (bigserial FK to member.id)
member_history_type (bigserial FK to member_history_type.id)
body

+time_fields

member_history_type

display_name

Also all tables proposed by me “in the text” before are:

trust_level_privilege

trust_level_id (bigserial FK to trust_level.id)
privilege_id (bigserial FK to privilege.id)

member_privilege_revokation

member_id (bigserial FK to member.id)
privilege_id (bigserial FK to privilege.id)
is_suspended (boolean default FALSE)
suspension_ends_at (date)

closure_reason

display_name (text 50)
description (text)
parent_id (bigserial FK to closure_reason, nullable) – If not NULL, then this is a sub reason, otherwise this is toplevel
is_active (boolean default to TRUE)

+date fields

post_notice_type

display_name (text)
description (text)
is_active (boolean default to TRUE)

+date fields

Olin · 3 February 2020 21:56

Additional attibutes:

Who "owns" a post, or whether it is community-owned. This matters in who is shown as the author, how votes effect someone's trust level, how votes effect someone's "rep" or whatever you want to call it.
Whether anyone but the author can edit. This is important for "paper" type posts where the author owns the main post and editing by others should not be allowed or very restricted.
What trust level someone can edit without it requiring review. Wikis may be more open than regular answers, for example.

cellio · 4 February 2020 00:16

Dev-philosophy question:

The schema refers to some things we haven’t decided on yet, like locks and protection. Should the schema for MVP reflect only what’s in MVP, and we’ll add to it later, or is it better to get more of this worked out in the DB now even though we’re not going to use it until later? If, later, we need to change it, we have to update anyway, so we’re trading off definitely having to update vs maybe having to update. Is updating the schema – in an additive, backward-compatible way – bad?

manassehkatz · 4 February 2020 01:00

We are going to have database changes. That’s pretty much guaranteed. But as much we can reasonably plan for, the better. Some of these things may start as Booleans and turn into FKs into other tables or Integer values. Some may get dropped. A lot of this will actually get figured out in the initial development process before MVP. I am, in general, in favor of placeholder fields, even if they end up getting changed.

It would be different, for example, if developing software for an embedded application where 100,000 units would be out the door before you come up with v1.1. Then I would want to have it as stripped down and clean as possible to leave nothing to chance. But that’s not the case here, and keeping the “extra” fields is not a heavy burden (we don’t have any 10,000,000 record tables at the start).

(I once worked with some embedded systems provided by another vendor (my job was to design/run a server to talk to the machines in the field) and they had Windows with “everything” on them - absolutely absurd. Compared to that, a few extra database fields that might get thrown away after one instance is live is not a big deal.)

cbrumbaugh · 5 February 2020 03:49

I would rather not, I think the ids are just fine. Maybe others have a different opinion, but I don’t see the need.

Why? Is it faster?

I would rather have these be enforced by the software instead of the DB

For things like protected/locked/closed

I would rather have foreign keys than booleans. Also, can a post have multiple notices? Like both closed and locked?

I would rather do tag groups by a parent_id column that links back to the top level tag.

All that’s needed is the create date and the modified date, those columns are on every table.

Your suggestions that I agreed with I incorporated here DB Schema Round 7