Database schema (Round 8) - wip

Following discussion with @misha130 last weekend, I have made changes to the pgModel to generate the initial database.

From the Round 7 discussion:
Added category_tag_set table
Changed social_media_type to external_association_type
Added ‘soft delete’ fields to all tables - I expect some (enum type) tables don’t want this, but I figured it was easier to add to everything and delete what isn’t wanted later
Added duplicate_post_id and close_reason_id to post_vote

Not from the Round 7 discussion:
Add a column “internal_id (text 50)” to enum tables (setting, privilege, post_type, vote_type)
Why can’t the dsiplay_name be used for querying (or why will internal_id remain consistent when display_name won’t). Also, with an entity framework solution there will be very little sql readable, or otherwise.

Delete the post_status table
This will be useful if we want to maintain post status history (outside of audit). If we don’t want then then it’s easy to delete and make the changes following

Following are the tables from the latest Database schema proposals.
Table images below show the regular and the audit (history) table. I’ve pasted images as a) retyping all field names didn’t interest me, and b) you get to see the constraints as well and check consistency between data and audit this way.

Tag:

Category_TagSet:

Category:

Privilege:

Post_Tag:

TrustLevel:

Member_Privilege:

Post:

Member:

Post_Vote:

ExternalAssociationType:

Post_Status:

Member_ExternalAssociationType:

Comment_Vote:

Vote_Type:

Post_Type:

Setting:

Comment:

Post_StatusType:

Category_PostType:

Post_DuplicatePost:

Outsatanding issues:
Cascade rules for ON_DELETE
Member table has a network_account_id column. We need a network_account table.
Post_Vote table has a close_reason_id column. We need a close_reason table
There are other unanswered questions in the Round 7 thread. These can be added as issues to github.

SQL file generated by pgModeler and modified version for generated (calculated) fields added to:

1 Like

The problem is, that display_name is often configurable by the communities (e.g. rename of badges, translations), whilst we still need to be able to access them consistently for functionality reasons (e.g. badge queries). For readibility in these cases, there’s the internal_id.

Can you explain to me, why we can’t store everything post status related (deleted, closed, locked, …) on post? They would be stored in history, too, via post_history, right?

I guess that the network id will be from an other table and it should therefore be no fk, but should be interpreted by the software.

I suggested this structures in No. 7:

  • Add close_reason table with at least these columns: display_name (text 50), description (text), parent_id (bigserial FK to closure_reason, nullable), is_active (boolean default to TRUE)
  • Add post_close_subreason table, which allows for the selected sub reasons of the primary close reason to be saved.

Then you forgot this suggestion:

Then we need a member_history table, which is not the audit log for the member table, but a list of events in the user history, which are useful for moderators (e.g. moderator message sent, suspension added, suspension removed, PII accessed, custom annotation). This should include a member_history_type table.

Which should now be renamed to member_annotations or so.

1 Like

Assuming the issue is “network = list of communities, but if each community is a separate database, can’t share this table”, I would recommend having that table in each community’s database. It will be a small table, rarely change, and so it can be kept synchronized (manually if need be) as communities are added/dropped. That will make things much easier for anything within a page (e.g., migration options, user profile details, etc.) that needs to know the names of other communities.

Nah. I mean the network account id. That’d be a potentially big table (all accounts on any Codidact community of our instance).

1 Like

You mean the list of users?

I mean a table containing the contents for something similar to SE’s network profile (mine for example: https://stackexchange.com/users/11267045/mee-the-setup-wizard), which contains connections to accounts on all communities is connected to authentification.

Yeah, that would be trivial in the “one database” style. With multiple databases, probably the way to do it is to have the authentication/“master” server have just the barebones info - email/password/etc. + master profile + list of communities. Then each community has a profile for each user which links back to the authentication server (but not a true FK because it is a different server) and has a setting for “same as master” (which I think is already in the schema but I’m not looking right now). If a user makes an update “for all communities” then that process would push updates to all the other communities as needed. Which is I think basically the way SE does it (at least from an outsider’s perspective).

1 Like

Yeah. IIRC it’s in the schema and that was what I wanted to suggest.

1 Like