Database schema (Round 8) - wip

Yeah. IIRC it’s in the schema and that was what I wanted to suggest.

1 Like

Sorry for update slowness - trying times all 'round at the moment.

Updates based on discussion follow

OK, I thought the new field was going to be configurable (to match ‘display_name’) too. One thing: to better identify its purpose, can it be called something like ‘universal_code’? Or if you don’t like universal, ‘internal_code’. Just not ‘_id’ (by our naming convention). I’ve called it universal_code in the interim*

Well we can, but basic normalization rules suggest we shouldn’t as the status can change over time without changing anything to do with the actual post. Also, do we want to store the status change timeline (‘Post closed on 2020-03-25 16:10:071’, 're-opened on 2020-03-26 12:15:178, ‘closed-03-26 12:22:376’ etc)?

They would be stored in post_history, but I understood the audit tables weren’t for general functionality, just audit. Isn’t that how CQRS works? Not having separate post_status means every time there’s a status change, a twenty plus field table (post_history) gets an extra row added instead of a table that, apart from the audit data, only has fields for the post and the status (though it should probably have start end date/times too). Given that post will likely be the most populated/updated table it makes sense to make it as lightweight as possible

Additionally, separating out the status means the ‘is_closed’, ;is_accepted’, ‘is_protected’ fields can all come out of the post table and be used as post_status_types. This then allows for further status types to be added to the data without changing the structure of the post_table further (is_promoted’, ‘is_historically_significant_but_not_a_good_fit’, ‘is_the_subject_of_a_twitter_storm’ - is_locked as per later instruction) and then storing a whole bunch of essentially null value columns (for all the is_whatever falses).

Yes. I think I skimmed that para after assuming those changes depended on the changes i’m discussing. Now added, and the notice table, and also the member_annotations and member_annotations_type tables.

What is being stored in ‘post_close_subreason’? Is it just a description (text) and link to ‘close_reason’ (with lookup values)? Or a user written field with different text each time? I have assumed the former.

I think that is all the remaining issues from the Round 7 discussion.

From the Round 8 discussion, with regard to the network_account_id issue, if I understand it’s a master_profile_id. Can I suggest that as a name?

Tables changed and added have been included in OP. Script to create database has been uploaded to: https://github.com/mcalexster/mcalex_codidact/blob/mcalex/skeleton/Database/Scripts/base/SchemaProposalCurrent_withGenCols.sql

Comments:
a) With the new category_tag_set table, we have: Tag - M:1 - TagSet - 1:M - Category. What happens if one category instance wants the same tag_set as another, except for a couple tags in the set that only one or the other category allows? Should perhaps a Category be allowed multiple TagSets which would result in a ‘general_tags’ tag_set instance being used by both categories and then ‘category_a_specific’ and ‘category_b_specific’ tag_sets which each individual category would use?

b) should the ‘enum’ tables be converted into actual enums (types) in postgresql and would this negate the ‘universal_code’/‘internal_id’ issue?

Multiples potentially introduce a lot of UI confusion. Remember that tags are messy, emergent, and created by users who might not have thought deeply about a whole context involving multiple categories and general-vs-specific tags. I’d like to stick with: a category uses one tag set; a tag set may be used by more than one category. If we find that’s not expressive enough, we can refine it later.

Perhaps if someone adds/edits a Tag that is in a Tag Set used by more than one Category, they should get a reminder like:

This Tag is (“will be” for an Add) used by your current Category, Foo, and also by the Category Bar. If these changes (this new tag) is not appropriate for Bar, please [some appropriate moderation notification action to discuss splitting the Tag Set].

1 Like

Following discord discussion on Apr 6 in #backend-development channel, tables post_type, vote_type and privilege will be turned into postgresql enums (types). The data in these tables aren’t likely to change once finalised, and any changes will also involve significant codebase changes (to deal with whatever the post_type, vote_type or privilege entails). Fire away if you disagree/have any objections, ‘like’ if you agree (i guess, is that how the forum works?).

1 Like

I agree that these 3 tables (post_type, vote_type, privilege) will have only very rare changes after MVP, and even then they will almost certainly be additions and not deletions due to the need to maintain compatibility with any prior posts, votes and privileges.

However, the names of these items will vary over time. That can be due to internationalization, tweaking of names (particularly privileges) to make them clearer to new users or other factors determined by various instances and/or communities.

Therefore, I recommend that the textual portions - name, description or similar fields - be stored in a text file, retrieved at system startup, rather than compiled into code. If that is handled properly then moving them out of the database is quite reasonable.

2 Likes

I expect we will occasionally add to these tables/enums after MVP. For example, we know we want some post types beyond question and answer, and if we add something like endorsements or reactions or public votes, that would presumably affect vote_type. That said, I agree that such changes will be additive and need to be accompanied by code changes.

2 Likes

So another thing:

Is the last modified by member id not nullable? Should be nullable since its only on update
The created by member Id is also not nullable? Should be nullable at least on the members table because the only thing that creates a user is the registration or an admin of sorts?

1 Like

I would make them both NOT NULL, last_modified_by is set to the exact same as the created_by when the row is created and so it would never actually be null.

As far as creating members, I would suggest we setup some admin users to handle that and that id go into the created_by column there.

1 Like

I agree with @cbrumbaugh. Creation == most recent modification and it’s pretty common to fill both fields with the same value in an entity constructor.

For admin/system creation, I was thinking of a ‘system’ member. I’m not sure if cbrumbaugh is referring to real members who are designated as admin users, but it’s essentially the same concept. There don’t look to be too many entities other than ‘member’ that will involve system creation in any event.

Alternatives: there could be a placeholder in the field until the creation process is complete and then the new member becomes the ‘created_by_member_id’ - this matches the real world process.

Or, and thinking aloud now, if a new member has created an account in order to respond to content - adding an answer to a question, a comment on another post, even casting an up/down vote - the writer of the original content is designated as the ‘created_by_member_id’. Some reputation in whatever format we end up with could be assigned to those members; a straight up count, or better, some formula that factors in the quality of the response content …

When you asked in the forum, i thought you might have been referring to entities whose creating member has since been deleted - but I understand these will remain, just some indicator (is_deleted) will be used to designate them as former members. SO makes any member links unclickable effectively removing the member from general existence, but that’s not the only solution, eg, grey out the regular profile page, or a curated version just listing their questions / answers / other posts, a note showing why the member wanted account removal (if the member wants to publish that) etc.

tl/dr I think they both should be not nullable. We have options for exceptional cases.

Well to create this system admin member I need a member_id. Trying to implement this is why I am here.

EF doesnt support this and the orm will crash all the time basically because of its expectations.

Maybe on this first system insert:

  • Ill remove constraint
  • Add member with null
  • Set created/modified to newly created member id
  • Put constraint back

Implementation suggestions of the two options:

I suggest following the unix way of segregating the first 999 uids for system type members. There could be a ‘Default User Creator’ a 'Created by SO Migrator; etc or maybe just one Member Creator. Maybe just pick a number eg 100 (leaving the first 99 for global system types) and create members with ids from 100. This should be flexible enough to change later eg flipping from a single user creator to multiple types or even to having one god-like system user (or just ‘nobody’). Postgresql allows for identity key values to start at a certain number - I’ll change that to 1000 if you prefer this way.

It looks like EF Core provides support in a different manner
https://docs.microsoft.com/en-us/ef/core/modeling/generated-properties?tabs=data-annotations#value-generated-on-add

or there’s a couple SO suggestions: https://stackoverflow.com/questions/27077461/how-to-get-next-value-of-sql-server-sequence-in-entity-framework

I don’t know which would be best: if there’s likely to be a need for other ‘system’ user types then the first, if not, or it makes sense to setup the new member as the creating member, then the second, I guess.

Doesn’t EF add navigation entitles with ID = 0 (or CLR default value which for int is 0) when it doesn’t know primary key values in any event? That might be a sql server feature.

It does exactly that which violates the constraint since there are no members with id 0.

OK, but isn’t that only at SaveChanges?

eg doesn’t this work?

using (CodidactContext context = new CodidactContext())
{
  Member member = new Member {
    DisplayName = "new member",
    Bio = "new member bio",
    ...
  };

  member.CreatedByMember = member;
  member.LastModifiedByMember = member;

  context.SaveChanges();
}

Is there any reason, but consistency with other tables, to have a created_by_member column on a member record? Will this be used anywhere?

All tables have those 4 columns as part of the spec (DB Naming convention - Round 2 - #8 by manassehkatz). Their purpose is audit related. My personal preference would be to make them not nullable for the member table rather than delete them entirely.

@misha130 One more suggestion: a dummy member could be created and the member foreign keys can initially be set to the dummy user, and then changed to the new user. If both SaveChanges() calls happen inside a transaction it should guarantee the correct member_id is set, but you’d need that dummy user created beforehand. I don’t think that would trigger the circular reference error.

If you don’t like that idea, I think you’ve identified a valid justification for allowing the created_by, and last_modified_by keys to be nullable, so I’d suggest to assume their NOT NULL status has been removed on the member table. I’ll set that in pgModeler and announce it in the next round and see if there’s any disagreement. The keys could still be set in a second SaveChanges call within a transaction so any error on setting the keys would roll back the entire creation attempt.

Ok I honestly have a circularity problem with the TrustLevel right now.

Member:
Needs on created by/modified by - no problem, that is set to itself.
Also needs to have a trust level because every member has a trust level when its created.

Trust level:
Needs to created by/modified by members.

So thats the circularity I am now facing. I posted the code for all of this on discord if you want to take a look.

So maybe make TrustLevel on member nullabe? Or maybe some of these tables shouldn’t have these columns?

Yup, I was actually going to say something about that, but figured the TL foreign key would just cause the same result as the member foreign key. So either Member ↔ TrustLevel foreign key cannot be NOT NULL, or maybe TrustLevel could be set up as an enum/type which would remove the created_by, last_updated_by references. Fortunately this is the only (non-member) foreign key in the member table.

I’m still a bit curious if adding the member as the created_by_member works, but I wouldn’t waste time on it.

It seems a strategy will need to be created to perform a full install. A system member will need to create default data in TrustLevel and possibly any other lookup type tables as part of the install - also enums. (I’m getting more and more convinced we need at least one non-human member)

So assume trust_level_id is a nullable FK for the member table and I’ll make it like that for the next schema proposal and ask if that alternative is preferred over making trust_level an enum.

Well sadly the following doesn’t work too:

    communityMember = new Member
     {
      DisplayName = "Codidact Community",
      Bio = "The codidact community manifested",
     };
     communityMember.CreatedByMember = communityMember;
     communityMember.LastModifiedByMember = communityMember;

     context.Members.Add(communityMember);
     context.SaveChanges();

It wouldn’t work if you wrote pure SQL too and EF isn’t some voodoo magic.
So I am going to let the createdMemberById/lastmodifiedbyMemberId be null

1 Like

Agreed; it’s smoke and mirrors. I guessed/hoped there was some temp creation going on in the background to satisfy constraints

Yup, works for me. As I said, I’ll set that in pgModeler, announce it in the next round and see if there’s any disagreement. The keys could still be set in a second SaveChanges call within a transaction so any error on setting the keys would roll back the entire creation attempt.