Proposal - DB Naming Conventions

cbrumbaugh · 1 January 2020 05:47

This is what I would use for the DB convention, and its the convention I used in my proposed schema if there is a better way or I missed something please let me know.

Table names will be PascalCase
Table and column names should not be reserved words or include special characters.
The primary key will be named “Id” and will be the first column in the table. The datatype will be bigserial.
Foreign keys will follow the format ForeignTableId and the data type bigserial.
Boolean columns will start with either “Is” or “Can” such as IsSuspended or CanUpvote.
Join tables will be the table names squished together and PascalCased such as PostsVotes unless it would be more clear not to.
All tables will have a CreatedAt field and a LastModifiedAt field that will auto set to the time when created and updated respectively. These columns will be the last two columns in the table. Tables with soft deletion will have a DeletedAt field
Date fields should either have the word “Date” in them like SuspensionEndDate or “At” like CreatedAt.
IsDeleted denotes that a row has been soft deleted.
Still not sure about the pluralization to table names because it sounds weird in some cases, going with plural for now.

ArtOfCode · 1 January 2020 11:26

Generally agree. Minor consideration for these two:

In the spirit of making things flow nicely (as with IsX fields), we could use “At” instead: CreatedAt, LastModifiedAt, DeletedAt.

celtschk · 1 January 2020 13:54

Here I see a contradiction: You say that Date fields should have the word “Date” in them, but your CreateDate and LastModifiedDate from their description are actually Time fields (they describe a point in time, not merely a day).

codymullins · 2 January 2020 17:58

I would highly recommend using a singular naming convention to avoid the weird sounding names and complexities. It’s much simpler, for example, with words like Activity where the singular version is Activities instead of just adding an s, like Post vs Posts.

I would additionally recommend being consistent in naming for your timestamps: ModifiedDate and CreatedDate. Keep them in the same tense and the same style.

JackDouglas · 2 January 2020 20:31

Table names aren’t case sensitive unless quoted — are you proposing quoting table/column names or just writing them PascalCase in the knowledge that the database will ignore case differences? Occasionally this might be important (eg PostEdit = PostedIt)

Lamak · 3 January 2020 03:23

Does every Id column really be bigserial?, it doesn’t really makes much sense for small tables to do that

cbrumbaugh · 3 January 2020 03:27

Foreign keys data types need to match primary key datatypes and rather that worry about the storage space, setting them all to bigserial means that you never have to worry that they are going to match.

Lamak · 3 January 2020 03:33

I know that datatypes in foreign keys need to match, but I don’t think that checking the datatype when creating foreign keys between tables is really much work. And big datatypes not only affect storage space, also performance

cbrumbaugh · 3 January 2020 03:38

How much of a performance difference?

manassehkatz · 3 January 2020 03:53

I could see sticking with serial here - that, or its equivalent) is what I typically use - 4 bytes. But 8 bytes is really no big deal.

Consistency across all the tables counts for a lot.
Ther is more extra stuff all over the place anyway (e.g., create & modify timestamps).
For the small tables it won’t matter (4 bytes per record x small number of records).
For the big tables it may someday be critical.

So go with it - good choice @cbrumbaugh Today’s CPUs have 64-bit registers. Today’s servers have multi-terabyte drives. The extra storage space and CPU time is practically nothing.

Lamak · 3 January 2020 04:38

I disagree that CPU use is practically nothing, since it will affect reads, joins, seeks, scans, etc. But, anyway, I’m not really invested enough in this to keep going

manassehkatz · 3 January 2020 05:28

My gut said “use bigserial, it makes sense” but I did a quick search anyway and found:

https://stackoverflow.com/questions/25531959/should-i-use-postgres-bigserial-for-records-in-a-new-application/25532623#25532623

so apparently at least at the CPU level it really is “nothing”, leaving the storage aspect which I do think is no big deal now - this answer was 5 years ago with the only minor concern being SSD cost, and those costs have gone way down.

404 · 3 January 2020 13:11

From what I gather, the database intended to be used is Postgres. In that case I would strongly recommend not using PascalCase for any identifier. The convention in postgres is to use snake_case. PascalCase names would mean every table name, and presumably every column name (since you use the same casing in the examples) would have to be quoted in any query. Apart from being a massive pain to write and very ugly, it’s also a great source of bugs when someone forgets to quote the identifier.

Tensibai · 3 January 2020 13:54

If that’s a requirement from start and to be auto populated by the DB system on insert/update (which I assume is what you have in mind), I’d rather set them at start, just after the Id. This avoid the need to reorder (or recreate) the whole table if there’s a need to add a new column.

(Not repeating the others points I agree with here, I’ve ‘liked’ them as mark of agreeing)

manassehkatz · 3 January 2020 14:37

Does adding columns in the middle really force a major rewrite in a DB like POSTGRESQL?

Tensibai · 3 January 2020 14:44

As far as I know yes, that’s kind of why postgres doesn’t have a way in ALTER TABLE to specify where to insert the column, the only way is dropping and recreating the table which comes with its own caveats in terms of FK etc.

Unless I missed something this wiki page is still valid: https://wiki.postgresql.org/wiki/Alter_column_position

manassehkatz · 3 January 2020 15:07

Interesting. In any case, I have learned over the years (with MySQL and PostgreSQL and with a variety of languages) to ignore the ordering as listed in the DB itself. Let an ORM or an internal (i.e., in your application code) define a conceptual structure and don’t worry about where the fields actually lie. It doesn’t affect performance or application programming. The only places where DB field sequence matter are:

Direct GUI-based manipulation (e.g., pgAdmin) - after initial development that should be relatively minimal, and over time should decrease as we development admin tools of our own for common functions
Insert of “all fields of a table in natural sequence”. That is, at times, very handy. But not a big deal - and not likely to happen much, if at all, after initial development.

So the end result is: Define the tables in an ORM model or whatever convenient & consistent application-friendly design that you choose (there are many ways to do this) and don’t worry if over time the sequence of fields ends in the actual database doesn’t match the sequence of fields in your code.

Tensibai · 3 January 2020 15:18

I’d mostly agree with that, the proposal did however specify a place for those two fields, which mean altering the column position if there’s a change to make later.

Hence my proposition to set them at start of the table to a be able to keep a consistent and “lintable for required fields” model without the burden or reordering the whole DB if there’s a need to add a column