On the anatomy of a post: Do two sizes fit all?

I just thought a bit about the differences between Stack Exchange, TopAnswers, and what we are striving to build, and then it occurred to me that there is a very general theme hiding in it.

First, consider Stack Exchange. There are three post types: questions, answers and comments. And for reasons that I hope will get apparent later, I’ll treat the title question not as integral part of the question, but as a separate entity.

So what is the difference between a question and an answer? Well, the answers belong to the questions. So we have a single-level tree with the question as parent node, and the answers as child nodes, So what is the parent of the question? Well, since I treat the title as separate entity, it makes sense to treat the title as parent to the question. Note that fundamentally, there’s no fundamental difference between a question and an answer, other than their position in the hierarchy.

You may argue that a question also has tags, which the answer doesn’t have. But then, one might argue that the tags could also logically be seen as attached to the title.

Now to the comments. The comments are attached to a question or an answer. Now apart of how they are displayed, how do comments differ from questions and answers? Well, they have a lower maximal length, and only a subset of Markdown is interpreted for them. But other than that, it’s again just a post with a single parent, but no children

So for Stack Exchange, we have s simple hierarchy: The title is at the top. A title has a single, mandatory question as child. A question has a title as parent, and any number of comments and answers as children. An answer has a question as parent, and any number of comments as children.

Now in Codidact, we want to have threaded comments. So what changes? Well, not much; just that comments may have children, which are other comments.

OK, let’s look at TopAnswers. Note that this is not about how things are actually implemented there (that I don’t know), but just abstractly thinking about what is presented to the user.

TopAnswers doesn’t have comments, but a chat. Apart from presentation, how does the chat differ from comments? Well, as far as I can tell, they are not attached to another post, but they are related to a specific question. So I’d say a chat message is like a comment, but attached directly to the title.

There are also global chats, that are not attached to a question. Those are in chat rooms which have their own names. But what is a name, if not a (very short) title? So those chat rooms are just titles which have only chat messages as children.

Also, on TopAnswers there are blog post. Blog posts differ from questions only in that you cannot answer them.

Let’s look beyond those sites, to no-Q&A-sites. For example the forum I’m writing this on, Discourse. Again, we have topics, that is, titles, and long posts that are attached to that title. Except that now we have several posts attached to the title, and posts can also have arbitrary other posts as parent.

So in summary, at least from the view of the back-end, all the posts are fundamentally divided into just two types: Long posts, which have all the features of Markdown available, and short posts, which have a much lower maximal length, and have only a subset of markdown features available.

Every post type (question, blog post, forum, chat room) has its own rules which type of post can be attached to which other type of post. Under each title, there’s only one type of hierachy, so it makes sense that the title comes with a type that fixes the rule for posts under this title.

Long messages attached to titles are questions, blog posts or forum messages, depending on the title type. Long messages attached to other long messages are answers or forum replies, depending on the title type.

Short messages attached to titles are chat messages. Short messages attached to long messages or short messages are comments.

So in summary, we get an uniform model that, at least in the back end, gives us everything from a chat room to a full Q&A site.

Of course the front end needs to do different things for each of them, so we don’t get the full range of page types for free. But I think adding a new one should get much easier, if the only thing that has to be written is the front end code, and new database entries laying out the hierarchy rules for the new type of page.

4 Likes

Thank you for writing this up so clearly.

There are other post types, though they are secondary to the ones you’ve described here. SE has, and we’ve talked about, tag wikis. I guess those would be sort of like blog posts in your taxonomy, with the title being the tag name. Another might be user profile text – SE doesn’t implement the “about me” text as a post, I don’t think, but it shares properties (particularly markdown), and we might choose to implement it that way. Profile posts would be kind of like blog posts with restricted editing – everything else can be edited by anybody (with the privilege), but profile text can be edited only by the owner (or mods/admins, but that’s a privilege override).

3 Likes

Good point on the tag wiki.

Actually tag descriptions would be similar to blog posts, but their own type because they have exactly two posts attached: a short one (the excerpt) and a long one (the actual tag wiki). The title would indeed be the tag name.

Indeed, that gives me the idea to another unification: Related posts. If tags are, under the hood, a post type, then tagging a question would just be adding a related post of type “tag”. While related posts of type “question” would form the ”related questions” list. Also tag synonyms could then be implemented as “related post” of type “synonym”. In that scheme, also tag hierarchies could be implemented, by just having a “related post” of type “parent tag”, without adding anything to the database structure, even if implemented only years after the initial release.

Profile posts indeed would be essentially blog posts with restricted editing rights.

It is not so clear why you treat the title as a separate post (ie. separate branching point).

Is in your idea a ‘title’-post able to get a number of ‘question’-post children that is different from one (ie. zero or multiple)? If not (or when it only occurs with very low frequency) then you can just as well combine the two together.

2 Likes

I would say that fundamentally a question and an answer are very different. But in practice (from the database of posts perspective) they are much the same (mostly same database field, same styling rules and licensing rules and other rules) and among other differences one of them (the question) has a non-empty parent-field.

Because as I described, while for a SE question, the question itself is the only child of the title, in other cases there are more children of the title. In chat, all chat messages are direct children of the title (in a pure chat room, there’s not even a question to attach it to). In a forum thread, all posts that are not replies are children of the title.

1 Like

I see, so your point is to mix all sorts of posts q&a, chat, blog together (as all have some sort of tree structure) and then the ‘title’ is the parent node of any tree.


This makes me wonder what the use is of this view? Currently, while typing/answering, I am imagining the analogy of the ‘tree’ database structure further, and I imagining that the q&a site is much like a forest. In this forest you have trees, but there are other organisms as well (other plants, animals, moss, bacteria) and it would be bad to place the (exact) same branched database structure on those. It might be much more clear to have different structures for different types of posts.

Or at least (if they all end up to be of some hierarchical branched type) to not place them in the same hierarchical structure but to have a different structure, with different relationships, for different posts (I am not an expert in this but I imagine that this growth in the width could also yield a faster database, in comparison to having one single table of posts).


I am not sure whether you can change your post but maybe you should highlight

I only realize now that this might have been your main point. (I got a bit lost while reading the post). You were more relating to the posts as nodes in a tree structure, which are all very much the same except some can big and others are small. So, I guess now that you were relating to the ‘similarity of the nodes’ and not as what I was commenting at ‘the similarity of the trees’.


I would still not consider the title as something separate (it would actually make three sizes; title, short post, long post). Only when one somehow is able to create a variable format that mixes different post type like questions, blogs, chat and forum (which would be great if one achieves that) then this could be useful. Otherwise the title can be just placed in the first (parent) post that started the page that bears the title.

I believe that the value of the two sizes might be not so much in the database (this is what I first imagined with your post; but I guess now that you would also like to keep different types of posts seperate), but more in the re-use of code and simplicity of the interface. You have all the frontend stuff (like editors) and backend stuff that handle the posts of two different sizes all in the same way, and only the connections and places in the database tables are different.

1 Like

It’s worth noting that SEDE stores questions and answers in the same table, and uses one table for both kinds of comment.

I stumbled across some more info. on SEDE. It is based on data dumped from the real SE database into another database to allow the public queries. The real database may be structured quite differently.

1 Like

True, but we know from the API that questions and answers share the same ID space. There’s also some more obscure post types.

Mainly simplification of the backend, and more flexibility in adding future extensions.

And my point is exactly that at closer look, they are not as different as they might see at first glance, and therefore could be treated in a pretty uniform manner.

Note that handling them all in their own way means extra code for each of them. Handing them in an uniform way means the same code can be used for all of them.

I’m not a database expert either, but even if we put different types of posts in different databases, it IMHO would still pay off if we could use the same backend code for all of them.


Done.

Yes, I though the title makes that clear. Indeed, I believe the differences between post types other than the large/small divide can be completely determined by their position in the tree. Different pages have different tree structures, but those different tree structures are all composed of the same two post types in different arrangements.


Did you ever have a look at TopAnswers? There we have exactly the combination of Q&A with blog. And while there are no comments on that site, there is no reason why the same software should not be able to support both a TopAnswer like question chat and SE-like comments (probably as alternatives, so that one topic community could choose a question chat, and another could choose traditional comments, but in principle both could even be supported at the same time).

Note that, as far as I understand it, chat messages on TopAnswers are not attached to either the question text or one of the answers, but belong to the complete page (that is, question and answers are discussed in the same chat). Therefore it would not make much sense to make the chat messages children of the message text; it makes more sense to make them children of the page title, which gives a title not only for the question, but also for the answers.

And attaching chat messages directly to the title, but comments to the post, also makes the distinction between both types of messages trivial: A short post attached to the title is a chat message, while a short message attached to a long post is a comment.

As of the title being a third type of post, one might consider the title to be a short post, too; being at the top of the tree makes it a title. On the other hand, if the title is considered part of the question, you also need three type of posts, namely long posts with title (questions), long posts without title (answers), and short posts.

Oh, and on “Only when one somehow is able to create a variable format that mixes different post type like questions, blogs, chat and forum (which would be great if one achieves that) then this could be useful.”: Yes, that’s what I had in mind (most post types would not be in MVP, of course). Note that TopAnswers puts blog posts and Q&A posts next to each other; a feature that several on this forum liked. In particular, on meta sites, the Q&A style isn’t too useful for general discussions; if after MVP, we could add forum-like topics with minimal effort, that would be quite useful.

I thought mostly about the backend. Some synergies would also be there on the front end (e.g. the editor, as you noticed), but I think the front end would have more need to distinguish different page types. I could imagine a synergy also on the database side, but I guess the database experts here could better tell whether that is a good idea.

2 Likes

I’ll call myself a “database expert” for the moment. (In some ways I am, in some ways I’m definitely not, but I do have a lot of real-world experience).

Real tree structures are hard.

Much harder than you might think. Been there, done that. Tons of complications.

However, what you have described as a “tree” is only very loosely a tree. In reality, most things are just simple foreign key links:

  • Question → Answer - That is a simple link. You may think of it “trunk → branches”, but I think of it as a “One to Many Relationship”
  • Question or Answer or Blog → Comments or Chat - Again, a simple link from the master item to the children “One to Many”
  • Comments or Chat internal structure - This is the closest, possibly, to a Tree, if we decide to support Threaded Conversations. But to keep the internal structure and the retrieval process relatively simple, I would have two backwards links in each Comment or Chat item - one is the master item (Question or Answer or Blog) and the other is the “parent” Comment or Chat item, which would be null or one level up in the thread. That makes retrieval of “everything for a Question or Comment or Blog” trivial (the most common case, by far) and also provides the necessary links to build a Threaded Conversation display.

One Main Table is OK

No reason we can’t have one Main Table for Questions/Answers/Blogs/Comments/Chats.

  • They all have a Type (Question/Answer/Blog/Comment/Chat).
  • They all have Author, linked History, Created/Updated times, Status (Active/Hold/Closed/Deleted - not all types will have all Status codes), Flags, etc.
  • Questions and Blogs have a Title - the other types do not. (Title is NOT a separate record.)
  • Some have Upvotes/Downvotes - the others can ignore this.
  • Some have import details (and only if they were actually imported, of course).
  • Some have other stuff we haven’t even though of yet.

User Interface, Display, Retrieval Will Vary A LOT

The User Interface and Display will vary dramatically.

  • Main Page - displays only Questions
  • Search - displays Questions, Answers and Blogs
  • Question - displays Question, Answers, links to possibly related items, Comments/Chat (possibly hidden initially)
  • Blog - displays Blog, links to possibly related items, Comments/Chats (possibly hidden initially)
  • Comments and Chat will each (if they are separate) have their own display format, which will be quite different from that of Questions/Answers/Blogs.
  • History will be available for Questions, Answers, Blogs - likely not available on Comments/Chat, except to Moderators.
  • Editing will vary between types - probably Question, Answer and Blog largely the same (full Markdown available, images allowed, etc.) and Comments/Chat more limited.

Using one Main Table for “everything” will result in a bunch of fields that are not always used. That’s OK. Storage is cheap. Maintaining lots of separate tables is expensive. The lion’s share of storage will be for text anyway.

5 Likes