Handling duplicate questions

Problem

There has been some discussion around question closure in general. The discussions always included “marking as duplicate” as a default close reason, which simply links to a list of user-selected posts.

Recently there has been some debate, how duplicate content should be handled. I can conclude two main requirements, which are shared by a lot of people, from these discussions:

  1. Posts which are duplicating an other one need to be closed (preventing answers) and there’ll need to be a way to say, where the answer/original question is.
  2. Closing posts as a duplicate may seem unfriendly/RTFM-y to some users, especially to those, who don’t have many experience with the topic of the community or the site.

Solution

I am suggesting the following solution to this problem:

  • It’s possible to mark a post as “duplicate”. Duplicates cannot be answered and redirect to the original post for anonymous users.
  • Marking as duplicate is a vote-based process, which is integrated in the normal closure progress. From the voter-perspective, it’s like a normal close vote.
  • Every “mark as duplicate” vote will need to link to an answer on an other question. Exceptions apply for linking to a blog post/canonical from an other category.
  • Once someone suggested an answer as a duplicate, a shadow answer is added, which loads its content from the suggested original. This answer contains a notice, that this answer is from an other question and was suggested as a duplicate origin for this question.
  • This shadow answer can be voted upon. The asker sees a button “This solved my question.” and one with “This answer is not helpful” on the answer.
  • Every “mark as duplicate” vote for an already suggested answer is counted as upvote on that answer
  • If the answer has a score of X (can be changed for each community, defaults to 5), it will be
    automatically marked as duplicate origin. The question will be marked as duplicate. This also happens, when the asker confirms the answer.
  • If the answer has a score of Y (can be changed for each community, defaults to -3), it will be deleted as not helpful.
  • If the asker rejects the answer, they are encouraged to edit their question to clarify it.

What do you think of this suggested process?

4 Likes

I originally posted this in a different topic, but it is specifically for “Handling duplicate questions”. (I will edit slightly but don’t have time right now…spent too much time earlier writing the original!)

I agree with most (maybe all, but need to think about the details) of @luap42 's suggestion. My post below is more of a problem description + a way of guiding users before they post a duplicate + my feelings about making “duplicate” not seem “bad” like other types of close votes.


I think a lot of the problem with downvote/close-for-duplicate is the perception by new users, who are likely (but I don’t have the data to prove it, just a hunch) the source of the vast majority of duplicate questions. Some experienced users are part of the “problem” too, because they answer what has already been answered elsewhere (either to be honestly helpful or for Reputation points or whatever) and they wouldn’t if the new questions didn’t exist.

The Problem

Essentially the typical new user has the following problems when asking:

  • They don’t know how to search the site well, or even that they should try to search to look for a duplicate. Solution: More/better automated keyword-based guidance when they start asking a question.
  • They don’t realize the site is supposed to be a repository of knowledge - i.e., they think it is Radio Shack :tm: you’ve got questions, we’ve got answers. Solution: short (because if it is more than a few lines most people will not read it at all), but long enough to be CLEAR, explanation of what the site is about. This should be presented at registration and at “start of first question”.
  • If, despite all that, they ask a duplicate (e.g., there are multiple terms to search to describe the same problem and they pick the wrong ones and just don’t find the duplicate, or they find a duplicate but think their question is unique), they get extremely negative responses - down votes and close votes. Solution: Provide a way to duplicate flag/close without it sounding just as bad closing for spam, trolling, hate speech, etc.

Of course, the end result is hard to say for sure, but I would bet that a huge percentage of new users who come in, ask one little (but important to them at the time) question, get a barrage of down votes and very nasty (based on perception of the SO system messages) flags/close votes, NEVER EVER COME BACK. I believe that many of those same people might become regular, and productive, members of the community if they were treated well at the time. That does not mean “allow duplicate questions”. The rationale for removing duplicate questions is, IMHO, 100% valid. The problem is the way we deal with them, and especially the way things are explained to the (likely very new) user.

The Solution

  • A lot more automated prompting and messages and easy to find help pages. But not so overwhelming that they become “more walls of text to ignore like license agreements and privacy policies.”
  • Clear information about potential duplicates and the benefits of not posting a duplicate question (both to the user - gets a faster answer by finding the question that already has answers - and to the community). I just started the Ask Question process in SO and noticed that it shows “Similar” but doesn’t say anything, at least not without diving in to help pages, about duplicates.)
  • A kinder, gentler way of handling duplicates when they occur. Language does matter. Even though I agree that the end result will be effectively the same - question closed - from a user (especially new user) perspective, the current Close process seems pretty ominous. Maybe a totally different category of “action”, perhaps two three different sections (right now SO has everything together plus a mess of pointers - e.g., you can Flag->Duplicate or Flag->Close->Duplicate):
    • Close - This means extremely unlikely to be salvageable, let’s just get rid of it:
      • Spam
      • Trolling/Hate/etc.
      • Opinion Based
      • Too Broad
    • Needs Improvement - This means "there is some hope, with effort from OP and/or assistance from experienced users**:
      • Needs More Details
      • Major Formatting Issues (e.g., didn’t put code in a code block, more than a few grammatical/spelling errors, etc.)
      • Needs More Focus (e.g., 3 questions in one - all might be good, but needs to just pick one and ask the others as separate questions)
    • Duplicate

The end result of all of these will be close so that nobody sees them except very high privilege users if OP (or others on their behalf) does not make significant changes. But they are really three very different groups of problems - “bad” (by any objective sense), “needs work” (which is not inherently bad) and duplicate (which is truly a category of its own - and the OP needs to understand what is going on in order to either revise the question so that it is not a duplicate, or agree (which includes “by default if they do nothing within a certain amount of time”) in a way that they understand what is being done and why it is being done and not lumped together with bad questions.)

5 Likes

Would Google complain about your auto-redirecting from a search result in that way?

Would a Google user find it odd, e.g. if I’m searching for “X” and land on a page about “Y” which doesn’t have the word “X” in it (because the question about “X” is redirecting to Q&A about “Y”)

It’s clever.

It’s a bit more work for the closer (to find an answer not just a question) – and (worryingly) the “Closing > Duplicate” UI is more complicated i.e. to list and/or select answers instead of simply topic titles.

It does solve two of the existing problems, i.e. …

  • Allegedly-duplicate topic doesn’t have an answer to this question
  • Question is closed without feedback/confirmation from the user

… so well done i.e. it’s an interesting theory.

  • Is the UI workable?
  • Can a moderator or someone ever cast a binding “yes this is a duplicate” vote?
  • What if the OP abandons the topic (which they might especially if they’re a casual or new user)?

No way, not being able to close questions as duplicates of questions without answers is one of the most annoying limitations of SE.

It also probably contributes to people feeling like there are invalid duplicates. It is the text of the questions that determine whether they are duplicate or not. We should be strict about closing questions as duplicates only when the truly do ask the same thing, or when one question is a strict subset of another. Answers should not come into consideration at all.

Perhaps something that could help is to be explicit about why you think it’s a duplicate: when you vote to close as duplicate you could have to choose when the question is exactly the same as another, or whether it is entirely contained in another question.

This would discourage people from making near-similar closures. But if you can’t find something that’s really exactly the same then it could motivate you to ask a canonical question so that the others can be closed as entirely contained duplicates.

3 Likes

Linking directly to an answer is a bad idea. First there might be complementary answers to the suggested original question, none of which has a greater claim to be the answer to the duplicate than another. Second, the original need only differ in detail, notation, or phrasing from the duplicate for answers to the former to be nonsensical as answers to the latter: typically you need to read both original & answer to see how your duplicate’s been answered. So the link should be to the original question, & shadow answers won’t work. Third, as @curiousdannii says, some original questions don’t have answers yet; the duplicate’s no less a duplicate for that.

On the other hand, encouraging feedback when someone’s happy with the suggested original is a very good idea: it’s gratifying to know when you’ve helped.

3 Likes

All the reasons for redirecting attention to a previous (duplicate) question are gone when that question is unanswered.

I think SE’s reasoning is that where the previous question failed (to get an answer), might as well give a new user a chance (with a newer version of the question and perhaps worded slightly differently).

That would be another option.

I’m not sure you can make a general rule about how to recognise whether/when two different questions “truly” ask the same thing – i.e. that would be an “AI-hard” problem, and/or it may seem obvious in certain cases but not others – I think the OP was working around that difficulty, by declaring that, “two question are sufficiently identical if they’re both satisfactorily answered by the same answer”.

It might be useful at some future point to make it more prompt and reliable to close all unanswered questions in a mutually-linked set as duplicates of the first one to be answered, if all are marked while still unanswered.

We don’t need strong AI, as we already have natural intelligence available to close-voters. (Or at least that’s the expectation of using humans for the job.)

Defining duplicate questions as having a satisfactory answer in common is usually close enough, but not always, so at the very least we should keep that definition’s limitations in mind.

2 Likes

My take on the point about AI was that if we could actually determine duplicates automatically then this would be a whole lot easier. Not just because it would be “automatic” but because the typical new user will react better to the computer immediately saying “Sorry, this is a duplicate of ‘xyz’, you can get answers over there” than to people replying an hour later saying “Hey you idiot, why don’t you search better? If you did, you would have found duplicate ‘xyz’ that has some answers.” I know the “duplicate close” message doesn’t use the terms “you idiot” and we (obviously) discourage users from using such terms in comments, but that is the feeling a user can get from the existing SE-style process.

At this point we need to have humans in the process, so coming up with a way to let the users understand what is going on without feeling “like idiots” and without leaving forever (we want those new users to come back when they have the harder/more interesting/not duplicate questions) is the goal.

3 Likes

“AI-hard”: I meant there are some cases where even as a practised human I’m uncertain whether something is “truly” a duplicate of something else – not on SO perhaps. where the subject matter is more boolean than most natural topics are, but on another site.

So different users could differ on whether something is a duplicate. And it’s difficult to formulate a general rule (even to explain to humans) to define whether something is duplicate.

In that case instead of voting to “close-as-duplicate” (perforce using an autocratic moderator’s “binding vote”), I ask the user via comment, like, “There’s topic X which is similar to your question, maybe the same. Do those answers there already answer your question? Or can you explain how your question is different from that one?”

Because I want the new user to have their question answered one way or the other, and closing the question isn’t the top priority (but may or may not be the best way to serve up satisfactory answers to a new question).

And I’ll check whether the reference has some “good” answers already because otherwise what’s the point.

So the OP’s proposal above is kind of automating or providing software support for what I do already.

The site I’m referring to is (per its community’s request – who were possibly mostly never SO users) geared toward being welcoming, and to not closing any on-topic question if at all possible. I don’t want tragedies like, “I tried to ask and they closed my question and I can’t get an answer” – I want to be believe that if I “close-as-duplicate” then that’s because that’s better for the new user, i.e. the old question has instant satisfactory answers to their specific “new” question.

I’m not saying there aren’t duplicates – there is such a thing as a “frequently asked question” – in practice there are sometimes near-duplicates, as it were a 95% overlap, or an 80% confidence that they do overlap. Some communities or people sometimes err on the side of closing a question too hastily and leaving the new user without a satisfactory answer.

1 Like

Apart from whether the UI design is feasible (or too complicated), I have one other reservation or quibble.

That is that sometimes a question is asked (and is frequently asked) exactly because it has no single easy answer – and/or has several answers, which might seem mutually contradictory.

For example, “What is an elephant?” leads to The parable of the elephant.

One of the benefits of posting online is that you may get several answers in reply, each (user) with a different “view” or explanations of different “aspects” of the topic.

Given several answers, you might see that there are several answers, and synthesise a comprehensive or more 3-D understanding of the topic than you’d get from a single answer.

That’s one use-case where selecting a single old answer as answering the new question isn’t completely satisfactory – i.e. because the old question might have been answered by several good answers.

(And the fact that the old question is well-answered by several answers makes it difficult to answer the new question equally well, therefore even more worthwhile than usual to close the new one as a duplicate – if possible).

2 Likes

This isn’t something that needs to be dealt with, unless a community wants to completely depart from the proven successful principles of SE. Such a question will simply be closed as needing focus/primarily opinion based.

That was a link to a Physics topic (“wave/particle duality”) – just me trying to pick a famous hypothetical example.

It’s true that the Buddhism.SE guidelines for what questions are permitted explicitly contradict several of the guidelines you chose for Christianity.SE – which may make your range of permitted topics more seemingly-objective (and possibly “doctrinaire”, I don’t know the site).

SE does allow (many) rather “subjective” sites i.e. where answers may depend on or vary according to the experience and training of the users who are answering – see Good Subjective, Bad Subjective.

Even the simplest questions like “how do I travel from X to Y?” might have several answers.

Buddhism tries to answer questions like, what’s a good description of the world, what’s a good way for different people to think and behave – written using general rules (or “as universal truths”) which people are meant to understand for themselves and apply – and written using antique languages which don’t have one-to-one semantic correspondence with english vocabulary. I think it’s a worthwhile subject-matter and that SE Q&A format is a beneficial form or site, resource, for studying it.

As it happens the chief example which I had in mind here, is also the most-upvoted question on the site – i.e. Is rebirth a delusional belief? – I think that’s very much a FAQ for people new to the doctrine. And as it happened I found several of those answers useful and I found it non-trivial to “accept” a specific one. I chose the one which seemed to be most orthodox, the safest.

The extent to which opinion is allowed might vary from site to site. I asked a question about graphic design recently on that site – they allow Questions tagged [critique] which are (are they not?) “opinion based” to some extent. A ‘critique’ question might never be an exact duplicate of another, but again it’s an example of a topic which will benefit from several answers.

One of their comments under my design-critique question was …

This is a very subjective question, and I suspect there is no right/wrong answer.

… but they answered it anyway because, and very usefully.

I’m not talking about the case where “there is no right/wrong answer” – I’m talking about the case where there are several independent right answers (sometimes even seemingly contradictory because they differ) – and maybe no single answer is entirely satisfactory.

1 Like

"Do those answers there already answer your question? Or can you explain how your question is different from that one?”— isn’t this the automatically generated message the OP gets in any case if you close their question as a duplicate of another? The advantage of closing right away is that it prevents answers prior to clarification of the difference, if there is one.

Yes – or if that isn’t (the message) then it could be (if the message were configurable).

Yes, if you consider that an advantage. It’s one of the effects, anyway.

People who ask possibly-a-duplicate-or-FAQ are often first-time users – I’m not confident they understand why and how to edit.

That (i.e. “it prevents answers”) is the same reason for closing a question as “unclear” – i.e. it requires an edit before the question can be answered. But likewise the site’s community didn’t want that rule either (i.e. that close reason) – so (instead of closing) you can still do all the usual things, e.g. …

  • Comment to ask for clarification
  • Edit to improve clarity
  • Decide for yourself to refrain (or not) from posting an answer until it is clearer
  • Wait to see whether another user can cue you in to what the question is about

… but as I said, the community didn’t want the usual “in case of doubt then close it” standard SE practice or close reason.

Sometimes I do close a question as duplicate, sometimes I just comment. If I’m not sure whether it’s a duplicate (i.e. whether the new OP will find the old answers satisfactory), I don’t see a benefit in preventing other community users from answering the new question if they want to – they too can see my comment (that there is potential duplicate) and judge for themselves whether to answer the new one.

1 Like

Duplicate questions - a retrospective

In days of Usenet of old, every group of sufficient size had its FAQs. These FAQs were posted on a gulag basis on the group so that new users would have a change to see them and not ask the same questions again and again. Some examples of these FAQs include comp.lang.java FAQ, alt.frequent.questions.ask.ask.ask, C++ FAQ Lite, the annotated American pie, soc.culture.jewish, rec.games.rougelike.nethack and countless others.

The culture of Usenet was one where until about September 1993, one would lurk on a group a bit - reading things but not contributing until you’ve got a feel for the group that is there. In this time, one would often see the FAQ posted and be able to read it (and not have to ask the same question again).

On forums, the approach was for sticky posts that sit at the very top of the page. If you visit https://www.reddit.com/r/learnprogramming/ you will see “New? READ ME FIRST!” there. Often these would contain the commonly asked questions.


The design of StackOverflow didn’t originally have duplicates and users were to edit in the duplicate text to the question.

You can read more about this at Handling Duplicate Questions from '09.

The problem with duplicates can be seen in the 9,512 linked questions to What is a NullPointer and how do I fix it? (in 2014, this was 410 questions). Sort by newest to see the rate that these come in at and how well they’re received.

These questions take time, curation, and people answer them again and again and again.

While this may may encourage some people trying to get reputation on Stack Overflow, it also has the side effect of discouraging the people who are core contributors to the site.


The biggest problem that I see in this area is the difficulty in discoverability of the canonical answers combined with the expectation that a personalized answer will be provided. This was attempted to be done in the Tag Wikis (servlets and java for example), but the discoverability was challenged there. This was also attempted to be resolved with Documentation… and that attempt didn’t work out as planned.


Realize also that different sites and the communities that form around those sites have different expectations. It is probably a mistake to expect that the way that one community handles duplicates will also work best for the others. There are constraints that are in place from the server software that prevent some things from working and there are differences in scale. It is one thing to have a site where there are seven new users a day that one can help guide through the use of the site and the norms of the community… it is quite another when there are seven hundred or seven thousand new users a day.


The challenge that exists is to make it so that duplicate questions are resolved and curated well without taxing the core community significantly.

Part of that is to make sure that there is a sufficient barrier to entry. Another part is to make sure that there are low friction tools to curate the duplicate material (gold badge dup closing was one example of this) as well as an easy way for new users to discover existing material… and that last point is where I believe that Stack Overflow failed the most.

Not closing posts for answers increases the amount of work that people trying to curate and maintain the site have to do eventually… or they end up with the 9k posts that they’d rather not look at.

6 Likes

Stack Exchange provides a list of “similar” questions as you type the title of a new question.

It’s a great first attempt at interactive discoverability, but it doesn’t work particularly well. And if they’ve been working on it, I never noticed any improvements.

That is a difficult problem in natural language processing. Perhaps using the proposed tags as well as the title text would help some, but new users may be as poorly prepared for good tagging as for knowing what to search.

Hmmm … while we may want to vary the behavior shown to the user when we find un-answered duplicates we still want the system to know that they are related. Because duplicate linking serves to relate the many ways people can phrase the same question.

Perhaps they should both (or all) remain answerable until one of them gets an answer. Adding UI support for composing and self-answering a canonical version would be a nice bonus.

3 Likes

That’s a very good idea. It would eliminate any discontent someone might feel at having their question closed as a duplicate of an unanswered one. (I don’t think anyone who notices a duplicate of an unanswered question can reasonably be expected to keep checking both until one has an answer before closing the other.)

What about (as well/instead?) bumping a question (with/without answers?) to the front page when another’s marked as its duplicate?

1 Like

Answerers can be first-time users too; & it’s not all that uncommon that in the time between the first ‘possible duplicate’ comment & closure, someone wastes their time writing the same answer someone else already has (or a scantier version). Or if they do have anything useful to add, it’s now on a separate post. To make things worse, if the OP then edits their question to differentiate it, the answer can become ‘not an answer’ & even get down-votes.

If I can’t tell the difference between what two questions are asking, when I understand the topic well, & am reading carefully; I don’t need to worry about having missed any nuanced distinctions before closing one as a duplicate of the other: it’s likely enough that answerers will miss those too, & better that the OP should make them clear before anyone answers.

But of course, things work differently on different sites, & I don’t mean to presume that what’s good for one is good for all.

2 Likes

Tbh, I don’t consider it a big problem. There is actually a big difference between feeling unwelcome and being unwelcome. If you’re new to these kind of forums, then learn. Imo, one of the biggest problems with SE is that people are not taught hard enough that they are supposed to do background research.

That being said, I like your proposal in how dup closing should work.

1 Like