On the value of scores and votes

gilles · 30 November 2019 23:47

In What are we trying to build? - #88 by cellio we’ve been discussing how voting and score reflects the quality of an answer. Since that thread is huge, let’s have the discussion here.

The starting point:

What are we trying to build?

Are they though? Take the +25/-15 example. If the +25 is from seasoned professionals and the -15 are from “hobby” level users, what does that say?

Flip that on it’s head. If the +25 votes are by “hobby” users and the -15 are from seasoned professionals, what is the take away? And the positive relative “score” of the answer makes it easier for those who aren’t sure to give it additional up votes (well if others like it, it must be good, right?).

Or it could simply be that “user X” pissed off 30 people in chat and half of those went out and voted down a few of X’s answers. So what do the votes really tell anyone about the answer?

Unless a post gets some statistically significant number of votes, it really doesn’t actually provide valuable information. Sorting by weighted votes would be a better indicator of good answers. I agree that sorting isn’t enough by itself (after all, if every answer is bad, the top will only be the least bad answer).

gilles · 1 December 2019 00:13

@Marc.2377 Uh? What makes you think this doesn’t happen?

It’s impossible to tell for sure because votes are anonymous. But there are answers which are, to a topic expert, clearly wrong, yet have a positive score (with many downvotes, but not enough to offset the upvotes). This is fairly common when an answer got a lot of off-site attention, for example via HNQ. And more rarely there are answers which are well-explained and, to a topic expert, clearly correct, and yet have a negative score (with many upvotes, but even more downvotes).

Security.SE has a famous example of an answer which was clearly upvoted by non-experts (or by people who didn’t read the answer): Jeff Atwood’s answer to what is now (and IIRC has been for a long time) the highest-scoring question on the site. Thomas Pornin’s current answer is clearly (to an expert) the best answer for an expert audience, but his original answer wasn’t nearly as informative: he added the good stuff really came more than a year later. AviD’s answer is a good “TLDR” answer and was the best early answer. Jeff’s answer is plain wrong (you don’t even need to be an expert to figure it out, but if you aren’t you need to read the answer very closely): it starts out alright, but then it goes completely off the rails (“Point 3 is almost unanswerable and I think personally highly unlikely in practice. I expect …” — no, point 3 is definitely answerable, but then Jeff would have had to forego the conclusion he wanted to reach…) and comes to a wrong conclusion. Yet, for years, Jeff’s answer scored above Thomas’s. It took a concerted effort of the Security.SE community to publicize this question on the site until Thomas’s answer overtook Jeff’s answer for second place. It’s clear that Jeff got upvotes for writing well and for writing a long answer, not for writing an answer that made sense.

Of course that’s just one single anecdote. But even one anecdote is enough to invalidate “this simply does not happen”.

user1306322 · 3 December 2019 05:54

I’m assuming the discussion here is about how to make sense of post votes better. Search for “reaction” because there are already a couple posts about them as a better way of expressing opinions on posts in addition to the good old arrows.

Here’s my take on it: Rating and moderating questions and askers - #12 by user1306322

Basically, I don’t think upvotes and downvotes are sufficient or useful enough on their own, and could be extended with specific reactions.

MartijnWeterings · 3 December 2019 11:07

I believe that a voting system of the type like Stack Exchange’s is not sustainable and it is not providing the good (useful) information (it is nice fertilizer for gamification and it - negative votes - may work to get rid of weeds, but it is not making a pretty garden otherwise).

See also this post of my on StackExchange https://meta.stackexchange.com/questions/338680/is-the-voting-and-reputation-system-sustainable-how-can-we-improve-it-or-maybe where I provide a bunch of arguments and create a wider image of this view.

Sidenote:

In my post to SE that I mentioned, there is a link to a database query that can help to track the history for scores of different answers on the same question. Here’s the view for the securitySE question mentioned by Gilles. It is obvious how the ‘better’ posts are struggling to get above the ‘popular’ post. In addition, the posts have obtained a lot of votes in only the first few days. The newer placed posts are never gonna obtain a similar amount of votes.

This gamification system, which is about quantity of votes and not about quality of votes is gonna lose its value over time and sort of locks the game/play on any old post, making it such that most people lose the appetite to make any improvements in the old (popularized) topics. Any scoring system that relies on quantity is (quickly) gonna become highly asymmetric over time, and reduces the quality of the game and with it the quality of the content.

https://data.stackexchange.com/security/query/1157262/track-voting-for-multiple-answers-to-specific-question?ParentQuestion=6095#graph

gilles · 3 December 2019 23:02

There’s a discussion on using a different scoring method to give downvotes more weight: How should we compute post scores? - #2 by tuggyne

And also on sorting based on criteria other than votes, in particular age: Handling "misvoted" content - #6 by gilles

Thanks for the graph by the way.

MartijnWeterings · 5 December 2019 14:08

Below is another image that shows how to relativate the scores and votes. ( https://data.stackexchange.com/stackoverflow/query/1162693/distribution-of-vintages-among-top-vote-receivers?FreqPerYear=4&TopNumber=1000&MinYear=2006#graph )

For every quarter we look at the top 1000 most popular posts (questions) in terms of how many votes they receive. We look how these posts are distributed depending on the year that they have been made (vintage).

You can see a clear decline in the popularity of new posts. Each year is scoring less often in the top than the last. Only in their own year the posts might beat the old top posts, but even that is declining and the year 2019 is not even beating the other vintages in it’s own year (thus not being able to make use of the advantage of being novel).

I imagine there might be two cases (or a mixture of both):

All the good posts have been posted already. Activity is much less about making those kinds of posts now.
Due to the initially low score it is much more difficult for a new post to get attention to obtain score/votes and climb up the ladder. As a result the gap between old and new posts keeps increasing.

In any case, the game is over. The value of scores and votes has decreased in meaning. (and anybody that wishes to remain playing the game - increase rep and status - might be scraping of the last pieces by posting in high volume, but potentially low quality)

Rplot73

Olin · 5 December 2019 15:49

Very interesting. This seems to mean two things:

The building of a repository of knowledge worked. Now what?
High volume above some threshold is actually bad. Basically too many post bury each other too quickly.

So how should a new site be managed to operate sustainably in the long term? Is that even possible? I can see some points this is arguing towards:

A site must be split up when the average time a question spends in the most recent active 25 gets too low. The trick will be to split it up well, so that you don't end up with one site nearly as large, and a smaller one with traffic below critical mass. That will be tricky, but I think doable. Site managers must be willing to split/merge occasionally based on question volume feedback.
Obsessing over not loosing every last question is not only bad for overall site quality, but you don't actually want that many questions anyway.

MartijnWeterings · 5 December 2019 16:32

You could also see it like some pyramid scheme
The community being build up by rep and votes, but now the questions and answers to make the score/votes/rep are up. I am not sure whether splitting up is neccesarily so good. It might make things easier to oversee, but it may not change the content a lot. (and how are you gonna do the split?)

Olin · 5 December 2019 17:01

OK, what do you suggest?

It seemed to me that size is the thing that ultimately works against a successful site. If that’s true, then two smaller sites are better than one overly-large one.

Any site broad enough to have over-the-tipping-point volume will have multiple catagories of information. It shouldn’t be too hard to split a large general site into two or more specific sites.

For example, if you decided that Electrical Engineering SE needs to be split, you could break it into utility-scale power, RF, microcontrollers, analog circuitry, etc. If any of those get too big, you find sub-catagories to split them further. Repeat as needed.

If you mess up, two sites can be merged.

MartijnWeterings · 5 December 2019 19:42

I do like the idea of splitting in some way. But wonder whether it isn’t just too cosmetic to split the sites entirely. Possibly we need to consider it in a much grander scheme than what can be done with the current SE/SO website.

To compare it with living organisms. Whether a split is gonna ‘work’ depends on what creature you are talking about. If it is a human than a split in two will kill the human. However for cellular organisms splitting is daily (hourly) business. But, still, for cellular organisms you will have the effect that they can not split indefinitely and eventually they will die when they do not get new food. Also for something like trees (maybe wikipedia is like that) they will eventually die.

So, I would say that we should not try to have the organism (the site) survive at all costs by doing these kind of splits. But what we need is a mechanism for the DNA to survive.

While thinking about this split of Codidact over the last weeks, I have lost a bit my hopes for the concept of Q&A. Creating split offs seems to much like going along the same dead-end road (no clear evolutionary adaptations except change of governance) .

What would excite me is when the ‘content’ would be more portable, lightweight, and less dependent on the platform/substrate that it is living off. It is not about splitting, but about being able to switch.

If we wish to beat the wild yeast that SE/SO are then we need to find a way to adapt to their mistake of being solely a model that only lives while it is growing and dies otherwise. What I believe now is that some form of mutualism with a tree like wikipedia is neccesary (use all of the four answerers in one ecosystem)

Marc.2377 · 12 December 2019 07:24

Gilles, sorry for taking so long to reply here. By calling that scenario unrealistic, I was arguing that the 25/-15 example is a (probably intentional) exaggeration (first illustrated by Monica). I don’t mean to say that answers containing wrong information aren’t upvoted (they are!); my point was that, whenever a post gathers any significant number of votes in one direction, the difference between upvotes and downvotes increases. 25/15 is a 67% delta, and since 25 is already a fair number of upvotes, examples of this specific score distribution should be quite hard to come by. They probably exist but should be very rare in my experience. But then again, I should observe that due to reputation restrictions, I was - until very recently - only able to see score distributions on Stack Overflow and not on any other site, so my experience is definitely biased. Even more so in comparison to yours

Anyway, sorry for not making my previous post more clear!

P.S. I can’t see the upvote/downvote count for the example you provided (thanks, btw) - how is Jeff’s answer scoring these days? Just out of curiosity, if you don’t mind ^^.

(Please check this as well, @YLearn, since you addressed me on the other thread about this.)

cellio · 12 December 2019 15:02

Uh, you should visit some of the more subjective sites. That sort of vote controversy is not as rare as you think it is. It wasn’t an exaggeration.

Marc.2377 · 12 December 2019 15:18

It’s a matter of reputation - I visit many, but just can’t see it. Usually, the presence of highly upvoted comments indicating something is wrong with a given answer is indicative for me that the upvote/downvote ratio is high enough (in either direction). BUT, if you’re all telling me this, I believe it!

MartijnWeterings · 12 December 2019 17:11

On a site like security.SE the distribution of downvotes is very much independent from the number of upvotes (except the difference between the cases of upvotes=0 and upvotes>0).

There are very little downvotes (80% of the answers is without downvote) only 42 answers (0.04%) are with more than 10 downvotes (Jeff’s answer 296/52 up/down is one of them).

So these controversial cases do not occur that much, at least not on security.SE and the down-voting is roughly independent of the number of up-votes (it doesn’t matter how much upvotes a question has, it does not influence much the probability how many downvotes it gets; more upvotes means only a slightly larger probability for downvotes; the causal relationship is probably that posts with more upvotes are also older posts or more active posts that had more time to aquire downvotes).

Rplot379
https://data.stackexchange.com/security/query/1166843/distribution-of-up-and-down-votes#graph

In comparison

a site like stats.SE has a relatively lower amount of downvotes.
a site like StackOverflow has relatively lower amount of downvotes for questions with 0 upvotes, but for questions with some or many upvotes there are more downvotes.

The dynamics is a lot different (but on all sites the downvotes are relatively low and only differentiates a small percentage of the posts in the tails of these distributions).

Rplot380

Rplot381

Marc.2377 · 13 December 2019 07:28

I’m loving these sorts of data visualizations. Please keep bringing them up.