I’m not sure I understand your point. If the voting is unanimous, what scoring method could be better than Wilson-lower-bound or Wilson-center? This is just irrelevant, since Wilson can’t be worse here than the SE style of up - down.
In addition, most posts acquire only a handful of votes and there is too much randomness, which the Wilson score is not gonna make up for.
Statistically, Wilson is specifically designed to distinguish noise from signal by accepting a certain specified chance of being randomly wrong. I believe z=2 here is about a 5% chance.
The SE setup of up - down is not designed for any particular statistical standard and I don’t know its expected error rate, but it’s guaranteed to be higher, probably much higher. (It’s not actually possible, as far as I know, to get a lower error rate than Wilson simply by using a different algorithm.)
This Wilson score is assuming a large number of votes, in both up votes and down votes. (it uses a z-score which is a continuous normal distribution approximation of a discrete binomial distribution). So it might make the ordering only worse: a post with +4/-1 will tie with +2; but what if that -1 is a mis-click or just an unhappy OP or bystander?
Also, what if it isn’t? What are the actual rates of occurrence? (The Wilson score is not more reliant on large numbers of votes than any other scoring system, by the way. SE-style scoring is highly unreliable at low vote counts too.)
If the distribution is biased (which it might be), then we’d need to figure out how to compensate for that, or just accept that once in a while even the best algorithm is going to do slightly worse than a substantially inferior algorithm. But I don’t really see why a disgruntled OP should be factored in; if, as seems likely, we gate downvoting behind a privilege of some kind, most OPs won’t be able to downvote, and in any case, how you distinguish “this correct answer to my question made me irrationally angry and I will therefore downvote for no good reason” from “this answer to my question did not help my actual case”?
Obviously, some topics (politics, parenting, tabs vs spaces) are likely to get votes more on opinions and tribal affiliation than reason and evidence. But whatever scoring method we choose can’t fix that. That has to be handled some other way.