Rate Limits and Other Special Protections

manassehkatz · 25 November 2019 06:13

As seen on this question, SE has a lot of limits on how often users can perform various actions. I suspect that most of these limits were developed over an extended period of time to combat both human & machine abuse of the site.

I have a feeling we will, over time, have to deal with similar issues. While I am cautiously optimistic that we will not need to deal with these problems immediately, and therefore I am not tagging it as an MVP item, I think we should plan ahead for these types of issues, even if we don’t do anything about any of them in the initial production system.

A few key things to consider:

Rate limits == rate tracking. A short-term database of “every action + IP + user” may be needed to catch problems.
For example, we will not need to separately track every page view. But if we see too many page views from one IP/user in a very short period of time, it is likely to be a robot either trying to hack into the site somehow or simply scrape all the data. The data will be under some form of open license, but it is far less stress on the system to provide that same data via a well-designed API than as a series of page views.
CAPTCHA - I can’t stand CAPTCHA (and similar systems). But they may be a necessary evil to deal with “intermediate level” rate limits - i.e., if someone tries to view 10 pages/second or post 10 comments/second, etc. then it is clearly not human. But if someone tries to comment once every 3 seconds for 2 minutes? That is extremely improbable yet not nearly as easy to automatically block since “2 comments 3 seconds apart” is easy for a person to do.
New User Limits
This has been discussed to some degree elsewhere. The basic concept is that is a new user votes a whole bunch of times they are likely to either be a troll or just someone who doesn’t understand how to use the system properly (e.g., voting all over the place without actually reading the questions & answers), while the same action by an experienced user might be perfectly legitimate.

bertieb · 25 November 2019 13:15

I think it’s worth an extra layer of caution; bots by their nature have the capacity to cause quite a bit of work to clean up in a very short period of time. As such, I believe we should have some basic protections in MVP. A site that falls prey to bot spam is not viable.

In addition, having good moderator/privileged user tools to deal with those that evade rate limits are important, but those perhaps deserve another topic as there will be overlap with tools for moderating humans.

manassehkatz · 25 November 2019 15:12

I am certainly not against having any tools/protection mechanisms in the initial product. I am just saying I don’t think they are necessarily MVP. But I want to make sure the structure supports it - e.g., have easy places to insert automation & tools for verification, cleanup, rate limiting, etc. We have an advantage over SE in that we can bake our tools right into the system instead of having to work via an API, and we can learn from what we have all seen over the last 10 years (or more) on SE and many other platforms with respect to these types of problems.

cegfault · 25 November 2019 21:43

IMO we can use standard / open sourced tools for MVP and refine them as we go. We don’t want no protection, but it’s also not something I think we have to have flushed out on day one.