As seen on this question, SE has a lot of limits on how often users can perform various actions. I suspect that most of these limits were developed over an extended period of time to combat both human & machine abuse of the site.
I have a feeling we will, over time, have to deal with similar issues. While I am cautiously optimistic that we will not need to deal with these problems immediately, and therefore I am not tagging it as an MVP item, I think we should plan ahead for these types of issues, even if we don’t do anything about any of them in the initial production system.
A few key things to consider:
- Rate limits == rate tracking. A short-term database of “every action + IP + user” may be needed to catch problems.
For example, we will not need to separately track every page view. But if we see too many page views from one IP/user in a very short period of time, it is likely to be a robot either trying to hack into the site somehow or simply scrape all the data. The data will be under some form of open license, but it is far less stress on the system to provide that same data via a well-designed API than as a series of page views.
- CAPTCHA - I can’t stand CAPTCHA (and similar systems). But they may be a necessary evil to deal with “intermediate level” rate limits - i.e., if someone tries to view 10 pages/second or post 10 comments/second, etc. then it is clearly not human. But if someone tries to comment once every 3 seconds for 2 minutes? That is extremely improbable yet not nearly as easy to automatically block since “2 comments 3 seconds apart” is easy for a person to do.
- New User Limits
This has been discussed to some degree elsewhere. The basic concept is that is a new user votes a whole bunch of times they are likely to either be a troll or just someone who doesn’t understand how to use the system properly (e.g., voting all over the place without actually reading the questions & answers), while the same action by an experienced user might be perfectly legitimate.