I have an SE clone. How do we feel about building on that?

ArtOfCode · 9 November 2019 21:52

Here’s one that I haven’t yet mentioned (mostly because I completely forgot about it): a few years back, I built an SE clone. It was based on the idea of being a like-for-like clone of SE, so it’s pretty much the same, albeit very much simpler.

The source code is available here, and there is a temporary live instance of it running too, for anyone who wants to test/poke it.

If we were to build on this software rather than building something new, these are the pros and cons as I see them:

Cons

It’s not the tech stack we voted for; it’s Ruby/Rails. That said, Ruby isn’t too difficult to pick up, if we wanted to.
It’s old. The version of Rails it was built on is at least four years out of date; there have been two major version updates since. I had to upgrade it one major version just to get it to work at all. We’d need to do significant work here to bring it up to date.
It might be too similar to SE. If we wanted to use it, we’d most likely want to change how various things work to suit what we want out of it (I’m thinking particularly around comments, but other things likely apply too - the other that comes to mind is a meta/discussion/community area, which it doesn’t currently have).

Pros

It’s already started. This should not be underestimated; the cost of starting a new software project is always high compared to building on top of something existing.
It solves a number of architectural questions for us already.
It already has the basic functionality: questions, answers, comments (all with markdown), reputation, privileges, notifications, some moderation tools (including flagging).
We can still do what we want with it, and take it in the direction we want to; I personally have no particular attachment to the software, and I’m happy for it to be reworked into what is effectively a different product. This likewise applies to the design - I’m well aware it’s outdated and in need of mobile support, so we can create our own new design for it.
It has decent test coverage already, so we won’t need to spend as much time writing tests for it; we can mostly test new functionality and bring existing up to date later.

rodolphito · 9 November 2019 22:11

First of all, its awesome that you made this. Good job on it, its a lot of work.

I feel we should vote among contributors, because I’m not sure how representative this vote was of the people who will actually be working on the project, but as it stands, nobody was okay with ruby/rails.
22%20PM
That said, I have no aversion to learning new things, although it must be taken into account that progress will be slow if everyone is learning on the way. I haven’t looked at the code much, but I am already biased against the language because of dynamic typing.

Can ruby scale to meet the loads we will get if we are successful? Are there examples of large scale performant software on ruby? Ruby + MySQL vs asp net core + Postgres

source: https://www.techempower.com/benchmarks/#section=data-r18

ArtOfCode · 9 November 2019 22:19

GitHub and GitLab are both built with Rails; they have huge workloads. I also maintain another application using Rails that handles a huge dataset with no issues. Benchmarks aren’t always the best way to judge things…

I’m more than happy to have a discussion about which database we use; MySQL is what I’m familiar with, but I know it’s not the highest-performing system out there.

As for dynamic typing… it ain’t so bad. There’s the occasional gotcha, aye, but there is with static typing too; the cleaner the code you write, the easier it is to avoid the gotchas that creep in.

rodolphito · 9 November 2019 22:54

Sounds good, and yeah I know benchmarks should be taken with a grain of salt, but its just surprising that its over an order of magnitude of difference. Knowing that GitHub and GitLab use it is a good sign. I still maintain that we should have a vote among contributors to see who is comfortable with using/learning rails.

ArtOfCode · 9 November 2019 22:57

Aye, probably. I tend to be a little wary of putting too much stock in voting on every decision, but some decisions makes sense. I wouldn’t be averse to voting anew on the stack, in any case, given that it was decided among a very small group (I hadn’t joined yet, and I came fairly early on, so it won’t have been a very broad vote). Folks may also feel differently about it if it’s a question of “choose your stack freely” or “choose your stack, but we’ll have to do less work in X or Y stacks”.

Marc.2377 · 9 November 2019 23:02

Another web application that seems to be doing mostly well is https://dev.to/ (source code at https://github.com/thepracticaldev/dev.to).

That said, RoR applications do appear to require more hardware (from my experience hosting Discourse and OpenProject) due to being heavier on resources. We were initially looking for a stack that scales as well as possible - while maintaining an acceptable degree of productivity - without having to throw more hardware to (in the words of Mason/Stormhunter), ‘emulate scalablility on top of non-scalability’, thus keeping hosting costs low. This was said in the context of Node.js (and PHP before that), but in my experience, while less true of Rails, it’s still true to a certain level.

Btw I hear GitLab is very resource-hungry for what it offers. We considered hosting it in-house for a company I worked with, but they decided to keep using TFVC ().
(p.s. I’m familiar with GitLab’s source code.)

Maybe with great caching (static content and otherwise) it’s possible to minimize this, and it appears that’s what Dev.to is doing.

Anyway. Of those 10 votes against Ruby on Rails, 7 are from the contributors team, and one is from a programming advisor. I was one of the members who voted against RoR on that poll. While I do find the language itself nice and even pleasurable to develop with, I strongly feel it would not be a nice fit for this project.

Will comment more on @ArtOfCode’s QPixel application after I finish evaluating it.

ArtOfCode · 9 November 2019 23:09

It’s important to keep thinking about things in proportion, and relative to reality.

Is Rails the most efficient web framework there is? Hell no.

Is it sufficient for what we’re likely to need? Yes, absolutely. With the right choice of database and server software, a single hardware server (such as an EC2 instance) can run a Rails application up to hundreds of requests per second. We’re not likely to see that level of traffic possibly ever, or at least not for a number of years to come; if that day comes, we’ll be an organisation with full-time paid staff and money to spend on the resources we need.

(Heck, I have a Node.js/Express application - often cited as a very slow combination - using a custom web framework I wrote in not-very-long and not-very-efficiently, which has stood up to around 100 req/s peak… I think we’d be okay.)

I don’t object to optimization, but let’s not over-optimize over-early and limit our options without good reason.

Marc.2377 · 9 November 2019 23:30

Well, SO alone serves in excess of 10 million pages a day. I’m looking to build a platform that can even surpass that one day. - without requiring a full rewrite, if possible.

Meaning, yes, if we do end up choosing an existing Q&A platform to launch now, no problem. But if the decision is to actually build one, I recommend putting up no less than the absolute best efforts we can.

Cross posting from https://discordapp.com/channels/634104110131445811/635636489447014411/635638943747932160, for insight into current SE HW topology:

A list of resources about SE infrastructure (old and new):

https://stackoverflow.blog/2008/12/10/server-hosting-rent-vs-buy/

https://stackoverflow.blog/2009/01/12/new-stack-overflow-server-glamour-shots/

https://blog.serverfault.com/2010/09/10/1097492931/

https://www.dev-metal.com/architecture-stackoverflow/ (2014)

http://highscalability.com/blog/2014/7/21/stackoverflow-update-560m-pageviews-a-month-25-servers-and-i.html

https://nickcraver.com/blog/2016/02/17/stack-overflow-the-architecture-2016-edition/

https://meta.stackexchange.com/questions/10369/which-tools-and-technologies-are-used-to-build-the-stack-exchange-network/10370#10370 (a bit outdated and not clear enough, but references at the end may be useful)

Additional insights:
8. https://stackoverflow.blog/tags/server/ (a couple of blog posts, most not too much relevant)
9. https://stackoverflow.blog/2019/07/22/how-stack-overflow-upgraded-from-windows-server-2012/
10. https://meta.stackexchange.com/questions/333095/planned-maintenance-scheduled-for-wednesday-september-11-2019-at-100-utc-9-pm (some comments relevant)
11. https://stackexchange.com/performance

Links 6 and 11 are the most relevant.

ArtOfCode · 9 November 2019 23:32

10 million pageviews per day is 115 req/s sustained…

Marc.2377 · 9 November 2019 23:38

True, but I’m a bit skeptical of this:

I’ll have to look at some data. Including costs…

ArtOfCode · 9 November 2019 23:40

That’s not theoretical, @Marc.2377 - I’ve run an application that served into the hundreds of requests per second on a single server.

If you’re looking at costs - I don’t know if you’re familiar with EC2. The cheapest way to run EC2 servers is to buy a 3-year reserved instance; that’s where you’ll get the lowest hourly-equivalent cost.

Marc.2377 · 9 November 2019 23:53

It’s just not the experience I have (re Rails). When it is, it’s usually the case that a similar application can be served with lower costs if written in ASP.NET (Core). Of course, assuming good, clean implementations for both frameworks. The difference in cost becomes significant as traffic increases, and so does the perceived delays for various computational tasks. This is, again, in my experience, but I don’t doubt yours to be different and I’ll be looking more into this.

MasonWheeler · 10 November 2019 00:42

This project might be helpful for a frontend implementation, (haven’t looked at the code yet, so I don’t know what’s there,) but we rejected using a dynamic language for the backend for a bunch of good reasons. Maybe @ArtOfCode knows some performance tricks the rest of us don’t, but the “Rails doesn’t scale” meme has been around for… what? At least a decade that I’m aware of. And completely aside from the performance issues that may or may not apply here are the correctness issues that definitely apply. Trying to build a non-trivial application without the benefits of static typing is just asking for trouble.

ArtOfCode · 10 November 2019 01:18

Let’s not decide by attacking the framework with its stereotypes, hmm? Rails does scale. It scales just fine. It’s been given a bad reputation mostly by earlier versions. GitHub manages at enterprise scale; so does GitLab. I’ve built scalable applications using out-of-the-box Rails, with no special tricks. Likewise on the dynamic vs static typing issue; it’s not “correct” to do it one way or the other. There are benefits and drawbacks of both. I’m more than happy to hear debate about the pros and cons of various approaches, but let’s at least try to make it informed debate, or recognise and acknowledge where we don’t know.

manassehkatz · 10 November 2019 02:17

Warning, this started out short and got a bit long - feel free to reply, ignore, split it up/move it elsewhere or whatever…

In my personal experience - limited (my customer’s systems just don’t approach the scale of # of users that we are talking about here, though one in particular does have some pretty decent size databases (currently PostgreSQL)) but extensive and spanning > 30 years, yes there are platforms/frameworks/etc. that work better in general than others. But an awful lot of speed & scalability depends on many other factors, including (but not limited to):

Database engine (the current decision is PostgreSQL, which I still believe is a good choice for this project)
Database sever - lots of users and lots of data means multiple cores, lots of RAM and SSD storage. All pretty standard these days, but need to provision appropriately.
Database design - index/key design, relationships between tables, field types, etc. (I have learned the hard way on some of these things…and sometimes gone back and found that I recommended the right way and was overruled, until years later when it had to be changed to the right way because of performance problems.) A lot of this gets into real specific details far below the big picture stuff we’re (mostly) currently discussing.
Caching - which can take many forms - RAM cache managed by database engine, deliberate saving of frequent queries in additional special tables, saving already formatted/ready-to-display data blobs (e.g., main content of home page of each topic site with default settings), etc. A lot of possibilities depending on how much management of this is desired in the server application (more coding, and very dependent on the platform/framework used).
Web server (not nearly an expert on this, usually fall back to Apache) - Apache vs. nginx etc.
And finally, the actual platform/framework being used.

My main point is that the platform/framework can be great to start but end up with awful performance due to database design issues or other problems. On the other hand, almost any platform/framework can work out OK if you have a really good database design, fast database server, fast web server, etc. I have no experience with C# and my opinions of ASP.NET are, unfortunately, colored by my awful experiences with Classic ASP many years ago (that code replaced Cold Fusion and was replaced by PHP which is still in use for that customer), but all indications are that C# and ASP.NET will do just fine, provided the database is structured well and the other issues are handled well.

In addition, the type of functionality that we are talking about (i.e., Q&A vs. chat vs. comments etc.) can all be supported quite well in any modern platform. And the front-end design (desktop/mobile/responsive, moderate design differences between topic sites, etc.) can all be the output from any modern platform.

As far as reusing any existing code (beyond small sections like using some existing library functions for a markdown editor or whatever rather than rolling our own for every little detail of the site), my take on the existing “SE clones” is that while many of them look quite good and have the basic functionality, none have been fleshed out to the level of detail, scalability, consideration of functional changes, etc. that we are currently discussing. Since none of them, except for SE itself (and obviously we don’t have that code as a starting point! We have the output as a starting point, but that is quite different.) are anywhere near complete systems (all the maintenance pages, moderation tools, search & tag capabilities, etc…), and we are discussing building a system to last for (hopefully) many years, I don’t see the existing platform/framework of any existing packages to be a real factor. We can - and should - take any pieces from existing (open source, compatible license, of course) packages to help build Codidact where that will save some effort, but overall we are talking about new code for a new system.

Marc.2377 · 10 November 2019 04:36

I personally believe that one good reason for Ruby on Rails becoming so popular for web applications was Microsoft stance of taking too damn long before making .NET an open (and cross platform) technology.

.NET Core has been production ready for like, what? 2 years? A bit more or a bit less, maybe. Both GitHub and Gitlab are older than that. EF Core is not even at feature parity with EF to this day (good thing EF 6.3 supports Core now, but you know, older design, not really great as it doesn’t take full advantage of Core).

I think many of the RoR projects have gone that route because, in the absense of a fully open Microsoft .NET stack, RoR was the next best thing.

Now it isn’t.

MasonWheeler · 13 November 2019 16:22

I was talking about correctness in the technical sense (ie. the absence of bugs.) Correctness is a benefit of static typing and a drawback of dynamic typing, as the type system makes entire classes of bugs impossible that are unfortunately common in dynamic systems. This is useful in writing code, but absolutely vital in maintaining it, because you can utilize the TCIYF principle to make sure you haven’t missed something when making changes.

ArtOfCode · 13 November 2019 16:25

In that case, it becomes a question of what you value more: removing that class of potential bugs (which is the upside of static typing), or flexibility and conciseness (which is the upside of dynamic typing).

There isn’t a right answer to that question, and whichever you choose can be supported in various ways.

MasonWheeler · 13 November 2019 16:44

Personally, I value correctness uber alles, because if your code doesn’t actually work, nothing else matters. If you have a system that’s supposed to be really awesome and do all sorts of amazing things, but it doesn’t actually do them because there are bugs preventing it from functioning properly, then you don’t have an awesome system that does amazing things.

As for “flexibility and conciseness”, you’re technically right on conciseness being an upside of dynamic typing, but that’s not always an advantage. Remember the old maxim, “programs should be written for people to read, and only incidentally for machines to execute.” If I have a statically-typed system that declares the type of a function’s arguments, I can trivially tell what it’s supposed to be doing. If all I have is a name of an argument without a type, in all too many cases I have to dig through the code to see where it’s called and how it’s used in order to figure out what’s going on. This is based on direct experience working in Python and Ruby codebases. It becomes a maintenance nightmare very quickly exactly because off too much “conciseness.”

DoctorJones · 14 November 2019 13:47

I think maintainability should be at the top of the list. Dynamic typing is not very good for maintainability, especially as the codebase scales.