Production Site Hosting Considerations

The production system (i.e., the primary instance of Codidact) will have a number of hosting requirements which place it beyond the minimal servers currently provided at no charge by various people involved in development of Codidact. The needs primarily fall into three categories:

Web Server
The web server actually runs the code that powers the web site. Many shared hosting systems (not really a consideration here, but for comparison) like ICDSoft (currently hosting the barebones codidact.org page), GoDaddy and others are limited in what software you can actually run on the server. In our case, we need to run the primary stack (currently planned as C#/ASP.NET) and we need flexibility to control many aspects of the server which are not generally practical on a typical shared server. In addition to full (or “nearly full” - with almost any remote server there are always some things you can’t/shouldn’t do) control of the server, we will need regular data backups, high speed connections and a high-reliability data center.

Database Server
We are currently planning on PostgreSQL for the database engine. Typical shared hosting includes MySQL and limited storage & speed - we’re already planning for much more than that. We can set up our own PostgreSQL server but a managed solution provides a number of benefits with respect to backup, bandwidth, availability and other factors.

File Server
Any web server can also function as a file server. However, in addition to the typical small (relatively) batch of JS, CSS, typical site image files (logos, avatars, etc.) we will also need quite a bit of storage for image files uploaded by users.

Ideally, all 3 of these systems should be scalable to handle more users and storage as we need them without locking us into any long-term contract or significant up-front costs.

There are a number of possible solutions:

Typical Shared Hosting
Simply not an option here. For anyone who does not already know, any hosting company that offers “unlimited bandwidth” and/or “unlimited storage”, etc. for $5 or $10 (or even $100) per month is flat-out lying. TANSTAAFL. Shared hosting has its place for sites that are relatively small and expected to stay small. We need to plan BIG. Note however that a really good shared hosting company (ICDSoft is the only one I put in that category - my list of “not so good” is a long one…) does a great job of taking care of web server backup, firewalls, database backup, hardware repairs (move you to a new server when there is a problem), etc. But shared hosting just doesn’t do the job for our grand plans.

Your Own Server
By this, I mean a service such as Rackspace where you buy a specific server (or group of servers to expand, etc.) and run whatever you want on the server. That can work great if your system is sized just right. In my experience, you either end up with a server that is too small - and have to rush to expand when you hit a limit - or too big - and are then paying for more than you need. In addition, no matter what they (the data center/management company) say, I have found that when you have your “own” server, you do not get the level of support for when there are problems that you do with one of the newer cloud services (below) or good shared hosting (above). Again, I do not feel this is a viable option for us.

AWS/Azure/Google Cloud
AWS (Amazon), Azure (Microsoft) and Google Cloud (Google) are 3 services that offer various combinations of servers with some great advantages:

  • Pay for only what you need
  • Expand when you want very easily
  • Multiple data centers with automatic live backup of databases (AWS calls this “multi zone”)
  • Easy replacement of failed servers
  • Easy backup of web servers on a scheduled basis
  • Super high bandwidth
    and many other advantages.

I only have personal experience with AWS (a few years now with my largest customer and also a few smaller projects). Specific features that AWS has that I believe make it a good fit for us:

  • Web Server = EC2
    The primary web server platform in AWS is EC2. You can run pretty much anything on it (i.e., there are terms of service limitations as with almost any service, but the servers themselves can run either an Amazon customized Linux distro or many other versions of Linux or even Windows (though I don’t recommend that). You can associate as much storage (normally SSD) as you want with an EC2 instance. You can spin up multiple EC2 instances to handle increased usage (though you have to decide how to split the usage among servers) or keep increasing the size of the server (essentially spin up a new server, turn off the old server and move the storage over to the new server.

There are also alternatives. For example, AWS Lambda lets you (essentially, don’t complain too much if I have the terminology wrong…) have small processes spin up in response to requests, so that you (a) only pay for the CPU time you use and not 24/7 for an EC2 server and (b) have huge capacity because AWS will spin up as many Lambda processes as needed to meet demand. I have no idea if this is compatible with our chosen tech stack or not, but just mentioning it as a possible option.

  • Database Server = RDS
    This is one of the places where I think AWS really shines. RDS can be configured with a number of different database servers, including PostgreSQL. You pay based on storage and server size in CPU cores, RAM, etc. (i.e…, much like the pricing of EC2). However, RDS is specifically optimized for databases. It includes backup, live mirrored data (multi zone - though that doubles the cost), automatic database system updates, etc. Effectively a plug 'n play database appliance but with no upfront cost. Scaling is great - start small and increase as you need to.

Note that both EC2 and RDS can be paid for by the hour (which really means monthly) or pay 0, partial or all upfront on a 1 to 3 year contract to save a considerable amount once you have an idea how much long-term capacity you really need.

  • File Server = S3
    You can use an EC2 instance as a “normal” web file server. However, Amazon offers S3 as a high-speed file storage system. Everything is stored in “buckets”. The number of buckets is actually relatively limited, so typically you might have one for Dev. and one for Production with everything else stored hierarchically inside the bucket. Technically the buckets have everything at one level, but they use / as a separator to essentially mimic a typical file system. Security options are quite flexible - you can have files that are totally open (great for CSS, JS, system images - the client browser can get the files directly) or secured (check user privileges first and then have your EC2 process read the file and serve it to the users).

Overall, I think EC2, RDS and S3 are a great system and could work very well for a Codidact production system. However, I am open to consideration of other platforms, so if anyone has any experience with others please speak up. In addition, we need to consider how EC2 or other options will relate to the chosen tech stack (I am quite confident about the suitability of RDS and S3).

3 Likes

I would happily host things on AWS, with a few caveats:

  • It’s what I’m familiar with. I don’t know Azure or GCS beyond the very basics, so I can’t do a fair comparison; this is simply “I’ve used it for my stuff and I think it’s good”.
  • We should start small. And by “small”, I mean one smallish EC2 instance - we won’t need more than that, and we certainly won’t need RDS or load balancing or most of the other things AWS offers until we’re looking at an order of magnitude more scale.
  • RDS is probably not the way to go. RDS is ridiculously expensive - the only time I’ve been able to afford dedicated RDS is when Amazon was paying for it. We’d be better off getting a second EC2 instance and setting up a PgSQL server on it, and handling backup etc ourselves.
  • We should definitely reserve capacity. Reserved instances are a significant cost saving over PAYG instances; especially at this stage where individuals are going to be covering costs rather than the organisation, we should keep costs as low as possible. A full-upfront 1-year reservation would seem the ideal option to start with.
2 Likes

I agree on everything except RDS. It really is a good system and a small instance is not very expensive - but has the same advantages as a large instance in terms of updates, tuned for database usage, etc. Unless we have a real PostgreSQL high-level DBA on call, I think it makes a lot of sense. I certainly understand that cost is an issue, but knowing that the system is set up correctly, backed up, etc. - and not having to spend many hours getting there - is worth something.

Meanwhile, let’s see if anyone else has any opinions on all of this.

I agree on AWS. It’s flexible, scalable, and at our size probably inexpensive. S3 probably makes sense for data storage, but they charge for each API call so be careful there. I don’t have an opinion on RDS; it sounds good but I can’t evaluate Art’s objections.

2 Likes

To be fair, we don’t really need to consider whether we use RDS yet or not - as I said, for now we can start with everything running on one single server (except maybe using S3 for file storage), so by the point we need to consider running RDS, we’ll also probably be in a more stable place organisationally and financially.

1 Like

The major application I currently use S3 for has relatively few & large files and a low usage rate, so the number of reads & size of reads is basically irrelevant. But a quick look at S3 pricing (if I understand the numbers correctly) boils down to:

  • 0.023/GB for storage. If we have 100,000 images at 1 Meg. each, that’s 100 GB = $2.30 per month. Even 100x is basically “nothing”
  • 0.0004/1,000 requests. That is 2,500,000 requests for $1.
  • 0.09/GB bandwidth (after first GB). So 2,500,000 requests of 1 Meg. each = 2,500 GB = $225

So if we should be so fortunate as to have 100,000 1 Meg. images served up every day for a month and store 100,000 unique images x 1 Meg. then we are looking at charges < $250/month.

YMMV.

2 Likes

Migrating large databases is a pain.

Migrating large databases in an active system when you have the luxury of “shut down over long weekend because it is a system primarily serving a government agency and the government employees are off anyway so nobody is actually using the system so it doesn’t matter” is relatively easy.

Migrating large databases when you have a system you are trying to promote as the greatest thing since sliced bread and is already live and gradually building up a user base and oops, now we’d better upgrade that server because we’re running out of steam is not a good idea if it can be avoided.

So my professional recommendation is that when we go live on a production system, the database should be in a system that can be reasonably scaled without major downtime. That could definitely be RDS. It could be something else.

2 Likes

Agreed on all counts there - though that doesn’t necessarily preclude us from using a second EC2 instance for the database, since instances can be scaled up with no downtime (or very little? it’s a while since I did it).

3 Likes

Let me just make a small case for Azure.
If you define your project as a startup you can get Bizspark which is 150$ a month for 3 years.
It is considerably cheaper to use SQL on Azure but only if its MSSQL (can be as low as 25$/month because they own the licenses) which is a managed Azure SQL, unlike Aurora AWS which is about 200$/month I think? Having a managed RDS gives the advantage of not having a DBA do all the backups, scaling configuration, etc.
EC2 is basically a VM doesn’t offer the same scaling abilities as EBS on AWS so there is some sys admin required on that. Azure offers Web Apps for relatively cheap.

I used both AWS & Azure and honestly found AWS around 15% more expensive in general.
The biggest and obvious con on Azure is that it forces the client to go for the Microsoft stack. This is also the reason why they offer startups free money, so after 3 years they would be completely rooted in Azure. Using PostgreSQL on Azure would be more expensive too.

1 Like

I know quite a few people who would avoid helping altogether if we tie ourselves completely with a Microsoft stack.

IMO we should make sure whatever solution we go with both (a) can scale, as @manassehkatz said, but also (b) be flexible enough that we could migrate (as awful as that would be) later. Let’s not be so married to any one platform or hosting provider.

P.S. I read elsewhere that codidact will be both a self-hosted option/instance/code-base, and the website/public interface. For people wanting a self-hosting option, they won’t be interesting in something that’s only available on a Microsoft stack.

4 Likes

@ArtOfCode Last time I checked EC2 instances can be scaled up in the same class (that is a a t* instance can be scaled to larger tinstance, or an m to a larger m*) with zero downtime. They cannot be scaled down immediately (in most cases), and you can’t change instance class.

1 Like

It gets a little more complicated, but basically “scale up EC2 with very minimal (not 0 in my experience) downtime” and “scale down a little more complicated but not that hard either”. Where it gets tricky is the storage - easy to expand, hard to contract. So if you use (in the case of AWS) RDS, you want to guesstimate a size that will hold you for a while so you don’t have to increase it too often, but you don’t want to way oversize because making it smaller is not trivial.

I don’t much about Azure. I do know that in AWS there is a lot of flexibility in terms of RDS instances, just as there are in EC2 instances. A lot of the variability in pricing of AWS has to do with “sizing things right” and “reserving instances”. If you get both of those right, the costs are, IMHO, quite reasonable. If you get them wrong, you can easily pay 2x or more than you need to.

1 Like

In terms of hosting, I see a lot of chatter about getting VMs, but are you planning on running the application directly on the host OS or through a container system (docker, kube)?

Also I’ve used Azure professionally for 2 years now, it’s very comparable to Amazon in terms of services offered.

There are 3 other important features that Azure brings:

  1. Monitoring. You have Application Insights (telemetry) and Azure Log Analytics (logging) which give you enormous power to see how your application is behaving in once it is deployed.

  2. Azure Active Directory. A central identity platform so that every developer and admin can have their own codidact-branded login to the Azure Portal, and administrators can fine-tune access controls.

  3. Azure DevOps Pipelines. CI/CD pipelines that integrate well with all the other Azure services.

Most/all of these have free/reduced price options. I’m happy to give more information if needed or try to calculate a price estimate of resources.

1 Like

@gunr2171 Thank you for providing more details. While I am partial to AWS (simply because I am familiar with it), I do think we should give reasonable consideration to other options such as Azure.

AWS does have monitoring - free at low levels and low (IMHO) priced at higher levels (more frequent and/or more data), but I don’t know how it compares to Azure’s offerings. I’m also not sure how much monitoring we really need - in my experience you end up with everything fine until you hit a really slow spot and then the monitoring comes in handy to determine how long the problem has existed so that you can track down (or try to, anyway) the source, though inevitably the solution is typically “restart something”

AWS also has multiple users with a seemingly infinite variety of privilege levels available.

I don’t know if AWS has something comparable to Pipelines or not.

As far Docker, etc., I know there has been some discussion of that but I don’t consider myself anything close to an expert on Docker. I have used it a few times, mostly with someone else doing the initial setup, leaving me to troubleshoot the problems when it stops working :frowning:

1 Like

AWS has CodeBuild as far as pipelines go

I do believe that Azure offers Postgres, Mysql, Linux, and even FreeBSD nowadays.

Anyway, Amazon looks a reasonable choice to me. That is, until the day comes when it makes sense to own dedicated hardware (if it comes to that).

1 Like

A few years ago I would have agreed (though it would not have been AWS as the initial service). However, with the way that AWS (and Azure etc.) have evolved, I see no turning back except for really, really, really big companies. Given the breadth of offerings from AWS, and the ability to spin up replacement hardware trivially and at no extra cost, dedicated hardware has lost its allure, at least for me.

4 Likes