At Filestage we've always heavily relied on third parties to build our product, we don't want to for example waste our time managing our database we want to focus on our core product and concentrate on fixing fix our customer problems. I'm very thankful to be able to build on the shoulders of giants, but I painfully came to learn that not all vendors are at the same level.
Our stack from the beginning heavily relied on AWS and after years of use we never experienced any downtime or problem whatsoever. Like every other web app we had our servers and database in EC2, we used a load balancer and S3. We got used to those levels of reliability and performance and oh boy, was I wrong.
The first time I started to feel the pain is when we moved our video transcoding from AWS ElasticTranscoder to Transloadit. Our customers uploaded videos with many different codecs and formats that AWS didn't support and we saw a quick win changing our provider. Indeed it was easy to change, Transloadit has a great API and it is very easy to use. It even handles file upload so we also started using that. We immediately brought value to our customers it was great, the price was very competitive with AWS so we were happy with the change.
But then we started to see some problems, Transloadit was a small bootstrapped startup at the time and was gaining traction and having struggles to scale. Suddenly our jobs took a lot of time to process, because our application's core is about uploading and sharing files so our customers started complaining all of a sudden. Worst of all, we couldn't do anything about it, we could only pass the complaints to Transloadit and wait for them to fix it. We were at their mercy, we had no control over the situation and we were losing customers because of it. Unfortunately this situation repeated itself many times, and in many different ways, sometimes videos had the wrong color or freezed in the middle, etc. It was a nightmare.
We worked a lot in the things we could change, we tried to decouple our UX as much as possible from the transcoding process. If non critical tasks like thumbnail generation we just failed silently, while the transcoding is taking place we allowed the user to continue using the app, if the file was webcompatible we loaded the original file, if the transcoding failed we let the user retry, etc. Eventually we also migrated some transcoding processes to our own implementation using opensource tools like: imagemagick, ffmpeg, etc.
Although this happened years ago it still makes me angry, I got hit by an unexpected 10x price increase that forced us to change our roadmap. Before we started using Auth0 we had our own authentication system, but we wanted to onboard enterprise customers and although we also had our own SSO implementation we didn't support as many protocols as Auth0 did. We started using Auth0 for our SSO customers and we were very happy with it, the pricing was fair, we paid like 50$ a month so it was a no brainer.
Because Auth0 was working great for us I decided to move all our authentication to their platform, this was my first big mistake. Although it removed unnecessary complexity from our codebase and meant we could focus on our core product it also meant we were now fully dependent on Auth0. Another down side is that we had to add complexity to achieve custom UX with Auth0, so in the end I didn't manage to reduce the complexity of our codebase as much as I expected.
Over the years our usage grew and we started paying like $200 a month, which still was fair. But then Auth0 was acquired by Okta, Okta was a company that went public and had to show their promised growth to their shareholders. They introduced the most unfair pricing model I've ever seen to date, if you use Auth0 for a B2B product you pay more per active user than if you use it for a B2C product. Imagine you go and buy a shovel and they ask you if you are going to use it to dig your garden or dig out gold and depending on that they charge you more or less. This is what Auth0 did and is still present in their pricing page today.
This made our costs skyrocket, we were now paying like $2000 a month. The worst part is that it didn't stop increasing, their new pricing plan didn't include unlimited SSO connections anymore so they forced us to pay per different SSO provider we integrated so they got a healthy cut for every enterprise customer too. We were now paying like $4000 a month and were locked in yearly contracts. We had no choice but to accept it but I was determined to leave Auth0 as soon as possible.
Migrating all our users to our own authentication system wasn't easy and we wanted to do it without any friction to our customers so this meant it was a long running migration. Thankfully over the years enterprise organizations have settled on a few SSO protocols so we only had to implement those. Thanks to opensource libraries which by the way are exactly the ones Auth0 uses under the hood we were able to implement the same features easily, our SSO service is currently 494 lines of code.
Before using MongoDB Atlas we hosted our MongoDB database in our own EC2 instance, which meant that from time to time we had to update the operating system and MongoDB versions ourselves. This is challenging if you want to do it with no downtime and the slowness to keep up with latest security patches adds risks. So we decided to migrate to this managed solution.
It has served us really great during the years, we have been able to easily configure auto scaling to optimize costs and leverage solutions like their Atlas Search to provide a search feature to our customers without having to spin up a separate ElasticSearch service and take care of syncing the data. Lately we are also splitting customer data by region to reduce latency and comply with data residency regulations and their global cluster and sharding features have made this easy to achieve.
Because we started depending on their cloud solution unfortunately in our local environments we stopped using a local MongoDB database as the feature set isn't the same. This made it trickier to implement integration tests and increased our vendor lock in. Fortunately they still have a fair pricing and good reliability but we are always at their mercy.
Another downside of PaaS is that you rely on their support to solve any critical production issues. You can't easily restart the instance or don't have full admin access to the database. That is when you learn the real downside, their support pricing model as many other PaaS services is completely unfair. You have the free support which is basically useless. MongoDB Atlas is mainly used by software developers so of course support questions are going to be somewhat technical and they are using that as a reason to redirect you to their paid support. Even once I was trying to make them aware of a bug in their UI and still they insisted on their paid support, a very frustrating experience.
It somehow makes sense that you pay more for support if you are using more the platform because your use case is probably more complicated. But in my experience you are rarely using support so basically you are paying thousands of dollars for nothing. They also know this so if you enable/disable support they force you to pay for the last 3 months to reenroll, ludicrous.
Don't expect this expensive support to be managed by MongoDB
experts
in any way, in my experience
we've paid thousands of dollars in support for very poor quality
answers. Once we had a back and forth for weeks trying to understand a
production issue and the problem was that their node.js driver doesn't
support multiple queries in parallel (as in
await Promise.all
) when using sessions to achieve consistency
but in their documentation they were talking about concurrency and node.js
is a single threaded language so we misunderstood that. Unfortunately the
support person we got didn't know node.js well enough so wasn't able to
help us and it took weeks to finally figure it out ourselves.
Don't assume all third party providers will have a great reliability, do your research. Check customer complains, their status pages or twitter accounts.
Beware of unfair pricing models. It is common for technology to get cheaper over time but software providers tend to get more expensive over time and arbitrarily change their pricing.
Understand the amount of vendor lock in you are committing to. Think ahead of time how much effort would it be to migrate out: will you have to move data? are there alternatives? will it be easy to migrate without downtime?