Bye Bye MySQL & MongoDB. Guten Tag PostgreSQL
We switched a major part of our software stack lately, the following article shall outline why we did so, how we did it, and the challenges we faced on the way.
Why do we want to switch?
So, first off, let's explain our reason for that move. One of the reasons why we've been running on MySQL until now has been replication, which was quite the hassle in the past with PostgreSQL. But times have changed. Now, not only does it offer built-in streaming replication, it can even be configured as master <--> master. Add to that PostgreSQL's excellent data consistency behaviour (more on MySQL's shortcoming later), and it's already getting interesting.
What about MongoDB then? We have used it exclusively for chat transcripts, which are actually one thing mongo handles really well. Our only grief with it was that it tended to consume absurd amounts of memory over time. Since PostgreSQL added JSON support a while ago, we decided to switch over, the biggest upside being one service less to maintain.
At the same time, we decided to upgrade our ejabberd version, since the old version didn't play nice with PostgreSQL. By doing so, we could get rid of yet another nasty service, xmpp-bosh, since recent ejabberd versions have native support for bosh over websockets.
What was affected
So, after explaining why we set out on this journey, here is our target setup:
- One PostgreSQL master with three slaves as hot standby (Userlike is dead serious about not losing data)
- Upgrade ejabberd to current version
- Drop xmpp-bosh
- Drop MongoDB
- Drop that MySQL
To move our data to PostgreSQL, we needed some kind of conversion process, and there are basically two different ways to go about it:
- Create an sql dump and do some fixing and converting on that to make it importable into PostgreSQL
- Use the django orm to dump everything as JSON, then reimport the result
We decided to go with option two since there are a few differences in how the django ORM creates and handles fields between MySQL and PostgreSQL. We wanted to dodge the problems which could potentially arise from that.
As we had to learn, straightforward dumping of large datasets using the default dumpdata and loaddata commands isn't viable because the utility tries to do it without chunking in one go. So, naturally, we ran out of memory. Fortunately, we were not the first to run into this problem, so this helped us lots: https://github.com/fastinetserver/django-dumpdata-chunk.
I didn't mention it before, but we're using cachemachine, which caches django querysets using redis. It turns out dumping your whole database with such a thing in place is a terrible idea. The redis instance caught fire and exploded at a later point from the load we put on it, so we disabled cachemachine between the migration steps.
Another thing worth mentioning is that loaddata triggers django signals on your models, which could potentially lead to things like signup mails going out to already registered customers. This can be circumvented by checking if you get passed raw=True into your signal handler. We solved this by adding a decorator to our signal handlers, which skips the execution in this case.
But being done at this point would've been too easy, the next issue we had were MySQL fields which violated constraints or contained things that PostgreSQL is more strict about. Just like django's IPv4 address field actually validating the contents.
Those issues were fixed by processing the dumpdata json files with a few custom cleanup functions before loading them again.
That leaves us with the ejabberd chat roster database, which was previously handled by MySQL, so we had to make this play along with PostgreSQL, too. This by itself wasn't too difficult, we're using mod_global_roster and apart from a few database permission restrictions that had to be ported over, we could just switch the connector settings and be done with it.
This part went a lot more smooth, mostly because our mongo use case was quite simple. We dumped all mongo data into a JSON file and imported the chat transcripts into the JSON field for each chat, the only thing that needed handling here was datetime data, which the mongo connector implicitly converted for us in both directions. Datetime data is now stored as an ISO-8601 date string, which is converted back and forth by our own code.
One nice feature mongo has is updating a JSON document in-place. We used that to append each line in a chat transcript to an array. We lost this capability - PostgreSQL as of yet does not support appending to an array using JSON queries - we would have to add a stored procedure to pull that off. Not being very fond of that idea, instead we simply buffer entries for the chat transcript and only update the row every X entries or after some amount of time has passed.
Enter (and exit) PgBouncer
After all that praise for PostgreSQL, there is one potential downside, which is that connection setup is a more expensive process compared to MySQL. This can be mitigated by using PgBouncer, which can hold already set-up connections and hand them out to the needy. During development, it turned out that after all, for our setup, this was some kind of premature optimization. Just giving every django worker process a persistent connection is enough for us and since less is more, we dropped PgBouncer in the end.
Time doesn't stand still for long in our development process, so after development made the first switch to PostgreSQL, we piled on new database migrations. This left us with the problem of the production MySQL server, dumping its data and loading it into the current development branch doesn't work in this scenario. So we made a migration branch and forked off right at the point where MySQL turned PostgreSQL for the first time. Having such a branch also enabled us to put all the code from the previous steps (i.e. disabling signals, cache machine etc.) into a convenient location separated from the regular code base. So, the plan emerged to switch to the postgres-migration branch, do all the conversion work, switch back to master, apply every regular django migration that happened since then and be done.
Will it blend?
One concern that came up during development was that the new stack was significantly slower than the old one. This isn't even about "Is MySQL faster than PostgreSQL?", it can simply happen due to seemingly minor difference in each database’s behavior. After all, what we had before was reasonably optimized for MySQL, since we have been running on it for quite some time. Luckily for us, our friends over at StormForger were interested in more people testing out their product, so we've decided to give it a spin to see where we're at. StormForger is a load testing service, you can configure your websites' API endpoints and setup up different scenarios to stress your application.
Outlining in detail how we use StormForger now is enough material for another blog post, which we will write in the future, but here is a short rundown:
The bottom line is, we actually don't know in hard numbers where we stand compared to before since we have never load-tested the old system this extensively. Still, we do now know our current limits very well. In response to this, several database queries were optimized or restructured - up to the point, where people not involved in the process started to notice it and gave positive feedback.
Preparing for failure
Battle hardened developers who are reading this are probably thinking "That are a lot of critical things changing at once" - and they’re absolutely right. Working with an established product that is in use by customers is rewarding and a lot of fun, but you cannot break it in any major way.
The important questions here are "What can go wrong?" and "What can we do about it, when it actually does go wrong?".
Since we did a lot of testing before, there was not much doubt about the migration process itself. What was worrying us the most was the major ejabberd upgrade, since we had a very fine tuned setup with the old version and couldn't load-test the new version under production conditions except by actually putting it in production.
The decision was made to have a downgrade to the previous ejabberd version as a fallback option, which proved yet to be another challenge. Our previous ejabberd version bundled an ODBC connector, which was incompatible with the PostgreSQL 9 series. Tracing back the source from git through a point in time before where svn was still in use, we could locate the necessary version of the connector with just the patch we needed. Doing it this way instead of simply using the newest version minimised the risk of incompatibilities, which is an important thing for a failsafe fallback plan. After fiddling around with different erlang compiler versions, since we used an official binary release, we got it running and eventually had our old ejabberd talking to our new PostgreSQL database.
With everything in place, we did a smooth rollout of the release. Of course it was a long night simply due to the amount of time the migration process necessary, but we took it in stride without any issues.