MySQL vs PostgreSQL shootout
At the risk of fueling the already-merrily-burning holy debate around which of MySQL or PostgreSQL to use, I’m publishing my own findings here.
I was offered a contract to design and implement a inventory and billing system for a cellphone hiring company. This meant two things to me:
- Data volume would be something significant. CDRs, or call data records, are batched up and delivered regularly in a flat file, that needs to be parsed and fed to a database. I’m told to easily expect a million records within a month. This obviously has some serious performance implications, since an integral part would be real-time delivery of reports based on these CDRs.
- Integrity, obviously, since the CDRs will be directly used for billing, and a thorough audit trail must be left of all operations.
So, to start the testing, I’ve created a script that will generate sample data that is based on typical CDRs — all the variables are ‘sanely’ randomised. I thought that 1 million records would be a good starting point. Also, I’m inserting the same data into both databases. The test machine is an AMD64 running Ubuntu Linux (Breezy, the development version) on kernel 2.6.11-1-amd64-k8 with 1GB RAM . MySQL is version 4.1.11-3, PostgreSQL is 8.0.3-4, both the official Debian packages available at the time, and with /var on a Seagate ST340016A PATA drive, running at UDMA-100. I have made no changes to the database daemons’ config files in terms of performance tuning.
In terms of raw insert speed, MySQL already leads the pack: Inserting a million records runs at 4678 INSERTs per second, while PostgreSQL manages 2720 per second. I had to wrap the whole lot of inserts in a BEGIN/COMMIT for PostgreSQL, otherwise it only ran at 311 records per second. Disk usage is 114 and 173 MB for the two, respectively.
Next the indexes: Indexes on the PK was 11.75 and 21.45 MB, respectively. I tried adding an index on the MSISDN (the subscriber number), which was 8.33 and 30.1 MB, respectively.
Ultimately, MySQL beat PostgreSQL in all the areas that counted: raw insert speed and search speed. For this application, PostgreSQL’s strengths (relational integrity, ACIDity, etc.) weren’t that important — getting data in and out of the database in the least amount of time was.
So, we went with MySQL after all. We did a few preliminary tests using MySQL’s InnoDB tables (which allows for transactions and relational integrity to be forced down), and then MySQL’s performance suffered similarly to PostgreSQL.
Moral of the story: use the tool that fits the job. MySQL has always been the “jock” of the database world: fast, strong, but stupid. It has a number of, well, unique idiosyncrasies (timestamp handling, weird JOIN results, default ordering, etc), but its really, really good at shuttling data to-and-from an application. PostgreSQL is the bright, academic, nerdy type: smart but unpopular. It is probably the best-spec’ed free RDBMS available, but those exact pros bring its biggest cons: extra overhead makes it slower. Depending on your application, you’ll have to choose which of these two features is the most important to you.

Can you post the scripts you used for inserting the data? I have a similar situation looming, but instead of 2 million, I have 140 million records.
For postgresql did you use INSERT statements or the COPY statement? If you used INSERT, was the statement prepared each time or was it prepared once and reused each time. How often did you COMMIT?
I’ve heard reports of postgresql getting 4000+ records per second, but it has to be done just right.
This thread in particular is worth a read:
http://archives.postgresql.org/pgsql-general/2006-12/msg01554.php
Thanks!
Sorry, but I’ve long since moved on from that company. It wasn’t that difficult to create; I just wrote a quick script that wrote out a few million insert lines with random (but valid!) sample data.
I’m back with some results which I will share for posterity.
We finely tuned our importing script (not the database) and we were able to squeeze anywhere between 15,000 and 25,000 records per second using a combination of the COPY statement and BEGIN/COMMIT every 10,000 records. Our data started out as CSV, so we did minimal processing on it to convert it to the form needed for input.
I imagine that the most recent version of PgSQL will do better since it can accept CSV data directly into COPY.
Faster disks (we’re currently using RAID10) and a more idle system would produce even greater results, but loading 140M records in 2-3 hours is quite acceptable.
Hey Joel
Thanks for coming back with that — seems like you are getting some decent performance out of it in any case!