Perl's DBI: Bulk inserts and the impact of latency

Intro: why database latency matters

Database transactions are normally fast when connected to a local or low-latency SQL server. This analysis will focus on the performance of bulk inserts with varying levels of RTT latency.

How fast is the DBI itself?

As an example of how fast the DBI is able to perform, we'll start off with a very basic example using a SQLite in-memory database and insert 500 rows.

Code: SQLite 500 inserts

This code runs in 0.05 s of wall-time for me, so time is hardly a factor. The DBI is clearly able to perform well and the total time to insert the rows sounds very reasonable.

How about no-latency client/server?

Let's try it on a remote database server on a lab network without adding any additional latency.

Code: PostgreSQL 500 inserts, network database

This runs in about 0.40 s of wall-time, which is still fairly negligible.

So what's wrong with a little latency?

Let's see what happens if we have the lab router between the client & server insert a 40ms RTT latency (20ms each direction) as might be typical for a decent WAN connection to a remote facility.

We'll use the same code from above: PostgreSQL 500 inserts, network database

With the added latency, this now takes 21.55 s of wall-time, or about 5400% slower. So what's going on to make some seemingly acceptable latency take 54 times longer?

Defining the problem

The reason latency is such a big deal here is because the DBI is inserting each row individually and waiting for the confirmation from the server before inserting the next row.

So latency is a problem. Can the DBI do better?

Surely there's a way to make it faster, right? Good news: there is. Bad news: the DBI won't do it for you. The DBI does allow for drivers to insert records in bulk for a specific method call, but neither MySQL or PostgreSQL drivers utilize this.

The docs for the DBI indicate that execute_for_fetch() (which is used by execute_array()) can potentially be implemented by drivers to collect statements in bulk for more efficient execution. This is exactly what's needed here, but it's evidently not used by the 2 drivers I'm testing with.

Consider this code, which creates that 500-row in-memory database, starts a SELECT statement on them, and calls this execute_for_fetch() routine. We'll use the Time::HiRes module to time just the INSERT portion of the process:

Code: PostgreSQL execute_for_fetch

This takes 21.41 s to execute. This is no better than the direct INSERT method, so clearly the PostgreSQL driver is not aggregating commits together as the DBI suggests drivers have the "option" of doing. MySQL does no better here.

SQL SELECT is cheap, INSERT is expensive

With the Perl DBI, at least using both MySQL and PostgreSQL drivers, SELECT statements do not incur such a large performance hit because the records are all returned when the statement is executed.

Consider the following code that takes each of the 500 rows inserted earlier and dumps them to a local file:

Code: PostgreSQL select 500 rows, network database

Here are the run times:

Without added latency: 0.05 s wall-time
With 40ms RTT latency: 0.30 s wall-time

This is only a 600% increase, mostly dealing with the extra round-trip times involved to connect, prepare, and execute the SELECT statement.

Solving the problem: real bulk inserts

Performing INSERTs in bulk

Since the DBI (or more specifically the DBD modules in use here) can't perform efficiently enough, we'll cook up an INSERT statement that does the job properly without waiting on 500 sequential RTT delays.

This requires preparing a statement where all 500 rows can be inserted at once. This should reduce the latency hit on the execute() call to a single round-trip.

Code: PosgtreSQL execute bulk insert

Now the same INSERT statement that took 21.41 s earlier only takes 0.20 s on the same 40ms latency-link. The full wall-time is 0.40 s, which includes the database connection, prepared statements, and disconnects.

Conclusion

There is a bit of a trade-off to the method used here, and this trade-off is likely why DBD driver modules don't implement this feature by default.

In order to prepare the @bulk_values used in the final method, Perl must allocate memory for the entire set of values to be inserted. This isn't so bad here because we only have 500 lines that contain a short amount of text. This would be a much bigger deal if we were dealing with records that held many megabytes of data, such as blobs of documents, media files, or similar "large data" objects.

The documentation for execute_for_fetch() specifically notes that the subroutine is allowed to return the same array-reference, and typically does. This means the results of each row must be cached somewhere. In the above example, the caching happens in memory. This would not be appropriate where large amounts of memory would be required to store it all.

The method here works well for specific use-cases, but clearly isn't suited as a default for DBD drivers without careful understanding of where the delays occur. It would be foolish to assume network latency is always going to be worse than Perl's memory management for all workloads.

Future work

The method used here can potentially be extended upon and made more generic. This would enable users to make logical choices about how to do bulk inserts taking into account the trade-offs listed above.

Some possibilities for such a generic solution might allow the user to control this balance by:

Limiting the number of rows per INSERT call, ensuring that it won't grow boundlessly
Limiting the memory size of the @bulk_values array so a statement is executed if limits are exceeded
Providing an alternative cache location (such as file-backed) with some form of side-loading
- Special attention to drivers must be done here since not all DB engines support loading data from external files, and those that do don't tend to use the same syntax

Licensing of this text

This document is available under a Creative Commons Attribution 4.0 International license (CC BY 4.0.) Licensing details

Code examples are available under a 3-clause BSD license. Licensing details