SQLite Transactions (Rust)

·

4 min read

A transaction is a sequence of actions on data items

Transactions help prevent problems that could arise, such as data durability when a program crushes or in the event of an unexpected power failure or even during complex concurrent programming procedures etc. (These restrictions/guarantees is known as ACID, more info below)

Programs can start a transaction and execute operations as part of the transaction. But for the transactions changes to occur the transaction must be committed. To commit simply means to instruct the database to permanently update its state according to the operations contained within the transaction.

Transactions can be considered as logical units of work for a database system. If a transaction fails the database must remove it's effects from the database and revert back to the state the database was in before the transaction occurred.

No only are transactions units of work that move the database state forward, transactions are also a database abstraction with the following guarantees (aka ACID):

  • Atomic: All operations within a transaction should succeed, if even a single operation fails all other operation will not be considered and the transaction fails to be committed.

  • Consistency: A transaction mutates a consistent database state to a consistent state and a transaction must be deterministic.

  • Isolation: All operations of each transaction happen 'together' instantaneously.

  • Durability: Effects of successful transactions must become a part of the database.

To get a greater view of Transactions let us see them at work, using Rust and the Rusqlite crate:

use rusqlite::{params, Connection, Result};

/// A helper function for connecting the database
fn connect_db() -> Result<Connection> {
    let conn = Connection::open("/tmp/TEST_DB.db")?;

    conn.execute(
        "CREATE TABLE IF NOT EXISTS vals(
            v  INTEGER NOT NULL
        )",
        [],
    )?;

    Ok(conn)
}

/// A slow way to insert rows
fn slow_insert(conn: &Connection) -> Result<()> {
    for count in 1..=1000 {
        conn.execute("INSERT INTO vals (v) VALUES (?1)", params![count])?;
    }

    Ok(())
}

/// A fast way to insert rows
fn fast_insert(conn: &mut Connection) -> Result<()> {
    let tx = conn.transaction()?;

    for count in 0..1000 {
        tx.execute("INSERT INTO vals (v) VALUES (?1)", params![count])?;
    }
    tx.commit()?;
    Ok(())
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_slow_insert() {
        let conn = connect_db().unwrap();
        slow_insert(&conn).unwrap();
    }

    // #[test]
    // fn test_fast_insert() {
    //     let mut conn = connect_db().unwrap();

    //     fast_insert(&mut conn).unwrap();
    // }
}

In the above code we try out two ways to insert 1000 rows in an SQLite database. The code has three function:

  • fn connect_db() -> Result<Connection>: A helper function for connecting the database
  • fn slow_insert(conn: &Connection) -> Result<()>" A slow way to insert rows.
  • fn fast_insert(conn: &mut Connection) -> Result<()>: A fast way to insert rows

The code also has two test function test_slow_insert and test_fast_insert the later is commented out because we only want to test the slow one first by running:

$ cargo test

We see that it is quite slow, the test on my machine output:

running 1 test
test tests::test_slow_insert has been running for over 60 seconds
test tests::test_slow_insert ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 165.22s

   Doc-tests st

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Let us try out the fast version by commenting out the test_slow_insert unit test function and uncommenting the test_fast_insert unit test function. Then after we run cargo test we get

running 1 test
test tests::test_fast_insert ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.18s

   Doc-tests st

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

This time the test completes instantly. The slow version took 165.22s to complete while the fast version took 0.18s to complete!

So why is the first one slow, specifically why is this slow:

/// A slow way to insert rows
fn slow_insert(conn: &Connection) -> Result<()> {
    for count in 1..=1000 {
        conn.execute("INSERT INTO vals (v) VALUES (?1)", params![count])?;
    }

    Ok(())
}

It is slow because it uses the connection's execute method which results in a new transaction being created to insert each and every row. This might be acceptable if the database was held in memory but in this case the database is being held on the filesystem on the spinning disk drive.

Interactions with the filesystem are slow, usually several syscalls have to be called. For example for durability reasons (a key requirement for ACID) databases often make use of the fsync system call to make sure OS buffered data is actually flushed to disk and is not lost in RAM in case of a unexpected power failure. All of this means creating 1000 transactions and committing them is very slow, it is much better to batch the database operations on a single transaction and only committing them once like this:

/// A fast way to insert rows
fn fast_insert(conn: &mut Connection) -> Result<()> {
    let tx = conn.transaction()?;

    for count in 0..1000 {
        tx.execute("INSERT INTO vals (v) VALUES (?1)", params![count])?;
    }
    tx.commit()?;
    Ok(())
}