modernc.org/sqlite with Go

I had long been aware of the effort of cznic to automatically translate the C source of SQLite to Go while avoiding the complexity CGo would introduce. SQLite was always meant to be a database you embed into your application, and I was curious what tradeoffs that meant, if any, and just generally how it differed from the usual suspects (PostgreSQL, etc.). To explore these trade-offs and differences, I decided to dive deeper into the usage of this library.

SQLite is a single-writer multiple-reader database. Using just a single instance of database/sql.DB will not work well with this, because it will make it hard to avoid receiving SQLITE_BUSY errors when multiple writers try to run at the same time (because sql.DB is a connection pool). So we will need to separate readers from writers by making two instances of database/sql.DB and limiting the writer to a single MaxOpenConn and the reader to as many as you require.

So far, SQLite is behaving remarkably like other databases apart from the single-writer gotcha; we still have connections and per-connection settings. The usual way to pass these settings is usually in the DSN you pass to sql.Open and this is definitely possible, but I find that it would be more straightforward if I could just run my custom SQL when a new connection is made[1]. We can do that with sqlite.RegisterConnectionHook.

Now the question is what settings to use, because there are quite a few of them. These are the ones I am using personally:

PRAGMA journal_mode = WAL; Use write-ahead-logging. Better concurrency, but with an edge-case: with heavy writers, the wal can grow idefinitely (see for more). I don't expect this will happen in my use-case. For a good description on the various jornal modes, see this blogpost.
PRAGMA synchronous = NORMAL; Since with the WAL journal mode, this is safe, there is no reason to use FULL.
PRAGMA temp_store = MEMORY; Let's store temporary indices, tables, etc. in memory, my use-case is not very resource-constrained to require using the filesystem (also, this is a fun read if we are talking about additional syscalls to the kernel)
PRAGMA mmap_size = 30000000000; -- 30GB Just making sure, access to my database is always done with memory-mapped I/O.
PRAGMA busy_timeout = 5000; How much time to wait in milliseconds after which SQLITE_BUSY is returned when waiting for the db to unlock.
PRAGMA automatic_index = true; Make sure automatic indexing is enabled, it can only help, and is a great way to get warnings about indexes that should be created.
PRAGMA foreign_keys = ON; Make sure foreign key enforcement is enabled.
PRAGMA analysis_limit = 1000; Make sure ANALYZE returns quickly in case the DB gets large.
PRAGMA trusted_schema = OFF; No idea what security audited functions are, but this setting makes sure only those can be run. The manual page tells me to turn it off anyway, so let's do that.

Not all of these settings are per-connection settings, but it doesn't hurt to always run all of these on every connection, and having a single "Init SQL" makes things simpler.

Let's deal with explicit transactions next. The default transaction mode is DEFERRED, meaning that it depends on the next statement what type of transaction is started (if a SELECT statement comes first, a read transaction is started, with the possibility of upgrading it to a write transaction depending on the statements that follow). With our setup, there should never be multiple writers concurrently, so making all transactions write transactions from the get-go (with BEGIN IMMEDIATE) can only help us show if this invariant is violated.

Putting this all together, this is the code we arrive at:

package main

import (
	"bytes"
	"context"
	"crypto/sha1"
	"database/sql"
	"database/sql/driver"
	"encoding/hex"
	"fmt"
	"time"

	"modernc.org/sqlite"
)

type DB struct {
	dbPath    string
	Read  *sql.DB
	Write *sql.DB
}

const initSQL = `
	PRAGMA journal_mode = WAL;
	PRAGMA synchronous = NORMAL;
	PRAGMA temp_store = MEMORY;
	PRAGMA mmap_size = 30000000000; -- 30GB
	PRAGMA busy_timeout = 5000;
	PRAGMA automatic_index = true;
	PRAGMA foreign_keys = ON;
	PRAGMA analysis_limit = 1000;
	PRAGMA trusted_schema = OFF;
`

func New(ctx context.Context, dbPath string) (db *DB, err error) {
	db = &DB{
		dbPath: dbPath,
	}

	// make sure every opened connection has the settings we expect
	sqlite.RegisterConnectionHook(func(conn sqlite.ExecQuerierContext, _ string) error {
		_, err = conn.ExecContext(ctx, initSQL, nil)

		return err
	})

	write, err := sql.Open("sqlite", "file:"+db.dbPath)
	if err != nil {
		return
	}
	// only a single writer ever, no concurrency
	write.SetMaxOpenConns(1)
	write.SetConnMaxIdleTime(time.Minute)
	if err != nil {
		return
	}

	read, err := sql.Open("sqlite", "file:"+db.dbPath)
	if err != nil {
		return
	}
	// readers can be concurrent
	read.SetMaxOpenConns(100)
	read.SetConnMaxIdleTime(time.Minute)

	db.Read = read
	db.Write = write

	return
}

func (db *DB) InTransaction(ctx context.Context, tx func(context.Context, *sql.Conn) error) (err error) {
	// cancel anything that takes longer than 5 seconds
	var cancel func()
	ctx, cancel = context.WithTimeout(ctx, time.Duration(5*time.Second))
	defer cancel()

	// separate conn thats just for us
	// so that the transactions queries are executed together
	conn, err := db.Write.Conn(ctx)
	if err != nil {
		return
	}
	defer conn.Close()

	_, err = conn.ExecContext(ctx, "BEGIN IMMEDIATE TRANSACTION")
	if err != nil {
		return
	}

	defer func() {
		if r := recover(); r != nil || err != nil {
			if _, err = conn.ExecContext(ctx, "ROLLBACK"); err != nil {
				return
			}
		}
	}()

	err = tx(ctx, conn)
	if err != nil {
		return
	}

	_, err = conn.ExecContext(ctx, "COMMIT")
	return
}

Just as a small added bonus, it's also very easy to register our own go functions that we can then call from SQL. In this example, we define the sha1 function, receiving and returning a single argument, and always returning the same output for the same input (thus the deterministic). For more information, see the docs

func registerFuncs() {
	// lets add our own sha1 hash function usable from SQL
	sqlite.MustRegisterDeterministicScalarFunction(
		"sha1",
		1,
		func(ctx *sqlite.FunctionContext, args []driver.Value) (driver.Value, error) {
			var arg *bytes.Buffer
			switch argTyped := args[0].(type) {
			case string:
				arg = bytes.NewBuffer([]byte(argTyped))
			case []byte:
				arg = bytes.NewBuffer(argTyped)
			default:
				return nil, fmt.Errorf("expected argument to be a string, got: %T", argTyped)
			}

			h := sha1.Sum(arg.Bytes())
			return hex.EncodeToString(h), nil
		},
	)
}

In conclusion, this solution works perfectly well for most of the usage scenarios out there where immense scaling is neither expected nor needed. There are faster ways to use SQLite with Go (see the excellent cvilsmeier benchmarks), but its still plenty-fast enough, and avoiding CGo makes it very convenient also.

[1]: This is another thing that database/sql.DB being a connection pool makes harder.

modernc.org/sqlite with Go

Blog Posts

View more

Graceful termination of Nginx in K8s

React Native vs. React Web: What's the Difference and When to Use Them

React vs. Angular: Which Front-End Framework Is Right for You?