Discussion:
[sqlite] dqlite - SQLite replication and failover library
Free Ekanayaka
2017-08-19 08:46:59 UTC
Permalink
Hi,

first of all let me thank Richard Hipp and the rest of the SQLite team
and community for such a great piece of software. I've been recently
working with SQLite's code base and found it an absolute pleasure to
read (and have learned from it too).

In this mail I'd like to:

1) Present dqlite, a library replicating your application's SQLite
database across N nodes and safely surviving any minority of them
dying or disconnecting (only for Go applications for now, see below).

2) Submit a patch to SQLite that introduces a minimal replication API,
and get feedback about its possible inclusion upstream.

= dqlite =

It's a Go package that uses the Raft algorithm to replicate SQLite WAL
frames across a cluster of nodes. This roughly means that you can open a
SQLite connection using the "database/sql" standard lib API and have
anything you transactionally replicated. No external process needed.

Ideally this library should have been written in C or Rust, to support
binding to any language. However, due to the use case and timeline of
the first project that will use it (LXD [0]), and due to the lack of
mature Raft implementations in C/Rust, Go was chosen instead. It should
hopefully at least serve as reference to anyone needing a C/Rust
version.

The work has been funded by Canonical, the company behind Ubuntu. Please
see the dqlite's home page [1] for more details.

= SQLite replication API patch =

This is the SQLite patch that dqlite depends on. It essentially adds a
few key hooks in the pager and write-ahead log to let external libraries
like dqlite implement WAL-based database replication.

As you'll quickly see, it's by no means ready for upstream inclusion, in
particular it lacks unit tests and more comprehensive documentation
comments (note however that virtually every code path introduced by the
patch is already exercised indirectly by dqlite's own unit tests).

If the SQLite team thinks there is room for upstream inclusion, I'll be
more than glad to do the necessary work to make the patch adhere to
SQLite's standards and go through a review process.

The patch has currently 703 additions and 22 deletions, and is published
on GitHub [2].

Cheers,

Free

[0] https://linuxcontainers.org/
[1] https://github.com/CanonicalLtd/dqlite
[2] https://github.com/CanonicalLtd/sqlite/commit/2a9aa8b056f37ae05f38835182a2856ffc95aee4
Wout Mertens
2017-08-20 17:50:24 UTC
Permalink
Very interesting!

So how does it behave during conflict situations? Raft selects a winning
WAL write and any others in flight are aborted?

And when not enough nodes are available, writes are hung until consensus?

I won't be able to use it due to Go but it's great to know that this is on
the horizon of possibilities… Very nice!
Post by Free Ekanayaka
Hi,
first of all let me thank Richard Hipp and the rest of the SQLite team
and community for such a great piece of software. I've been recently
working with SQLite's code base and found it an absolute pleasure to
read (and have learned from it too).
1) Present dqlite, a library replicating your application's SQLite
database across N nodes and safely surviving any minority of them
dying or disconnecting (only for Go applications for now, see below).
2) Submit a patch to SQLite that introduces a minimal replication API,
and get feedback about its possible inclusion upstream.
= dqlite =
It's a Go package that uses the Raft algorithm to replicate SQLite WAL
frames across a cluster of nodes. This roughly means that you can open a
SQLite connection using the "database/sql" standard lib API and have
anything you transactionally replicated. No external process needed.
Ideally this library should have been written in C or Rust, to support
binding to any language. However, due to the use case and timeline of
the first project that will use it (LXD [0]), and due to the lack of
mature Raft implementations in C/Rust, Go was chosen instead. It should
hopefully at least serve as reference to anyone needing a C/Rust
version.
The work has been funded by Canonical, the company behind Ubuntu. Please
see the dqlite's home page [1] for more details.
= SQLite replication API patch =
This is the SQLite patch that dqlite depends on. It essentially adds a
few key hooks in the pager and write-ahead log to let external libraries
like dqlite implement WAL-based database replication.
As you'll quickly see, it's by no means ready for upstream inclusion, in
particular it lacks unit tests and more comprehensive documentation
comments (note however that virtually every code path introduced by the
patch is already exercised indirectly by dqlite's own unit tests).
If the SQLite team thinks there is room for upstream inclusion, I'll be
more than glad to do the necessary work to make the patch adhere to
SQLite's standards and go through a review process.
The patch has currently 703 additions and 22 deletions, and is published
on GitHub [2].
Cheers,
Free
[0] https://linuxcontainers.org/
[1] https://github.com/CanonicalLtd/dqlite
[2]
https://github.com/CanonicalLtd/sqlite/commit/2a9aa8b056f37ae05f38835182a2856ffc95aee4
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Free Ekanayaka
2017-08-20 21:44:44 UTC
Permalink
Post by Wout Mertens
Very interesting!
So how does it behave during conflict situations? Raft selects a winning
WAL write and any others in flight are aborted?
Ah yeah this is probably something that was not clear from the docs or
from my presentation.

There can't be a conflict situation. Raft's model is that only the
leader can append new log entries, which translated to dqlite means that
only the leader can write new WAL frames. So this means that any attempt
to perform a write transaction on a non-leader node will fail with a
SQLITE_NOT_LEADER error (and in this case clients are supposed to retry
against whoever is the new leader).

I'm going to add this to the FAQ.
Post by Wout Mertens
And when not enough nodes are available, writes are hung until
consensus?
Yes, but there's a (configurable timeout). It's not possible to *not*
have timeout (although you can set it really really high of course :)
Post by Wout Mertens
I won't be able to use it due to Go but it's great to know that this is on
the horizon of possibilities… Very nice!
Yeah I think Go is somehow limiting, but hopefully once Raft libraries
mature in C/Raft, dqlite can act as reference/prototype.
Free Ekanayaka
2017-08-20 22:07:27 UTC
Permalink
Post by Free Ekanayaka
Post by Wout Mertens
And when not enough nodes are available, writes are hung until consensus?
Yes, but there's a (configurable timeout).
BTW, this is a consequence of Raft sitting in the CP spectrum of the CAP
theorem: in case of a network partition it chooses consistency and
sacrifices availability.
Wout Mertens
2017-08-20 22:11:34 UTC
Permalink
Oh I see, of course. So I assume the client library automatically sends
write commands to the current leader?

I wonder if there is value in setting a preferred leader, but probably
that's messing too much with the Raft protocol.
Post by Free Ekanayaka
Post by Wout Mertens
Very interesting!
So how does it behave during conflict situations? Raft selects a winning
WAL write and any others in flight are aborted?
Ah yeah this is probably something that was not clear from the docs or
from my presentation.
There can't be a conflict situation. Raft's model is that only the
leader can append new log entries, which translated to dqlite means that
only the leader can write new WAL frames. So this means that any attempt
to perform a write transaction on a non-leader node will fail with a
SQLITE_NOT_LEADER error (and in this case clients are supposed to retry
against whoever is the new leader).
I'm going to add this to the FAQ.
Post by Wout Mertens
And when not enough nodes are available, writes are hung until consensus?
Yes, but there's a (configurable timeout). It's not possible to *not*
have timeout (although you can set it really really high of course :)
Post by Wout Mertens
I won't be able to use it due to Go but it's great to know that this is
on
Post by Wout Mertens
the horizon of possibilities… Very nice!
Yeah I think Go is somehow limiting, but hopefully once Raft libraries
mature in C/Raft, dqlite can act as reference/prototype.
Free Ekanayaka
2017-08-20 23:28:35 UTC
Permalink
Post by Wout Mertens
Oh I see, of course. So I assume the client library automatically sends
write commands to the current leader?
No, that's up the application for now, the library just returns you an
error if you attempt a write on a non-leader node.
Post by Wout Mertens
I wonder if there is value in setting a preferred leader, but probably
that's messing too much with the Raft protocol.
I'm not entirely sure to understand, but if you mean "if possible, I
generally would like the leader to be this node, please", no that's
currently not supported. I don't see a reason why it couldn't be added,
but it seems a kind of exotic requirement in today's "cats vs pets" way
of thinking to nodes.

Loading...