[sqlite] Fastest way to backup/copy database?

Discussion:

Rob Willett

2016-05-04 10:44:17 UTC

Hi,

We think we know the answer to this, but we’ll ask the question
anyway.

We’re trying to backup a 10GB live running database
“as-fast-as-we-possibly-can” without stopping updates coming in. The
updates come every 2-3 mins, and write a chunk of data in. We can’t
really stop the database updates, well we can but we don’t want to.

1. We had a quick look to see if we could copy the sqlite file over in
the short interval between updates but sadly cp simply wasn’t fast
enough. We get around 3GB copied before an update happens, which
basically renders the cp useless.

2. If we use the command line sqlite <filename> .dump > <backupfilename>
it works, but its very slow.

3. Using the Sqlite C API works but is also very slow.

4. We don’t have the option of an LVM snapshot as the file system is
in a Container <sigh>.

So is there any other method of doing a quick snapshot? Failing that,
our solution will be to stop any updates for the duration of the cp
command, and then restart the process afterwards. Its not the end of the
world but it would have to be done out of normal working hours.

This is going to become a bigger problem for us as the database will
only get bigger so any advice welcomed.

Thanks

Rob

Dominique Devienne

2016-05-04 11:08:04 UTC

Permalink

On Wed, May 4, 2016 at 12:44 PM, Rob Willett

Post by Rob Willett
We’re trying to backup a 10GB live running database

[...]

Post by Rob Willett
2. If we use the command line sqlite <filename> .dump > <backupfilename> it
works, but its very slow.

That's going to SQL text.
While .backup is page-based, and binary. But not incremental in the
Shell I believe.

Post by Rob Willett
3. Using the Sqlite C API works but is also very slow.

Are you talking about https://www.sqlite.org/backup.html ?
Because Example 2 is exactly your use case.

Post by Rob Willett
So is there any other method of doing a quick snapshot?

Well, https://www.sqlite.org/backup.html is the only supported way IMHO. --DD

Rob Willett

2016-05-04 11:13:09 UTC

Permalink

Thanks for the reply,

Yes Example 2 in https://www.sqlite.org/backup.html is what we are
talking about. It was very slow to run for us.

Rob

Post by Dominique Devienne
On Wed, May 4, 2016 at 12:44 PM, Rob Willett

Post by Rob Willett
We’re trying to backup a 10GB live running database

[...]

Post by Rob Willett
2. If we use the command line sqlite <filename> .dump >
<backupfilename> it
works, but its very slow.

That's going to SQL text.
While .backup is page-based, and binary. But not incremental in the
Shell I believe.

Post by Rob Willett
3. Using the Sqlite C API works but is also very slow.

Are you talking about https://www.sqlite.org/backup.html ?
Because Example 2 is exactly your use case.

Post by Rob Willett
So is there any other method of doing a quick snapshot?

Well, https://www.sqlite.org/backup.html is the only supported way IMHO. --DD
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Dominique Devienne

2016-05-04 11:22:22 UTC

Permalink

On Wed, May 4, 2016 at 1:13 PM, Rob Willett

Post by Rob Willett
Thanks for the reply,
Yes Example 2 in https://www.sqlite.org/backup.html is what we are talking
about. It was very slow to run for us.

Dominique Devienne

2016-05-04 11:26:27 UTC

Permalink

Post by Dominique Devienne
On Wed, May 4, 2016 at 1:13 PM, Rob Willett

Post by Rob Willett
Thanks for the reply,
Yes Example 2 in https://www.sqlite.org/backup.html is what we are talking
about. It was very slow to run for us.

Then maybe https://www.sqlite.org/rbu.html is your last chance.

Unless you invest into writing your own VFS, which is thus fully aware
of page writes
and can "tee" those to the actual file, but also queue the written
pages to be written
(or sent on network) to the backup destination. But an SQLite VFS is
tricky business... --DD

Dominique Devienne

2016-05-04 11:31:26 UTC

Permalink

Post by Dominique Devienne

Post by Dominique Devienne
On Wed, May 4, 2016 at 1:13 PM, Rob Willett

Post by Rob Willett
Thanks for the reply,
Yes Example 2 in https://www.sqlite.org/backup.html is what we are talking
about. It was very slow to run for us.

Then maybe https://www.sqlite.org/rbu.html is your last chance.

Yours would be similar to
http://www.sqlite.org/src/doc/trunk/src/test_multiplex.c
from http://www.sqlite.org/vfs.html, but not for splitting a file in
pieces, but duplicating the file somewhere.

Although I don't think a VFS is aware of transaction boundaries,
so you wouldn't know when the backup copy is in a "consistent" state,
unless you can interpret xUnlock and/or xSync as transactions,
but that depends on pragmas too. Not easy, in all cases IMHO. --DD

Rob Willett

2016-05-04 12:35:33 UTC

Permalink

Dominque,

We put together a quick C program to try out the C API a few weeks ago,
it worked but it was very slow, from memory not much different to the
sqlite command line backup system. We put it on the back burner as it
wasn’t anywhere near quick enough.

We hadn’t seen or found or even looked for RBU, when I read this I
thought Realtime Backup Unit, I wasn’t even close :) Rats

I think that unless we can get the C API to back up in a time close to
that of a cp, the easiest solution is to suspend updates for 10-15 mins
out-of-hours and do a simple cp from there. Sometimes a change in
workflow might be the easiest and simplest solution.

I know you have mentioned VFS shims further down the email trail but
thats certainly a step too far for us.

Thanks

Rob

Post by Dominique Devienne
On Wed, May 4, 2016 at 1:13 PM, Rob Willett

Post by Rob Willett
Thanks for the reply,
Yes Example 2 in https://www.sqlite.org/backup.html is what we are talking
about. It was very slow to run for us.

Then maybe https://www.sqlite.org/rbu.html is your last change.
Although I don't see how it could be faster than the Backup API.
I think you should share more details of how you use the Backup API,
so experts can provide advice on what to try to make it scale better. --DD
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Simon Slavin

2016-05-04 13:24:00 UTC

Permalink

I think that unless we can get the C API to back up in a time close to that of a cp, the easiest solution is to suspend updates for 10-15 mins out-of-hours and do a simple cp from there. Sometimes a change in workflow might be the easiest and simplest solution.

This solution may be what you eventually come up with. But I would like to comment that I used a Unix platform to copy 15 Gigabytes through FireWire from an external hard disk yesterday and that took less than 10 minutes. Simply duplicating a 10 Gigabyte file onto your boot drive should take considerably less time.

You may be able to use built-in SQLite mechanisms to suspend updates. It might be worth using the SQLite shell tool to execute "BEGIN IMMEDIATE" and see whether that does suspend operations.

On the other hand there's a lot to be said for running a script at 3am which quits the update program, takes the snapshot, then starts them up again. Not only will this perform the task needed but it would also serve to 'reset' those updating programs in case they have a slow resource leak or some other long-term bug.

Simon.

Rob Willett

2016-05-04 13:52:38 UTC

Permalink

Simon,

Thanks for the reply, we were a little surprised at the UNIX cp speed as
well. We investigated it with the provider in the past over other file
system speed issues and it turned out that they rate limit the IO ops,
so you can’t consume them all. Our provider runs many servers out of
their data centre and they want to make sure that one company (us!)
can’t monopolise all the resources. I can see their point of view and
since we are not a critical company we choose to pay an appropriate rate
for this. All systems have limitations one way or another. This is one
of theirs and is acceptable given the price bracket. Now if we were
paying £1,000 ($1,500)/week we would have had a very different
conversation :)

We’ll investigate the BEGIN IMMEDIATE and see what happens.

As you say though, a simple script that suspends a message queue whilst
a cp takes place and then sim,ply turns the tap back on is very simple
to code and to see working. I like easy, simple solutions because I’m
simple at heart.

Best wishes,

Rob

Post by Simon Slavin

Post by Rob Willett
I think that unless we can get the C API to back up in a time close
to that of a cp, the easiest solution is to suspend updates for 10-15
mins out-of-hours and do a simple cp from there. Sometimes a change
in workflow might be the easiest and simplest solution.

This solution may be what you eventually come up with. But I would
like to comment that I used a Unix platform to copy 15 Gigabytes
through FireWire from an external hard disk yesterday and that took
less than 10 minutes. Simply duplicating a 10 Gigabyte file onto your
boot drive should take considerably less time.
You may be able to use built-in SQLite mechanisms to suspend updates.
It might be worth using the SQLite shell tool to execute "BEGIN
IMMEDIATE" and see whether that does suspend operations.
On the other hand there's a lot to be said for running a script at 3am
which quits the update program, takes the snapshot, then starts them
up again. Not only will this perform the task needed but it would
also serve to 'reset' those updating programs in case they have a slow
resource leak or some other long-term bug.
Simon.
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

R Smith

2016-05-04 17:22:38 UTC

Permalink

Post by Rob Willett
Dominque,
We put together a quick C program to try out the C API a few weeks
ago, it worked but it was very slow, from memory not much different to
the sqlite command line backup system. We put it on the back burner as
it wasn’t anywhere near quick enough.

You do realize that the backup API restarts the backup once the database
content changes, right? I'm sure at the rates you describe and update
frequency, that backup would never finish. The backup API is quite fast
if your destination file is on a not-too-slow drive, but you will have
to stop the incoming data to allow it to finish.

As an aside - you need a better provider, but that said, and if it was
me, I would get two sites up from two different providers, one live, one
stand-by, both the cheap sort so costs stay minimal (usually two cheap
ones are much cheaper than the next level beefy one). Feed all
updates/inserts to both sites - one then is the backup of the other, not
only data-wise, but also can easily be switched to by simple DNS
redirect should the first site/provider go down for any reason. The
second site can easily be interfered with / copied from / backed up /
whatever without affecting the service to the public.

I only do this with somewhat critical sites, but your use-case sounds
like it might benefit from it. My second choice would be to simply stop
operations at a best-case time-slot while the backup / copy completes.

Cheers,
Ryan

Rob Willett

2016-05-04 17:45:49 UTC

Permalink

Ryan,

Ahhhhh! The penny drops, we didn’t realise that with the backup API.
That explains a great deal. We must have missed that in the docs. Blast.

We’ve looked around for other providers in Europe and the cost
differences are very high. We need to be in the EU for various data
protection reasons. Until now we haven’t had any issues as we don’t
move a significant amount of data around in a very short period of time,
so the rate limited IO has not been a problem.

One of our options is to do what you suggest with a second database
server and run them hot/warm. We had already thought of that but not got
around to it as the setting up time is quite high (we need a bank of
servers, feeding things from one server to another), but our immediate
issue is simply copying the 10GB database. The downside of the second
server is moving 10GB data files around the internet afterwards back to
the failed server. Rebuilding from scratch is a pain as it takes around
2-3 weeks to rebuild the database from scratch as we have to process
every file again (circa 200,000) in order and each file takes around 4-8
secs to run.

I think the backup solution is the tried and tested Keep-It-Simple shell
script. We pause the queue upstream which stops the update process, do a
cp and then restart the queue again. All of this is doable in shell
script.

Rob

Post by R Smith

Post by Rob Willett
Dominque,
We put together a quick C program to try out the C API a few weeks
ago, it worked but it was very slow, from memory not much different
to the sqlite command line backup system. We put it on the back
burner as it wasn’t anywhere near quick enough.

You do realize that the backup API restarts the backup once the
database content changes, right? I'm sure at the rates you describe
and update frequency, that backup would never finish. The backup API
is quite fast if your destination file is on a not-too-slow drive, but
you will have to stop the incoming data to allow it to finish.
As an aside - you need a better provider, but that said, and if it was
me, I would get two sites up from two different providers, one live,
one stand-by, both the cheap sort so costs stay minimal (usually two
cheap ones are much cheaper than the next level beefy one). Feed all
updates/inserts to both sites - one then is the backup of the other,
not only data-wise, but also can easily be switched to by simple DNS
redirect should the first site/provider go down for any reason. The
second site can easily be interfered with / copied from / backed up /
whatever without affecting the service to the public.
I only do this with somewhat critical sites, but your use-case sounds
like it might benefit from it. My second choice would be to simply
stop operations at a best-case time-slot while the backup / copy
completes.
Cheers,
Ryan
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Dan Kennedy

2016-05-04 17:53:58 UTC

Permalink

Post by Rob Willett
Ryan,
Ahhhhh! The penny drops, we didn’t realise that with the backup API.
That explains a great deal. We must have missed that in the docs. Blast.

There is one exception to this:

If the database is written to via the same database handle that is being
used as the source db by the backup API, then the backup is not
restarted. In this case if any pages that have already been transferred
to the backup db are modified the new versions are written into the
backup db at the same time as the source is updated.

Dan.

Post by Rob Willett
We’ve looked around for other providers in Europe and the cost
differences are very high. We need to be in the EU for various data
protection reasons. Until now we haven’t had any issues as we don’t
move a significant amount of data around in a very short period of
time, so the rate limited IO has not been a problem.
One of our options is to do what you suggest with a second database
server and run them hot/warm. We had already thought of that but not
got around to it as the setting up time is quite high (we need a bank
of servers, feeding things from one server to another), but our
immediate issue is simply copying the 10GB database. The downside of
the second server is moving 10GB data files around the internet
afterwards back to the failed server. Rebuilding from scratch is a
pain as it takes around 2-3 weeks to rebuild the database from scratch
as we have to process every file again (circa 200,000) in order and
each file takes around 4-8 secs to run.
I think the backup solution is the tried and tested Keep-It-Simple
shell script. We pause the queue upstream which stops the update
process, do a cp and then restart the queue again. All of this is
doable in shell script.
Rob

Post by R Smith

Post by Rob Willett
Dominque,
We put together a quick C program to try out the C API a few weeks
ago, it worked but it was very slow, from memory not much different
to the sqlite command line backup system. We put it on the back
burner as it wasn’t anywhere near quick enough.

You do realize that the backup API restarts the backup once the
database content changes, right? I'm sure at the rates you describe
and update frequency, that backup would never finish. The backup API
is quite fast if your destination file is on a not-too-slow drive,
but you will have to stop the incoming data to allow it to finish.
As an aside - you need a better provider, but that said, and if it
was me, I would get two sites up from two different providers, one
live, one stand-by, both the cheap sort so costs stay minimal
(usually two cheap ones are much cheaper than the next level beefy
one). Feed all updates/inserts to both sites - one then is the backup
of the other, not only data-wise, but also can easily be switched to
by simple DNS redirect should the first site/provider go down for any
reason. The second site can easily be interfered with / copied from
/ backed up / whatever without affecting the service to the public.
I only do this with somewhat critical sites, but your use-case sounds
like it might benefit from it. My second choice would be to simply
stop operations at a best-case time-slot while the backup / copy
completes.
Cheers,
Ryan
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Rob Willett

2016-05-04 17:57:01 UTC

Permalink

Dan,

Thats NOT the case for us so that explains why things are slow. Mmm…
as I recall we never did get a backup to finish…. Now we know why :)

Rob

Post by Dan Kennedy

Post by Rob Willett
Ryan,
Ahhhhh! The penny drops, we didn’t realise that with the backup
API. That explains a great deal. We must have missed that in the
docs. Blast.

If the database is written to via the same database handle that is
being used as the source db by the backup API, then the backup is not
restarted. In this case if any pages that have already been
transferred to the backup db are modified the new versions are written
into the backup db at the same time as the source is updated.
Dan.

Post by Rob Willett
We’ve looked around for other providers in Europe and the cost
differences are very high. We need to be in the EU for various data
protection reasons. Until now we haven’t had any issues as we
don’t move a significant amount of data around in a very short
period of time, so the rate limited IO has not been a problem.
One of our options is to do what you suggest with a second database
server and run them hot/warm. We had already thought of that but not
got around to it as the setting up time is quite high (we need a bank
of servers, feeding things from one server to another), but our
immediate issue is simply copying the 10GB database. The downside of
the second server is moving 10GB data files around the internet
afterwards back to the failed server. Rebuilding from scratch is a
pain as it takes around 2-3 weeks to rebuild the database from
scratch as we have to process every file again (circa 200,000) in
order and each file takes around 4-8 secs to run.
I think the backup solution is the tried and tested Keep-It-Simple
shell script. We pause the queue upstream which stops the update
process, do a cp and then restart the queue again. All of this is
doable in shell script.
Rob

Post by R Smith

Post by Rob Willett
Dominque,
We put together a quick C program to try out the C API a few weeks
ago, it worked but it was very slow, from memory not much different
to the sqlite command line backup system. We put it on the back
burner as it wasn’t anywhere near quick enough.

You do realize that the backup API restarts the backup once the
database content changes, right? I'm sure at the rates you describe
and update frequency, that backup would never finish. The backup API
is quite fast if your destination file is on a not-too-slow drive,
but you will have to stop the incoming data to allow it to finish.
As an aside - you need a better provider, but that said, and if it
was me, I would get two sites up from two different providers, one
live, one stand-by, both the cheap sort so costs stay minimal
(usually two cheap ones are much cheaper than the next level beefy
one). Feed all updates/inserts to both sites - one then is the
backup of the other, not only data-wise, but also can easily be
switched to by simple DNS redirect should the first site/provider go
down for any reason. The second site can easily be interfered with
/ copied from / backed up / whatever without affecting the service
to the public.
I only do this with somewhat critical sites, but your use-case
sounds like it might benefit from it. My second choice would be to
simply stop operations at a best-case time-slot while the backup /
copy completes.
Cheers,
Ryan
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Clemens Ladisch

2016-05-04 11:58:59 UTC

Permalink

We’re trying to backup a 10GB live running database “as-fast-as-we-
possibly-can” without stopping updates coming in.

How much memory do you have? I guess you can't simply read the entire
database file to force it into the file cache?

In WAL mode, a writer does not block readers. You have to decide
whether you can live with its restrictions:
http://www.sqlite.org/wal.html

Regards,
Clemens

Rob Willett

2016-05-04 12:39:27 UTC

Permalink

Clemens,

We have 8GB of memory which is the most our VPS provider allows. We’d
like 32GB but its not an option for us. Our desktops have more than
that, but the VPS provider is reasonably priced :)

We hadn’t considered the WAL mode, my conclusion is that a simple
change to our workflow is actually simpler, we stop the database updates
for 15 mins out of hours, cp and then restart. Its not ideal but we’re
not running a nuclear power station or a 24/7 medical facility. Users
*may* not get traffic updates for 15 mins at 03:00 in the morning. The
world will keep spinning.

Rob

Post by Clemens Ladisch

Post by Rob Willett
We’re trying to backup a 10GB live running database
“as-fast-as-we-
possibly-can” without stopping updates coming in.

How much memory do you have? I guess you can't simply read the entire
database file to force it into the file cache?
In WAL mode, a writer does not block readers. You have to decide
http://www.sqlite.org/wal.html
Regards,
Clemens
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Rowan Worth

2016-05-06 03:17:15 UTC

Permalink

Post by Rob Willett
Clemens,
We have 8GB of memory which is the most our VPS provider allows. We’d like
32GB but its not an option for us. Our desktops have more than that, but
the VPS provider is reasonably priced :)
We hadn’t considered the WAL mode, my conclusion is that a simple change
to our workflow is actually simpler, we stop the database updates for 15
mins out of hours, cp and then restart. Its not ideal but we’re not running
a nuclear power station or a 24/7 medical facility. Users *may* not get
traffic updates for 15 mins at 03:00 in the morning. The world will keep
spinning.

Are users really forced to wait for a database update to receive traffic?
Assuming you run a BEGIN IMMEDIATE to prevent updates while the copy is in
progress -- which you must no matter how fast you can copy the database,
otherwise a transaction committing during the copy will likely result in a
corrupted backup -- connections that are reading the database can still
proceed.

Note that you can prevent the sqlite3_backup process from restarting when
the database changes by passing -1 as the number of pages to
sqlite3_backup_step ("copy all pages at once"). If you take this approach
you don't need BEGIN IMMEDIATE because sqlite will take care of the
locking, but I'm not sure how the i/o performance compares to cp or rsync.

-Rowan

Stadin, Benjamin

2016-05-07 11:32:50 UTC

Permalink

Hi Rob,

I think Clemens suggestion may be worth investigating, in case you do not
want to stop the updates (which probably means a change in your workflow
and some effort at other places anyways).

I think this may work:
- Use WAL, and turn off automatic checkpointing
(https://www.sqlite.org/wal.html). The default behavior is to do a commit
after 1000*4096(pagesize) which is round about 4MB. Instead of using the
default auto checkpoint, create a checkpoint every now and then on your
own in your code (e.g. simply after every n-th commit, every 10 minutes,
or whatever fits).
- Do *not* do checkpointing at the time you copy your db, in order to
avoid changing the db while copying the file. Changes are written to WAL
files exclusively at this time. I think it needs just reasonable effort to
trigger these event from the outside to have the app know when a backup
starts and stops - or it could be done as simple as implement within the
checkpoint code a rule like „don’t make a checkpoint between 2:30am and
4:00am“.

Regards,
Ben

Am 04.05.16, 14:39 schrieb "sqlite-users-***@mailinglists.sqlite.org
on behalf of Rob Willett" unter

Post by Rob Willett
Clemens,
We have 8GB of memory which is the most our VPS provider allows. We’d
like 32GB but its not an option for us. Our desktops have more than
that, but the VPS provider is reasonably priced :)
We hadn’t considered the WAL mode, my conclusion is that a simple
change to our workflow is actually simpler, we stop the database updates
for 15 mins out of hours, cp and then restart. Its not ideal but we’re
not running a nuclear power station or a 24/7 medical facility. Users
*may* not get traffic updates for 15 mins at 03:00 in the morning. The
world will keep spinning.
Rob

Post by Clemens Ladisch

Post by Rob Willett
We’re trying to backup a 10GB live running database
“as-fast-as-we-
possibly-can” without stopping updates coming in.

_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Adam Devita

2016-05-09 14:08:25 UTC

Permalink

Re WAL mode trick.

I think you would want to complete a checkpoint and then do the backup,
ensuring that no check-points are done during your backup time. This way,
you know that your committed transactions prior to the backup are in the
file being backed up.

regards,
Adam

On Sat, May 7, 2016 at 7:32 AM, Stadin, Benjamin <

Post by Stadin, Benjamin
Hi Rob,
I think Clemens suggestion may be worth investigating, in case you do not
want to stop the updates (which probably means a change in your workflow
and some effort at other places anyways).
- Use WAL, and turn off automatic checkpointing
(https://www.sqlite.org/wal.html). The default behavior is to do a commit
after 1000*4096(pagesize) which is round about 4MB. Instead of using the
default auto checkpoint, create a checkpoint every now and then on your
own in your code (e.g. simply after every n-th commit, every 10 minutes,
or whatever fits).
- Do *not* do checkpointing at the time you copy your db, in order to
avoid changing the db while copying the file. Changes are written to WAL
files exclusively at this time. I think it needs just reasonable effort to
trigger these event from the outside to have the app know when a backup
starts and stops - or it could be done as simple as implement within the
checkpoint code a rule like „don’t make a checkpoint between 2:30am and
4:00am“.
Regards,
Ben
on behalf of Rob Willett" unter

Post by Rob Willett
Clemens,
We have 8GB of memory which is the most our VPS provider allows. We’d
like 32GB but its not an option for us. Our desktops have more than
that, but the VPS provider is reasonably priced :)
We hadn’t considered the WAL mode, my conclusion is that a simple
change to our workflow is actually simpler, we stop the database updates
for 15 mins out of hours, cp and then restart. Its not ideal but we’re
not running a nuclear power station or a 24/7 medical facility. Users
*may* not get traffic updates for 15 mins at 03:00 in the morning. The
world will keep spinning.
Rob

Post by Clemens Ladisch

Post by Rob Willett
We’re trying to backup a 10GB live running database
“as-fast-as-we-
possibly-can” without stopping updates coming in.

_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

--
--------------
VerifEye Technologies Inc.
151 Whitehall Dr. Unit 2
Markham, ON
L3R 9T1

Eduardo Morras

2016-05-04 14:25:17 UTC

Permalink

On Wed, 04 May 2016 11:44:17 +0100

Hi,
We think we know the answer to this, but we?ll ask the question
anyway.
We?re trying to backup a 10GB live running database
?as-fast-as-we-possibly-can? without stopping updates coming in. The
updates come every 2-3 mins, and write a chunk of data in. We can?t
really stop the database updates, well we can but we don?t want to.
1. We had a quick look to see if we could copy the sqlite file over
in the short interval between updates but sadly cp simply wasn?t fast
enough. We get around 3GB copied before an update happens, which
basically renders the cp useless.
2. If we use the command line sqlite <filename> .dump >
<backupfilename> it works, but its very slow.
3. Using the Sqlite C API works but is also very slow.
4. We don?t have the option of an LVM snapshot as the file system is
in a Container <sigh>.
So is there any other method of doing a quick snapshot? Failing that,
our solution will be to stop any updates for the duration of the cp
command, and then restart the process afterwards. Its not the end of
the world but it would have to be done out of normal working hours.
This is going to become a bigger problem for us as the database will
only get bigger so any advice welcomed.

If you only want the data, you can attach/open a new db file, create schema without indexes, select all data from tables and insert them in new db tables. You don't write the indexes and should be faster. If you need the indexes, you can create them later.

Thanks
Rob

--- ---
Eduardo Morras <***@yahoo.es>

Scott Robison

2016-05-04 15:51:13 UTC

Permalink

Post by Rob Willett
This is going to become a bigger problem for us as the database will
only get bigger so any advice welcomed.

Perhaps, rather than backing up the live data, you create an append only
log of each and every query you send to the database. Should you need to
restore, you replay the log of statements. Or at the appointed backup time,
you replay the day's log of statements into another database. No need to
ever take the live database offline at the cost of slightly longer running
commands during the day to handle the append operation.

Rob Willett

2016-05-04 17:47:26 UTC

Permalink

Scott,

Thats an interesting idea. Is there an option in SQLite to do this for
us, or do we have to write a small shim in our app?

I like the idea of this as its simple and elegant.

Rob

Post by Scott Robison

Post by Rob Willett
This is going to become a bigger problem for us as the database will
only get bigger so any advice welcomed.

Scott Robison

2016-05-04 17:52:34 UTC

Permalink

Post by Rob Willett
Scott,
Thats an interesting idea. Is there an option in SQLite to do this for us,
or do we have to write a small shim in our app?
I like the idea of this as its simple and elegant.

It would require a little extra work on your part. Nothing built into the
system that would accomplish this directly. However, I've done similar
things and they don't involve a ton of overhead. You could use another
SQLite database as the append only log, or a simple text file.

I'm not aware of a free lunch solution, sadly.

Post by Rob Willett
Rob
This is going to become a bigger problem for us as the database will

Post by Scott Robison

Post by Rob Willett
only get bigger so any advice welcomed.

_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

--
Scott Robison

Rob Willett

2016-05-04 17:55:56 UTC

Permalink

Scott,

OK, We can see how to do this (I think). Our app is written in Perl and
we’d just need to capture the command we write down. The only issue I
can think of is the prepare statement and making sure we capture the
right SQL command. W

We’ll dig into it and have a look,

Thanks for taking the time to reply.

Rob

On Wed, May 4, 2016 at 11:47 AM, Rob Willett

Post by Rob Willett
Rob
This is going to become a bigger problem for us as the database will

Post by Scott Robison

Post by Rob Willett
only get bigger so any advice welcomed.

Perhaps, rather than backing up the live data, you create an append only
log of each and every query you send to the database. Should you need to
restore, you replay the log of statements. Or at the appointed
backup
time,
you replay the day's log of statements into another database. No need to
ever take the live database offline at the cost of slightly longer running
commands during the day to handle the append operation.
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

--
Scott Robison
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

J Decker

2016-05-05 15:42:18 UTC

Permalink

Instead of cp, rsync might help it is able to send delta changes.

On Wed, May 4, 2016 at 10:55 AM, Rob Willett

Post by Rob Willett
Scott,
OK, We can see how to do this (I think). Our app is written in Perl and we’d
just need to capture the command we write down. The only issue I can think
of is the prepare statement and making sure we capture the right SQL
command. W
We’ll dig into it and have a look,
Thanks for taking the time to reply.
Rob

On Wed, May 4, 2016 at 11:47 AM, Rob Willett

Post by Rob Willett
Rob
This is going to become a bigger problem for us as the database will

Post by Scott Robison

Post by Rob Willett
only get bigger so any advice welcomed.

_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

--
Scott Robison
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Rob Willett

2016-05-05 15:50:27 UTC

Permalink

Hi,

We did look at this before, and discarded the idea but I can’t
remember why.

I’ve just looked again and seen the —in-place option which I
wasn’t aware of. That *might* help and be an interesting solution. We
know we can make cp wrk, though with a little downtime. We’ll
investigate rsync —in-place on a closed (and definitely not working
database), see what happens and report back. It should be easy to test.

Thanks for the information

Rob

Post by J Decker
Instead of cp, rsync might help it is able to send delta changes.
On Wed, May 4, 2016 at 10:55 AM, Rob Willett

On Wed, May 4, 2016 at 11:47 AM, Rob Willett

Post by Rob Willett
Scott,
Thats an interesting idea. Is there an option in SQLite to do this for
us,
or do we have to write a small shim in our app?
I like the idea of this as its simple and elegant.

Post by Rob Willett
Rob
This is going to become a bigger problem for us as the database will

Post by Scott Robison

Post by Rob Willett
only get bigger so any advice welcomed.

Perhaps, rather than backing up the live data, you create an append only
log of each and every query you send to the database. Should you need to
restore, you replay the log of statements. Or at the appointed
backup
time,
you replay the day's log of statements into another database. No need to
ever take the live database offline at the cost of slightly longer running
commands during the day to handle the append operation.
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

--
Scott Robison
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Adam Devita

2016-05-05 16:26:41 UTC

Permalink

I use rsync to backup a 400MB sqlite db to a remote server. This is not
fast (which is a don't care problem in my context). You may want to test
changes to a 'log of sql commands at database' to get a replay-backup
remotely as it handles text better than binary files (at least the version
I use does). Rsync of that will save IO. I also have a process where
changed / new records are flagged and exported for backup. Rsync works
well for the little delta files, especially identifying ones that have
already been transmitted. You will have to ensure that your automated
process warns you when the remote host decided to change their address or
pc and the sync fails because it is asking if THIS host can be trusted.

regards,
Adam DeVita

Hi,
We did look at this before, and discarded the idea but I can’t remember
why.
I’ve just looked again and seen the —in-place option which I wasn’t aware
of. That *might* help and be an interesting solution. We know we can make
cp wrk, though with a little downtime. We’ll investigate rsync —in-place on
a closed (and definitely not working database), see what happens and report
back. It should be easy to test.
Thanks for the information
Rob
Instead of cp, rsync might help it is able to send delta changes.

Post by J Decker
On Wed, May 4, 2016 at 10:55 AM, Rob Willett

Post by Rob Willett
Scott,

Post by Rob Willett
Thats an interesting idea. Is there an option in SQLite to do this for us,
or do we have to write a small shim in our app?
I like the idea of this as its simple and elegant.

Post by Rob Willett
Rob
This is going to become a bigger problem for us as the database will

Post by Scott Robison

Post by Rob Willett
only get bigger so any advice welcomed.

Perhaps, rather than backing up the live data, you create an append

only
log of each and every query you send to the database. Should you need to
restore, you replay the log of statements. Or at the appointed backup time,
you replay the day's log of statements into another database. No need to
ever take the live database offline at the cost of slightly longer running
commands during the day to handle the append operation.
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________

sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

--
Scott Robison
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

--
--------------
VerifEye Technologies Inc.
151 Whitehall Dr. Unit 2
Markham, ON
L3R 9T1

Rob Willett

2016-05-05 16:38:29 UTC

Permalink

Mmmm…. Initial tests are not conclusive, it does look as if using
rsync —-inplace does speed things up but nowhere near as much as we
anticipated.

Testing consisted of a 6GB test database which is a backup copy from a
few months ago.

We timed copying the database over using cp

# time cp tfl.sqlite.backup t1

real 2m30.528s
user 0m0.052s
sys 0m10.403s

We then edited the database and deleted the contents of a table that
would have changed over the lifetime of the database. We will freely
admit we have no idea where this table is in the database file and have
no intention of finding out. Thats SQLites problem :) The file had
65,000 lines or so and would have been updated regularly at the start of
the database and over the last few months would have had small daily
updates but they would be getting fewer and fewer.

We then did

# time rsync -av --inplace --no-whole-file tfl.sqlite.backup test_db
sending incremental file list
tfl.sqlite.backup

sent 564,081,269 bytes received 623,255 bytes 4,805,995.95 bytes/sec
total size is 6,067,933,184 speedup is 10.75

real 1m57.689s
user 2m4.947s
sys 0m8.952s

The —no-while-file is apparently needed as well as —inplace.

We checked the md5sums at the end

# md5sum tfl.sqlite.backup test_db t1
b5bd91cc9b49ee1f54a8a2d013005586 tfl.sqlite.backup
b5bd91cc9b49ee1f54a8a2d013005586 test_db
b5bd91cc9b49ee1f54a8a2d013005586 t1

So we can see that the file integrity has been preserved which is what
we expect but its always good to check. Yes we know that md5sum is not
perfect but for this purpose its fine. However the rsync vs cp time is
not as big a difference as we expected. Its still taking 80% of the time
of the cp.

Our first thoughts are that 80% is still better than 100% so thats good,
our second thoughts are that we need to test this on a more
representative set of changes to the database. Deleting the contents of
a table that has been changed over the lifetime of the database may be
the worst-case scenario as it might well touch many, many pages in the
database. We are certainly not expert enough to comment on this
assumption of how rows are distributed in the database and if anybody
would like to chip in, please do so.

Our intention now is to take a more recent and representative database,
run a days and a weeks set of database transactions through it which is
easy enough, though takes time, and see how that compares.

Thanks for the suggestion,

Rob,

Post by J Decker
Instead of cp, rsync might help it is able to send delta changes.
On Wed, May 4, 2016 at 10:55 AM, Rob Willett

On Wed, May 4, 2016 at 11:47 AM, Rob Willett

Post by Rob Willett
Scott,
Thats an interesting idea. Is there an option in SQLite to do this for
us,
or do we have to write a small shim in our app?
I like the idea of this as its simple and elegant.

Post by Rob Willett
Rob
This is going to become a bigger problem for us as the database will

Post by Scott Robison

Post by Rob Willett
only get bigger so any advice welcomed.

Perhaps, rather than backing up the live data, you create an append only
log of each and every query you send to the database. Should you need to
restore, you replay the log of statements. Or at the appointed
backup
time,
you replay the day's log of statements into another database. No need to
ever take the live database offline at the cost of slightly longer running
commands during the day to handle the append operation.
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

--
Scott Robison
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

J Decker

2016-05-05 16:42:51 UTC

Permalink

On Thu, May 5, 2016 at 9:38 AM, Rob Willett

was the file already in place with sending changes? Or is it a full
copy of the whole thing all the time? if the later... well cp is
gonna really be as good as it gets....

sent 564,081,269 bytes received 623,255 bytes 4,805,995.95 bytes/sec
total size is 6,067,933,184 speedup is 10.75
real 1m57.689s
user 2m4.947s
sys 0m8.952s
The —no-while-file is apparently needed as well as —inplace.
We checked the md5sums at the end
# md5sum tfl.sqlite.backup test_db t1
b5bd91cc9b49ee1f54a8a2d013005586 tfl.sqlite.backup
b5bd91cc9b49ee1f54a8a2d013005586 test_db
b5bd91cc9b49ee1f54a8a2d013005586 t1
So we can see that the file integrity has been preserved which is what we
expect but its always good to check. Yes we know that md5sum is not perfect
but for this purpose its fine. However the rsync vs cp time is not as big a
difference as we expected. Its still taking 80% of the time of the cp.
Our first thoughts are that 80% is still better than 100% so thats good, our
second thoughts are that we need to test this on a more representative set
of changes to the database. Deleting the contents of a table that has been
changed over the lifetime of the database may be the worst-case scenario as
it might well touch many, many pages in the database. We are certainly not
expert enough to comment on this assumption of how rows are distributed in
the database and if anybody would like to chip in, please do so.
Our intention now is to take a more recent and representative database, run
a days and a weeks set of database transactions through it which is easy
enough, though takes time, and see how that compares.
Thanks for the suggestion,
Rob,

Post by J Decker
Instead of cp, rsync might help it is able to send delta changes.
On Wed, May 4, 2016 at 10:55 AM, Rob Willett

On Wed, May 4, 2016 at 11:47 AM, Rob Willett

Post by Rob Willett
Rob
This is going to become a bigger problem for us as the database will

Post by Scott Robison

Post by Rob Willett
only get bigger so any advice welcomed.

_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

--
Scott Robison
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Rob Willett

2016-05-05 16:45:12 UTC

Permalink

Yes, realised after I sent the e-mail that I hadn’t said we had
created the destination file in which to run the command. You cannot
actually run rsync —inlace if the destination file doesn’t exist, we
found that out earlier ;) So it was a reasonable test though only one
test. We need to do more to see whats going on.

Top marks for spotting our deliberate omission :)

Rob

Post by J Decker
On Thu, May 5, 2016 at 9:38 AM, Rob Willett

was the file already in place with sending changes? Or is it a full
copy of the whole thing all the time? if the later... well cp is
gonna really be as good as it gets....

sent 564,081,269 bytes received 623,255 bytes 4,805,995.95
bytes/sec
total size is 6,067,933,184 speedup is 10.75
real 1m57.689s
user 2m4.947s
sys 0m8.952s
The —no-while-file is apparently needed as well as —inplace.
We checked the md5sums at the end
# md5sum tfl.sqlite.backup test_db t1
b5bd91cc9b49ee1f54a8a2d013005586 tfl.sqlite.backup
b5bd91cc9b49ee1f54a8a2d013005586 test_db
b5bd91cc9b49ee1f54a8a2d013005586 t1
So we can see that the file integrity has been preserved which is what we
expect but its always good to check. Yes we know that md5sum is not perfect
but for this purpose its fine. However the rsync vs cp time is not as big a
difference as we expected. Its still taking 80% of the time of the cp.
Our first thoughts are that 80% is still better than 100% so thats good, our
second thoughts are that we need to test this on a more
representative set
of changes to the database. Deleting the contents of a table that has been
changed over the lifetime of the database may be the worst-case scenario as
it might well touch many, many pages in the database. We are
certainly not
expert enough to comment on this assumption of how rows are
distributed in
the database and if anybody would like to chip in, please do so.
Our intention now is to take a more recent and representative
database, run
a days and a weeks set of database transactions through it which is easy
enough, though takes time, and see how that compares.
Thanks for the suggestion,
Rob,

Post by J Decker
Instead of cp, rsync might help it is able to send delta changes.
On Wed, May 4, 2016 at 10:55 AM, Rob Willett

Post by Rob Willett
Scott,
OK, We can see how to do this (I think). Our app is written in Perl and
we’d
just need to capture the command we write down. The only issue I
can
think
of is the prepare statement and making sure we capture the right SQL
command. W
We’ll dig into it and have a look,
Thanks for taking the time to reply.
Rob

On Wed, May 4, 2016 at 11:47 AM, Rob Willett

Post by Rob Willett
Scott,
Thats an interesting idea. Is there an option in SQLite to do this for
us,
or do we have to write a small shim in our app?
I like the idea of this as its simple and elegant.

It would require a little extra work on your part. Nothing built
into
the
system that would accomplish this directly. However, I've done similar
things and they don't involve a ton of overhead. You could use another
SQLite database as the append only log, or a simple text file.
I'm not aware of a free lunch solution, sadly.

Post by Rob Willett
Rob
This is going to become a bigger problem for us as the database will

Post by Scott Robison

Post by Rob Willett
only get bigger so any advice welcomed.

Perhaps, rather than backing up the live data, you create an
append
only
log of each and every query you send to the database. Should you
need
to
restore, you replay the log of statements. Or at the appointed
backup
time,
you replay the day's log of statements into another database. No
need
to
ever take the live database offline at the cost of slightly
longer
running
commands during the day to handle the append operation.
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

--
Scott Robison
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Dominique Devienne

2016-05-04 18:38:28 UTC

Permalink

Post by Scott Robison

Post by Rob Willett
This is going to become a bigger problem for us as the database will
only get bigger so any advice welcomed.

Perhaps, rather than backing up the live data, you create an append only
log of each and every query you send to the database. Should you need to
restore, you replay the log of statements.

That sounds like WAL to me, except done manually.

In fact, if there was a hook during checkpointing that allowed to grab
parts of WAL file,
and apply these chunks (aligned with transactions or savepoints) to another
DB,
you'd basically have online replication with minimal overhead. --DD

R Smith

2016-05-04 18:52:20 UTC

Permalink

Post by Dominique Devienne

Post by Scott Robison

Post by Rob Willett
This is going to become a bigger problem for us as the database will
only get bigger so any advice welcomed.

Perhaps, rather than backing up the live data, you create an append only
log of each and every query you send to the database. Should you need to
restore, you replay the log of statements.

That sounds like WAL to me, except done manually.
In fact, if there was a hook during checkpointing that allowed to grab
parts of WAL file,
and apply these chunks (aligned with transactions or savepoints) to another
DB,
you'd basically have online replication with minimal overhead. --DD

Quite correct, but isn't that what the RBU extension basically achieved,
only in a well-managed and safe way?
I haven't used it yet, but am planning to....

Scott Robison

2016-05-04 19:24:31 UTC

Permalink

Post by Dominique Devienne

Post by Scott Robison

Post by Rob Willett
This is going to become a bigger problem for us as the database will
only get bigger so any advice welcomed.

Perhaps, rather than backing up the live data, you create an append only
log of each and every query you send to the database. Should you need to
restore, you replay the log of statements.

That sounds like WAL to me, except done manually.

I'm talking about logging the source statements that eventually update the
database. WAL is a way to keep track of pages after the execution of
commands. It's logging text instead of database pages.

--
Scott Robison

Drago, William @ CSG - NARDA-MITEQ

2016-05-04 16:32:54 UTC

Permalink

Rob,

I have a continuously running application that simply writes the same data to two different databases; one in the primary location and the other in the backup location. This is obviously not a perfect solution (a man with two watches never really knows what time it is) but it's good enough for us and is simple.

--
Bill Drago
Staff Engineer
L3 Narda-MITEQ
435 Moreland Road
Hauppauge, NY 11788
631-272-5947 / ***@L-3COM.com

CONFIDENTIALITY, EXPORT CONTROL AND DISCLAIMER NOTE:This e-mail and any attachments are solely for the use of the addressee and may contain information that is privileged or confidential. Any disclosure, use or distribution of the information contained herein is prohibited. In the event this e-mail contains technical data within the definition of the International Traffic in Arms Regulations or Export Administration Regulations, it is subject to the export control laws of the U.S.Government. The recipient should check this e-mail and any attachments for the presence of viruses as L-3 does not accept any liability associated with the transmission of this e-mail. If you have received this communication in error, please notify the sender by reply e-mail and immediately delete this message and any attachments.

Continue reading on narkive:

Search results for '[sqlite] Fastest way to backup/copy database?' (Questions and Answers)

replies

What are the advantages of a computer database?

started 2009-05-02 18:22:10 UTC

programming & design

replies

Active directory backup and restore in windows server 2003?

started 2011-03-28 06:11:01 UTC

computer networking

replies

Why is a computer database better than a manual system?