[sqlite] Round-tripping SQLite back and forth between text representation.

Discussion:

Randall Smith

2018-07-10 00:52:34 UTC

I'm curious if there is some standard or normal way to convert a SQLite DB to a text representation, and then recreate the DB content from the text. Naively, this seems hard or impossible as a general problem, but perhaps I am missing something.

Thanks in advance for any advice or suggestions.

Randall.

J Decker

2018-07-10 00:53:51 UTC

Permalink

does it have to be text? There was serialization added to sqlite....
https://www.sqlite.org/c3ref/serialize.html

Post by Randall Smith
I'm curious if there is some standard or normal way to convert a SQLite DB
to a text representation, and then recreate the DB content from the text.
Naively, this seems hard or impossible as a general problem, but perhaps I
am missing something.
Thanks in advance for any advice or suggestions.
Randall.
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Keith Medcalf

2018-07-10 01:01:10 UTC

Permalink

.dump in the command line shell?

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

-----Original Message-----
From: sqlite-users [mailto:sqlite-users-
Sent: Monday, 9 July, 2018 18:53
Subject: [sqlite] Round-tripping SQLite back and forth between text
representation.
I'm curious if there is some standard or normal way to convert a
SQLite DB to a text representation, and then recreate the DB content
from the text. Naively, this seems hard or impossible as a general
problem, but perhaps I am missing something.
Thanks in advance for any advice or suggestions.
Randall.
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Simon Slavin

2018-07-10 01:05:49 UTC

Permalink

Randall Smith

2018-07-10 18:27:21 UTC

Permalink

Yep. It's done a lot, to convert a database to a different SQL engine, or rescue data from a corrupt database. See section 10 of

<https://sqlite.org/cli.html#converting_an_entire_database_to_an_ascii_text_file>

For more information read the documentation about the '.dump' and '.read' commands on the same page. Or ask here.

Simon.

Thanks, Simon, for the info.

One follow-up: Do you know if the dump output is "deterministic" over time? That is, if I diff two dumps taken at different times, will the unchanged material be in the same order and so on? Or is the ordering effectively random?

My underlying question is "can text-comparing two DB dumps be used to determine what has changed?"

Thanks again for any insights.

Randall Smith

R Smith

2018-07-10 19:10:24 UTC

Permalink

One follow-up: Do you know if the dump output is "deterministic" over
time? That is, if I diff two dumps taken at different times, will the
unchanged material be in the same order and so on? Or is the ordering
effectively random?
My underlying question is "can text-comparing two DB dumps be used to determine what has changed?"

I am not sure if it is 100% deterministic - it probably is, however, I
would like to point out that while parsing a dump (supposing it IS
deterministic) is possible, writing some code to check congruence
between two DBs at the business end of the SQLite API is significantly
better, much easier and always 100% deterministic.

Not only that, but the sqldiff command-line utility (download page) does
it already (though you may require a more specific result, but at a
minimum its a good start).

Is there perhaps a specific difficulty which makes you think that
parsing the dump would provide a better/easier insight into which data
changed?

Will Parsons

2018-07-11 00:12:50 UTC

Permalink

On Tuesday, 10 Jul 2018 2:27 PM -0400, Randall Smith wrote:
<snip>

Post by Randall Smith
My underlying question is "can text-comparing two DB dumps be used
to determine what has changed?"

I don't know if it will meet your needs, but I've written a script for
my own purposes to compare DB changes. Since it's fairly short, I
include it here.

--8<---------------cut here---------------start------------->8---
#!/bin/sh
# Compare two SQLite3 databases.
# If invoked under the name "tksql3diff", the diff is displayed graphically
# using tkdiff to display the differences.
#
# Last modified: 10-Jul-2018 Wm. Parsons

if [ $# -ne 2 ]
then
echo "Usage: `basename $0` <filename1> <filename2>"
exit
fi

if [ `basename $0` = "tksql3diff" ]
then
diff=tkdiff
else
diff=diff
fi

file1=$1
file2=$2

dump()
{
file=$1

# check that the file is readable
if [ ! -r $file ]
then
echo "cannot read $file" >&2
exit 1
fi

sql="select 'Application ID:'; pragma application_id;
select 'User version:'; pragma user_version;"
tables=`sqlite3 $file <<EOF
select name from sqlite_master where type='table' order by name;
EOF
`
for t in $tables
do
sql="$sql select '$t:'; select * from $t;"
done

echo "$sql" | sqlite3 $file > $file.dump
}

dump $file1
dump $file2

$diff $file1.dump $file2.dump

rm $file1.dump $file2.dump
--8<---------------cut here---------------end--------------->8---

Note: This *probably* doesn't play well if you have BLOB fields in
your database.

--
Will

Randall Smith

2018-07-11 17:01:24 UTC

Permalink

On 2018/07/11 Simon wrote:

I am not sure if it is 100% deterministic - it probably is, however, I would like to point out that while parsing a dump (supposing it IS deterministic) is possible, writing some code to check congruence between two DBs at the business end of the SQLite API is significantly better, much easier and always 100% deterministic.

Not only that, but the sqldiff command-line utility (download page) does it already (though you may require a more specific result, but at a minimum its a good start).

Is there perhaps a specific difficulty which makes you think that parsing the dump would provide a better/easier insight into which data changed?

Fundamental problems with SQLite or other binary representations are (a) the information represented is opaque unless one spends time and money creating bespoke tools to allow viewing and technical reviews of the content, and (b) there is no simple way to allow concurrent development of info by several people and to reconcile independent changes into a coherent whole ("merging"). These are both mission critical for a team effort of any size (even size=2!). The software industry has historically avoided these problems by storing everything in the form of text files, and has developed elaborate tools and procedures for viewing, reviewing, storing, and merging information in this form and as a result large teams can collaborate on a rapidly evolving body of digital information easily and well.

Binary file formats like SQLite, while having many compelling advantages, have a hard time penetrating into areas where multiple people need to collaborate on an evolving body of information because of the limitations described above. IMO this is an urgent problem and one that has not been solved very well for SQLite. I don't have the wherewithal to solve it generally, but I am trying to see if there are ways to bridge the gap between SQLite DBs and existing team-capable workflows built around text files.

My wishlist is:

(o) Allow humans to view the contents of a DB without custom tools.
(o) Have a way to see what has changed between V1 and V2 of a database, e.g., for a "change review."
(o) Have a way to merge two independent sets of database changes into a single result in an understandable way.
(o) Have a way to make changes (update, insert, delete) to the DB data in a pinch without specialized tools.

I'm thinking the dump approach you described previously has promise provided certain criteria are met. Interestingly, the text representation produced by dump is about the same size as the "normal" binary form, and it will compress to about 1/8 the size of the binary form. So it's not a bad archival format.

Randall.

Simon Slavin

2018-07-11 17:26:37 UTC

Permalink

Post by Randall Smith
(o) Allow humans to view the contents of a DB without custom tools.
(o) Have a way to see what has changed between V1 and V2 of a database, e.g., for a "change review."

SQL is based around Ted Codd's view of relational databases. One of the fundamentals of this view is that a table of rows has no inherent order. Rows of data in one table are like pebbles in a bag, not dots on a line. Or, if you prefer computing terms, they're a set, not an array.

However a text file does have an inherent order. So any tool that coverts a SQL database to a text file must do something arbitrary: pick an order. The conclusion is that if you're comparing two databases you need to compare them while they're databases, not when you're looking at them as text files.

So if your first point above comes down to "view the database as text" then the above two points conflict with one-another. You need to do that first point and the second point as two separate procedures, not do step 1 then use the output of that for step 2.

As a more direct answer to an earlier question, the ".dump" command of the CLI does dump the data in a predictable and consistent order. But it is not documented to do so. So a future version can change that.

Simon.

Richard Hipp

2018-07-11 17:41:22 UTC

Permalink

Post by Randall Smith
(o) Allow humans to view the contents of a DB without custom tools.

SQLite database file are binary. That is a necessity in any format
that needs to store binary data. On the other hand, the SQLite
database file format is carefully and fully documented
(https://www.sqlite.org/fileformat2.html) and there have been
multiple, independent implementations of readers and writers for that
file format. SQLite databases are one of only three formats (the
others being JSON and CSV) recommended by the US Library of Congress
for archival storage of datasets.
(https://www.sqlite.org/locrsf.html). The SQLite database library is
the second mostly widely deployed bit of software in the world -
second only to zlib - so the tools needed to read SQLite are probably
already available on your system. SQLite is baked into every Mac and
Windows machine. SQLite is not a thoroughly baked into Linux
machines, but it is still pretty common.

Text files are also opaque binaries in the sense that they are stored
using a binary encoding on a disk drive or SSD. They seem less opaque
because you have tools easily at hand (a filesystem and "cat") to
access them. The point is this: Tools to access SQLite are also
widely available. Perhaps not quite as widely as "cat", but nearly
so.

"Opaque" vs. "non-opaque" is not a binary property of data files. It
is a question of degree. A text file might seem less opaque than a
database, but that depends to some extent on the text that it
contains. Try reading the HTML for a typical website. Or trying
reading the XML that is at the core of a Word document or Power-Point
presentation. Those files are all text, but they seem pretty opaque
to me.

Post by Randall Smith
(o) Have a way to see what has changed between V1 and V2 of a database,
e.g., for a "change review."

The "sqldiff" utility program will do this for you. Just as with the
unix "diff" command, the "sqldiff" shows you (in human-readable form)
the difference between two SQLite database files. The output takes
the form of SQL statements that will transform the first file into the
second.

Post by Randall Smith
(o) Have a way to merge two independent sets of database changes into a
single result in an understandable way.

The sqldiff command will do this. If you have a baseline database B,
and two separate derivative databases D1 and D2, you can merge those
changes together by computing the differences in B->D1 and applying
those changes to D2. Or compute the differences from B->D2 and apply
those changes to D1. As with "patch" or "diff3", there is the
possibility of merge conflicts, but you a clean merge surprisingly
often.

Post by Randall Smith
(o) Have a way to make changes (update, insert, delete) to the DB data in a
pinch without specialized tools.

I guess it all comes down to how you define "specialized". At some
point, tools become sufficiently ubiquitous and common-place that they
cease to be specialized. The SQLite command-line shell may have
reached that threshold. If not, it is certainly close. SQLite is
certainly not obscure or esoteric. It comes installed by default on
just about every computer you can purchase today.

--
D. Richard Hipp
***@sqlite.org

dmp

2018-07-12 16:42:49 UTC

Permalink

Post by Randall Smith
(o) Allow humans to view the contents of a DB without custom tools.

If what is meant here is a generic tool that opens/views any particular
file format, db context here, then there are tools including
the generic db gui that I have been working on for years.

Post by Randall Smith
(o) Have a way to see what has changed between V1 and V2 of a database,
e.g., for a "change review."
(o) Have a way to merge two independent sets of database changes into
a single result in an understandable way.

This has already been answered, .dump diff and sqldiff. I use a dump
in my interface which I used with diff to compare changes in my
personal expense database. This was to insure changes introduced in work
on the interface were not screwing things up. Very helpful to insure
your not introducing bugs.

Post by Randall Smith
(o) Have a way to make changes (update, insert, delete) to the DB data

in a pinch without specialized tools.

My generic db gui will do all these. Once more it really is a plugin
framework so it is very easy to introduce your own code to extend its
behavior.

Seems parts of the wishlist could be provided more effectly by client/server
db rather than SQLite. My personal general expenses database is not
SQLite, but a client/server db. Just an old box back in the corner. Why,
because I use the same db server with my dad's, in his 90s, expenses which
we both can add, edit, search, review, and aggregate accounts at the
end of the year.

danap.

Richard Hipp

2018-07-12 17:18:42 UTC

Permalink

Post by dmp
I use a dump
in my interface which I used with diff to compare changes in my
personal expense database. This was to insure changes introduced in work
on the interface were not screwing things up. Very helpful to insure
your not introducing bugs.

I am glad that has been working for you. But there is a caveat: The
".dump" format can (and does) change slightly from one release of
SQLite to the next. So you are welcomed to continue using ".dump"
this way, but just be careful that you do not compare the .dump output
from two different versions of SQLite.

--
D. Richard Hipp
***@sqlite.org