Discussion:
[sqlite] json question
rene
2018-03-27 06:46:27 UTC
Permalink
Hello,

I have a big json string in an sqlite memory database.
The json string contains also arrays like:

"arrayname" : [
{
"name" : "unique_name",
"id" : 1,
...
},
..
]

Currently I create my own index on "name" and "id" and get the values with:

json_extract(db.json,'$.arrayname[index].name')
json_extract(db.json,'$.arrayname[index].id')
json_extract(db.json,'$.arrayname[index]....')

Question 1: Is it possible to create an index on "name" and "id" inside
sqlite

Question 2: Can I speed up the reading of the values.


Thank you
René




--
Sent from: http://sqlite.1065341.n5.nabble.com/
Jens Alfke
2018-03-27 16:06:20 UTC
Permalink
Post by rene
Question 1: Is it possible to create an index on "name" and "id" inside
sqlite
Not really, because SQLite indexes only support values with a 1::1 relationship to table rows; in other words, the index can have only one entry per table row. But your rows can have any number of name/id pairs.
Post by rene
Perhaps use FTS4 or FTS5 with a custom tokenizer that understands JSON?
This has promise, since the FTS module does support indexing multiple values (words) per table row. I haven’t tried it yet, though.

The approach I’ve used in the past is to create a derived table containing every name/value pair from the main table, and indexing that. The main problem with that is you have to keep that table up to date as changes are made to the main table. If I were going to re-implement this today, I’d probably use triggers for that.

—Jens
Warren Young
2018-03-28 00:37:35 UTC
Permalink
Post by rene
"arrayname" : [
{
"name" : "unique_name",
"id" : 1,
...
},
..
]
That looks like

CREATE TABLE arrayname (
Id INTEGER NOT NULL PRIMARY KEY,
name TEXT NOT NULL,
etc INTEGER...
);

to me.

The point of JSON support in SQLite, IMHO, is that it allows for hierarchical data structures, not something that is easy to do in a relational table-based DBMS like SQLite otherwise.

But, you’ve just got tables of flat records, exactly what SQLite normally stores, so why not just leverage the power it already gives you?

In other words, do the JSON parsing in the process that populates the SQLite DB, don’t make SQLite do the JSON parsing repeatedly for you.
Post by rene
Question 1: Is it possible to create an index on "name" and "id" inside
sqlite
It is if you normalize the data on ingest as I suggest above.
Post by rene
Question 2: Can I speed up the reading of the values.
Normalized, parsed, and indexed data should query far faster than repeatedly-parsed JSON strings.
rene
2018-03-28 09:28:12 UTC
Permalink
Hello,

Thank you for the suggestions so far.
My main goal is currently to reduce the startup time.

I have done some tests and compared the results to a pure C++ parser
(nlohmann::json).

1. parsing time is nearly the same between "nlohmann::json::parse(..)" and
"insert into data values(json(..))"

2. Creation of indexes is more then parsing time in both C++ and sqlite.
May be this can be improved in sqlite. I have for each array an sqlite table
created (arraname) and filled with:

insert into arrayname select
key,json_extract(value,'$.name'),json_extract(value,'$.id') from data,
json_each(data.json,'$.arrayname')

Necessary indexes are created after the insert statement.
The database is ":memory:" and all commands are inside an transaction.
Are there other options/improvements possible?


Thank you
Rene




--
Sent from: http://sqlite.1065341.n5.nabble.com/
Warren Young
2018-03-28 10:09:20 UTC
Permalink
Post by rene
My main goal is currently to reduce the startup time.
That’s not what I understood from your first post. Are you moving the goalposts, or is indexed read of stored data not actually the primary problem?
Post by rene
1. parsing time is nearly the same between "nlohmann::json::parse(..)" and
"insert into data values(json(..))”
That’s good to know, but not too surprising. It just means one particular C parser and one particular C++ parser happen to be about the same speed.
Post by rene
insert into arrayname select
key,json_extract(value,'$.name'),json_extract(value,'$.id') from data,
json_each(data.json,'$.arrayname’)
I’m suggesting that you don’t use SQLite’s JSON features at all. Use this other C++ JSON parser you have, then construct INSERT queries for each row from the parsed JSON data.

Also, be sure to use prepared queries and bound parameters for something like this. Don’t rebuild the SQL query each time:

https://www3.sqlite.org/c3ref/bind_blob.html

Not only will it be faster, it’s also safer.
Post by rene
Necessary indexes are created after the insert statement.
Good.
Post by rene
The database is ":memory:" and all commands are inside an transaction.
Also good.
rene
2018-03-28 11:51:11 UTC
Permalink
Hello,
author="Warren Young"
Post by rene
My main goal is currently to reduce the startup time.
That’s not what I understood from your first post. Are you moving the
goalposts, or is indexed read of >stored data not actually the primary
problem?

Both are related. I need a fast startup time and the access should be
"reasonable" fast.
Post by rene
1. parsing time is nearly the same between "nlohmann::json::parse(..)" and
"insert into data values(json(..))”
That’s good to know, but not too surprising. It just means one particular
C parser and one particular C++ >parser happen to be about the same speed.

Yup.
Post by rene
insert into arrayname select
key,json_extract(value,'$.name'),json_extract(value,'$.id') from data,
json_each(data.json,'$.arrayname’)
I’m suggesting that you don’t use SQLite’s JSON features at all. Use this
other C++ JSON parser you >have, then construct INSERT queries for each row
from the parsed JSON data.

Oh no. Either I use the C++ parser or I use sqlite. The C++ parser has no
knobs to tune.
So it is a fixed first option. Now I'm try to be faster with sqlite :) With
this option I have more options in selecting data.
Also, be sure to use prepared queries and bound parameters for something
https://www3.sqlite.org/c3ref/bind_blob.html
Not only will it be faster, it’s also safer.
I have done this.
Post by rene
Necessary indexes are created after the insert statement.
Good.
Post by rene
The database is ":memory:" and all commands are inside an transaction.
Also good.
I have done some more experiments with json_tree(). But these are not
faster.
It would be really cool if the json_extract() function could support a
wildcard as array index and return all values p.e.

json_extract(data.json,'$.arrayname[*].name')

Oh, regarding json_extract(). If I do a "explain query .." on it, it always
do a table scan.
Is there a way to directly get values from json? At least there are some
id's returned from json_tree().
May be a function with this id as argument.


Thank you
Rene




--
Sent from: http://sqlite.1065341.n5.nabble.com/
Warren Young
2018-03-28 12:14:27 UTC
Permalink
Post by rene
Post by Warren Young
I’m suggesting that you don’t use SQLite’s JSON features at all.
Oh no. Either I use the C++ parser or I use sqlite.
I’m suggesting that you do both: Parse the JSON with C++, then insert it in normalized SQL form into SQLite, then query the data with pure ANSI SQL-92.

If you absolutely need JSON querying at the SQLite level, you can split the difference:

CREATE TABLE arrayname (
id INTEGER NOT NULL PRIMARY KEY,
name TEXT NOT NULL,
json TEXT NOT NULL DEFAULT ‘'
);

Parse the id and name column values from the JSON text, then insert the JSON text and the parsed values into the table. Now you can key off off the id and/or name, so you needn’t do a full table scan even when querying fields of a specific JSON object, if you can address it by ID and/or name.
Post by rene
The C++ parser has no knobs to tune.
Why do you need knobs? What specific benefit do you get from these knobs?
Post by rene
With this option I have more options in selecting data.
That sounds like a distinction without a difference. The SQL query language is very powerful as-is, without adding SQLite’s JSON querying features.

I’m not against SQLite’s JSON feature, for what it’s worth, I’m just not seeing what it actually buys you.

If you think I’m wrong, show some example code giving a query you can do in JSON that you cannot do efficiently in plain SQL. I suspect if you do this, someone will quickly come along and show you how to get the same speed or faster in plain old ANSI SQL.

Also show your SQL schema and a more representative sample of your JSON input data. It may simply be that you have an understanding of both that I do not because I’m left to guess. Getting concrete might help a lot.
Post by rene
It would be really cool if the json_extract() function could support a
wildcard as array index and return all values p.e.
json_extract(data.json,'$.arrayname[*].name’)
If that really is the way you need to query your data, then I think you want something made from the start to be a JSON DBMS, not this feature of SQLite’s.
Post by rene
Oh, regarding json_extract(). If I do a "explain query .." on it, it always
do a table scan.
I’d have been amazed if it had behaved any differently.

One would design SQLite differently if arbitrary JSON were a first-class data type. Since the binary data format didn’t change when this feature was added, we can infer that arbitrary JSON is not a first-class data type; they’re just strings which have to be repeatedly re-parsed to get anything useful done.
rene
2018-03-28 12:34:48 UTC
Permalink
Hello Warren,
author="Warren Young"
Post by rene
Post by Warren Young
I’m suggesting that you don’t use SQLite’s JSON features at all.
Oh no. Either I use the C++ parser or I use sqlite.
I’m suggesting that you do both: Parse the JSON with C++, then insert it in
normalized SQL form into >SQLite, then query the data with pure ANSI SQL-92.
CREATE TABLE arrayname (
id INTEGER NOT NULL PRIMARY KEY,
name TEXT NOT NULL,
json TEXT NOT NULL DEFAULT ‘'
);
Parse the id and name column values from the JSON text, then insert the
JSON text and the parsed >values into the table. Now you can key off off
the id and/or name, so you needn’t do a full table scan >even when querying
fields of a specific JSON object, if you can address it by ID and/or name.

Yes, you are right. And I have an offline translated sqlite database for
testing. It's fast.
And I would love to use it. I'm not inclined to use json.
But people here have decided to use json. And I'm stuck with it.

I was only trying to get faster then the already used C++ parser. One way
was to use sqlite's json parser.
When I could find a way to convert all the json in sql tables in time, fine.
Post by rene
The C++ parser has no knobs to tune.
Why do you need knobs? What specific benefit do you get from these knobs?
To make is faster in "my" use case :)


Thank you for your answer.
Rene



--
Sent from: http://sqlite.1065341.n5.nabble.com/
R Smith
2018-03-28 17:42:23 UTC
Permalink
Post by rene
And I would love to use it. I'm not inclined to use json.
But people here have decided to use json. And I'm stuck with it.
Ah, a shotgun-marriage to a wrongful ideal.  Are you working at
Micro$oft by any chance? :)
Post by rene
I was only trying to get faster then the already used C++ parser. One way
was to use sqlite's json parser.
I can vouch to you that all of the SQLite code is made with great
attention to efficiency, but even so, there is no real evidence or
reason why it should be any faster than whatever other native-code JSON
parser you are using.  Even with the knobs, I would be surprised if you
find a knob setting that significantly increases the SQLite parser speed
over what you have. (And if you do, we'd like to know please!)

I'm sorry to hear you are stuck with it. Perhaps your JSON data has a
definitive or specific format that is always the same?  Perhaps the
fields are always used? If so you can write a better parser than anyone
else so long as those rules are adhered to.

Also, you may not be able to NOT use the JSON data, but are you allowed
to add other columns (perhaps indexed ones) that could at least make
some queries faster? Short of that, you have no real hope of more speed.

Good luck!
Ryan
Jens Alfke
2018-03-28 19:57:40 UTC
Permalink
Post by rene
But people here have decided to use json. And I'm stuck with it.
I was only trying to get faster then the already used C++ parser. One way
was to use sqlite's json parser.
That's not going to help. Inserting the JSON into a SQL database, and then using SQL to query the database and parse the JSON it finds there, is never going to be as fast as simply parsing the JSON directly.

If you want your operation to run as fast as possible, use a C/C++ JSON parser and avoid allocating memory. I've done a lot of JSON work over the years, and I find that what takes most of the time isn't the actual parsing, it's building an object tree in memory of the parsed data. So try to avoid that; in HTML terminology, use a SAX (streaming) parser instead of a DOM parser.

—Jens
Keith Medcalf
2018-03-28 01:26:48 UTC
Permalink
Post by Warren Young
The point of JSON support in SQLite, IMHO, is that it allows for
hierarchical data structures, not something that is easy to do in a
relational table-based DBMS like SQLite otherwise.
Actually, it is very simple and straightforward to implement hierarchical structures using Relational Database Models. What is difficult, however, is implementing the simple and straightforward navigation of a hierarchical database structure -- well, actually it is not really difficult, just very inefficient. Particularly in the case of some network-extended hierarchical database models.

It is a case of choosing the right tool for the job -- the choice of using a screwdriver or a hammer depends on whether the job is to drive a screw or drive a nail.

It is interesting to note that it is fairly straightforward and efficient to implement a Relational model over top of a hierarchical database (in fact, there used to be many of these implementations back when RDBMS was a newfangled idea), just as one can use a hammer to drive in a screw. Using the screwdriver to drive the nail is somewhat more inefficient (though you can do it if you try hard enough).

NB: There are "Relational" database systems where supported datatypes can be arrays, or even arrays of structures or arrays of arrays of ... . Not many though. NOMAD comes to mind, but that was a long time ago ...

https://en.wikipedia.org/wiki/Nomad_software

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.
Warren Young
2018-03-28 02:32:25 UTC
Permalink
Post by Keith Medcalf
Post by Warren Young
The point of JSON support in SQLite, IMHO, is that it allows for
hierarchical data structures, not something that is easy to do in a
relational table-based DBMS like SQLite otherwise.
Actually, it is very simple and straightforward to implement hierarchical structures using Relational Database Models.
I knew someone was going to pick that nit.

I was not thinking of regular hierarchical data structures with that comment, where you can use standard data normalization techniques to encode the hierarchy.[1] E.g. A Person may have multiple Addresses, and each Address may have multiple Phone Numbers, etc. That’s a hierarchical data structure, but not the sort I mean here.

I was thinking instead about arbitrary, irregular hierarchies, such as the sort of thing that XML and its dialects are good at expressing, and which can then be queried much more readily queried with XPath than with SQL.

If there are straightforward rules for transforming a given hierarchical data structure into flat tables with the hierarchy expressed as table relationships, then from my prior post, you can correctly guess that I recommend that you do that when using SQLite rather than use its JSON feature for this purpose.


[1]: https://en.wikipedia.org/wiki/Database_normalization
Ian Zimmerman
2018-03-28 03:22:00 UTC
Permalink
Post by Keith Medcalf
Actually, it is very simple and straightforward to implement
hierarchical structures using Relational Database Models.
But not recursive structures (or to be more precise, recursive queries)
which are the next very natural step.

You can have a "parent" database and you can use it to answer queries
like "all X, Y where X is a parent of Y", "all X, Y where X is a
grandparent of Y", "all X, Y where X is a great-grandparent of Y" etc.
All that with a single table. But no amount of SQL mastery will allow
you to answer "all X, Y where X is an ancestor of Y".
--
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.
J Decker
2018-03-28 23:31:19 UTC
Permalink
Post by Ian Zimmerman
Post by Keith Medcalf
Actually, it is very simple and straightforward to implement
hierarchical structures using Relational Database Models.
But not recursive structures (or to be more precise, recursive queries)
which are the next very natural step.
You can have a "parent" database and you can use it to answer queries
like "all X, Y where X is a parent of Y", "all X, Y where X is a
grandparent of Y", "all X, Y where X is a great-grandparent of Y" etc.
All that with a single table. But no amount of SQL mastery will allow
you to answer "all X, Y where X is an ancestor of Y".
that's where CTE queries come in... something like....

create table jsonMap(node_id,parent_id,fieldName, value );

WITH temp as(
SELECT * FROM jsonMap x
WHERE parent_id=X

UNION ALL

SELECT om.* FROM jsonMap om
INNER JOIN temp x ON om.parent_id = x.node_id
)
SELECT * FROM temp
Post by Ian Zimmerman
--
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Keith Medcalf
2018-03-28 04:04:20 UTC
Permalink
Became part of the SQL standard in 1999.
Available in SQLite3 since 2014-02-03 (3.8.3).

https://www.sqlite.org/lang_with.html

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.
-----Original Message-----
From: sqlite-users [mailto:sqlite-users-
Sent: Tuesday, 27 March, 2018 21:22
Subject: Re: [sqlite] json question
Post by Keith Medcalf
Actually, it is very simple and straightforward to implement
hierarchical structures using Relational Database Models.
But not recursive structures (or to be more precise, recursive
queries)
which are the next very natural step.
You can have a "parent" database and you can use it to answer queries
like "all X, Y where X is a parent of Y", "all X, Y where X is a
grandparent of Y", "all X, Y where X is a great-grandparent of Y"
etc.
All that with a single table. But no amount of SQL mastery will
allow
you to answer "all X, Y where X is an ancestor of Y".
--
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Continue reading on narkive:
Loading...