Discussion:
[sqlite] Bulk Insert in Sqlite3
Niti Agarwal
2017-02-07 05:36:17 UTC
Permalink
Hi,
We are using Sqlite3 with Golang to do bulk insert.
Need to insert close to 10 Million records to sqlite3 in around 30 mins.

Currently I am saving 100 Records under one transaction with below settings:

PRAGMA synchronous = NORMAL;
PRAGMA journal_mode = WAL;
PRAGMA auto_vacuum = FULL;

I read about SQLITE_MAX_SQL_LENGTH, Is it better to tune this value?

Also, read about sql bind feature, but not very sure how to do in Golang?
Is there any better way to do bulk insert?

Thanks,
Niti
Simon Slavin
2017-02-07 06:26:02 UTC
Permalink
Post by Niti Agarwal
Need to insert close to 10 Million records to sqlite3 in around 30 mins.
This number of records requires so much space the temporary data will not fit inside a cache. Consider using a counter so that the transaction is ended and a new one begun every 1000 records. Or perhaps every 10000 records. Try both ways and see which is faster.
Post by Niti Agarwal
PRAGMA synchronous = NORMAL;
PRAGMA journal_mode = WAL;
PRAGMA auto_vacuum = FULL;
I read about SQLITE_MAX_SQL_LENGTH, Is it better to tune this value?
The value SQLITE_MAX_SQL_LENGTH is about the length in text of one SQL command. I doubt it has any impact on your problem. You should probably leave it as it is.

Simon.
Niti Agarwal
2017-02-07 06:56:26 UTC
Permalink
Thanks for your reply. The length matters as I am appending 100 rows at a
time in a sql statement. It is making very fast as compared to single sql
insert in For loop.
Copied the code below for reference. Here the list size is 100
Any better way to do this? Like I read about *bind*...not sure how I can do
it in Golang.



As shown below:

func StoreFileList(db *sql.DB, fileList []File) {
sql_PathInfo := `
INSERT OR IGNORE INTO path_info(Id,FilePath) VALUES`
sql_FileInfo := `
INSERT OR REPLACE INTO file_info(
PathId,
FileName,
FileSize,
IsDir,
IsExported,
Level,
ModTime
) VALUES `

valsPath := []interface{}{}
valsFile := []interface{}{}
for _, file := range fileList {
sql_PathInfo += "(?,?),"
sql_FileInfo += "((SELECT Id FROM path_info WHERE FilePath = ?),?, ?, ?, ?,
?, ?),"
valsPath = append(valsPath, nil, file.FilePath)
valsFile = append(valsFile, file.FilePath, file.FileName, file.FileSize,
file.IsDir, file.IsExported, file.Level, file.ModTime)
}

sql_PathInfo = sql_PathInfo[0 : len(sql_PathInfo)-1]
sql_FileInfo = sql_FileInfo[0 : len(sql_FileInfo)-1]

PathInfoStmt, err := db.Prepare(sql_PathInfo)
if err != nil {
panic(err)
}
fileInfoStmt, err := db.Prepare(sql_FileInfo)
if err != nil {
panic(err)
}
defer PathInfoStmt.Close()
defer fileInfoStmt.Close()

tx, err := db.Begin()
if err != nil {
panic(err)
}
_, err = tx.Stmt(PathInfoStmt).Exec(valsPath...)
_, err1 := tx.Stmt(fileInfoStmt).Exec(valsFile...)
if err != nil || err1 != nil {
if err != nil {
panic(err)
}
if err1 != nil {
panic(err1)
}
fmt.Println("doing rollback")
tx.Rollback()
} else {
tx.Commit()
}
}
Post by Simon Slavin
Post by Niti Agarwal
Need to insert close to 10 Million records to sqlite3 in around 30 mins.
This number of records requires so much space the temporary data will not
fit inside a cache. Consider using a counter so that the transaction is
ended and a new one begun every 1000 records. Or perhaps every 10000
records. Try both ways and see which is faster.
Post by Niti Agarwal
Currently I am saving 100 Records under one transaction with below
PRAGMA synchronous = NORMAL;
PRAGMA journal_mode = WAL;
PRAGMA auto_vacuum = FULL;
I read about SQLITE_MAX_SQL_LENGTH, Is it better to tune this value?
The value SQLITE_MAX_SQL_LENGTH is about the length in text of one SQL
command. I doubt it has any impact on your problem. You should probably
leave it as it is.
Simon.
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Simon Slavin
2017-02-07 07:11:36 UTC
Permalink
Post by Niti Agarwal
Thanks for your reply. The length matters as I am appending 100 rows at a
time in a sql statement. It is making very fast as compared to single sql
insert in For loop.
Copied the code below for reference. Here the list size is 100
Any better way to do this? Like I read about *bind*...not sure how I can do
it in Golang.
Okay. By using an INSERT command with lots of value sets you are doing things more efficiently than I thought. Each INSERT is its own transaction so you are doing 100 INSERTs per transaction.

I am not familiar with GoLang. Can someone say if it’s appropriate to use the two functions

PathInfoStmt, err := db.Prepare(sql_PathInfo)
err = tx.Stmt(PathInfoStmt).Exec(valsPath…)

like that ? I would expect Prepare to go with Step instead but I could understand if the library being used makes it okay.

Simon.
Rowan Worth
2017-02-07 07:35:36 UTC
Permalink
Post by Simon Slavin
Post by Niti Agarwal
Thanks for your reply. The length matters as I am appending 100 rows at a
time in a sql statement. It is making very fast as compared to single sql
insert in For loop.
Copied the code below for reference. Here the list size is 100
Any better way to do this? Like I read about *bind*...not sure how I can
do
Post by Niti Agarwal
it in Golang.
Okay. By using an INSERT command with lots of value sets you are doing
things more efficiently than I thought. Each INSERT is its own transaction
so you are doing 100 INSERTs per transaction.
It would a lot simpler though to move the db.Begin() outside the for loop
and execute multiple INSERT statements within the loop.
Post by Simon Slavin
I am not familiar with GoLang. Can someone say if it’s appropriate to use
the two functions
PathInfoStmt, err := db.Prepare(sql_PathInfo)
err = tx.Stmt(PathInfoStmt).Exec(valsPath…)
like that ? I would expect Prepare to go with Step instead but I could
understand if the library being used makes it okay.
Yes, that's ok. These are not sqlite specific bindings, go takes a ODBC
like approach where a standard interface[1] is used to connect to various
database engines.

[1] https://golang.org/pkg/database/sql/

Exec() is designed for INSERT/UPDATEs where you're not asking the DB for
information. The read equivalent is Query() which returns a sql.Rows
structure that you iterate over using Rows.Next().

-Rowan
Rowan Worth
2017-02-07 06:42:13 UTC
Permalink
Hi Niti,

There's on need to build a giant SQL string; a transaction can span
multiple statements. To bind in golang place a ? within your SQL query and
provide the values as additional arguments to the Exec/Query function. eg,
after using db.Begin() to create a transaction

tx, err := db.Begin()
if err != nil {
return err
}
_, err = tx.Exec("INSERT INTO table1 VALUES (?, ?, ?)", column1,
column2, column3)
if err != nil {
tx.Rollback()
return err
}
err = tx.Commit()
if err != nil {
return err
}

Note that golang's sql transaction abstraction doesn't map perfectly to
sqlite. Golang does not allow any further operations on the Tx following a
call to Tx.Commit() or Tx.Rollback(). But in sqlite a transaction remains
open if COMMIT fails because the database is locked. If you want to be able
to retry the COMMIT in this situation you must manually manage transactions
via db.Exec("BEGIN")/db.Exec("COMMIT") instead of db.Begin()/tx.Commit().

-Rowan
Post by Niti Agarwal
Hi,
We are using Sqlite3 with Golang to do bulk insert.
Need to insert close to 10 Million records to sqlite3 in around 30 mins.
PRAGMA synchronous = NORMAL;
PRAGMA journal_mode = WAL;
PRAGMA auto_vacuum = FULL;
I read about SQLITE_MAX_SQL_LENGTH, Is it better to tune this value?
Also, read about sql bind feature, but not very sure how to do in Golang?
Is there any better way to do bulk insert?
Thanks,
Niti
_______________________________________________
sqlite-users mailing list
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
James K. Lowden
2017-02-07 16:56:18 UTC
Permalink
On Tue, 7 Feb 2017 14:42:13 +0800
Post by Rowan Worth
Note that golang's sql transaction abstraction doesn't map perfectly
to sqlite. Golang does not allow any further operations on the Tx
following a call to Tx.Commit() or Tx.Rollback(). But in sqlite a
transaction remains open if COMMIT fails because the database is
locked.
That suggests a failure of imagination in the Go SQLite driver.

Most DBMSs return only fatal errors for COMMIT (or success, of
course). There is no such thing as "try again".

SQLite does not behave that way by default, but can be made to do so.
If a busy handler is installed that never returns zero, the only time
SQLite returns SQLITE_BUSY is when it determines the transaction cannot
be completed i.e., that there's a deadlock.

ISTM the Go driver should supply a default busy handler that reduces
COMMIT errors to fatal ones. It might expose to the application knobs
to change the retry interval. Of course, if the application can
replace the busy handler, it also has the option of removing it, and
restoring the default behavior.

--jkl
Warren Young
2017-02-08 16:35:08 UTC
Permalink
Post by Niti Agarwal
I read about SQLITE_MAX_SQL_LENGTH,
If this is why you’re making many transactions, there’s no requirement that all of the SQL that’s part of a single transaction be in a single SQL string given to the DB. You can execute a bare “BEGIN TRANSACTION” SQL statement, do your 10 million separate insert statements, and then execute a COMMIT statement, at which time all of the rows will be visible to other users of that DB.

This is simpler, still fast, and doesn’t require that you do all that manual batching-up, or worry about internal SQLite buffer sizes.
Post by Niti Agarwal
Also, read about sql bind feature, but not very sure how to do in Golang?
There are a bunch of SQLite wrappers for Go:

http://go-lang.cat-v.org/library-bindings

Which one are you using?

The first one on that list appears to be MIA, but the second one includes this module, which includes a Bind() function:

https://github.com/kuroneko/gosqlite3/blob/master/query_parameter.go

Seems straightforward to me.

Loading...