Windows I/O (was: Initial read speed greater than subsequent)

Thanks. More RAM would clearly be helpful - but first I need a bigger
machine that can take it. For some reason - the "home" line of PC is
typically capped at 16GB or so. I'll Need more of a workstation to go
higher and experiment with the settings you suggested.

Post by Keith Medcalf
Windows is really atrociously bad at I/O. Windows has the same basic
model of how to perform I/O as a 6 year-old. Scratch that, the six year
old could probably understand I/O better than whoever wrote/designed the
crap in Windows that passes for I/O routines.
Anyway, make sure that you have all the fanciful magical features turned
OFF (they make things slower, not faster). That is all the various
"SpeedBooster" crap and so forth that Microsoft crappifies their OS with to
give that "gee wiz" wonderful warm and fuzzy feeling to the mass consumer
market.
Second, make sure you have turned off "Large System Cache". Force Windows
to forgo the magic, forgo the dreadful cache design, and do I/O properly.
Disable "Magical Virtual Machine Resizing"-- set a fixed pagefile size --
or better yet get sufficient RAM and disable swapping altogether -- it is
pure bull droppings that you need a pagefile that is a percentage of RAM
size. If it works with 4GB of RAM and a 4GB swapfile, then it will work
better with 8 GB of RAM and no pagefile.
Then increase the IOPageLockLimit to something reasonable.
And if your DASD driver supports it, enable block-level I/O optimization and/or caching.
---
() ascii ribbon campaign against html e-mail
/\ www.asciiribbon.org
_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Keith Medcalf

2012-07-14 04:16:02 UTC

I know the newer versions of Windows are fantastically bloated (and slower every version), but what are you running that uses more than 16 GB of committed memory?

Post by Udi Karni
Thanks. More RAM would clearly be helpful - but first I need a bigger
machine that can take it. For some reason - the "home" line of PC is
typically capped at 16GB or so. I'll Need more of a workstation to go
higher and experiment with the settings you suggested.

---
() ascii ribbon campaign against html e-mail
/\ www.asciiribbon.org

Udi Karni

2012-07-14 04:40:19 UTC

Experimenting with Data Warehouse - which should really be run on a more
"mainstream" DB. Sqlite was supposed to be just for piloting and testing -
but it's such an incredible little database engine - it's hard to let it go
- so I try big things on it just for kicks - delaying the inevitable.

It very easily handles billion row/100GB tables - multi-table joins, etc. -
it just chugs for a while because it's single threaded, and will gobble up
memory and swap - but it gets the job done.

It's quite amazing for a DB engine that's not even 1 MB.

While it's happiest when it can do all the work in memory versus disk - it
reads fairly quickly from disk the first time around - but not subsequently
- which is why I posed the question.

For now - my workaround is to attach and detach for every SQL statement -
but a better solution is probably to get a server with big RAM and tune
some of those OS settings - and ultimately, maybe MySQL?

Post by Keith Medcalf
I know the newer versions of Windows are fantastically bloated (and slower
every version), but what are you running that uses more than 16 GB of
committed memory?

Post by Keith Medcalf
Windows is really atrociously bad at I/O. Windows has the same basic
model of how to perform I/O as a 6 year-old. Scratch that, the six

year

Post by Keith Medcalf
old could probably understand I/O better than whoever wrote/designed

the

Post by Keith Medcalf
crap in Windows that passes for I/O routines.
Anyway, make sure that you have all the fanciful magical features

turned

Post by Keith Medcalf
OFF (they make things slower, not faster). That is all the various
"SpeedBooster" crap and so forth that Microsoft crappifies their OS

with to

Post by Keith Medcalf
give that "gee wiz" wonderful warm and fuzzy feeling to the mass

consumer

Post by Keith Medcalf
market.
Second, make sure you have turned off "Large System Cache". Force

Windows

Post by Keith Medcalf
to forgo the magic, forgo the dreadful cache design, and do I/O

properly.

Post by Keith Medcalf
Disable "Magical Virtual Machine Resizing"-- set a fixed pagefile size

Post by Keith Medcalf
or better yet get sufficient RAM and disable swapping altogether -- it

Post by Keith Medcalf
pure bull droppings that you need a pagefile that is a percentage of

RAM

Post by Keith Medcalf
size. If it works with 4GB of RAM and a 4GB swapfile, then it will

work

Post by Keith Medcalf
better with 8 GB of RAM and no pagefile.
Then increase the IOPageLockLimit to something reasonable.
And if your DASD driver supports it, enable block-level I/O

optimization

Post by Keith Medcalf
and/or caching.

---
() ascii ribbon campaign against html e-mail
/\ www.asciiribbon.org
_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Kees Nuyt

2012-07-14 11:28:34 UTC

Post by Udi Karni
It very easily handles billion row/100GB tables - multi-table joins, etc. -
it just chugs for a while because it's single threaded, and will gobble up
memory and swap - but it gets the job done.

If SQLite memory usage causes your system to swap, your sqlite cache
might be too large.

If you think you need a large cache, a good value to start with would be
30% to 40% of physical memory, divided by the database page_size.
That leaves the operating system some real memory for file system
buffers. Benchmark using that value, then benchmark with half and double
values for cache_size.

http://www.sqlite.org/pragma.html#pragma_cache_size
http://www.sqlite.org/pragma.html#pragma_default_cache_size
http://www.sqlite.org/pragma.html#pragma_page_size

Note: The PRAGMA [default_]cache_size=n; is expressed as number of
database pages, not bytes.

--
Regards,

Kees Nuyt

Simon Slavin

2012-07-14 12:58:33 UTC

... though you can, of course, do your SQLite call in a second thread and proceed as normal on your main thread. Still single-threaded but it won't block. I've noticed a lot of web pages which build the page, then fill in the contents of a table later.

Post by Udi Karni
It's quite amazing for a DB engine that's not even 1 MB.

It's kept small partly because it's used in tiny single-core embedded systems: phone handsets, handheld control devices, and machine controllers. I even found a SQLite header in the firmware from my TV recorder. It has to run under tiny amounts of memory on simple hardware with little or no caching. What's interesting is that somehow a SQL engine designed for hand-sized devices is so good on standard desktop computers that many people use it.

Post by Udi Karni
While it's happiest when it can do all the work in memory versus disk - it
reads fairly quickly from disk the first time around - but not subsequently
- which is why I posed the question.

This characteristic is something to do with your hardware or OS, not something inherent in SQLite. I bet if you tried the same thing on a Mac or Linux you wouldn't get the same behaviour.

I seem to remember that Windows caches files with certain extensions specially, but I can't find any elucidation on the web.

Simon.

Udi Karni

2012-07-14 13:31:51 UTC

Thank you all for your advice. I use the standard Shell compiled for 64-bit
Windows - so it's single threaded - and the only way to multi-thread is to
run 2 concurrent processes on 2 concurrent Sqlite DBs - assuming there
would be no conflict with the master tables - since they are read only -
I'll need to test that.

Otherwise - seems like the options are -

(1) Windows compile optimizations
(2) OS optimizations
(3) Try Linux which is typically zippier
(4) Get enough RAM to contain all the data and swap

I completely understand why Sqlite needs to be "lite" - because it's meant
for small devices running in RAM. It's just so clever - nothing to install
- no threads, etc. - it's tempting to use on the desktop for conventional
DBs - where you really need a parallel engine if you want to cut through
large amounts of data quickly.

In light of that - is it completely out of line to consider "light
parallelization" for Sqlite? Even just for the simplest SELECTs (SELECT
COUNT (*) WHERE AGE = NNN) ? Have Sqlite spawn 2 processes - 1 counting
the 1st half of the blocks of a table - the second counting the 2nd half?
Limited only to reads where there are no locking issues? Even capped at 2
threads? Only for tables greater than xGB so as not to bother with the tiny
ones? Would that introduce bloat in the code and detract from Sqlite's
original mission?

Thanks.

Post by Udi Karni
It very easily handles billion row/100GB tables - multi-table joins,

etc. -

Post by Udi Karni
it just chugs for a while because it's single threaded, and will gobble

Post by Udi Karni
memory and swap - but it gets the job done.

... though you can, of course, do your SQLite call in a second thread and
proceed as normal on your main thread. Still single-threaded but it won't
block. I've noticed a lot of web pages which build the page, then fill in
the contents of a table later.

Post by Udi Karni
It's quite amazing for a DB engine that's not even 1 MB.

It's kept small partly because it's used in tiny single-core embedded
systems: phone handsets, handheld control devices, and machine controllers.
I even found a SQLite header in the firmware from my TV recorder. It has
to run under tiny amounts of memory on simple hardware with little or no
caching. What's interesting is that somehow a SQL engine designed for
hand-sized devices is so good on standard desktop computers that many
people use it.

Post by Udi Karni
While it's happiest when it can do all the work in memory versus disk -

Post by Udi Karni
reads fairly quickly from disk the first time around - but not

subsequently

Post by Udi Karni
- which is why I posed the question.

This characteristic is something to do with your hardware or OS, not
something inherent in SQLite. I bet if you tried the same thing on a Mac
or Linux you wouldn't get the same behaviour.
I seem to remember that Windows caches files with certain extensions
specially, but I can't find any elucidation on the web.
Simon.
_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Simon Slavin

2012-07-14 13:51:25 UTC

Post by Udi Karni
(4) Get enough RAM to contain all the data and swap

This may be the best solution to the problem but it's disappointing. You really shouldn't need to do this. Computers and caching algorithms should be doing better to help you.

Post by Udi Karni
In light of that - is it completely out of line to consider "light
parallelization" for Sqlite? Even just for the simplest SELECTs (SELECT
COUNT (*) WHERE AGE = NNN) ? Have Sqlite spawn 2 processes - 1 counting
the 1st half of the blocks of a table - the second counting the 2nd half?
Limited only to reads where there are no locking issues? Even capped at 2
threads? Only for tables greater than xGB so as not to bother with the tiny
ones? Would that introduce bloat in the code and detract from Sqlite's
original mission?

Parallelization is of great advantage when the problem is processing: maths, pattern-searching, and processing of lots of data already in memory. But SQLite is very efficient at doing this, with a very simple very searchable file format. I think your bottleneck here isn't processing, it's I/O, just as you put in the 'Subject' header. I bet your limitation is in your bus width, throughput, or low-level file handling. If you implement parallelization the way you describe, the result will just be the two commands constantly fighting over access to your datastore -- back to bandwidth and throughput again. Just to give you an idea, in normal setups the normal bottleneck for SQLite speed is the rotational speed of a hard disk.

Simon.

Udi Karni

2012-07-14 14:12:04 UTC

You are right. Disk IO - even with SSD - is typically the bottleneck.
Running multiple threads would only make sense if all the data is in RAM -
:memory: DB or a giant RAM drive (remains to be tested of course and
pricey).

However - I noticed that when fully copying a master table into a :memory:
DB (and on my small machine this means no greater than 4-5GB - so not a
very conlusive test) - simple SELECT COUNT WHEREs go through the data at
about 250 MB/sec. IO is now not a factor anymore, and I doubt I am hitting
bus speed, so I suspect that this is truly a case of the CPU maxing out -
reading the pages from RAM and applying the WHERE criteria on the rows.
It's very linear. A 100GB table takes 400 seconds or 7.5 minutes. That's
where I am hoping a second concurrent thread could cut it down 50%.

But I understand this is completely not the core mission of Sqlite. I know
nothing about writing DB engines - so I don't know whether adding a 2nd
parallel process adds 10K or 10M to the code base. Just hoping that as
technology progresses and small embedded devices carry more and morer RAM -
it would be OK to slightly expand the footprint of Sqlite and add some more
"desktop" features.

This would be so incredible. As it is - Sqlite is virtually identical to
Microsoft ACCESS without the Microsoft price tag and footprint.
Multi-threaded capability would actually surpass it...

Post by Udi Karni
(4) Get enough RAM to contain all the data and swap

This may be the best solution to the problem but it's disappointing. You
really shouldn't need to do this. Computers and caching algorithms should
be doing better to help you.

tiny

Post by Udi Karni
ones? Would that introduce bloat in the code and detract from Sqlite's
original mission?

maths, pattern-searching, and processing of lots of data already in memory.
But SQLite is very efficient at doing this, with a very simple very
searchable file format. I think your bottleneck here isn't processing,
it's I/O, just as you put in the 'Subject' header. I bet your limitation
is in your bus width, throughput, or low-level file handling. If you
implement parallelization the way you describe, the result will just be the
two commands constantly fighting over access to your datastore -- back to
bandwidth and throughput again. Just to give you an idea, in normal setups
the normal bottleneck for SQLite speed is the rotational speed of a hard
disk.
Simon.
_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Simon Slavin

2012-07-14 14:17:07 UTC

Post by Keith Medcalf
I know
nothing about writing DB engines - so I don't know whether adding a 2nd
parallel process adds 10K or 10M to the code base.

You've reached the limit of what I know about parallelization. I hope someone else can chime in.

Simon.

Christian Smith

2012-08-01 14:57:59 UTC

Post by Keith Medcalf
I know
nothing about writing DB engines - so I don't know whether adding a 2nd
parallel process adds 10K or 10M to the code base.

You've reached the limit of what I know about parallelization. I hope someone else can chime in.

Using SQLite's VM architecture, I would guess that adding this sort of parallelization would be non-trival. You need a parallel VM, significantly different to the current sequential VM, at at least a way of managing asynchronous IO, with perhaps a callback mechanism into the VM to handle IO completion. <shudder>

While not certain, I guess other databases handle this by using tree based execution plans, where any single execution node can easily be split into branches to another thread/process/machine, then merged in the parent tree node, with each branch handling a certain key range.

This might make sense, for example, with a partitioned table, where each partition is on it's own spindle, so a full table scan can be executed in parallel on each spindle and merged as a final step. So, for a table scan between k0 and k3, find intermediate keys to split the query between spindles:

(k0-k3)
/|\
/ | \
/ | \
/ | \
/ | \
(k0-k1] (k1-k2] (k2-k3)
| | |
disk1 disk2 disk3

I sat through an Oracle internals course once, and the instructor gave us an example of a setup such as this where data was partitioned across 24 disks, and the resulting full table scans were in fact quicker than index based scans for the data set they were using.

Of course, the above would be useless for SQLite anyway, being a single file database. And even with the likes of Oracle, Stripe And Mirror Everything (SAME) might also largely defeat parallel scans.

All in all, the added bloat would be measured in MB, rather than KB.

Christian

disclaimer: Not a practical DB implementation expert.

Udi Karni

2012-08-01 19:25:55 UTC

You are right. True Parallel Query can get very complicated. I was hoping
for something very limited for starters - for example -

- only 2 processes
- only for simple full scans where the block range can be divided in two
- only when there is no ORDER/GROUP BY where sub results from the 2 threads
have to be combined

Basically only for queries such as SELECT COUNT/MIN/MAX FROM TABLE WHERE....

Sounds very limited / what's-the-point kind of thing - but it would
actually be very useful when working with large data where you find
yourself doing a lot of QA and study of the data - "how many rows have this
range of codes / are null", etc.

Having 2 processes working simultaneously might cut run times in half - and
save many minutes.

Going higher than 2 might hit disk read limitations anyway - so 2 might be
plenty for version 1.

In other words - nothing grand - just a small optimization that will kick
in on simple stuff. Pick some low hanging fruit.

A "would be nice" if not too complicated.

On Wed, Aug 1, 2012 at 5:57 PM, Christian Smith <

Post by Keith Medcalf
I know
nothing about writing DB engines - so I don't know whether adding a 2nd
parallel process adds 10K or 10M to the code base.

You've reached the limit of what I know about parallelization. I hope

someone else can chime in.
Using SQLite's VM architecture, I would guess that adding this sort of
parallelization would be non-trival. You need a parallel VM, significantly
different to the current sequential VM, at at least a way of managing
asynchronous IO, with perhaps a callback mechanism into the VM to handle IO
completion. <shudder>
While not certain, I guess other databases handle this by using tree based
execution plans, where any single execution node can easily be split into
branches to another thread/process/machine, then merged in the parent tree
node, with each branch handling a certain key range.
This might make sense, for example, with a partitioned table, where each
partition is on it's own spindle, so a full table scan can be executed in
parallel on each spindle and merged as a final step. So, for a table scan
between k0 and k3, find intermediate keys to split the query between
(k0-k3)
/|\
/ | \
/ | \
/ | \
/ | \
(k0-k1] (k1-k2] (k2-k3)
| | |
disk1 disk2 disk3
I sat through an Oracle internals course once, and the instructor gave us
an example of a setup such as this where data was partitioned across 24
disks, and the resulting full table scans were in fact quicker than index
based scans for the data set they were using.
Of course, the above would be useless for SQLite anyway, being a single
file database. And even with the likes of Oracle, Stripe And Mirror
Everything (SAME) might also largely defeat parallel scans.
All in all, the added bloat would be measured in MB, rather than KB.
Christian
disclaimer: Not a practical DB implementation expert.
_______________________________________________
sqlite-users mailing list
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Simon Slavin

2012-08-01 20:21:02 UTC

Post by Udi Karni
- only 2 processes
- only for simple full scans where the block range can be divided in two
- only when there is no ORDER/GROUP BY where sub results from the 2 threads
have to be combined

Premature optimization ? SQLite, just by itself with default compilation and a set of PRAGMAs which suit your requirements, is extremely fast. Write your application first, and only if it turns out to be too slow worry about clever tricks like that.

And if it really is too slow, and you want tricks like the above, it's probably better to switch to a different DBMS which will itself speed things like the above up because it does caching and other such tricks.

Simon.

Black, Michael (IS)

2012-08-01 19:43:01 UTC

You may be interested in this article:
http://www.drdobbs.com/parallel/multithreaded-file-io/220300055?pgno=2

Mutli-threaded reading of multiple files (which is basically what you're talking about by splitting a file in half) is only faster if you have multiple disks (in this article that's a RAID-5 system).

Random I/O gains a bit by threading due to the probability of intersecting common disk blocks.

Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Advanced GEOINT Solutions Operating Unit
Northrop Grumman Information Systems

________________________________________
From: sqlite-users-bounces-CzDROfG0BjIdnm+***@public.gmane.org [sqlite-users-bounces-CzDROfG0BjIdnm+***@public.gmane.org] on behalf of Udi Karni [ukarni-***@public.gmane.org]
Sent: Wednesday, August 01, 2012 2:25 PM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] Windows I/O (was: Initial read speed greater than subsequent)

You are right. True Parallel Query can get very complicated. I was hoping
for something very limited for starters - for example -

- only 2 processes
- only for simple full scans where the block range can be divided in two
- only when there is no ORDER/GROUP BY where sub results from the 2 threads
have to be combined

Basically only for queries such as SELECT COUNT/MIN/MAX FROM TABLE WHERE....

Sounds very limited / what's-the-point kind of thing - but it would
actually be very useful when working with large data where you find
yourself doing a lot of QA and study of the data - "how many rows have this
range of codes / are null", etc.

Having 2 processes working simultaneously might cut run times in half - and
save many minutes.

Going higher than 2 might hit disk read limitations anyway - so 2 might be
plenty for version 1.

In other words - nothing grand - just a small optimization that will kick
in on simple stuff. Pick some low hanging fruit.

A "would be nice" if not too complicated.

On Wed, Aug 1, 2012 at 5:57 PM, Christian Smith <

Post by Keith Medcalf
I know
nothing about writing DB engines - so I don't know whether adding a 2nd
parallel process adds 10K or 10M to the code base.

You've reached the limit of what I know about parallelization. I hope

Nico Williams

2012-08-01 20:42:19 UTC