tpch

command

v0.0.0-...-825cc77 Latest Latest Go to latest Published: Feb 24, 2017 License: Apache-2.0 Imports: 16 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

README ¶

README
---


About TPC-H
===

TPC-H is a read-only workload of "analytics" queries on large datasets.

TPC-H datasets are scaled by a "scale factor", which is a multiplier of the
number of rows in each table.

Tables, and rows per table:

nation      25
region      5
part        200000*SF
supplier    10000*SF
partsupp    800000*SF
customer    150000*SF
orders      1500000*SF
lineitems   6001215*SF

At scale factor = 1, the data in raw text is about 1gb, and 5gb of data on disk
per node (in a 3 node cluster). For perspective, scale factor 300 is a common
test, and scale factor 100000 (100 terabytes) is the largest official TPC-H test.


Loading TPC-H data
===

In order to run the TPC-H loader, you will need to obtain a set of .tbl files
for the raw data in the 8 tables, and put them in the folder tpch/data. tbl
files can be run by compiling and running the "dbgen" program distributed by
the TPC. If you do not want to run dbgen, a tarball of a set of tbl files (at
scale factor 1) is available on Google Drive at
https://drive.google.com/open?id=0B2yAkR0eFsMEYlJtOWJ0SWZrcTg

Then, running
    ./tpch -load
will load the data into Cockroach. This currently takes a lot of time (~5 hours
on a gceworker). In order to circumvent that, it is advisable to use the
enterprise backup/restore feature, as a backup dump is available
https://drive.google.com/open?id=0B2yAkR0eFsMEQlNHekhlaE5VTXM

This takes considerably less time, on the order of minutes. In order to restore
it, start a cockroach cluster with the following flags:

COCKROACH_PROPOSER_EVALUATED_KV=true COCKROACH_ENTERPRISE_ENABLED=true ./cockroach start

Then run:
    ./tpch -restore=/path/to/backup

Once this finishes (minutes), you can shut down the cockroach cluster and
restart it without those flags for actually running queries.

Running TPC-H queries
===

Currently, not all queries can be run, due to the use of features such as
correlated subqueries. Those queries that do run are set to run by default.

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL