horoscope
horoscope is an optimizer inspector for DBMS.
Get Started
-
Run TiDB
Recommend TiUP.
-
Initialize TPCH Database
Recommend go-tpc.
git clone https://github.com/pingcap/go-tpc.git
cd go-tpc
make
./bin/go-tpc tpch --sf=1 prepare
-
Build Horoscope
git clone https://github.com/chaos-mesh/horoscope.git
cd horoscope
make
-
Start Benching
bin/horo bench -p -w benchmark/tpch
Usage
USAGE:
horo [global options] command [command options] [arguments...]
COMMANDS:
bench Bench the optimizer
gen, g Generate a dynamic bench scheme
query, q Execute a query
hint, H Explain hint of a query
explain, e Explain analyze a query
info, i Show database information
index Add indexes for tables
card test the cardinality estimations
help, h Shows a list of commands or help for one command
GLOBAL OPTIONS:
--dsn DSN, -d DSN set DSN of target db (default: "root:@tcp(localhost:4000)/test?charset=utf8")
--round ROUND, -r ROUND execution ROUND of each query (default: 1)
--json, -j format log with json formatter (default: false)
--file FILE, -f FILE set FILE to store log
--verbose LEVEL, -v LEVEL set LEVEL of log: trace|debug|info|warn|error|fatal|panic (default: "info")
--help, -h show help (default: false)
Bench effectiveness
bin/horo -r 4 bench -p -c -w benchmark/tpch
Bench cardinality estimation
For example, measures the EMQ(exact match queries) row cnt error on customer.C_NAME
for total 100 seconds.
bin/horo card -columns 'customer.C_NAME' -type emq -timeout 100s
Summary report
There will generate a summary report after bench
sub-command is finished.
+-----+-------------+------------------------+--------------------------+---------------+---------------------------------+--------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ID | #PLAN SPACE | DEFAULT EXECUTION TIME | BEST PLAN EXECUTION TIME | EFFECTIVENESS | BETTER OPTIMAL PLANS | ESTROW Q-ERROR | QUERY |
+-----+-------------+------------------------+--------------------------+---------------+---------------------------------+--------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| q3 | 11 | 12061.0ms ±11% | 5401.8ms ±19% | 72.7% | #6(44.8%),#10(66.4%),#11(47.3%) | count:3, median:1.0, 90th:7173270.0, 95th:7173270.0, max:7173270.0 | SELECT l_orderkey,sum(l_extendedprice*(1-l_discount)) AS revenue,o_orderdate,o_shippriority FROM ((customer) JOIN orders) JOIN lineitem WHERE c_mktsegment="AUTOMOBILE" AND c_custkey=o_custkey AND l_orderkey=o_orderkey AND o_orderdate<"1995-03-13" AND l_shipdate>"1995-03-13" GROUP BY l_orderkey,o_orderdate,o_shippriority ORDER BY revenue DESC,o_orderdate LIMIT 10 |
+-----+-------------+------------------------+--------------------------+---------------+---------------------------------+--------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
ID
: query id
#PLAN SPACE
: the plan space size of a query
DEFAULT EXECUTION TIME
: the execution time of default plan, giving in the format of "Mean ±Diff", "Mean" is the mean value of round
rounds, and "Diff" is the lower/upper bound of the mean value
BEST PLAN EXECUTION TIME
: the execution time of the best plan
EFFECTIVENESS
: the percent of the execution time of the default plan better than others on plan space
- We use Pd to represent the default plan generated for the query, Pi as one of plan on plan space
- If execution time(Pi) < 0.9 * execution time(Pd), Pi is a better plan
BETTER OPTIMAL PLANS
: gives the better plan, each item is giving in the format of "nth_plan id(execution time / default execution time)"
ESTROW Q-ERROR
: Base table row cnt estimation q-error for each query
QUERY
: the query
Dataset
We integrate the SQL queries of TPCH, TPCDS, SSB, and JOB benchmarks on the repo, you can use go-tpc and tidb-bench to import the dataset.
For the JOB benchmark, join-order-benchmark is helpful.
Index selection fuzz
Refer to index selection fuzz