Open terminal and navigate to the location of godbbench.go $ cd ~/path/to/godbbench/cmd
Interact with go run godbbench.go to see flags
Possilbe CLI Commands
# run synthetic INSERT and SELECT statements against MySQL, each 100x$ go run godbbench.go mysql --host127.0.0.1 --port3306 --user"root" \
--pass"password" --iter100 --run"inserts selects"
# run statemets of custom script against Postgres, save results in file$ go run godbbench.go postgres --host127.0.0.1 --port5432 --user"postgres" \
--pass"password" --iter100 --script"./path/to/postgres.sql" \
--writecsv"./path/to/results/postgres.csv"
# merge serveral result files$ go run godbbench.go mergecsv \
--rootDir"~/path/with/csv-files/to-be-merged"
--targetFile"~/anypath/allresults.csv"
# visualize the benchmarking results$ go run godbbench.go createcharts \
--dataFile"~/anypath/allresults.csv" --charttype"line"
Statement Substitutions
Allows for dynamic creation of queries without specifying thousands of structurally identical DBMS statements.
Counter that starts with 1 and ends with the specified iteration count of the given benchmark.
{{call .RandInt64}}
Returns a random non-negative value of type Int64.
{{call .RandFloat64}}
Returns a random value within the interval [0.0,1.0) as Float64.
{{call .RandIntBetween 1 42}}
Returns a random integer between 1 and 42 (Int32).
{{call .RandFloatBetween 0.8 9.9}}
Returns a random float between 0.8 and 9.9 (Float64).
{{call .RandString 1 9}}
Returns a random string with a length between 1 and 9 characters.
{{call .RandDate}}
Returns a random date as string (yyyy-MM-dd) between 1970-01-01 and 2023-01-01.
Custom Script (merchant)
-- INIT (illustration purposes)
\benchmark once \name initialize
DROP SCHEMA IF EXISTS godbbench CASCADE; CREATE SCHEMA godbbench;
CREATETABLE godbbench.order (OrderId INTPRIMARY KEY, CustomerId INTNOTNULL, ... );
-- INSERTS (illustration purposes)
\benchmark loop 1.0 \name inserts
INSERTINTO godbbench.Order (OrderId, CustomerId, CreationDate, Comment)
VALUES( {{.Iter}}, (SELECT CustomerId FROM godbbench.Customer ORDERBY RANDOM() LIMIT 1),
'{{call .RandDate }}', '{{call .RandString 0 50 }}');
-- SELECTS
\benchmark loop 1.0 \name select_simple
SELECT*FROM godbbench.Customer WHERE CustomerId = {{.Iter}}
\benchmark loop 1.0 \name select_medium
SELECT*FROM godbbench.Product p JOIN godbbench.Supplier s ON ...
\benchmark loop 1.0 \name select_complex
SELECT c.CustomerId, c.Name, SUM(li.Quantity * p.UnitSize * p.PricePerUnit) as ...
-- CLEAN (illustration purposes)
\benchmark once \name clean
DROP SCHEMA IF EXISTS godbbench CASCADE;
Attention:
Relational data schemas should not directly be mapped into a graph-world.
Relationships in graph-based DBs are first-class citizen that can hold information by itself.
Custom Script (employees)
Show all subordinates of an employee (tree queries)
-- use WITH RECURISON notation in PostgresSQL (similar in MySQL)WITHRECURSIVE hierarchy AS (
SELECT employee_id, first_name, boss_id, 0AS level
FROM employee
WHERE employeeId = {{.Iter}}
UNIONALLSELECT e.employee_id, e.first_name, e.boss_id, hierarchy.level +1AS level
FROM employee e JOIN hierarchy ON e.boss_id = hierarchy.employee_id
) SELECT*FROM hierarchy;
-- simpler query using CypherMATCH (boss)-[:BOSS_OF*1..]->(sub) WHERE boss.employeeId={{.Iter}} RETURN sub;
see example graph on next slide ...
Further Automation
$ bash bashscript.sh
see demo on next slide...
Result Analysis
Generating a chart.html file visualazing the following metric per benchmark:
arithMean (μs)
Average execution time microseconds calculated using the aithmetic mean.
geoMean (μs)
Average execution time microseconds calculated using the geometric mean.
ops/s
Operations per second which equals executions divided by total (μs).
μs/op
Microseconds per operation which equals total (μs) divided by executions.
Showcase employees
Part
Benchmark
Tasks
0
initialize
Drop all possibly existing data and recreate the root node called "BigBoss"
1
insert_employee
Inserts further nodes that are connected to randomly chosen existing nodes. The number of iterations equals 100% of the specified iteration count.
2
select_before_index
Subsequent query all existing nodes and return the node itself together with all its connected nodes (i.e. its subordinate employees). No index exists at this stage. The number of iterations equals 100% of the specified iteration count.
3
create_index
Creating a so-called BTREE index on the entity's relationship indicator (i.e. foreign key in relational DBMS, resp. relationship itself in graph-based DBMS).
4
clear_cache
All cached data is discarded.
5
select_after_index
The identical querying tasks as in Part 2 is repeated.
6
clean
Complete removal of existing data and index information.
Showcase Results (1/3)
Showcase Results (2/3)
Showcase Results (3/3)
Conclusion
godbbench is a broadly employable tool for automatized DB benchmarks
It is difficult to fairly compare fundamentally different DBMS paradigms
Showcase revealed the superiority of MySQL & PostgreSQL over Neo4j
in the given scenario
Future Work
Implementation of further database adaptors
Facility of multi-connection simulation
Creation of more sophisticated custom scripts
References
Bechberger, D., & Perryman, J. (2020). Graph databases in Action: Examples in Gremlin. Manning.
Bush, J. (2020). Learn SQL Database Programming: Query and manipulate databases from popular relational database servers using SQL.
Chauhan, C., & Kumar, D. (2017). PostgreSQL High Performance Cookbook: Mastering query optimization, database monitoring, and performance-tuning for PostgreSQL. Packt Publishing.
Codd, E. F. (2002). A Relational Model of Data for Large Shared Data Banks. In M. Broy & E. Denert (Eds.), Software Pioneers (pp. 263–294). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-59412-0_16
Elmasri, R., & Navathe, S. (2011). Fundamentals of Database Systems (6th ed). Addison-Wesley.
Fleming, P. J., & Wallace, J. J. (1986). How not to lie with statistics: The correct way to summarize benchmark results. Communications of the ACM, 29(3), 218–221. https://doi.org/10.1145/5666.5673
Gray, J. (Ed.). (1994). The Benchmark Handbook for Database and Transaction Processing Systems (2. ed., 2. [print.]). Morgan Kaufmann.
Gregg, B. (2020). Systems Performance: Enterprise and the Cloud (Second). Addison-Wesley.
Meier, A., & Kaufmann, M. (2019). SQL & NoSQL Databases: Models, Languages, Consistency Options and Architectures for Big Data Management. Springer Vieweg.
Needham, M., & Hodler, A. E. (2019). Graph Algorithms: Practical Examples in Apache Spark and Neo4j (First edition). O’Reilly Media.
Robinson, I., Webber, J., & Eifrem, E. (2015). Graph Databases: New Opportunities for Connected Data.
Scalzo, B. (2018). Database Benchmarking and Stress Testing: An Evidence-Based Approach to Decisions on Architecture and Technology. Springer Science+Business Media, LLC.