Benchmarks¶

TongGraph keeps the v0.2.0 benchmark runner under tests/benchmark so benchmark smoke coverage can run with the normal test suite while still producing JSON artifacts for local comparison.

uv run python scripts/build_python_extension.py
uv run python -m tests.benchmark.gbench \
  --nodes 100 \
  --degree 3 \
  --repeat 3 \
  --output /tmp/gbench.json

The JSON artifact includes environment metadata, commit hash, workload configuration, per-workload timings, and result checksums. Initial workloads cover traversal, structured/Cypher query execution, hybrid GraphRAG retrieval, persistence/reopen, and belief propagation.

Exact Vector Search¶

Run the exact vector benchmark for embedded Graph.search_vector() and TongGraph Server HTTP vector search:

uv run python -m tests.benchmark.gbench.vector \
  --vectors 10000 \
  --dimensions 128 \
  --queries 20 \
  --batch-size 8 \
  --repeat 3 \
  --output tests/benchmark/.gbench/results/vector-exact-10k.json

uv run python -m tests.benchmark.gbench.vector \
  --vectors 100000 \
  --dimensions 128 \
  --queries 20 \
  --batch-size 8 \
  --repeat 3 \
  --output tests/benchmark/.gbench/results/vector-exact-100k.json

The benchmark builds one deterministic SQLite graph, measures embedded exact search, then starts a local TongGraph Server over the same graph and measures HTTP search. Results below are local reference numbers for commit 400558ee980140105053938088727ab5b1da9bfe on Linux 6.8 / Python 3.13.12 with cosine search, 128 dimensions, limit=10, 20 query vectors, and 3 repeats.

Vectors	Workload	Mean	P50	P95	Throughput
10k	embedded `search_vector`	56.890 ms	56.588 ms	59.938 ms	17.58 ops/s
10k	server `search_vector`	61.245 ms	60.119 ms	65.039 ms	16.33 ops/s
10k	embedded `search_vectors` batch	379.354 ms	455.020 ms	455.704 ms	2.64 batch ops/s
10k	server `search_vectors` batch	386.750 ms	461.768 ms	468.994 ms	2.59 batch ops/s
100k	embedded `search_vector`	596.288 ms	595.818 ms	601.011 ms	1.68 ops/s
100k	server `search_vector`	615.215 ms	613.285 ms	630.238 ms	1.63 ops/s
100k	embedded `search_vectors` batch	4028.329 ms	4835.346 ms	4841.639 ms	0.25 batch ops/s
100k	server `search_vectors` batch	4088.130 ms	4893.004 ms	4918.222 ms	0.24 batch ops/s

The batch rows measure one search_vectors() call as one operation. With batch_size=8 and 20 query vectors, the final batch has 4 vectors, so mean and percentiles reflect mixed batch sizes.

These numbers match the expected exact-scan profile: latency grows roughly linearly with vector count. The server wrapper adds only small overhead compared with embedded search; the scan dominates at 100k vectors. For interactive local or internal services, 10k vectors per index is comfortable, while 100k vectors is best treated as a batch/offline or low-QPS tier unless higher latency is acceptable. Larger or latency-sensitive deployments should wait for a future ANN backend.

Use these numbers as local reproducibility evidence. Public speed claims should only compare runs with the same commit, hardware, Python version, and benchmark configuration.