Home / Benchmark Results
Benchmark Results¶
How does ProTest compare to other combinatorial testing tools? We benchmarked ProTest's SIPO engine against the standard scenarios from pairwise.org — the definitive collection of pairwise testing tool efficiency comparisons.
All results below use pairwise (t=2) coverage. Each scenario was run with SIPO using the FullHorizontal enhancement strategy. Download any model file below — they include the saved results so you can inspect the covering arrays directly and try to beat them.
Results Summary — pairwise.org Scenarios¶
These are the six standard benchmark scenarios from pairwise.org:
| Scenario | Exhaustive | Best Known | SIPO | PICT | SIPO vs PICT | Download |
|---|---|---|---|---|---|---|
| 3^4 | 81 | 9 | 9 | 9 | Same | 3_4.cahtt |
| 3^13 | 1,594,323 | 15 | 15 | 18 | -17% | 3_13.cahtt |
| 2^100 | 1.3 x 10^30 | 10 | 10 | 15 | -33% | 2_100.cahtt |
| 10^20 | 10^20 | 180 | 192 | 210 | -9% | 10_20.cahtt |
| 4^15 3^17 2^29 | ~10^18 | 29 | 28 | 37 | -24% | 4_15__3_17__2_29.cahtt |
| 4^1 3^39 2^35 | ~10^20 | 21 | 22 | 27 | -19% | 4_1__3_39__2_35.cahtt |
Use SIPO when minimizing test case count matters — it typically produces smaller covering arrays than PICT, especially for models with many parameters. PICT may be faster for quick iterations. See the detailed results below for per-scenario comparisons.
How to Read This Table¶
- Exhaustive — Total possible combinations (what you'd need without combinatorial testing)
- Best Known — The smallest covering array found by any tool, per pairwise.org
- SIPO — ProTest's SIPO engine result
- PICT — Microsoft PICT's result, from pairwise.org
- Download — Click to download the
.cahttmodel file with saved results
Comparison with Other Tools¶
Full comparison from pairwise.org, with ProTest SIPO added:
| Tool | 3^4 | 3^13 | 4^15 3^17 2^29 | 4^1 3^39 2^35 | 2^100 | 10^20 | Available? |
|---|---|---|---|---|---|---|---|
| ProTest SIPO | 9 | 15 | 28 | 22 | 10 | 192 | Yes |
| AETG | 9 | 15 | 41 | 28 | 10 | 180 | No (proprietary) |
| TestCover | 9 | 15 | 29 | 21 | 10 | 181 | No (website defunct) |
| EXACT | 9 | 15 | ? | 21 | 10 | ? | No (research only) |
| IPO-s | 9 | 17 | 32 | 23 | 10 | 220 | Via NIST ACTS |
| CoverTable | 9 | 17 | 34 | 26 | 12 | 195 | Unknown |
| PICT | 9 | 18 | 37 | 27 | 15 | 210 | Yes (open source) |
| CTS | 9 | 15 | 39 | 29 | 10 | 210 | Unknown |
| IPO | 9 | 17 | 34 | 26 | 15 | 212 | Via NIST ACTS |
| AllPairs | 9 | 17 | 34 | 26 | 14 | 197 | Unknown |
| Jenny | 11 | 18 | 38 | 28 | 16 | 193 | Yes (open source) |
| DDA | ? | 18 | 35 | 27 | 15 | 201 | Unknown |
| ecFeed | 10 | 19 | 37 | 28 | 16 | 203 | Yes (open source) |
| TConfig | 9 | 15 | 40 | 30 | 14 | 231 | Unknown |
| JCUnit | 10 | 23 | 49 | 33 | 18 | 245 | Yes (open source) |
| LazyParams | 10 | 20 | 45 | 33 | 16 | 288 | Yes (open source) |
ProTest SIPO achieved 28 rows for 4^15 3^17 2^29, improving on the previous best of 29. The "Available?" column notes which tools can still be downloaded or used.
Detailed Results by Scenario¶
3^4 — Four 3-Level Parameters¶
| Parameters | 4 parameters, 3 values each |
| Total pairs | C(4,2) = 6 |
| Exhaustive | 81 combinations |
| Best known | 9 rows |
| SIPO | 9 rows (optimal) |
| PICT | 9 rows |
This is a small model — both PICT and SIPO reach the theoretical optimum.
3^13 — Thirteen 3-Level Parameters¶
| Parameters | 13 parameters, 3 values each |
| Total pairs | C(13,2) = 78 |
| Exhaustive | 1,594,323 combinations |
| Best known | 15 rows |
| SIPO | 15 rows (optimal) |
| PICT | 18 rows |
SIPO reaches the best known optimum. PICT produces 18 rows for this scenario.
2^100 — One Hundred Binary Parameters¶
| Parameters | 100 parameters, 2 values each |
| Total pairs | C(100,2) = 4,950 |
| Best known | 10 rows |
| SIPO | 10 rows (optimal) |
| PICT | 15 rows |
SIPO matches the best known result. PICT produces 15 rows for this scenario.
10^20 — Twenty 10-Level Parameters¶
| Parameters | 20 parameters, 10 values each |
| Total pairs | C(20,2) = 190 |
| Exhaustive | 10^20 combinations |
| Best known | 180 rows (AETG) |
| SIPO | 192 rows |
| PICT | 210 rows |
SIPO produces 192 rows; PICT produces 210. The best known result of 180 was achieved by AETG.
4^15 3^17 2^29 — Mixed Levels (61 Parameters)¶
| Parameters | 61 total: 15 with 4 values, 17 with 3 values, 29 with 2 values |
| Total pairs | C(61,2) = 1,830 |
| Previous best known | 29 rows (TestCover) |
| SIPO | 28 rows (new best known) |
| PICT | 37 rows |
SIPO achieves 28 rows — one fewer than the previous pairwise.org best of 29 (TestCover).
Download 4_15__3_17__2_29.cahtt
4^1 3^39 2^35 — Mixed Levels (75 Parameters)¶
| Parameters | 75 total: 1 with 4 values, 39 with 3 values, 35 with 2 values |
| Total pairs | C(75,2) = 2,775 |
| Best known | 21 rows (EXACT) |
| SIPO | 22 rows |
| PICT | 27 rows |
SIPO produces 22 rows, within 1 of the best known result of 21 (EXACT). PICT produces 27 rows.
Download 4_1__3_39__2_35.cahtt
Reproducibility¶
All download files include saved results with full trial details. You can also regenerate by opening any .cahtt file in the ProTest UI, selecting SIPO engine, and clicking Generate. Or from the CLI:
protest generate -i <model>.cahtt -o results.csv --engine sipo
Results will vary by random seed and number of parallel trials. More trials increase the chance of finding a smaller covering array.
Methodology¶
- Algorithm: SIPO (Wagner, Kampel & Simos, IWOCA 2021)
- Enhancement: FullHorizontal
- Strength: 2 (pairwise)
- SIPO Base (N): 10,000
- Reference data: pairwise.org efficiency comparison
- Verification: All covering arrays verified for complete pairwise coverage