Home / Statistical Tools / Analysis Tools / Boxplot / Math Details
Math Details¶
This page gives the exact formulas Quantum XL uses to build a box plot. Each equation lists what it computes and where it appears on the chart.
Notation¶
| Term | Description |
|---|---|
| \(x_{(1)} \le x_{(2)} \le \dots \le x_{(n)}\) | the data values sorted in ascending order |
| \(x_i,\ f_i\) | an unsorted value and its frequency weight, when a frequency column is supplied |
| \(n\) | the number of data values (unweighted case) |
| \(N\) | the effective sample size: \(N = n\) without frequencies, or \(N = \operatorname{round}\!\left(\sum_i f_i\right)\) with a frequency column |
| \(p\) | a probability in \([0,1]\) (for example \(p = 0.25\) for the first quartile) |
| \(Q(p)\) | the sample quantile at probability \(p\) (defined below) |
| \(\lfloor\,\cdot\,\rfloor,\ \lceil\,\cdot\,\rceil\) | the floor and ceiling functions |
A box plot is produced only when \(N \ge 4\); otherwise the box is skipped as having insufficient data.
Quantile \(Q(p)\)¶
Every positional value on the chart — quartiles, median, custom-percentile box edges, and the min/max endpoints — comes from one quantile estimator: the Hyndman–Fan Type 8 definition (approximately median-unbiased).
For unweighted data, let
The index \(\lfloor h \rfloor\) is clamped to the range \([1, n]\) so the endpoints stay in range. The boundary probabilities return the extreme values exactly: \(Q(0) = x_{(1)} = \min\) and \(Q(1) = x_{(n)} = \max\).
| Term | Description |
|---|---|
| \(h\) | the fractional rank (a position, \(1 \le h \le n\)) of the requested quantile within the sorted values |
Frequency-weighted data. When a frequency column is supplied, the same Type 8 rank is applied to a virtual sample of size \(N = \sum_i f_i\), without physically expanding the data:
Let \(F_k = \sum_{i \le k} f_i\) be the cumulative frequency over the sorted values, and let \(V(t)\) be the value at zero-based expanded position \(t\) — that is, the first sorted value \(x_{(k)}\) for which \(t < F_k\). With \(t_L = \lfloor h_0 \rfloor\) and \(t_U = \lceil h_0 \rceil\) (each clamped to \([0, N-1]\)),
This returns exactly what expanding each value \(f_i\) times and applying the unweighted formula would produce.
Median, quartiles, and range¶
| Quantity | Formula | Used by |
|---|---|---|
| First quartile | \(Q_1 = Q(0.25)\) | lower box edge (default), fences |
| Median | \(\text{med} = Q(0.5)\) | center line drawn across the box |
| Third quartile | \(Q_3 = Q(0.75)\) | upper box edge (default), fences |
| Minimum | \(Q(0) = x_{(1)}\) | red X marker (when Display Dataset Minimum is on) |
| Maximum | \(Q(1) = x_{(n)}\) | red X marker (when Display Dataset Maximum is on) |
Box edges¶
The box edges depend on the Box Height option.
| Box Height option | Lower edge | Upper edge |
|---|---|---|
| Interquartile Range (default) | \(Q(0.25)\) | \(Q(0.75)\) |
| Custom Percentile \(P\) | \(Q\!\left(\dfrac{P}{100}\right)\) | \(Q\!\left(1 - \dfrac{P}{100}\right)\) |
\(P\) is the integer percentile entered in the dialog (5 to 45). For example \(P = 10\) gives a box from the 10th to the 90th percentile.
Interquartile range¶
The IQR is always computed from the true quartiles, even when Box Height is set to Custom Percentile, so the fences below do not move with the box height.
Fences¶
Fences are the thresholds used to place whiskers and to classify outliers.
The fences are not drawn directly; they feed the whisker and outlier rules.
Whiskers¶
The whiskers depend on the Whiskers option.
Lowest/Highest Datum Within Inner Fence (default). The whiskers reach the most extreme actual values that lie within the inner fences:
The whiskers are then clamped so they never fall inside the box: if the lower whisker exceeds the lower box edge it is set to the lower box edge, and if the upper whisker is below the upper box edge it is set to the upper box edge. If no value satisfies a condition, that whisker falls back to the corresponding box edge.
Minimum/Maximum Data Point. The whiskers reach the full data range, and no outliers are identified:
Outlier classification¶
Outliers are identified only under the default Lowest/Highest Datum Within Inner Fence whisker option. A value \(x\) is classified by which fences it falls beyond:
Suspected outliers (between the inner and outer fences) are drawn as open red circles; outliers (beyond the outer fences) are drawn as filled red circles. Each distinct value is marked once, regardless of how many times it occurs.
Mean marker¶
When Display Mean is on, an orange diamond is drawn at the frequency-weighted arithmetic mean:
Without a frequency column, every \(f_i = 1\) and this reduces to \(\bar{x} = \frac{1}{n}\sum_i x_i\).
See Also¶
References¶
- Tukey, J. W. (1977). Exploratory Data Analysis. Reading, MA: Addison-Wesley.
- McGill, R., Tukey, J. W., & Larsen, W. A. (1978). Variations of box plots. The American Statistician, 32(1), 12–16.
- Hoaglin, D. C., Mosteller, F., & Tukey, J. W. (Eds.). (1983). Understanding Robust and Exploratory Data Analysis. New York: John Wiley & Sons.
- Frigge, M., Hoaglin, D. C., & Iglewicz, B. (1989). Some implementations of the boxplot. The American Statistician, 43(1), 50–54.
- Hyndman, R. J., & Fan, Y. (1996). Sample quantiles in statistical packages. The American Statistician, 50(4), 361–365.
- Montgomery, D. C. (2013). Introduction to Statistical Quality Control (7th ed.). Hoboken, NJ: John Wiley & Sons.