Skip to content

Home / Statistical Tools / Analysis Tools / Boxplot / Math Details

Math Details

This page gives the exact formulas Quantum XL uses to build a box plot. Each equation lists what it computes and where it appears on the chart.

Notation

Term Description
\(x_{(1)} \le x_{(2)} \le \dots \le x_{(n)}\) the data values sorted in ascending order
\(x_i,\ f_i\) an unsorted value and its frequency weight, when a frequency column is supplied
\(n\) the number of data values (unweighted case)
\(N\) the effective sample size: \(N = n\) without frequencies, or \(N = \operatorname{round}\!\left(\sum_i f_i\right)\) with a frequency column
\(p\) a probability in \([0,1]\) (for example \(p = 0.25\) for the first quartile)
\(Q(p)\) the sample quantile at probability \(p\) (defined below)
\(\lfloor\,\cdot\,\rfloor,\ \lceil\,\cdot\,\rceil\) the floor and ceiling functions

A box plot is produced only when \(N \ge 4\); otherwise the box is skipped as having insufficient data.

Quantile \(Q(p)\)

Every positional value on the chart — quartiles, median, custom-percentile box edges, and the min/max endpoints — comes from one quantile estimator: the Hyndman–Fan Type 8 definition (approximately median-unbiased).

For unweighted data, let

\[ h = \left(n + \tfrac{1}{3}\right)p + \tfrac{1}{3} \]
\[ Q(p) = x_{(\lfloor h \rfloor)} + \left(h - \lfloor h \rfloor\right)\left(x_{(\lfloor h \rfloor + 1)} - x_{(\lfloor h \rfloor)}\right) \]

The index \(\lfloor h \rfloor\) is clamped to the range \([1, n]\) so the endpoints stay in range. The boundary probabilities return the extreme values exactly: \(Q(0) = x_{(1)} = \min\) and \(Q(1) = x_{(n)} = \max\).

Term Description
\(h\) the fractional rank (a position, \(1 \le h \le n\)) of the requested quantile within the sorted values

Frequency-weighted data. When a frequency column is supplied, the same Type 8 rank is applied to a virtual sample of size \(N = \sum_i f_i\), without physically expanding the data:

\[ h = \left(N + \tfrac{1}{3}\right)p + \tfrac{1}{3}, \qquad h_0 = h - 1 \]

Let \(F_k = \sum_{i \le k} f_i\) be the cumulative frequency over the sorted values, and let \(V(t)\) be the value at zero-based expanded position \(t\) — that is, the first sorted value \(x_{(k)}\) for which \(t < F_k\). With \(t_L = \lfloor h_0 \rfloor\) and \(t_U = \lceil h_0 \rceil\) (each clamped to \([0, N-1]\)),

\[ Q(p) = V(t_L) + \left(h_0 - t_L\right)\left(V(t_U) - V(t_L)\right) \]

This returns exactly what expanding each value \(f_i\) times and applying the unweighted formula would produce.

Median, quartiles, and range

Quantity Formula Used by
First quartile \(Q_1 = Q(0.25)\) lower box edge (default), fences
Median \(\text{med} = Q(0.5)\) center line drawn across the box
Third quartile \(Q_3 = Q(0.75)\) upper box edge (default), fences
Minimum \(Q(0) = x_{(1)}\) red X marker (when Display Dataset Minimum is on)
Maximum \(Q(1) = x_{(n)}\) red X marker (when Display Dataset Maximum is on)

Box edges

The box edges depend on the Box Height option.

Box Height option Lower edge Upper edge
Interquartile Range (default) \(Q(0.25)\) \(Q(0.75)\)
Custom Percentile \(P\) \(Q\!\left(\dfrac{P}{100}\right)\) \(Q\!\left(1 - \dfrac{P}{100}\right)\)

\(P\) is the integer percentile entered in the dialog (5 to 45). For example \(P = 10\) gives a box from the 10th to the 90th percentile.

Interquartile range

\[ \text{IQR} = Q_3 - Q_1 = Q(0.75) - Q(0.25) \]

The IQR is always computed from the true quartiles, even when Box Height is set to Custom Percentile, so the fences below do not move with the box height.

Fences

Fences are the thresholds used to place whiskers and to classify outliers.

\[ \text{inner fences} = \left[\, Q_1 - 1.5\,\text{IQR},\ \ Q_3 + 1.5\,\text{IQR} \,\right] \]
\[ \text{outer fences} = \left[\, Q_1 - 3\,\text{IQR},\ \ Q_3 + 3\,\text{IQR} \,\right] \]

The fences are not drawn directly; they feed the whisker and outlier rules.

Whiskers

The whiskers depend on the Whiskers option.

Lowest/Highest Datum Within Inner Fence (default). The whiskers reach the most extreme actual values that lie within the inner fences:

\[ \text{lower whisker} = \min\{\, x_i : x_i \ge Q_1 - 1.5\,\text{IQR} \,\} \]
\[ \text{upper whisker} = \max\{\, x_i : x_i \le Q_3 + 1.5\,\text{IQR} \,\} \]

The whiskers are then clamped so they never fall inside the box: if the lower whisker exceeds the lower box edge it is set to the lower box edge, and if the upper whisker is below the upper box edge it is set to the upper box edge. If no value satisfies a condition, that whisker falls back to the corresponding box edge.

Minimum/Maximum Data Point. The whiskers reach the full data range, and no outliers are identified:

\[ \text{lower whisker} = x_{(1)} = \min, \qquad \text{upper whisker} = x_{(n)} = \max \]

Outlier classification

Outliers are identified only under the default Lowest/Highest Datum Within Inner Fence whisker option. A value \(x\) is classified by which fences it falls beyond:

\[ \text{suspected outlier:}\quad Q_1 - 3\,\text{IQR} \le x < Q_1 - 1.5\,\text{IQR} \quad\text{or}\quad Q_3 + 1.5\,\text{IQR} < x \le Q_3 + 3\,\text{IQR} \]
\[ \text{outlier:}\quad x < Q_1 - 3\,\text{IQR} \quad\text{or}\quad x > Q_3 + 3\,\text{IQR} \]

Suspected outliers (between the inner and outer fences) are drawn as open red circles; outliers (beyond the outer fences) are drawn as filled red circles. Each distinct value is marked once, regardless of how many times it occurs.

Mean marker

When Display Mean is on, an orange diamond is drawn at the frequency-weighted arithmetic mean:

\[ \bar{x} = \frac{\sum_i f_i\, x_i}{\sum_i f_i} \]

Without a frequency column, every \(f_i = 1\) and this reduces to \(\bar{x} = \frac{1}{n}\sum_i x_i\).

See Also

References

  • Tukey, J. W. (1977). Exploratory Data Analysis. Reading, MA: Addison-Wesley.
  • McGill, R., Tukey, J. W., & Larsen, W. A. (1978). Variations of box plots. The American Statistician, 32(1), 12–16.
  • Hoaglin, D. C., Mosteller, F., & Tukey, J. W. (Eds.). (1983). Understanding Robust and Exploratory Data Analysis. New York: John Wiley & Sons.
  • Frigge, M., Hoaglin, D. C., & Iglewicz, B. (1989). Some implementations of the boxplot. The American Statistician, 43(1), 50–54.
  • Hyndman, R. J., & Fan, Y. (1996). Sample quantiles in statistical packages. The American Statistician, 50(4), 361–365.
  • Montgomery, D. C. (2013). Introduction to Statistical Quality Control (7th ed.). Hoboken, NJ: John Wiley & Sons.