Skip to content

Home / Statistical Tools / Analysis Tools / Johnson Transformation / Math Details

Math Details

This page gives the exact formulas Quantum XL uses to fit the Johnson transformation. Each equation lists what it computes and where it appears in the output.

Notation

Term Description
\(x\) an original data value; \(z\) its transformed value (target: standard normal)
\(\gamma, \delta\) shape parameters (\(\delta > 0\))
\(\xi\) location parameter
\(\lambda\) scale parameter (\(\lambda > 0\))
\(x_1 \le x_2 \le x_3 \le x_4\) four sample quantiles read at the standard-normal scores \(-3z, -z, +z, +3z\)
\(x_L, x_M, x_U\) the quantile gaps \(x_L = x_2 - x_1\), \(x_M = x_3 - x_2\), \(x_U = x_4 - x_3\)
\(n\) sample size
\(\Phi\) standard normal cumulative distribution function

The transformation

Quantum XL fits one of three Johnson families, each mapping a data value \(x\) to a value \(z\) that is approximately standard normal:

\[ z = \gamma + \delta\, g\!\left(\frac{x - \xi}{\lambda}\right) \]
Family Transformation Valid range of \(x\)
\(S_L\) (log-normal) \(z = \gamma + \delta \ln(x - \xi)\) \(x > \xi\)
\(S_B\) (bounded) \(z = \gamma + \delta \ln\!\dfrac{x - \xi}{\xi + \lambda - x}\) \(\xi < x < \xi + \lambda\)
\(S_U\) (unbounded) \(z = \gamma + \delta \sinh^{-1}\!\dfrac{x - \xi}{\lambda}\) all real \(x\)

For \(S_L\) the parameter \(\lambda\) is not used. Used by: each original value is replaced by its transformed value \(z\); the transformed data are then treated as normal.

Selecting the family and the quantiles

For a chosen spacing \(z\), four sample quantiles are read at standard-normal scores \(-3z, -z, +z, +3z\):

\[ x_1 = Q\!\left(\Phi(-3z)\right), \quad x_2 = Q\!\left(\Phi(-z)\right), \quad x_3 = Q\!\left(\Phi(z)\right), \quad x_4 = Q\!\left(\Phi(3z)\right) \]

The family is chosen from the quantile ratio:

\[ QR = \frac{x_L\, x_U}{x_M^{2}} \]

The quantile ratio picks the bounded/unbounded candidate: \(QR < 1\) makes \(S_B\) the candidate, and \(QR \ge 1\) makes \(S_U\) the candidate. The log-normal \(S_L\) is always evaluated alongside. Quantum XL sweeps \(z\) from \(0.25\) to \(1.25\) (in steps of \(0.01\)) and selects the family (\(S_L\), \(S_B\), or \(S_U\)) and spacing \(z\) whose transformed data has the largest Anderson–Darling normality p-value (see Goodness of fit).

Parameter estimation

The four parameters are estimated in closed form from the quantile gaps (Slifker–Shapiro method). \(\sinh^{-1}\) and \(\cosh^{-1}\) are the inverse hyperbolic sine and cosine.

\(S_L\) (log-normal)

\[ \delta = \frac{2z}{\ln(x_U/x_M)}, \qquad \gamma = \delta \ln\!\frac{x_U/x_M - 1}{\sqrt{x_M\, x_U}}, \qquad \xi = \frac{x_2 + x_3}{2} - \frac{x_M}{2}\cdot\frac{x_U/x_M + 1}{x_U/x_M - 1} \]

\(S_B\) (bounded)

\[ \delta = \frac{z}{\cosh^{-1}\!\left(\tfrac{1}{2}\sqrt{(1 + x_M/x_U)(1 + x_M/x_L)}\right)} \]
\[ \gamma = \delta\, \sinh^{-1}\!\left(\frac{(x_M/x_L - x_M/x_U)\,\sqrt{(1 + x_M/x_U)(1 + x_M/x_L) - 4}}{2\left(\dfrac{x_M^{2}}{x_L x_U} - 1\right)}\right) \]
\[ \lambda = \frac{x_M\,\sqrt{\left((1 + x_M/x_U)(1 + x_M/x_L) - 2\right)^{2} - 4}}{\dfrac{x_M^{2}}{x_L x_U} - 1}, \qquad \xi = \frac{x_2 + x_3}{2} - \frac{\lambda}{2} + \frac{x_M\left(x_M/x_L - x_M/x_U\right)}{2\left(\dfrac{x_M^{2}}{x_L x_U} - 1\right)} \]

\(S_U\) (unbounded)

\[ \delta = \frac{2z}{\cosh^{-1}\!\left(\tfrac{1}{2}\left(x_U/x_M + x_L/x_M\right)\right)}, \qquad \gamma = \delta\,\sinh^{-1}\!\left(\frac{x_L/x_M - x_U/x_M}{2\sqrt{\dfrac{x_U x_L}{x_M^{2}} - 1}}\right) \]
\[ \lambda = \frac{2\,x_M\sqrt{\dfrac{x_U x_L}{x_M^{2}} - 1}}{\left(x_U/x_M + x_L/x_M - 2\right)\sqrt{x_U/x_M + x_L/x_M + 2}}, \qquad \xi = \frac{x_2 + x_3}{2} + \frac{x_M\left(x_L/x_M - x_U/x_M\right)}{2\left(x_U/x_M + x_L/x_M - 2\right)} \]

Goodness of fit

Each candidate transformation is scored by the Anderson–Darling normality statistic on the sorted transformed values \(z_{(1)} \le \dots \le z_{(n)}\):

\[ A^2 = -n - \frac{1}{n}\sum_{i=1}^{n}(2i - 1)\left[\ln \Phi(z_{(i)}) + \ln\!\left(1 - \Phi(z_{(n+1-i)})\right)\right] \]

with Stephens' finite-sample correction \(A^{*2} = A^2\left(1 + \tfrac{0.75}{n} + \tfrac{2.25}{n^2}\right)\) and a p-value from Stephens' approximation. The family and spacing \(z\) that maximize this p-value are selected; the unadjusted \(A^2\) is reported. If the original (untransformed) data already has an Anderson–Darling p-value above \(0.1\), Quantum XL notes that the data may already be normal.

See Also

References

  • Johnson, N. L. (1949). Systems of frequency curves generated by methods of translation. Biometrika, 36(1/2), 149–176.
  • Slifker, J. F., & Shapiro, S. S. (1980). The Johnson system: Selection and parameter estimation. Technometrics, 22(2), 239–246.
  • Chou, Y.-M., Polansky, A. M., & Mason, R. L. (1998). Transforming non-normal data to normality in statistical process control. Journal of Quality Technology, 30(2), 133–141.
  • Anderson, T. W., & Darling, D. A. (1952). Asymptotic theory of certain goodness-of-fit criteria based on stochastic processes. Annals of Mathematical Statistics, 23(2), 193–212.
  • Stephens, M. A. (1974). EDF statistics for goodness of fit and some comparisons. Journal of the American Statistical Association, 69(347), 730–737.