Home / Statistical Tools / Analysis Tools / Johnson Transformation / Math Details
Math Details¶
This page gives the exact formulas Quantum XL uses to fit the Johnson transformation. Each equation lists what it computes and where it appears in the output.
Notation¶
| Term | Description |
|---|---|
| \(x\) | an original data value; \(z\) its transformed value (target: standard normal) |
| \(\gamma, \delta\) | shape parameters (\(\delta > 0\)) |
| \(\xi\) | location parameter |
| \(\lambda\) | scale parameter (\(\lambda > 0\)) |
| \(x_1 \le x_2 \le x_3 \le x_4\) | four sample quantiles read at the standard-normal scores \(-3z, -z, +z, +3z\) |
| \(x_L, x_M, x_U\) | the quantile gaps \(x_L = x_2 - x_1\), \(x_M = x_3 - x_2\), \(x_U = x_4 - x_3\) |
| \(n\) | sample size |
| \(\Phi\) | standard normal cumulative distribution function |
The transformation¶
Quantum XL fits one of three Johnson families, each mapping a data value \(x\) to a value \(z\) that is approximately standard normal:
| Family | Transformation | Valid range of \(x\) |
|---|---|---|
| \(S_L\) (log-normal) | \(z = \gamma + \delta \ln(x - \xi)\) | \(x > \xi\) |
| \(S_B\) (bounded) | \(z = \gamma + \delta \ln\!\dfrac{x - \xi}{\xi + \lambda - x}\) | \(\xi < x < \xi + \lambda\) |
| \(S_U\) (unbounded) | \(z = \gamma + \delta \sinh^{-1}\!\dfrac{x - \xi}{\lambda}\) | all real \(x\) |
For \(S_L\) the parameter \(\lambda\) is not used. Used by: each original value is replaced by its transformed value \(z\); the transformed data are then treated as normal.
Selecting the family and the quantiles¶
For a chosen spacing \(z\), four sample quantiles are read at standard-normal scores \(-3z, -z, +z, +3z\):
The family is chosen from the quantile ratio:
The quantile ratio picks the bounded/unbounded candidate: \(QR < 1\) makes \(S_B\) the candidate, and \(QR \ge 1\) makes \(S_U\) the candidate. The log-normal \(S_L\) is always evaluated alongside. Quantum XL sweeps \(z\) from \(0.25\) to \(1.25\) (in steps of \(0.01\)) and selects the family (\(S_L\), \(S_B\), or \(S_U\)) and spacing \(z\) whose transformed data has the largest Anderson–Darling normality p-value (see Goodness of fit).
Parameter estimation¶
The four parameters are estimated in closed form from the quantile gaps (Slifker–Shapiro method). \(\sinh^{-1}\) and \(\cosh^{-1}\) are the inverse hyperbolic sine and cosine.
\(S_L\) (log-normal)¶
\(S_B\) (bounded)¶
\(S_U\) (unbounded)¶
Goodness of fit¶
Each candidate transformation is scored by the Anderson–Darling normality statistic on the sorted transformed values \(z_{(1)} \le \dots \le z_{(n)}\):
with Stephens' finite-sample correction \(A^{*2} = A^2\left(1 + \tfrac{0.75}{n} + \tfrac{2.25}{n^2}\right)\) and a p-value from Stephens' approximation. The family and spacing \(z\) that maximize this p-value are selected; the unadjusted \(A^2\) is reported. If the original (untransformed) data already has an Anderson–Darling p-value above \(0.1\), Quantum XL notes that the data may already be normal.
See Also¶
References¶
- Johnson, N. L. (1949). Systems of frequency curves generated by methods of translation. Biometrika, 36(1/2), 149–176.
- Slifker, J. F., & Shapiro, S. S. (1980). The Johnson system: Selection and parameter estimation. Technometrics, 22(2), 239–246.
- Chou, Y.-M., Polansky, A. M., & Mason, R. L. (1998). Transforming non-normal data to normality in statistical process control. Journal of Quality Technology, 30(2), 133–141.
- Anderson, T. W., & Darling, D. A. (1952). Asymptotic theory of certain goodness-of-fit criteria based on stochastic processes. Annals of Mathematical Statistics, 23(2), 193–212.
- Stephens, M. A. (1974). EDF statistics for goodness of fit and some comparisons. Journal of the American Statistical Association, 69(347), 730–737.