Home / Statistical Tools / Analysis Tools / Scatterplot / Math Details
Math Details¶
This page gives the exact formulas Quantum XL uses to fit a scatter plot's regression line and report its statistics. Each equation lists what it computes and where it appears in the output.
Notation¶
| Term | Description |
|---|---|
| \((x_i, y_i)\) | the \(i\)-th data pair — predictor \(x\), response \(y\) |
| \(w_i\) | frequency weight of point \(i\) (\(w_i = 1\) when no frequency column is used) |
| \(n\) | effective sample size, \(n = \sum_i w_i\) |
| \(p\) | number of model parameters, including the intercept |
| \(\hat{y}_i\) | fitted (predicted) value at \(x_i\) |
| \(\bar{y}\) | weighted mean of the response, \(\bar{y} = \dfrac{\sum_i w_i y_i}{\sum_i w_i}\) |
| \(X\) | design matrix whose \(i\)-th row is the model's terms at \(x_i\) |
| \(W\) | diagonal weight matrix, \(W = \operatorname{diag}(w_1, \dots)\) |
Regression models¶
The fitted curve is one of five model forms (selected in Options). Coefficients are estimated by weighted least squares.
| Model | Equation | Constraint |
|---|---|---|
| Linear | \(\hat{y} = a + b x\) | — |
| Polynomial (degree \(d\), 2–6) | \(\hat{y} = a_0 + a_1 x + a_2 x^2 + \dots + a_d x^d\) | — |
| Logarithmic | \(\hat{y} = a + b \ln x\) | \(x > 0\) |
| Power | \(\hat{y} = a\, x^{b}\) | \(x > 0,\ y > 0\) |
| Exponential | \(\hat{y} = a\, e^{b x}\) | \(y > 0\) |
Power and Exponential are fit by linearizing in log space and least-squares fitting there, then back-transforming — so the least-squares criterion is minimized on \(\ln y\), not on \(y\):
In both cases the intercept from the log-space fit is back-transformed as \(a = e^{(\text{intercept})}\). For these two models the fit statistics — \(R^2\), the F-test, the residual standard error, and VIF — are computed in this log space, and the coefficient standard errors are the log-space standard errors. The reported coefficients are the back-transformed \(a\) and \(b\): the slope \(b\) is identical in log space and after back-transformation, but the intercept t-statistic divides the back-transformed intercept \(a = e^{(\text{intercept})}\) by the log-space standard error of the intercept.
Used by: the fitted line/curve drawn on the chart, and the model equation shown in the results.
Weighted least-squares estimation¶
The coefficient vector \(\hat{\beta}\) minimizes the weighted residual sum of squares:
where \(\mathbf{x}_i\) is the \(i\)-th row of \(X\) (e.g. \([1,\ x_i]\) for linear; \([1,\ x_i,\ x_i^2,\ \dots,\ x_i^{d}]\) for polynomial). Quantum XL solves this by QR decomposition of the weighted design matrix \(X^{*} = \sqrt{W}\,X\) (so that \(X^{*\top}X^{*} = X^{\top}WX\)), which is more numerically stable than forming the normal equations directly.
Coefficient of determination (\(R^2\))¶
Used by: the fit-statistics table. Undefined (\(R^2 = \text{NaN}\)) when \(SS_{\text{tot}} = 0\) (constant \(y\)).
Adjusted \(R^2\)¶
Penalizes added terms; reported as NaN when \(n \le p\).
Residual standard error¶
| Term | Description |
|---|---|
| MSE | mean squared error, \(SS_{\text{res}}/(n-p)\) |
Overall F-test¶
Tests whether the model explains significant variance.
The p-value is the upper-tail probability of the \(F\) distribution with \((p-1,\ n-p)\) degrees of freedom:
A perfect fit (\(SS_{\text{res}} = 0\)) gives \(F = \infty\), p-value \(= 0\).
Coefficient t-tests¶
For each coefficient \(\hat{\beta}_j\):
The two-tailed p-value uses the \(t\) distribution with \(n - p\) degrees of freedom:
Quantum XL obtains \((X^{\top}WX)^{-1}\) from the QR factor \(R\) of \(X^{*}\) as \((R^{\top}R)^{-1}\).
Variance inflation factor (VIF)¶
For predictor \(j\) (intercept excluded), the predictors are first standardized (weighted mean subtracted, divided by weighted sample standard deviation) and weighted by \(\sqrt{W}\); a QR decomposition of that standardized weighted matrix \(Z^{*}\) gives factor \(R\), and:
\(VIF = 1\) indicates no multicollinearity; values above roughly \(5\)–\(10\) indicate a problem. A single predictor is defined as \(VIF = 1\).
Optional X transformations¶
If enabled in Options, \(x\) is transformed before fitting:
where \(s_x\) is the sample standard deviation of \(x\).
See Also¶
References¶
- Draper, N. R., & Smith, H. (1998). Applied Regression Analysis (3rd ed.). New York: John Wiley & Sons.
- Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). Introduction to Linear Regression Analysis (5th ed.). Hoboken, NJ: John Wiley & Sons.
- Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2005). Applied Linear Statistical Models (5th ed.). New York: McGraw-Hill/Irwin.
- Weisberg, S. (2005). Applied Linear Regression (3rd ed.). Hoboken, NJ: John Wiley & Sons.
- Seber, G. A. F., & Lee, A. J. (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: John Wiley & Sons.
- Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: John Wiley & Sons.
- Golub, G. H., & Van Loan, C. F. (2013). Matrix Computations (4th ed.). Baltimore: Johns Hopkins University Press.