Home / Statistical Tools / Analysis Tools / Scatterplot / Math Details

Math Details¶

This page gives the exact formulas Quantum XL uses to fit a scatter plot's regression line and report its statistics. Each equation lists what it computes and where it appears in the output.

Notation¶

Term	Description
\((x_i, y_i)\)	the \(i\)-th data pair — predictor \(x\), response \(y\)
\(w_i\)	frequency weight of point \(i\) (\(w_i = 1\) when no frequency column is used)
\(n\)	effective sample size, \(n = \sum_i w_i\)
\(p\)	number of model parameters, including the intercept
\(\hat{y}_i\)	fitted (predicted) value at \(x_i\)
\(\bar{y}\)	weighted mean of the response, \(\bar{y} = \dfrac{\sum_i w_i y_i}{\sum_i w_i}\)
\(X\)	design matrix whose \(i\)-th row is the model's terms at \(x_i\)
\(W\)	diagonal weight matrix, \(W = \operatorname{diag}(w_1, \dots)\)

Regression models¶

The fitted curve is one of five model forms (selected in Options). Coefficients are estimated by weighted least squares.

Model	Equation	Constraint
Linear	\(\hat{y} = a + b x\)	—
Polynomial (degree \(d\), 2–6)	\(\hat{y} = a_0 + a_1 x + a_2 x^2 + \dots + a_d x^d\)	—
Logarithmic	\(\hat{y} = a + b \ln x\)	\(x > 0\)
Power	\(\hat{y} = a\, x^{b}\)	\(x > 0,\ y > 0\)
Exponential	\(\hat{y} = a\, e^{b x}\)	\(y > 0\)

Power and Exponential are fit by linearizing in log space and least-squares fitting there, then back-transforming — so the least-squares criterion is minimized on \(\ln y\), not on \(y\):

\[ \text{Power:}\quad \ln y = \ln a + b\,\ln x \qquad\qquad \text{Exponential:}\quad \ln y = \ln a + b\,x \]

In both cases the intercept from the log-space fit is back-transformed as \(a = e^{(\text{intercept})}\). For these two models the fit statistics — \(R^2\), the F-test, the residual standard error, and VIF — are computed in this log space, and the coefficient standard errors are the log-space standard errors. The reported coefficients are the back-transformed \(a\) and \(b\): the slope \(b\) is identical in log space and after back-transformation, but the intercept t-statistic divides the back-transformed intercept \(a = e^{(\text{intercept})}\) by the log-space standard error of the intercept.

Used by: the fitted line/curve drawn on the chart, and the model equation shown in the results.

Weighted least-squares estimation¶

The coefficient vector \(\hat{\beta}\) minimizes the weighted residual sum of squares:

\[ \hat{\beta} = \arg\min_{\beta} \sum_{i} w_i\left(y_i - \mathbf{x}_i^{\top}\beta\right)^2 \]

where \(\mathbf{x}_i\) is the \(i\)-th row of \(X\) (e.g. \([1,\ x_i]\) for linear; \([1,\ x_i,\ x_i^2,\ \dots,\ x_i^{d}]\) for polynomial). Quantum XL solves this by QR decomposition of the weighted design matrix \(X^{*} = \sqrt{W}\,X\) (so that \(X^{*\top}X^{*} = X^{\top}WX\)), which is more numerically stable than forming the normal equations directly.

Coefficient of determination (\(R^2\))¶

\[ R^2 = 1 - \frac{SS_{\text{res}}}{SS_{\text{tot}}}, \qquad SS_{\text{tot}} = \sum_i w_i\left(y_i - \bar{y}\right)^2, \qquad SS_{\text{res}} = \sum_i w_i\left(y_i - \hat{y}_i\right)^2 \]

Used by: the fit-statistics table. Undefined (\(R^2 = \text{NaN}\)) when \(SS_{\text{tot}} = 0\) (constant \(y\)).

Adjusted \(R^2\)¶

\[ R^2_{\text{adj}} = 1 - \left(1 - R^2\right)\frac{n - 1}{n - p} \]

Penalizes added terms; reported as NaN when \(n \le p\).

Residual standard error¶

\[ s = \sqrt{\text{MSE}} = \sqrt{\frac{SS_{\text{res}}}{\,n - p\,}} \]

Term	Description
MSE	mean squared error, \(SS_{\text{res}}/(n-p)\)

Overall F-test¶

Tests whether the model explains significant variance.

\[ F = \frac{\text{MSR}}{\text{MSE}} = \frac{SS_{\text{reg}}/(p-1)}{SS_{\text{res}}/(n-p)}, \qquad SS_{\text{reg}} = SS_{\text{tot}} - SS_{\text{res}} \]

The p-value is the upper-tail probability of the \(F\) distribution with \((p-1,\ n-p)\) degrees of freedom:

\[ \text{p-value} = 1 - F_{\,p-1,\ n-p}(F) \]

A perfect fit (\(SS_{\text{res}} = 0\)) gives \(F = \infty\), p-value \(= 0\).

Coefficient t-tests¶

For each coefficient \(\hat{\beta}_j\):

\[ SE(\hat{\beta}_j) = s\,\sqrt{\left[(X^{\top}WX)^{-1}\right]_{jj}}, \qquad t_j = \frac{\hat{\beta}_j}{SE(\hat{\beta}_j)} \]

The two-tailed p-value uses the \(t\) distribution with \(n - p\) degrees of freedom:

\[ p_j = 2\left[\,1 - T_{\,n-p}\!\left(\lvert t_j \rvert\right)\right] \]

Quantum XL obtains \((X^{\top}WX)^{-1}\) from the QR factor \(R\) of \(X^{*}\) as \((R^{\top}R)^{-1}\).

Variance inflation factor (VIF)¶

For predictor \(j\) (intercept excluded), the predictors are first standardized (weighted mean subtracted, divided by weighted sample standard deviation) and weighted by \(\sqrt{W}\); a QR decomposition of that standardized weighted matrix \(Z^{*}\) gives factor \(R\), and:

\[ VIF_j = \left[(R^{\top}R)^{-1}\right]_{jj}\,(n - 1) \]

\(VIF = 1\) indicates no multicollinearity; values above roughly \(5\)–\(10\) indicate a problem. A single predictor is defined as \(VIF = 1\).

Optional X transformations¶

If enabled in Options, \(x\) is transformed before fitting:

\[ \text{Coded to } [-1, 1]:\quad x' = \frac{2\left(x - x_{\min}\right)}{x_{\max} - x_{\min}} - 1 \qquad\qquad \text{Standardized (z-score)}:\quad x' = \frac{x - \bar{x}}{s_x} \]

where \(s_x\) is the sample standard deviation of \(x\).

References¶

Draper, N. R., & Smith, H. (1998). Applied Regression Analysis (3rd ed.). New York: John Wiley & Sons.
Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). Introduction to Linear Regression Analysis (5th ed.). Hoboken, NJ: John Wiley & Sons.
Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2005). Applied Linear Statistical Models (5th ed.). New York: McGraw-Hill/Irwin.
Weisberg, S. (2005). Applied Linear Regression (3rd ed.). Hoboken, NJ: John Wiley & Sons.
Seber, G. A. F., & Lee, A. J. (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: John Wiley & Sons.
Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: John Wiley & Sons.
Golub, G. H., & Van Loan, C. F. (2013). Matrix Computations (4th ed.). Baltimore: Johns Hopkins University Press.