PCA vs. Regression Slope
The article discusses the differences between PCA slope and regression slope in data analysis. It highlights how the regression line minimizes squared vertical residuals while the PCA line minimizes total squared perpendicular distance. The findings indicate that the angular difference between the two fits is most pronounced when the regression slope is at a specific value.
- ▪A regression fit can appear poor compared to PCA due to different minimization methods.
- ▪The PCA slope is influenced by the variance of the noise in the data.
- ▪The angular difference between PCA and regression slopes is maximized at a regression slope of m = 1/sqrt(3).
Opening excerpt (first ~120 words) tap to expand
PCA slope vs regression slope Published on Sat 16 May 2026 A regression fit can sometimes look surprisingly poor when plotted on top of data, with the two having visually different slopes. The reason is that the line that visually seems to go through the data best is that which minimizes total projection error, while the least squares fit minimizes squared \(y\)-error at each fixed \(x\) (e.g., see this prior Hacker News thread for discussion). Here, we make these points concrete by working out the exact angular difference between the two fits. We find the effect is strongest when the least squares fit has a slope of \(m = 1/\sqrt{3}\), running \(30\) degrees off the x-axis, as in Figure 1 below. Figure 1: Synthetic data generated from (\ref{1}) with \(m=1/\sqrt{3}\), \(\sigma^2=2.25\).
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Jonathan Landy.