And stepping backwards is almost always better than stepping forward,

for the obvious that X1 and X2 may fit Y perfectly as a pair, but

neither will do as well by itself as X3.

This comment needs to be further clarified.

An extreme outlier can be in the (X-y) space or in the space of

residuals (observed errors).

In the older books, papers, and older software packages, undue

emphases had often beeb placed on the observed error space only.

But some of those "outliers" near the center of the X-space may

have as little as ZERO influence on the fitted model (if it

coincides with X-bar).

Conversely, there may be points far from the center of X that

exerts tremendous influence on the fitted model but would not

exhibit itself as an "outlier" in the residual space.

Jerry Dallal gave a simple expository explanation of these concepts

of "leverage points" and "influential points" in the link:

http://www.tufts.edu/ ~gdallal/diagnose.htm

The new subject area has been known as "regression diagnostics",

much more than just the analysis of "outliers", e.g.

http://www.google.com/search?hl=en&q=textbook +on+regression+diagnostics

Cook and Weisberg (1982) is another standard reference text:

http://www.math.montana.edu/Rweb/Rhelp/influence.measures.html
Chatterjee, Hadi, and Price (1984?) is wtill another:

http://www.ats.ucla.edu/stat/examples/chp/default.htm
See also Chatterjee, S., and A. S. Hadi, "Influential Observations,

High Leverage Points, and Outliers in Linear Regression," Statistical

Science, 1:379-416, 1986.

and many other references in post-1980 literature.

-- Bob.