And stepping backwards is almost always better than stepping forward,
for the obvious that X1 and X2 may fit Y perfectly as a pair, but
neither will do as well by itself as X3.
This comment needs to be further clarified.
An extreme outlier can be in the (X-y) space or in the space of
residuals (observed errors).
In the older books, papers, and older software packages, undue
emphases had often beeb placed on the observed error space only.
But some of those "outliers" near the center of the X-space may
have as little as ZERO influence on the fitted model (if it
coincides with X-bar).
Conversely, there may be points far from the center of X that
exerts tremendous influence on the fitted model but would not
exhibit itself as an "outlier" in the residual space.
Jerry Dallal gave a simple expository explanation of these concepts
of "leverage points" and "influential points" in the link:
http://www.tufts.edu/ ~gdallal/diagnose.htm
The new subject area has been known as "regression diagnostics",
much more than just the analysis of "outliers", e.g.
http://www.google.com/search?hl=en&q=textbook +on+regression+diagnostics
Cook and Weisberg (1982) is another standard reference text:
http://www.math.montana.edu/Rweb/Rhelp/influence.measures.html
Chatterjee, Hadi, and Price (1984?) is wtill another:
http://www.ats.ucla.edu/stat/examples/chp/default.htm
See also Chatterjee, S., and A. S. Hadi, "Influential Observations,
High Leverage Points, and Outliers in Linear Regression," Statistical
Science, 1:379-416, 1986.
and many other references in post-1980 literature.
-- Bob.