sas >> Order of records returned by SELECTs without ORDER Bys (was :

by pardee.r » Thu, 14 Jul 2005 03:58:55 GMT

Well...

I too would be astounded if proc sql returned recs in a different order
when the SELECTs and base datasets are identical. But otherwise, all
bets are off.

In _SQL For Smarties_ (2d ed.) Celko has this to say:

Most optimizers will look at the result set and
see from the query if it is already in sorted
order as a result of fetches done with an index,
thus avoiding a redundant sorting operation. The
bad news is that many programmers have written
code that depends on the way that their particular
release of a particular brand of SQL product
presented the result. This auttomatic ordering
can disappear when an index is dropped or changed,
or when the database is upgraded to a new release
or ported to another product.

So... To the extent that sort order depends on index use, and index use
depends on characterisitcs of the underlying datasets involved (e.g.,
when you join a big table to a smaller table on an indexed key, and the
smaller dataset size changes to the point where sas figures it's no
longer profitable to use the index) you could definitely wind up with
inconsistant results.

Can I demonstrate the problem w/test code? Nope. But I'm inclined to
keep my ORDER BY in cases where sort order matters...

-----Original Message-----
From: SAS(r) Discussion [mailto: XXXX@XXXXX.COM ] On Behalf Of
Michael Murff
Sent: Wednesday, July 13, 2005 11:47 AM
To: XXXX@XXXXX.COM
Subject: Re: Views and passes (was RE: Output last record of fantom by
group)


Roy et al.,

I omitted the order by statement for reasons previously noted. I would
be astounded if Proc SQL actually returned a different order without an
order by statement, at least in this simple application. This would
really shake my faith in its usefulness.

Anyone else surprised that the SQL view took so much longer to process
than the datastep view?

Michael

-----Original Message-----
From: Pardee, Roy [mailto: XXXX@XXXXX.COM ]
Sent: Wednesday, July 13, 2005 11:58 AM
To: Michael Murff; XXXX@XXXXX.COM
Subject: RE: Re: Views and passes (was RE: Output last record of fantom
by
group)

Interesting--thanks! So did you include the ORDER BY in that view def?

I'm actually kind of attached to that ORDER BY--tho I totally take
Dale's point that it's unnecessary for this application. Having drunk
the rdbms-kool-aid ;-) I've internalized the idea that SELECTs not
sporting ORDER BYs return rows in an 'undefined' order, meaning that the
order can change from execution to execution (or version to version of
the software executing the SELECT).

Now, I'd *hope* that proc sql would check the source dset's sort
assertion, and avoid actually sorting the data (and I bet that's just
what it would do, but look how often I'm wrong lately). But even the
check would take some time. And of course plenty of sorted datasets
don't have their sort assertion set...

-----Original Message-----
From: SAS(r) Discussion [mailto: XXXX@XXXXX.COM ] On Behalf Of
Michael Murff
Sent: Wednesday, July 13, 2005 10:39 AM
To: XXXX@XXXXX.COM
Subject: Re: Views and passes (was RE: Output last record of fantom by
group)


I'm not sure what happened with the merge, but I may have miffed the
implementation. Still, this should give some idea of run times. Datastep
view is still the winner.

ONE PASS:
NOTE: The data set WORK.ONE_PASS has 245760 observations and 8
variables.
NOTE: DATA statement used:
real time 14:27.92
cpu time 11:53.12


SQL VIEW:
NOTE: There were 56620562 observations read from the data set
WORK.WITH_YEAR.
NOTE: The data set WORK.WANT has 245760 observations and 9 variables.
NOTE: DATA statement used:
real time 28:02.35
cpu time 25:18.08

DATASTEP MERGE:
NOTE: The data set WORK.ONE_SHIFT has 0 observations and 8 variables.
NOTE: DATA statement used:
real time 14:28.21
cpu time 13:22.24

-----Original Message-----
From: SAS(r) Discussion [mailto: XXXX@XXXXX.COM ] On Behalf Of
Dale McLerran
Sent: Wednesday, July 13, 2005 10:24 AM
To: XXXX@XXXXX.COM
Subject: Re: Views and passes (was RE: Output last record of fantom by
group)

--- "Pardee, Roy" < XXXX@XXXXX.COM > wrote:

> Excellent thread--thanks much everyone. I'm happy to be disabused of
> the notion that dstep views get 'realized' on disk on each use. Very
> cool stuff.
>
> So now I'm naturally suspicious of my intuition WRT sql views. If
> Michael or anybody else w/real data to exercise this on feels like it,

> I'd be curious to see how this performed relative to the dstep view:
>
> proc sql ;
> create view with_year as
> select *, year(date) as yr
> from have
> order by id, date ;
> quit ;
>
> data want;
> set with_year ;
> by id yr ;
> if last.yr;
> run;
>
> Thanks!
>
> -Roy
>

Roy,

I would not include the order by clause in the SQL view if you know that
the data are already ordered by ID and YEAR. SQL would have to check on
ordering before passing the data along. When views require random access
into the data (as is required when performing a sort), then the data
constructed by the view are first written to disk - at least when the
view is a data step view. I would think the same would be true for a
SQL view. This was discussed by Jeffrey Polzin in a SUGI 18
presentation "Introduction to DATA Step Views".

In the present case, we were informed that the data have already been
written by ID and date variables. Therefore, the data are already
ordered by ID and year. It is not necessary, and indeed would be
wasteful (counterproductive), to have SQL attempt to order the data as
part of the view processing.

Dale


---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: XXXX@XXXXX.COM
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com


Similar Threads

1. Order of records returned by SELECTs without ORDER Bys (was :

2. "define / order order=internal" not being honored by proc

3. why simple kappa are different in proc freq order=data, and order=internal

I found the simple kappa value different in proc freq:
proc freq; (by default order=internal)
proc freq order=data;
why? The simple kappa will be higher in order=internal than in
order=data;

order=DATA
orders values according to their order in the input data set.
order=INTERNAL
orders values by their unformatted values, which yields the same order
that the SORT procedure does. This order is operating-environment
dependent.

proc freq data=aaa;
tables q11ts6s1*q11ts6s1r2 q11ts6s2*q11ts6s2r2 / nocol norow agree;
run;

why this happen? related to the definition of simple kappa
coefficient?  thanks.

4. Returning to original sort order

5. Select Top 5 * From Databases Order By Sales was: 6/20-6/26

6. Rank order of records in an ordinal logistic

7. Ordered Linking of Records on two Variables

Hi SAS-Lers -

I can't come up with a nice solution to this simple sounding problem:

A set of records has two numeric variables, Start and End.  For each
record there is another record such that the first end variable uniquely
matches the second start variable. The records need to be ordered based
on this start-end linkage.

They form a ring, so the first and last record in the set doesn't
matter, just the ordering.  My example has four records but the actual
number may be in the thousands.  I'd like to do this with a limited
number of joins, not with the number of joins dependent on the number of
records.

Example:
R1  1  3
R2  2  1
R3  3  4
R4  4  2

One of four possible results (any result okay):
R1  1  3
R2  3  4
R3  4  2
R4  2  1


Here's some starter code...
data ordered;
     input record $2.
           Start
           End;
cards;
R1  1  3
R2  2  1
R3  3  4
R4  4  2
;


Background - the records are endpoints from map boundary segments - for
GMAP to work they need to be ordered around the full boundary. I have
clusters of counties from which I've swept out the shared borders, now I
need to order the remaining segments around the exterior of each
cluster. There may be a GMAP solution to this - if so I'd be interested
in that as an alternative.

I'll have a large bowl of left-over candy for best solution.  :)

Paul Choate
DDS Data Extraction
(916) 654-2160

8. making the orders of records vice versa