sas >> How to Proc SQL select top N records?

by Ruve » Wed, 12 Mar 2008 02:40:29 GMT

How to Proc SQL select top N records in SAS?

Just like "Select Top 10" in SQL Server? I just want to see the top
10 results without having to saving all query results into a dataset.

Thanks!


sas >> How to Proc SQL select top N records?

by rsermer@gmail.com » Wed, 12 Mar 2008 02:48:10 GMT


top 10 results use the following where condition:

WHERE ROWNUM <= 10





sas >> How to Proc SQL select top N records?

by sraimi » Wed, 12 Mar 2008 03:04:01 GMT






proc sql outobs=10; /* limit to first 10 results */
select groupvar, count(*)
from table
group by groupvar
order by 2 desc; /* this is how you make them the TOP 10 */
run;


Steve Raimi


How to Proc SQL select top N records?

by pardee.r » Wed, 12 Mar 2008 03:33:24 GMT

Couple of notes here:

1) You may want to add the option NOWARN after the outobs= spec to
suppress WARNINGs in your log due to th outobs spec.
2) If you have a WHERE clause, note that SAS will process your entire
dataset in order to apply it--so this isn't something to do for
performance reasons.

HTH,

-Roy

-----Original Message-----
From: SAS(r) Discussion [mailto: XXXX@XXXXX.COM ] On Behalf Of
Steven Raimi
Sent: Tuesday, March 11, 2008 12:04 PM
To: XXXX@XXXXX.COM
Subject: Re: How to Proc SQL select top N records?






proc sql outobs=10; /* limit to first 10 results */
select groupvar, count(*)
from table
group by groupvar
order by 2 desc; /* this is how you make them the TOP 10 */ run;


Steve Raimi


How to Proc SQL select top N records?

by hs AT dc-sug DOT org » Wed, 12 Mar 2008 09:54:30 GMT





I think that more or less begs the question. How do you derive the ROWNUM
column?


How to Proc SQL select top N records?

by hs AT dc-sug DOT org » Wed, 12 Mar 2008 10:00:58 GMT





3. If there is a tie for 10th (or a 3-way tie for 9th, etc.) this method
will arbitrarily report some but not all of the tied rows.




How to Proc SQL select top N records?

by pardee.r » Thu, 13 Mar 2008 00:33:25 GMT

Yeah, this won't work in SAS. Our correspondent here may be used to
working with oracle, which adds that 'pseudocolumn' automatically (tho
even on oracle I think doing top N is a tad bit more complicated).

-----Original Message-----
From: SAS(r) Discussion [mailto: XXXX@XXXXX.COM ] On Behalf Of
Howard Schreier <hs AT dc-sug DOT org>
Sent: Tuesday, March 11, 2008 6:54 PM
To: XXXX@XXXXX.COM
Subject: Re: How to Proc SQL select top N records?

On Tue, 11 Mar 2008 11:48:10 -0700, XXXX@XXXXX.COM



I think that more or less begs the question. How do you derive the
ROWNUM column?


How to Proc SQL select top N records?

by HERMANS1 » Thu, 13 Mar 2008 01:26:42 GMT

In a much earlier post I recommended creating a row index variable such
as ROWNUM in a Data step view using the automatic _N_ variable.
S

-----Original Message-----
From: XXXX@XXXXX.COM [mailto: XXXX@XXXXX.COM ]
On Behalf Of Pardee, Roy
Sent: Wednesday, March 12, 2008 12:33 PM
To: Howard Schreier <hs AT dc-sug DOT org>; XXXX@XXXXX.COM
Subject: RE: Re: How to Proc SQL select top N records?


Yeah, this won't work in SAS. Our correspondent here may be used to
working with oracle, which adds that 'pseudocolumn' automatically (tho
even on oracle I think doing top N is a tad bit more complicated).

-----Original Message-----
From: SAS(r) Discussion [mailto: XXXX@XXXXX.COM ] On Behalf Of
Howard Schreier <hs AT dc-sug DOT org>
Sent: Tuesday, March 11, 2008 6:54 PM
To: XXXX@XXXXX.COM
Subject: Re: How to Proc SQL select top N records?

On Tue, 11 Mar 2008 11:48:10 -0700, XXXX@XXXXX.COM



I think that more or less begs the question. How do you derive the
ROWNUM column?


Similar Threads

1. Selecting the top 3 records with a twist

2. Selecting TOP X child records for a parent record

Hi,

I have a stored procedure that has to extract the child records for
particular parent records.

The issue is that in some cases I do not want to extract all the child
records only a certain number of them.

Firstly I identify all the parent records that have the requird number
of child records and insert them into the result table.

insert into t_AuditQualifiedNumberExtractDetails
 	(BatchNumber,
	 EntryRecordID,
	 LN,
	 AdditionalQualCritPassed)
(select t1.BatchNumber,
        t1.EntryRecordID,
        t1.LN,
        t1.AdditionalQualCritPassed
from
(select BatchNumber,
       RecordType,
       EntryRecordID,
       LN,
       AdditionalQualCritPassed
 from t_AuditQualifiedNumberExtractDetails_Temp) as t1
 inner join
(select BatchNumber,
	RecordType,
	EntryRecordID,
	Count(*) as AssignedNumbers,
	max(TotalNumbers) as TotalNumbers
from t_AuditQualifiedNumberExtractDetails_Temp
group by BatchNumber, RecordType, EntryRecordID
having count(*) = max(TotalNumbers)) as t2
on t1.BatchNumber = t2.BatchNumber
and t1.RecordType = t2.RecordType
and t1.EntryRecordID = t2.EntryRecordID)

then insert the remaining records into a temp table where the number of
records required does not equal the total number of child records, and
thenloop through each record manipulating the ROWNUMBER to only select
the number of child records needed.

insert into @t_QualificationMismatchedAllocs
	 	([BatchNumber],
		 [RecordType],
		 [EntryRecordID],
		 [AssignedNumbers],
		 [TotalNumbers])
	(select BatchNumber,
	   	RecordType,
	       	EntryRecordID,
	       	Count(*) as AssignedNumbers,
	       	max(TotalNumbers) as TotalNumbers
	from t_AuditQualifiedNumberExtractDetails_Temp
	group by BatchNumber, RecordType, EntryRecordID
	having count(*) <> max(TotalNumbers))

	SELECT @QualificationMismatched_RowCnt = 1

	SELECT @MaxQualificationMismatched = (select count(*) from
@t_QualificationMismatchedAllocs)

while @QualificationMismatched_RowCnt <= @MaxQualificationMismatched
	begin
	--## Get Prize Draw to extract numbers for
		select 	@RecordType  = RecordType,
			@EntryRecordID = EntryRecordID,
			@AssignedNumbers = AssignedNumbers,
			@TotalNumbers = TotalNumbers
		from @t_QualificationMismatchedAllocs
		where QualMismatchedAllocsRowNum = @QualificationMismatched_RowCnt

			SET ROWCOUNT @TotalNumbers

		insert into t_AuditQualifiedNumberExtractDetails
		                (BatchNumber,
			 EntryRecordID,
			 LN,
			 AdditionalQualCritPassed)
     		(select BatchNumber,
		        EntryRecordID,
		        LN,
		        AdditionalQualCritPassed
		 from t_AuditQualifiedNumberExtractDetails_Temp
		 where RecordType = @RecordType
		 and EntryRecordID = @EntryRecordID)

		SET @QualificationMismatched_RowCnt =

QualificationMismatched_RowCnt + 1
		SET ROWCOUNT 0
	end

Is there a better methodology for doing this .....

Is the use of a table variable here incorrect ?

Should I be using a temporary table or indexed table if there are a
large number of parent records where the child records required does
not match the total number of child records ?

3. How to update top 20 records in sql server 2000 using [stored proc - SQL Server

4. Display Top Ns and sum of All others

We need to display a Geography Dimension with levels

NATIONAL
REGIONS
AREAS

As 

National
	Region1
		Area1
		Area2
		All Other Areas Under Region1
	Region2
		Area3
		Area4
		All Other Areas Under Region2
	All Other Regions Under National

All Other Areas Under Region1 Sum of the areas under Region1 that are not
displayed in the report under region1. Similarly Region2

All Other Regions Under National Sum of the Regions under National that are
not displayed in the report under National.

The members that are selected to be displayed in the report can be a variable
list.
Similarly the conditions to display the child for each levels also is
variable

(For eg) 
Top 2 Regions, Bottom 3 Areas for one report
And  
Regions whose valu> > 100, Top 3 Areas for one report

We have formed the query to display the report but for the LL Other..
members

SELECT { [Measures].[values] } on axis(0),
ORDER(
Hierarchize(
{
GENERATE(
GENERATE(
Head(Order([GEOGRAPHY].[GEO].[NATIONAL].members, [Measures]. [VALUES], DESC),
1),
{
[GEOGRAPHY].[geo].CurrentMember,
Head(Order(Descendants ([GEOGRAPHY]. [geo].CurrentMember,[GEOGRAPHY].[GEO].
[REGIONS]), [Measures]. [VALUES], DESC), 2)
}
),

[GEOGRAPHY].[geo].CurrentMember,
Head(Order(Descendants ([GEOGRAPHY]. [geo].CurrentMember,[GEOGRAPHY].[GEO].
[AREAS]), [Measures]. [VALUES], DESC), 2)
}
)
}
), 
[Measures].[VALUES], DESC)
on axis(1)
FROM [USERDEMO 1 4 U ABC C99980_4325__20030317]

Can you help us in completing this query or suggesting a better optimized way
of retrieving the report?

5. PROC SQL - select records according to FORMAT value rather

6. PROC SQL - select records according to FORMAT value rather than

Hi

Question:=20
How do I use a WHERE statement in PROC SQL to select a record according
to a variable's formatted value rather than its actual value? =20

E.g.:=20
Suppose X1 =3D C000 and applying a (character) format makes X1 =3D '1'.
Here, 1 is treated as a character rather than a number (reasoning
below). A straight WHERE X =3D '1' statement in PROC SQL doesn't pick up
this record, because the actual value is 'C000' not '1'.

Background:=20
I have a large patient discharge dataset. I want to select specific
records if their primary diagnosis (DIAG01) or their external injury
(ECODE01) match the list I'm interested in, so I can ultimately sum a
cost variable (WIES).  A record may have both a DIAG01 and ECODE01 that
I am interested in, but I only want to count each record once. The
ECODE01 takes priority.=20

So, I want to split the dataset into two, and analyse the resulting sets
separately.  That is, I want a set with matched ECODE01s and residual
set with matching DIAG01s (the remaining, unmatched records are
irrelevant and I don't want to keep them in the analytical datasets).

I've managed to do this by a torturous PROC SQL where I list all the
ECODE01s I want to keep.  But this is hard coded in with a statemetn
WHERE ECODE01 =3D 'xxxx' OR ECODE01 =3D 'xxxx' OR ...
But I want to be able to easily track what's on the list, and add or
remove conditions by changing an input file rather than my code.

Partial solution:
I've just thought, instead, of using PROC FORMAT, as follows using a
dataset (TEMP01), which is a list of all the ECODE01s I want to match,
to create a format ($INJURY), applying this to my dataset (TEMP00), and
then using PROC SQL to create two datasets with the appropriate records
(matched ECODE01s and matched DIAG01s; I haven't done the latter step
for matching DIAG01s yet).

data temp00;
      input ID DIAG01 $ ECODE01 $ WIES ;
      cards ;
      1      C000      .        0.7129  /* Keep because of relevant
DIAG01 */
      2      C000      W1009    0.4105  /* Keep because of relevant
ECODE01; overrides relevant DIAG01 */
      3      C000      Y838     6.9079  /* Keep because of relevant
DIAG01 despite irrelevant ECODE01 */
      4      S010      .        3.0588  /* Discard, nothing relevant */
      5      S010      W2208    0.2986  /* Keep because of relevant
ECODE01 */
      6      S010      Y870     2.4198  /* Discard, nothing relevant */
;

data temp01;
      input ECODE01 $;
      cards ;
        W1009
        W2208
;

DATA temp02 (rename=3D (ecode01=3Dstart));
        set temp01;
        fmtname=3D'$injury';                    /* Defines a character
format, so I can apply it to an existing character varialbe
        label =3D 1;                            /* This comes out as a
character variable; I don't want to use a letter. (Because my format is
LENGTH 1, it only picks up the first element of the variable to which I
apply the format (and which always starts with a letter.) */
RUN;

PROC FORMAT cntlin=3Dtemp02;
QUIT;

DATA temp03;
SET temp02;
FROMAT ecode01 $injury.;
RUN;

PROC SQL;
CRATE TABLE temp04 AS
SELECT *=20
FROM temp03=20
WHERE ecode01 =3D '1';
QUIT;

PROC PRINT DATA=3D temp04; QUIT;

NOTE: No observations in data set WORK.TEMP04.
NOTE: The PROCEDURE PRINT used 0.0 seconds.

Problem:
There are no records where ecode01 =3D '1', only (some) records where =
the
format of ecode01 =3D '1', i.e. records 2 and 5. Can I modify the WHERE
statement so it targets the formatted value rather than the actual
value?

Thanks in advance.

Adrian.

------------------------------------------------------------------------
--------------------------------------
School of Economics and Finance
Victoria University of Wellington

P   +64 4 463 5233 xtn 8571
E    XXXX@XXXXX.COM 
A   Room 315, Rutherford House, PO Box 600, Wellington, NEW ZEALAND=20
------------------------------------------------------------------------
--------------------------------------

7. CORRECTION: PROC SQL - select records according to FORMAT

8. CORRECTION: PROC SQL - select records according to FORMAT value

Sorry, I pasted the example code over with errors. Corrected example
below.

DATA temp00;
      INPUT id diag01 $ ecode01 $ wies ;
      CARDS ;
      1      C000      .        0.7129
      2      C000      W1009    0.4105
      3      C000      Y838     6.9079
      4      S010      .        3.0588
      5      S010      W2208    0.2986
      6      S010      Y870     2.4198
;

DATA temp01;
      INPUT ecode01 $;
      CARDS ;
        W1009
        W2208
;

DATA temp02 (RENAME=3D (ecode01=3Dstart));
        SET temp01;
        fmtname=3D'$injury';
        label =3D 1;
RUN;

PROC FORMAT cntlin=3Dtemp02;
QUIT;

DATA temp03;
SET temp01;
FORMAT ecode01 $injury.;
RUN;

PROC SQL;
CREATE TABLE temp04 AS
SELECT *
FROM temp03
WHERE ecode01 =3D '1';
QUIT;

PROC PRINT DATA=3D temp04; QUIT;

> ______________________________________________=20
> From:         Adrian Slack =20
> Sent: Wednesday, 10 December 2008 12:53 p.m.
> To:   ' XXXX@XXXXX.COM '
> Subject:      PROC SQL - select records according to FORMAT value
> rather than actual value
>=20
> Hi
>=20
> Question:=20
> How do I use a WHERE statement in PROC SQL to select a record
> according to a variable's formatted value rather than its actual
> value? =20
>=20
> E.g.:=20
> Suppose X1 =3D C000 and applying a (character) format makes X1 =3D =
'1'.
> Here, 1 is treated as a character rather than a number (reasoning
> below). A straight WHERE X =3D '1' statement in PROC SQL doesn't pick =
up
> this record, because the actual value is 'C000' not '1'.
>=20
> Background:=20
> I have a large patient discharge dataset. I want to select specific
> records if their primary diagnosis (DIAG01) or their external injury
> (ECODE01) match the list I'm interested in, so I can ultimately sum a
> cost variable (WIES).  A record may have both a DIAG01 and ECODE01
> that I am interested in, but I only want to count each record once.
> The ECODE01 takes priority.=20
>=20
> So, I want to split the dataset into two, and analyse the resulting
> sets separately.  That is, I want a set with matched ECODE01s and
> residual set with matching DIAG01s (the remaining, unmatched records
> are irrelevant and I don't want to keep them in the analytical
> datasets).
>=20
> I've managed to do this by a torturous PROC SQL where I list all the
> ECODE01s I want to keep.  But this is hard coded in with a statemetn
> WHERE ECODE01 =3D 'xxxx' OR ECODE01 =3D 'xxxx' OR ...
> But I want to be able to easily track what's on the list, and add or
> remove conditions by changing an input file rather than my code.
>=20
> Partial solution:
> I've just thought, instead, of using PROC FORMAT, as follows using a
> dataset (TEMP01), which is a list of all the ECODE01s I want to match,
> to create a format ($INJURY), applying this to my dataset (TEMP00),
> and then using PROC SQL to create two datasets with the appropriate
> records (matched ECODE01s and matched DIAG01s; I haven't done the
> latter step for matching DIAG01s yet).
>=20
> data temp00;
>       input ID DIAG01 $ ECODE01 $ WIES ;
>       cards ;
>       1      C000      .        0.7129        /* Keep because of
> relevant DIAG01 */
>       2      C000      W1009    0.4105        /* Keep because of
> relevant ECODE01; overrides relevant DIAG01 */
>       3      C000      Y838     6.9079        /* Keep because of
> relevant DIAG01 despite irrelevant ECODE01 */
>       4      S010      .        3.0588        /* Discard, nothing
> relevant */
>       5      S010      W2208    0.2986        /* Keep because of
> relevant ECODE01 */
>       6      S010      Y870     2.4198        /* Discard, nothing
> relevant */
> ;
>=20
> data temp01;
>       input ECODE01 $;
>       cards ;
>         W1009
>         W2208
> ;
>=20
> DATA temp02 (rename=3D (ecode01=3Dstart));
>         set temp01;
>         fmtname=3D'$injury';                  /* Defines a character
> format, so I can apply it to an existing character varialbe
>         label =3D 1;                          /* This comes out as a
> character variable; I don't want to use a letter. (Because my format
> is LENGTH 1, it only picks up the first element of the variable to
> which I apply the format (and which always starts with a letter.) */
> RUN;
>=20
> PROC FORMAT cntlin=3Dtemp02;
> QUIT;
>=20
> DATA temp03;
> SET temp02;
> FROMAT ecode01 $injury.;
> RUN;
>=20
> PROC SQL;
> CRATE TABLE temp04 AS
> SELECT *=20
> FROM temp03=20
> WHERE ecode01 =3D '1';
> QUIT;
>=20
> PROC PRINT DATA=3D temp04; QUIT;
>=20
> NOTE: No observations in data set WORK.TEMP04.
> NOTE: The PROCEDURE PRINT used 0.0 seconds.
>=20
> Problem:
> There are no records where ecode01 =3D '1', only (some) records where
> the format of ecode01 =3D '1', i.e. records 2 and 5. Can I modify the
> WHERE statement so it targets the formatted value rather than the
> actual value?
>=20
> Thanks in advance.
>=20
> Adrian.
>=20
> ----------------------------------------------------------------------
> ----------------------------------------
> School of Economics and Finance
> Victoria University of Wellington
>=20
> P   +64 4 463 5233 xtn 8571
> E    XXXX@XXXXX.COM 
> A   Room 315, Rutherford House, PO Box 600, Wellington, NEW ZEALAND=20
> ----------------------------------------------------------------------
> ----------------------------------------
>=20