sas >> select certain number of records

by Mark » Wed, 25 May 2005 02:06:04 GMT

I need to work on a huge dataset. The first thing is to derive some
new variables based on existing variables.

So I want to pick the first 100 or 200 records into a small dataset to
test my code.

I know I can use ranuni() function to select a small number of records
from the original datasets.

But is there a proc sql Or data set function can do this?

Thanks,



sas >> select certain number of records

by harry.droogendyk » Wed, 25 May 2005 02:11:51 GMT


Data set OPTION.

data small_data;
set big_data ( obs=100);
run;

-----Original Message-----
From: SAS(r) Discussion [mailto: XXXX@XXXXX.COM ]On Behalf Of Mark
Sent: Tuesday, May 24, 2005 2:06 PM
To: XXXX@XXXXX.COM
Subject: select certain number of records


I need to work on a huge dataset. The first thing is to derive some
new variables based on existing variables.

So I want to pick the first 100 or 200 records into a small dataset to
test my code.

I know I can use ranuni() function to select a small number of records
from the original datasets.

But is there a proc sql Or data set function can do this?

Thanks,
__________________________________________________________________________________________________________________________________

This e-mail may be privileged and/or confidential, and the sender does not waive any related rights and obligations.
Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized.
If you received this e-mail in error, please advise me (by return e-mail or otherwise) immediately.

Ce courrier ectronique est confidentiel et prot L'expiteur ne renonce pas aux droits et obligations qui s'y rapportent.
Toute diffusion, utilisation ou copie de ce message ou des renseignements qu'il contient par une personne autre que le (les) destinataire(s) digns) est interdite.
Si vous recevez ce courrier ectronique par erreur, veuillez m'en aviser immiatement, par retour de courrier ectronique ou par un autre moyen.



sas >> select certain number of records

by jairajs » Wed, 25 May 2005 02:19:20 GMT

Mark,
You could just use firstobs= & obs= options in a datastep to create
the small dataset you want. This won't be a random sample, but I don't
think that matters to you.

Jairaj




select certain number of records

by LWn » Wed, 25 May 2005 02:31:56 GMT

Try

data new ;
set old (obs=100) ;
run ;

/LWn

"Mark" < XXXX@XXXXX.COM > skrev i meddelandet





select certain number of records

by pchoate » Wed, 25 May 2005 02:41:40 GMT

Hi Mark -

You can use point= to pull random lines out of a dataset -


<stolen shamelessly from sas-l>

/* efficient small random selection with replacement
from large file (no sorting)
*/
/* choose 10 random obs with replacement from w */
data samp ( drop = i ) ;
do i = 1 to 10 ;
pt = ceil ( ranuni ( 64320653 ) * nobs ) ;
set w point = pt nobs = nobs ;
output ;
end ;
stop ;
run ;

</stolen shamelessly from sas-l>

have fun

Paul Choate
DDS Data Extraction
(916) 654-2160

-----Original Message-----
From: SAS(r) Discussion [mailto: XXXX@XXXXX.COM ] On Behalf Of Mark
Sent: Tuesday, May 24, 2005 11:06 AM
To: XXXX@XXXXX.COM
Subject: select certain number of records

I need to work on a huge dataset. The first thing is to derive some
new variables based on existing variables.

So I want to pick the first 100 or 200 records into a small dataset to
test my code.

I know I can use ranuni() function to select a small number of records
from the original datasets.

But is there a proc sql Or data set function can do this?

Thanks,


select certain number of records

by pchoate » Wed, 01 Jun 2005 07:36:38 GMT

Hi Mark -

A different method I often use to "sample" a dataset just occured to
me. I say "sample" because it is just a selection of every nth record
until a certain number is reached, using the modulus function. It is
useful to take a quick look at a few records out of a dataset.

data sample;
set large.dataset;
if mod(_n_,1000)=567;
_n+1; drop _n;
if _n>100 then stop;
run;

This takes every 1000th record starting at 567, keeping the first 100.

As you can see it is easily adjusted to take a certain percentage of
records, or the first n spaced by any constant increment.

regards

Paul Choate





select certain number of records

by nospam » Wed, 01 Jun 2005 11:45:07 GMT

concur. A couple of additional points ...

John's code can be made more efficient for very large data sets by avoiding
the POINT= option. It's possible to have one SET statement for the purpose
of discovering the observation count:

if 0 then set largedataset nobs = totobs;

and another to actually fetch in the observations in the loop:

set largedataset;

That eliminates the need for the counter PICKIT as well.

In addition to FSP, the MODIFY statement and PROC SQL can also mark
observations for deletion.

On Tue, 31 May 2005 18:16:32 -0700, David L. Cassell
< XXXX@XXXXX.COM > wrote:



select certain number of records

by Randy.Dai » Wed, 01 Jun 2005 14:16:58 GMT

INOBS=n
restricts the number of rows (observations) that PROC SQL retrieves from any
single source. Tip: This option is useful for debugging queries on large
tables.

Randy Dai

-----Original Message-----
From: Mark [mailto: XXXX@XXXXX.COM ]
Sent: 2005?5?25? 2:06
To: XXXX@XXXXX.COM
Subject: select certain number of records


I need to work on a huge dataset. The first thing is to derive some
new variables based on existing variables.

So I want to pick the first 100 or 200 records into a small dataset to
test my code.

I know I can use ranuni() function to select a small number of records
from the original datasets.

But is there a proc sql Or data set function can do this?

Thanks,


select certain number of records

by pchoate » Thu, 02 Jun 2005 01:01:00 GMT

While David is as always correct that for a statistician wanting a
probabilistic sample survey select offers some nifty choices, the
modulus method can be easily adapted to handle many testing situations
that survey select is ill equiped to handle.

Apparently to the statistical hammerer every subsetting problem looks
like a probabilistic nail. Alternatively, the modulus method offers a
full toolbox of control and flexibility.
:-)

The choices are endless, here are a couple quick examples -

If you want 100 records spaced 1000 apart from the end of the file:

data largedataset;
do i=1 to 10e5;
output;
end;

data sample;
set largedataset NOBS=nobs;
if _n_>nobs-1000*100;
if mod(_n_,1000)=567;
run;


If you want 30 groups of 25 records spaced 100 apart (start to start)
from the middle of the file:

data sample;
set largedataset NOBS=nobs;
if _n_>nobs/2-100*30/2;
if 1 le mod(_n_,100) le 25;
if _n_>nobs/2+100*30/2 then stop;
run;

etc.

Any sequential selection of records for any size record subset is as
easily and efficiently obtained. To the programmer this
non-probablistc control can be essential when exploring a file for
testing or other purposes.

Regards to all, statisticians and otherwise-

Paul Choate






Similar Threads

1. number records by a certain variable

2. logical error when writing a certain number of records from one file to another

Hi,
This is the code I have made to traverse a file and write the first 4
records out to another file:

filename in     's:\fremkon\r20-prel.dat';
filename out    'D:\TEST\fixed_files\out.dat';
%let num_rec = 4; *number of records to write;
%let rl = 65; *record length;

data _null_;
    infile in lrecl=&rl recfm=v;
    file out lrecl=&rl recfm=v pad; 
    input;
    if _n_ <= &num_rec then ;
        do;
            *removing crlf that are disturbing SAS;
            if length(_infile_) < &rl then 
                do;
                    _infile_=tranwrd(_infile_,'0d0a'x,"  ");
                end;
            put _infile_;
        end;
run;


The SAS log doesn't seem to agree with the logic I had in mind when I
wrote the code:

22   filename in     's:\fremkon\r20-prel.dat';
23   filename out    'D:\TEST\fixed_files\out.dat';
24   %let num_rec = 4; *number of records to write;
25   %let rl = 65; *record length;
26
27   data _null_;
28       infile in lrecl=&rl recfm=v;
29       file out lrecl=&rl recfm=v pad;
30       input;
31       if _n_ <= &num_rec then ;
32           do;
33               *removing crlf that are disturbing SAS;
34               if length(_infile_) < &rl then
35                   do;
36                       _infile_=tranwrd(_infile_,'0d0a'x,"  ");
37                   end;
38               put _infile_;
39           end;
40   run;

NOTE: The infile IN is:
      File Name=s:\fremkon\r20-prel.dat,
      RECFM=V,LRECL=65

NOTE: The file OUT is:
      File Name=D:\TEST\fixed_files\out.dat,
      RECFM=V,LRECL=65

NOTE: 423554 records were read from the infile IN.
      The minimum record length was 0.
      The maximum record length was 65.
NOTE: 423554 records were written to the file OUT.
      The minimum record length was 65.
      The maximum record length was 65.
NOTE: DATA statement used (Total process time):
      real time           2.12 seconds
      cpu time            1.10 seconds

---
Why are not just 4 records written to the file OUT ?


Regards 
Rune Runnest

3. Function to select one record in each of multiple subsets of records

4. How to randomly select a certain data from a large dataset

Hello SAS-Ls,

I have a large dataset, about 20,000 data, and I need randomly select just 400 of it. How do I use ranuni to retrieve only 400 data? Thanks a lot.

Jane

5. How to randomly select a certain data from a large dataset -

6. how to get the records meet certain requirements

dataset1

name  ID Gender
Jack   18  M
B        20  F
D        10  F
E        20 F
Mike   18  M
F        20  F
H        10  F
L        20 F

Dataset2

State   Label   City
IA        18      M
GA       20      F
LA       10      F
NC      20      F

How to write a program to create a dataset "Search" where hold the
records where the value of label in dataset 2 equal the value of ID in
dataset1. The variables in the new datasets are Name, Label, and City.

7. Identifying certain records

8. how to fix a dislocation in a certain record of a file

Hi,
In the example below, I try to fix a dislocation in a file.
The file have a dislocation i records # 4, 8 and 13.

filename wrong 'd:\wrong.dat';
filename right 'd:\rigth.dat';


data _null_;
    file wrong recfm=f lrecl=34;
    put '123456789012345678901234567890-a ';
    put '123456789012345678901234567890-b ';
    put '123456789012345678901234567890-c ';
    put '1234567890 12345678901234567890-d '; *rec# 4;
    put '123456789012345678901234567890-e ';
    put '123456789012345678901234567890-f ';
    put '123456789012345678901234567890-g ';
    put '1234567890 12345678901234567890-h '; *rec# 8;
    put '123456789012345678901234567890-i ';
    put '123456789012345678901234567890-j ';
    put '123456789012345678901234567890-k ';
    put '123456789012345678901234567890-l ';
    put '1234567890 12345678901234567890-m '; *rec# 13;
run;

data _null_;
    infile wrong recfm=f lrecl=33;    
    file right recfm=f lrecl=33;
    input;
    if _n_ = 4 then
        do;
            substr(_infile_,11) = substr(_infile_,12);
        end;
    put _infile_;
run;

The result is not what I expect. First, 14 rather than 13 records are
read from  wrong.dat. Second, there is a change in the file, but not
the one I wanted.

What is wrong with this code ?

Regards
Rune Runnest