Hello,
I have to pick 30.000 random rows from a file of 20 millions rows. The
file is 12GB. I am new at SAS. Any advices, please?
Thanks,
-k
2. How to pick up the random variables in a dataset
In a data set I have 10 variables.just i want to pickup only even varibles like ex: data new; input a1 a2 a3 a4 a5 a6 a7 a8 a9 a10; cards; 1 2 3 4 5 6 7 8 9 10 ; run; I need out put like a2 a4 a6 a8 a10 2 4 6 8 10 If u know anyone to solve this problem
3. how to pick the last 65000 rows
4. Reading the data from the excel file into SAS by row by row
5. Pick up highest observations from each variable in a data set
6. picking up duplicate records in a data set
Below is shown a way to identify duplicate records in a data set, using PROC
SQL. One of the records in DS1 has one duplicate, and the other has two.
I want is to make another data set which contains the actual duplicates. Not
only a record with their unique value. In the case below, the data set of
duplicates should have 3 records, not 2.
I have looked in the online documentation for PROC SORT with the NODUPRECS
option, but the out= data set contains the records *without* the duplicates.
Like this:
proc sort data = ds1 noduprecs out = ds1_without_dups ;
by
PartNumber
Description
InStock
ReceivedDate
Price;
run;
proc print; run;
Is there a way then to make a third data set - ds1_the_dups - that could
contain all records that are in ds1 but not in ds1_without_dups ?
Rune
data ds1;
input
PartNumber $
Description $
InStock
@19 ReceivedDate date9.
Price;
format ReceivedDate date9.;
datalines;
K89R seal 34 27jul1998 245.00
M4J7 sander 98 20jun1998 45.88
LK43 filter 121 19may1999 10.99
MN21 brace 43 10aug1999 27.87
BC85 clamp 80 16aug1999 9.55
NCF3 valve 198 20mar1999 24.50
NCF3 valve 198 20mar1999 24.50
NCF3 valve 198 20mar1999 24.50
KJ66 cutter 6 18jun1999 19.77
UYN7 rod 211 09sep1999 11.55
UYN7 rod 211 09sep1999 11.55
JD03 switch 383 09jan2000 13.99
BV1E timer 26 03aug2000 34.50
;
proc sql;
create table ds1_dup as
select
PartNumber,
Description,
InStock,
ReceivedDate,
Price,
count(*)
from ds1
group by
PartNumber,
Description,
InStock,
ReceivedDate,
Price
having count(*) > 1
;
quit;
proc print;run;
7. How to pick records randomly from a data sets
8. FW: picking up duplicate records in a data set
From: Sigurd Hermansen
Sent: Thursday, April 27, 2006 2:14 PM
To: 'Rune Runnest; XXXX@XXXXX.COM
Subject: RE: picking up duplicate records in a data set
Rune:
SQL tells you that a distinct row PartNumber,Description,InStock,ReceivedDate,Price occurs n times. It does not keep track of the physical location of rows in a dataset. By design ....
If you store row order as data (as in the example below), SQL will select the rows that have duplicates of the values of interest: data ds1;
input
PartNumber $
Description $
InStock
@19 ReceivedDate date9.
Price;
format ReceivedDate date9.;
rowN=_N_;
datalines;
K89R seal 34 27jul1998 245.00
M4J7 sander 98 20jun1998 45.88
LK43 filter 121 19may1999 10.99
MN21 brace 43 10aug1999 27.87
BC85 clamp 80 16aug1999 9.55
NCF3 valve 198 20mar1999 24.50
NCF3 valve 198 20mar1999 24.50
NCF3 valve 198 20mar1999 24.50
KJ66 cutter 6 18jun1999 19.77
UYN7 rod 211 09sep1999 11.55
UYN7 rod 211 09sep1999 11.55
JD03 switch 383 09jan2000 13.99
BV1E timer 26 03aug2000 34.50
;
proc sql;
create table ds1_dup as
select
rowN,
PartNumber,
Description,
InStock,
ReceivedDate,
Price
from ds1
group by
PartNumber,
Description,
InStock,
ReceivedDate,
Price
having count(*)>> 1
;
quit;
Sig
-----Original Message-----
From: XXXX@XXXXX.COM [mailto: XXXX@XXXXX.COM ] On Behalf Of Rune Runnest
Sent: Thursday, April 27, 2006 3:16 AM
To: XXXX@XXXXX.COM
Subject: picking up duplicate records in a data set
Below is shown a way to identify duplicate records in a data set, using PROC SQL. One of the records in DS1 has one duplicate, and the other has two.
I want is to make another data set which contains the actual duplicates. Not only a record with their unique value. In the case below, the data set of duplicates should have 3 records, not 2.
I have looked in the online documentation for PROC SORT with the NODUPRECS option, but the out= data set contains the records *without* the duplicates.
Like this:
proc sort data = ds1 noduprecs out = ds1_without_dups ;
by
PartNumber
Description
InStock
ReceivedDate
Price;
run;
proc print; run;
Note that the query selects one set of three rows and one set of two rows. Sig
Is there a way then to make a third data set - ds1_the_dups - that could contain all records that are in ds1 but not in ds1_without_dups ?
Rune
data ds1;
input
PartNumber $
Description $
InStock
@19 ReceivedDate date9.
Price;
format ReceivedDate date9.;
datalines;
K89R seal 34 27jul1998 245.00
M4J7 sander 98 20jun1998 45.88
LK43 filter 121 19may1999 10.99
MN21 brace 43 10aug1999 27.87
BC85 clamp 80 16aug1999 9.55
NCF3 valve 198 20mar1999 24.50
NCF3 valve 198 20mar1999 24.50
NCF3 valve 198 20mar1999 24.50
KJ66 cutter 6 18jun1999 19.77
UYN7 rod 211 09sep1999 11.55
UYN7 rod 211 09sep1999 11.55
JD03 switch 383 09jan2000 13.99
BV1E timer 26 03aug2000 34.50
;
proc sql;
create table ds1_dup as
select
PartNumber,
Description,
InStock,
ReceivedDate,
Price,
count(*)
from ds1
group by
PartNumber,
Description,
InStock,
ReceivedDate,
Price
having count(*> > 1
;
quit;
proc print;run;