sas >> Which ID has multiple records?

by irinfigvam » Fri, 22 Feb 2008 02:07:09 GMT

I have two columns: ID and subID

I need to know which ID does not have multiple subIDs. How can I do it?

Id subid

400b2 400a1
400b2 400a2
400b2 400a3
400b1 400c1
400b4 400d4
400b4 400d5
400b0 400d2

ID= 400b2 has multiple subid
ID=400b1 has only subid
ID=400b0 has only subid


Thank you in advance

Irin



---------------------------------
Never miss a thing. Make Yahoo your homepage.


sas >> Which ID has multiple records?

by Bob_Abelson » Fri, 22 Feb 2008 02:24:51 GMT


Here's a PROC SQL solution:

proc sql;
select id
from a
group by id
having count(*)=1;
quit;

Bob Abelson
HGSI
240 314 4400 x1374
XXXX@XXXXX.COM



"Irin later" < XXXX@XXXXX.COM >
Sent by: "SAS(r) Discussion" < XXXX@XXXXX.COM >
02/21/2008 01:08 PM
Please respond to
"Irin later" < XXXX@XXXXX.COM >


To
XXXX@XXXXX.COM
cc

Subject
Which ID has multiple records?






I have two columns: ID and subID

I need to know which ID does not have multiple subIDs. How can I do it?

Id subid

400b2 400a1
400b2 400a2
400b2 400a3
400b1 400c1
400b4 400d4
400b4 400d5
400b0 400d2

ID= 400b2 has multiple subid
ID=400b1 has only subid
ID=400b0 has only subid


Thank you in advance

Irin



---------------------------------
Never miss a thing. Make Yahoo your homepage.



sas >> Which ID has multiple records?

by Nathaniel.Wooding » Fri, 22 Feb 2008 02:26:10 GMT

Irin

Here is one solution:

Data Irin;
input Id :$5. subid : $5. ;
cards;
400b2 400a1
400b2 400a2
400b2 400a3
400b1 400c1
400b4 400d4
400b4 400d5
400b0 400d2
run;

Data dups;
set irin;
by id notsorted ;* if there are repeats of id later in the set, you
should first sort
the set;
if first.id ;
if last.id =0;
run;



This is set up so you will see only the first instance of an id with
multiple subids.

Nat Wooding
Environmental Specialist III
Dominion, Environmental Biology
4111 Castlewood Rd
Richmond, VA 23234
Phone:804-271-5313, Fax: 804-271-2977



Irin later
<irinfigvam@YAHOO
.COM> To
Sent by: "SAS(r) XXXX@XXXXX.COM
Discussion" cc
< XXXX@XXXXX.COM
GA.EDU> Subject
Which ID has multiple records?

02/21/2008 01:07
PM


Please respond to
Irin later
<irinfigvam@YAHOO
.COM>






I have two columns: ID and subID

I need to know which ID does not have multiple subIDs. How can I do it?

Id subid

400b2 400a1
400b2 400a2
400b2 400a3
400b1 400c1
400b4 400d4
400b4 400d5
400b0 400d2

ID= 400b2 has multiple subid
ID=400b1 has only subid
ID=400b0 has only subid


Thank you in advance

Irin



---------------------------------
Never miss a thing. Make Yahoo your homepage.



-----------------------------------------
CONFIDENTIALITY NOTICE: This electronic message contains
information which may be legally confidential and/or privileged and
does not in any case represent a firm ENERGY COMMODITY bid or offer
relating thereto which binds the sender without an additional
express written confirmation to that effect. The information is
intended solely for the individual or entity named above and access
by anyone else is unauthorized. If you are not the intended
recipient, any disclosure, copying, distribution, or use of the
contents of this information is prohibited and may be unlawful. If
you have received this electronic transmission in error, please
reply immediately to the sender that you have received the message
in error, and delete it. Thank you.


Which ID has multiple records?

by JClark » Fri, 22 Feb 2008 02:32:17 GMT

Irin,

To only report on those with a single value for SUBID, you can use PROC
SQL to group by ID with a count of SUBID values, then limit to those
with 1.



data have;
infile cards missover;
input Id $ subid $;
cards;
400b2 400a1
400b2 400a2
400b2 400a3
400b1 400c1
400b4 400d4
400b4 400d5
400b0 400d2
;
run;


proc sql;
select id, count(subid) as subid_cnt
from have
group by id
having subid_cnt eq 1
;
quit;



Jack Clark
Research Analyst
Center for Health Program Development and Management
University of Maryland, Baltimore County


-----Original Message-----
From: SAS(r) Discussion [mailto: XXXX@XXXXX.COM ] On Behalf Of
Irin later
Sent: Thursday, February 21, 2008 1:07 PM
To: XXXX@XXXXX.COM
Subject: Which ID has multiple records?

I have two columns: ID and subID

I need to know which ID does not have multiple subIDs. How can I do
it?

Id subid

400b2 400a1
400b2 400a2
400b2 400a3
400b1 400c1
400b4 400d4
400b4 400d5
400b0 400d2

ID= 400b2 has multiple subid
ID=400b1 has only subid
ID=400b0 has only subid


Thank you in advance

Irin



---------------------------------
Never miss a thing. Make Yahoo your homepage.


Which ID has multiple records?

by sashole » Fri, 22 Feb 2008 02:32:31 GMT

Irin,

I am sure you are just overlooking the standard simple

data uniq ;
set all ;
by id ;
if first.id and last.id ;
run ;

Kind regards
------------
Paul Dorfman
Jax, FL
------------



-------------- Original message ----------------------
From: Irin later < XXXX@XXXXX.COM >


Which ID has multiple records?

by JClark » Fri, 22 Feb 2008 02:48:40 GMT

Paul,

Maybe we need Irin to clarify what he means by "multiple subIDs". I was
assuming he meant different VALUES of the SUBID variable.

The method you suggested would not necessarily work with that assumption
if there are observations with the same ID and SUBID values. The method
you suggested works well if "multiple subIDs" refers to an ID with more
than 1 observation - regardless of the value (or even presence) of
SUBID.


I have added an additional observation to Irin's data at the end
(ID=400b0, SUBID=400d2), which is identical to the prior observation.
This ID would not be output with the data step approach below, but would
be output in the SQL solution presented.


data have;
infile cards missover;
input Id $ subid $;
cards;
400b2 400a1
400b2 400a2
400b2 400a3
400b1 400c1
400b4 400d4
400b4 400d5
400b0 400d2
400b0 400d2
;
run;

proc sort data = have;
by id;
run;

data uniq ;
set have ;
by id ;
if first.id and last.id ;
run ;


So Irin, which is it?



Jack Clark
Research Analyst
Center for Health Program Development and Management
University of Maryland, Baltimore County
410-455-6256

-----Original Message-----
From: SAS(r) Discussion [mailto: XXXX@XXXXX.COM ] On Behalf Of
Paul Dorfman
Sent: Thursday, February 21, 2008 1:33 PM
To: XXXX@XXXXX.COM
Subject: Re: Which ID has multiple records?

Irin,

I am sure you are just overlooking the standard simple

data uniq ;
set all ;
by id ;
if first.id and last.id ;
run ;

Kind regards
------------
Paul Dorfman
Jax, FL
------------



-------------- Original message ----------------------
From: Irin later < XXXX@XXXXX.COM >
it?


Which ID has multiple records?

by Nathaniel.Wooding » Fri, 22 Feb 2008 02:51:12 GMT

rin

Looking at the other replies, I just realize that I read your note
backwards meaning that I gave you the opposite of what you wanted.
Hopefully my code will be useful some other time.

Nat

Nat Wooding
Environmental Specialist III
Dominion, Environmental Biology
4111 Castlewood Rd
Richmond, VA 23234
Phone:804-271-5313, Fax: 804-271-2977



Nathaniel
Wooding/Services/
6/Dom To
Irin later < XXXX@XXXXX.COM >
02/21/2008 01:26 cc
PM XXXX@XXXXX.COM
Subject
Re: Which ID has multiple records?
(Document link: Nathaniel Wooding)









Irin

Here is one solution:

Data Irin;
input Id :$5. subid : $5. ;
cards;
400b2 400a1
400b2 400a2
400b2 400a3
400b1 400c1
400b4 400d4
400b4 400d5
400b0 400d2
run;

Data dups;
set irin;
by id notsorted ;* if there are repeats of id later in the set, you
should first sort
the set;
if first.id ;
if last.id =0;
run;



This is set up so you will see only the first instance of an id with
multiple subids.

Nat Wooding
Environmental Specialist III
Dominion, Environmental Biology
4111 Castlewood Rd
Richmond, VA 23234
Phone:804-271-5313, Fax: 804-271-2977



Irin later
<irinfigvam@YAHOO
.COM> To
Sent by: "SAS(r) XXXX@XXXXX.COM
Discussion" cc
< XXXX@XXXXX.COM
GA.EDU> Subject
Which ID has multiple records?

02/21/2008 01:07
PM


Please respond to
Irin later
<irinfigvam@YAHOO
.COM>






I have two columns: ID and subID

I need to know which ID does not have multiple subIDs. How can I do it?

Id subid

400b2 400a1
400b2 400a2
400b2 400a3
400b1 400c1
400b4 400d4
400b4 400d5
400b0 400d2

ID= 400b2 has multiple subid
ID=400b1 has only subid
ID=400b0 has only subid


Thank you in advance

Irin



---------------------------------
Never miss a thing. Make Yahoo your homepage.




-----------------------------------------
CONFIDENTIALITY NOTICE: This electronic message contains
information which may be legally confidential and/or privileged and
does not in any case represent a firm ENERGY COMMODITY bid or offer
relating thereto which binds the sender without an additional
express written confirmation to that effect. The information is
intended solely for the individual or entity named above and access
by anyone else is unauthorized. If you are not the intended
recipient, any disclosure, copying, distribution, or use of the
contents of this information is prohibited and may be unlawful. If
you have received this electronic transmission in error, please
reply immediately

Which ID has multiple records?

by sashole » Fri, 22 Feb 2008 03:29:09 GMT

ack,

But of course! I assumed that the file is unduped by the composite ID*SUBID. If not, it can be unduped beforehand, and the DATA step will work. Either way, I am sure Irin has enough input from SAS-L to work with.

Your note, however, gains much more significance in situations when the file is sorted by ID only, we still need to identify ID-groups having 1 distinct SUBID, but it is highly undesirable or impossible to sort the file (say, the input is a production view of a huge RDBMS table ordered by ID, but not SUBID). In this case, the step can be recoded as

data uniq (drop = _:) ;
do count = 1 by 1 until (last.id) ;
set all ;
by id ;
if first.id then _subid = subid ;
else if subid not = _subid then _mult = 1 ;
end ;
if not _mult ;
run ;

and it will work regardless of the order of SUBID within each ID and/or the presence or absence of composite duplicates. COUNT will indicate how many such duplicates are in every ID group with 1 unique SUBID.

Kind regards
------------
Paul Dorfman
Jax, FL
------------

-------------- Original message ----------------------
From: "Jack Clark" < XXXX@XXXXX.COM >


Which ID has multiple records?

by rjf2 » Fri, 22 Feb 2008 04:10:22 GMT

basic data review:

Proc Freq data = Libref.MyData;
tables SubId
/ noprint
out = NotMultiples
(where = (Count eq 1));

Ron Fehd the macro maven CDC Atlanta GA USA RJF2 at cdc dot gov




Similar Threads

1. selecting id that have multiple records

2. how to merge the records when there are multiple transitions of id

Assuming we know there are multiple transition of id as in the table
"transition". How to merge the records to create a complete history?

There are large number of iteration in the origianl data, for example,
a id can go through 60+ transitions.
So I am looking for a solutoin that is more efficient than left joins.


data transition;
input id_prev id maxtermnumber d_prev_obs yymmdd10.;
format d_prev_obs date9.;
datalines;
1 2 1 2001-01-01
6 9 6 2006-01-01
2 3 2 2002-01-01
3 4 3 2003-01-01
4 5 4 2004-01-01
5 . 3 2005-01-01
9 . 5 2009-01-01
10 . 1 2010-01-01
;
run;

/*what we want to see*/
data history;
input id_first id totaltermnumber d_prev_obs yymmdd10.;
format d_prev_obs date9.;
datalines;
1 1 1 2001-01-01
1 2 3 2001-01-01
1 3 6 2001-01-01
1 4 10 2001-01-01
1 5 13 2001-01-01
6 6 6 2006-01-01
6 9 11 2006-01-01
10 10 1 2010-01-01
;
run;


thanks and regards

3. Convert multiple values of variable into multiple obs per id

4. Convert multiple values of variable into multiple obs per id -

Hello everbody,

I have a dataset that has a single observation for each id, and has 3
zones and sizes, looks like this:

Id     Zone1   Zone2  Zone3  Zone1_Size   Zone2_Size   Zone3_Size
1         1      0      1      10           0               15
2         1      1      0       5           10              0

I want to convert this data set to look like this:

Id   Zone   Size
1      1      10
1      3      15
2      1       5
2      2       10


Is there an elegant way to process this data?

Thanks for your help,
Shukla

5. Function to select one record in each of multiple subsets of records

6. Transforming multiple records to one record

Hi everyone,

I have a dataset with multiple records for same ID like this:

data sam1;
	input id a1 a2 a3 a4 a5 $ a6 $;
cards;
101 1 . . . * 
101 . 2 . . * 
101 . . 3 . * 
101 . . . 4 * 
102 1 . 3 . * .
102 . 2 . 4 . 
;

I would like to have my output dataset to look like this: one record
per id;

 id   a1  a2  a3  a4  a5  a6
===================
101  1   2    3    4    *    
102  1   2    3    4    *    

I tried to use both array and proc transpose to get the output, but
the variables are different data types (Character & Numeric), it gives
me an error.

Please input your suggestions. Thanks for your help.

Regards!
Nat

7. collapsing multiple records into a single record

8. reading delimited file with some records having two lines

Hi,
Is there anyone know how to use proc import to read in a delimited file
with some records having two lines?
Most of the records in the file just takes one line, but 3 or 4 of them
takes two lines and there is a '+' at the end of the first line to
separate it from the second line. What I'm doing now is manually removing
the '+' first and moving the second line to the end of the first line. Is
there a way for Proc import to recognize '+' sign and read in two lines as
one record?

Here is an shorted example of the raw data (the real data line is much
longer, having 50 variables)
-->first obs: (it takes only one line in raw data)
x|01|11|1901|2|x|.|2111|LOS ANGELES|CA|90007|(326) 391-2489|0

-->second obs:(it takes two lines in raw data, and using '+' sign to
separate the lines;
   -->first line:
   xxxx|01|01|1901|1|xxx|.|2555|LOS ANGELES|CA|90007|(326) 391-276+
   -->second line:
   3|0
Any help will be appreciated.