sas >> proc format value assignment oddity - missing values

by pchoate » Mon, 31 Oct 2005 02:47:27 GMT

Question for proc freq wonks -

Generally, missing character values in SAS can be represented by either
'' (no blanks), ' ' (single blank), or ' ' (many blanks). In
comparisons there usually aren't any functional differences between
them. The way I think about it is that a character variable must be a
least length 1, and trailing blanks are ignored, and so all blank
values are treated as ' ', and SAS internally equates '' with ' ', and
so any comparison of two missing character values equate regardless of
length.

I was looking at the distribution of missing values in some data and so
coded the following format and freq:

proc format;
value $MISS '' = 'Missing'
other = 'Not Msg';
proc freq ;
tables clientid*age / missing;
format clientid age $miss.;
run;

Although the Age column had missing values they weren't shown - the
blanks equated to 'Not Msg', and so found that in Proc Freq '' does not
equate to ' '.

I tried:
proc format;
value $MISS ' ' = 'Missing' /* note the blank */
other = 'Not Msg';
proc freq ;
tables clientid*age / missing;
format clientid age $miss.;
run;

I got the expected results. I also tried more blanks than would equate
and they matched as expected:

proc format; /* age is $5 client is $9 */
value $MISS ' ' = 'Missing'
other = 'Not Msg';
proc freq ;
tables clientid*age / missing;
format clientid age $miss.;
run;


On the left side of a proc format value statement assignment, does ''
have any meaning? Is it '00'x rather than '20'x?

Thanks to all that reply.


sas >> proc format value assignment oddity - missing values

by HERMANS1 » Mon, 31 Oct 2005 07:38:06 GMT


Paul:
An 'empty' character value ('') does not appear to have any meaning in =
PROC FORMAT (MSW SAS, V9.13). This program demonstrates that:
proc format;

value $miss ' ' =3D 'missing'

'00'x =3D 'null '=20

other =3D 'notMiss'

;

run;

quit;

data test;

length x $ 10;

x=3D'';

output;

x=3D' ';

output;

x=3D'00'x;

output;

x=3D'z';

output;

run;

proc freq data=3Dtest;

tables x/missing;

format x $miss. ;

run;

quit;

It does not seem to matter how many blanks one uses to make a missing =
character value in PROC FORMAT. Any number match to '', ' ', etc.

I haven't checked the latest documentation of PROC FREQ, but, as far as =
I have tested, PROC FORMAT recognizes none of the usual tags for missing =
values (MISSING, NULL, ...). I'd prefer to see a value that matches to =
all standard missing values now that '' and ' ' evaluate differently in =
other sides of SAS, and access to DBMS' data often involves references =
to NULL values.

Sig

=20

=20


________________________________

From: XXXX@XXXXX.COM on behalf of pchoate
Sent: Sun 10/30/2005 1:47 PM
To: XXXX@XXXXX.COM
Subject: proc format value assignment oddity - missing values



Question for proc freq wonks -

Generally, missing character values in SAS can be represented by either
'' (no blanks), ' ' (single blank), or ' ' (many blanks). In
comparisons there usually aren't any functional differences between
them. The way I think about it is that a character variable must be a
least length 1, and trailing blanks are ignored, and so all blank
values are treated as ' ', and SAS internally equates '' with ' ', and
so any comparison of two missing character values equate regardless of
length.

I was looking at the distribution of missing values in some data and so
coded the following format and freq:

proc format;
value $MISS '' =3D 'Missing'
other =3D 'Not Msg';
proc freq ;
tables clientid*age / missing;
format clientid age $miss.;
run;

Although the Age column had missing values they weren't shown - the
blanks equated to 'Not Msg', and so found that in Proc Freq '' does not
equate to ' '.

I tried:
proc format;
value $MISS ' ' =3D 'Missing' /* note the blank */
other =3D 'Not Msg';
proc freq ;
tables clientid*age / missing;
format clientid age $miss.;
run;

I got the expected results. I also tried more blanks than would equate
and they matched as expected:

proc format; /* age is $5 client is $9 */
value $MISS ' ' =3D 'Missing'
other =3D 'Not Msg';
proc freq ;
tables clientid*age / missing;
format clientid age $miss.;
run;


On the left side of a proc format value statement assignment, does ''
have any meaning? Is it '00'x rather than '20'x?

Thanks to all that reply.

sas >> proc format value assignment oddity - missing values

by pchoate » Tue, 01 Nov 2005 02:00:31 GMT

hanks Ron - this seems very sensible, (after it's been explained, that
is).

So in this context '' is seen by Proc Format as ASCII '27'x and "" is
seen as '22'x - seems that Mike's response gets at the reason - open
text is allowed to the left of the assignment statement, and '' and ""
are defined as you say in open text.

Sig's comment re standards in this era of DBMS connectivity are well
taken!

Paul Choate
DDS Data Extraction
(916) 654-2160

-----Original Message-----
From: SAS(r) Discussion [mailto: XXXX@XXXXX.COM ] On Behalf Of
Fehd, Ronald J
Sent: Monday, October 31, 2005 5:30 AM
To: XXXX@XXXXX.COM
Subject: Re: proc format value assignment oddity - missing values


<honk!> sorry, that value is not null, but sQuote

PROC Format;
value $test '' = 'sQuote'
"" = 'dQuote'
' ' = 'blank';*
' '= 'two blanks';
PROC Format FmtLib;
run;

' FORMAT NAME: $TEST LENGTH: 6 NUMBER OF VALUES: 3
'
' MIN LENGTH: 1 MAX LENGTH: 40 DEFAULT LENGTH 6 FUZZ: 0
'
'START 'END 'LABEL (VER. V7|V8
31OCT2005:08:23:14)'
' ' 'blank
'
'" '" 'dQuote
'
'' '' 'sQuote
'

this is one of those gotchas that we find in every artificial language,
\begin{compiler head-scratcher}
Is this, indeed, a special character,
or is it used as an escape character,
meaning I won't know until I read the next character?
\end{compiler head-scratcher}

in this case single and double quotes
are escape characters when immediately followed
by their own selves, again.
redundant? well, not when you have figured how to be explicit:

Who can guess what the output of the following will be?
PROC Format;
value $test '" = 'sQuote'
"' = 'dQuote'
' ' = 'blank';
PROC Format FmtLib;
run;

Ron Fehd the occasionally dQuoted
macro maven CDC Atlanta GA USA RJF2 at cdc dot gov

remember perspective: the error is not always where it seems to occur!
-- RJF2
... nor does the special character actually mean what you typed it to
be.



sas >> proc format value assignment oddity - missing values

by iw1junk » Tue, 01 Nov 2005 06:39:28 GMT

Paul,

I see you have some good answers. I think several years ago Peter Lund
gave a SUGI paper on formats in which he discussed the problem. Over the
years I have developed the following conventions for using quote marks in
SAS.

1) Never use single quotes unless you have to, i.e. to hide macro
triggers. (65% doesn't matter, 30% either matters or will when
modified, 5% must be single quotes and they will stick out as
important.)

2) Never use "" to mean " " - usually they mean the same in SAS, but
not always. (options formdlim=""; and inside double quotes are some
other cases.)

The reason that it did not matter whether you used one blank or multiple
blanks comes from the fact that SAS character variables have a fixed length
space filled when a value is short. The function TRIM would be needed far
more often in SAS code without the convention that trailing blanks compare
equal.

Ian Whitlock
================
Date: Sun, 30 Oct 2005 10:47:27 -0800
Reply-To: pchoate < XXXX@XXXXX.COM >
Sender: "SAS(r) Discussion"
From: pchoate < XXXX@XXXXX.COM >
Organization: http://groups.google.com
Subject: proc format value assignment oddity - missing values
Comments: To: sas-l
Content-Type: text/plain; charset="iso-8859-1"
Question for proc freq wonks -
Generally, missing character values in SAS can be represented by either
'' (no blanks), ' ' (single blank), or ' ' (many blanks). In
comparisons there usually aren't any functional differences between
them. The way I think about it is that a character variable must be a
least length 1, and trailing blanks are ignored, and so all blank
values are treated as ' ', and SAS internally equates '' with ' ', and
so any comparison of two missing character values equate regardless of
length.
I was looking at the distribution of missing values in some data and so
coded the following format and freq:
proc format;
value $MISS '' = 'Missing'
other = 'Not Msg';
proc freq ;
tables clientid*age / missing;
format clientid age $miss.;
run;
Although the Age column had missing values they weren't shown - the
blanks equated to 'Not Msg', and so found that in Proc Freq '' does not
equate to ' '.
I tried:
proc format;
value $MISS ' ' = 'Missing' /* note the blank */
other = 'Not Msg';
proc freq ;
tables clientid*age / missing;
format clientid age $miss.;
run;
I got the expected results. I also tried more blanks than would equate
and they matched as expected:
proc format; /* age is $5 client is $9 */
value $MISS ' ' = 'Missing'
other = 'Not Msg';
proc freq ;
tables clientid*age / missing;
format clientid age $miss.;
run;

On the left side of a proc format value statement assignment, does ''
have any meaning? Is it '00'x rather than '20'x?
Thanks to all that reply.

Similar Threads

1. Odd Results with Proc Summary Missing Value assignment

2. PROC COMPARE compare missing value with nonmissing numeric value

Hi,

I am comparing two datasets with proc compare. In the report I exported
from SAS, those mismatched characters were marked as X, while those
mismatched numeric values were recorded as difference. When compare two
numeric values (one is missing and the other one is nonmissing), it did
not give out any indication in the report (i.e. no X and no difference in
DIF). My question is that is there any way to output the difference when
comparing one missing value with one nonmissing value? thanks a lot.



Lin

3. format different types of missing values (for Proc Report)

4. replacing missing values with the values of the previous observation

Hi all,
I need to fill missing values with the values of the previous
observation by the same group.
The code bellow provides the answer. However I have about 10 variables
I have to fill so I wonder if there is a more
efficiant solution.
Thanks,
Josip

data xy;
input x x1 x2 y $;
datalines;
1 2  3 A
2 3  4 A
3 .  . A
4 5  6 A
5 .  . B
. .  . B
7 8  9 B
8 9 10 B
;
proc sort data=xy;
by y;
run;
data xy1;
set xy;
array v(3) x x1 x2;
retain _x _x1 _x2;
by y;
if first.y then do;
_x=x; _x1=x1; _x2=x2;
end; else do;
if missing(x) or missing(x1) or missing(x2) then do;
x=_x; x1=_x1; x2=_x2;
end; else do;
_x=x; _x1=x1; _x2=x2;
end;
end;
drop _x _x1 _x2;
run;
proc print data=xy1;
run;

5. How to replace the Missing Value with the correct Value by

6. Checking all missing values with the value of a particular

Not sure if I understand you question correctly. Anyway here is mine:

data xx;
input e a b c;
cards;
0 . 1 2
0 . 2 .
0 1 . 3
1 . 1 2
1 . 2 .
1 . . 3
1 1 1 2
1 . 2 .
1 . . 3
2 . 1 2
2 . 2 .
2 . . 3
2 . 1 2
2 . 2 .
;

proc sql;
select
 count(case when e=2 then e else . end) as e2cnt,
 case when nmiss(a) = calculated e2cnt then 'Yes' else 'No' end as a,
 case when nmiss(b) = calculated e2cnt then 'Yes' else 'No' end as b,
 case when nmiss(c) = calculated e2cnt then 'Yes' else 'No' end as c
from xx
;


   e2cnt  a    b    c
-----------------------
       5  No   No   Yes


So there are 5 obs with e=2. A has 12 missing so A=No.
B has 4 missing also get a No. C has 5 missing which
equal to the number of obs of e=2, therefore c=Yes.


Ya

On Wed, 24 May 2006 10:03:51 -0700, divinedst < XXXX@XXXXX.COM > wrote:

>I have a dataset of about 500 variables that I would like to verify if
>the number of missing values for each seperate variable equal to the
>value within a particular variable. Example:   Variable A has 1000
>missing values, Variable B has 200 missing values, Variable C has 2500
>missing values;
>Variable E has value ranges
>0,1,2 and the frequency for Variable E looks like:
>0    500
>1    300
>2    600
>I am trying to verify that variable A missing values (1000) equal to
>where variable E=2 (600) and do this for every variable in the dataset.
>In this case, the two are not equal and this is what I need to know for
>every variable in the dataset.
>
>Let me know if this isn't clear.

7. format different types of missing values

8. removing ?formatted missing values

Hi,

I'm working with a national survey that has the missing values set as .L and
.M- (I'm not sure whether this is some kind of a format as these show up as
. missing in frequencies etc.)- is there a general statement that I can use
to remove these from the 100+ variables in the dataset- such as format _all_
for real formats?

The reason I need to do this is that I'm running an imputation using IVEWARE
(SAS callable procedure)- and for some reason- this program doesn't like the
.L and .M's

ps: Any users of IVEWARE on this list- willing to humor a few questions?

Thank you,
Mah-J

M. Soobader, PhD
STATWORKS
www.statworks.com