sas >> How the IF statement works - How to trim 1st and last data

by Mterjeson » Sat, 25 Jun 2005 00:45:37 GMT

re: How the IF statement works
re: Take The Fear Out (motto when I teach classes)

PS: If you, or any other readers are new to
programming, or vague on how the IF statement
is processed, here is how the line:
if first.plot or last.plot then delete;
works.

It could have been written as:
if first.plot eq 1 or last.plot eq 1 then delete;

When you added the debugging to the log:
data result;
set sample;
by COMP STAND PLOT TRANS;
put _all_; * for visual debugging ;
if first.plot or last.plot then delete;
run;

you saw that the added flags are displayed as if
they are variables just like your COMP STAND PLOT
TRANS and DIA. They are variables but they are
internal variables in SAS, meaning you can use
them in the datastep just like variables, but they
do not get saved into your resulting dataset. This
is similar to other internal variables such as _N_
and others. If someone wants these values in the
resulting output dataset you merely have to assign
the internal variables into a new variable that you
create and then your variable can be saved into the
dataset.

So, if first.plot eq 1 or last.plot eq 1 then delete;
can be read by most as IF (first.plot = 1) or
(last.plot = 1) then do something. Most folks
are already familiar with "condition evaluating"
expressions in the IF statement. However, it is
suprising to find that the majority of programmers
(sad to say) are very familiar with, and have been
using, IF THEN statements for years and know the
condition tests work and they get along fine. The
aspect of *how* it works is what most programmers do
not know. For those that do, great! :o) But most
folks know that (first.plot = 1) tests to see if
the variable equals 1 and that if true the THEN portion
of the code runs and if false the THEN portion does
not run. But most don't know that *how* each expression
gets evaluated to be true or false is that each pair
of evaluations results in an intermediate internal
value of zero for false and non-zero for true. (some
programming languages typically use 1 or -1 for true.
some languages years ago even utilized ones-compliment
and twos-compliment for their distiguishing values. but
all always use zero as false.)

SAS uses 1 and 0 for first. and last. flags. So just
like we all did back in 8th grade algebra, the computer
languages resolve the multiple expressions the same way.
For example, if first.plot is a value of 1 and last.plot
is a value of 0 then the line:
if (first.plot eq 1) or (last.plot eq 1) then delete;
can be resolved/rewritten/viewed as:
if (1 eq 1) or (0 eq 1) then delete;
The computer, just as you did in 8th grade elementary school
will resolve these pairs down as well.
(1 eq 1) is true so the intermediate result for true is (1)
(0 eq 1) is false so the intermediate result for false is (0)
thus,
if (1 eq 1) or (0 eq 1) then delete;
becomes
if ( 1 ) or ( 0 ) then delete;

The hierarchy rules will resolve all of the comparisons
such as eq,=,gt,>,ge,>=,lt,<,le,<=,ne,<>, etc. first in
pairs left-to-right, but just like algebra, (adding sets
of parentheses will override the ordering to your liking),
then after the comparison operators, the 'and' and 'or's
are resolved. Thus,
if ( 1 ) or ( 0 ) then delete;
becomes
if ( 1 ) then delete;
which now means
if true then delete;

Just like in algebra, many pairs and combinations of
comparisons and 'and's and 'or's can be cascaded together,
and just resolve logically down to the final single value
result.

So basic 1 and 0 values in any of your variables can be
treated/written the short way syntactically.

In addition to just the boring binary values of 1 and 0,
there is more you can do with the simple governing rule
of the IF statement.

remembering zero is false...
(and more importantly)
remembering non-zero is true...

If you had a variable abc and you wanted to consider
values of 1,2,3,4,5,6,7,... as true for your condition
and a value of 0 would be false for your desired condition,
then:
if abc > 0 then
if abc then
are already exactly the same due to the basic rule that
"IF evaluations of zero are false and non-zero are true".
e.g.
data _null_;
x = 0; if x > 0 then put x= 'is true.'; else put x= 'is false.';
x = 1; if x > 0 then put x= 'is true.'; else put x= 'is false.';
x = 2; if x > 0 then put x= 'is true.'; else put x= 'is false.';
x = 3; if x > 0 then put x= 'is true.'; else put x= 'is false.';
x = 4; if x > 0 then put x= 'is true.'; else put x= 'is false.';
x = 5; if x > 0 then put x= 'is true.'; else put x= 'is false.';
run;

is exactly the same as

data _null_;
x = 0; if x then put x= 'is true.'; else put x= 'is false.';
x = 1; if x then put x= 'is true.'; else put x= 'is false.';
x = 2; if x then put x= 'is true.'; else put x= 'is false.';
x = 3; if x then put x= 'is true.'; else put x= 'is false.';
x = 4; if x then put x= 'is true.'; else put x= 'is false.';
x = 5; if x then put x= 'is true.'; else put x= 'is false.';
run;


Isn't it amazing that the simpler the rules, the more you can do!

Learning and understanding all the rules built into language
statements, you will find that each statement or function has
fairly basic and simple rules, but may have one or many of these
simple separate aspects of how it is going to process (i.e. it's
behavior). Once comfortable with the one or few rules of each
statement/function you can/will begin to utilize them in many
more scenarios and be totally confident in all resulting behaviors.
This is one of those things that separates the best-of-the-best,
or those that know lots of different ways to tackle a problem.
It is merely expanding slightly past the 'ruts' that many get
into that keep them from being comfortable or knowledgable to
write or tweak code and *know* exactly what the resulting
behavior will be, everytime! Most good programmers are only
slightly short of total comfort with each statement or function.
Those few of us that have actually written programming languages
already know that we had intimate knowledge of each discrete
rule written into each statement or function, so we can pass
along that programming languages and all these 200+ statements
or functions are not some big black cloud to live under for years.
These things really are black-n-white and are finite. Just like
your programs you write, if one step has three tests and three
distinct behaviors, then so to are the languages themselves.
Remember, all programming languages are also written with
...programming languages --- ...makes ya ponder, huh?... :o)



The long story short is that:
if first.plot eq 1 then
if first.plot then
are exactly the same and the eq 1 is okay but
just a little redundant. So actually is boils down
to personally preference. One point I might suggest
is that typically I would just write the IF true THEN
shorter versions such as if first.plot then

However, good documentation and also self-documenting-code
are still a must, and occasionally I will write out the
long version such as if first.plot eq 1 then if it
makes the *intent* of the desired logic or flow more readable
to the reader, and/or if you have novice or intermediate
programmers that will be maintaining the code it sometimes
is prudent to write out the long version so insure against
misinterpretation. So there are always additional aspects
to good programming in addition to good skills and logic
coding; such as aesthetics, readability of all intended and
sometimes unintended readers, and reducing maintainability
difficulties by you and many-times not-you in the future.
The day may come when those that come after you may not be
the expert you are (gasp). Like they say out on the highway,
Give 'em a break. :o)




Hope this is helpful.


Mark Terjeson
Senior Programmer Analyst, IM&R
Russell Investment Group


Russell
Global Leaders in Multi-Manager Investing





















-----Original Message-----
From: SAS(r) Discussion [mailto: XXXX@XXXXX.COM ] On Behalf Of
Terjeson, Mark (IM&R)
Sent: Friday, June 24, 2005 8:11 AM
To: XXXX@XXXXX.COM
Subject: Re: How to trim 1st and last data in a series?


Hi Prab,

FIRST. and LAST. processing in the datastep
is very handy. It requires that you have
the dataset sorted and that in your datastep
you add the BY statement to the SET statement.
The BY with the SET automatically creates all
the combinations of first. and last. flags for
each element in the BY statement. Very Handy!
Add a put _all_; to the datastep to
get a visual of all these flags created. By
the way, these flags get set with 0 or 1 for
false or true.




data sample;
input COMP STAND PLOT $ TRANS DIA;
cards;
23 10 1A 1 0
23 10 1A 2 9
23 10 1A 3 0
23 10 1A 4 0
23 10 1A 5 8.5
23 10 1A 6 0
23 10 1A 7 9
23 10 1A 8 0
23 10 1A 9 0
23 10 1A 10 0
23 10 1A 11 0
23 10 1A 12 0
23 10 2A 1 28.5
23 10 2A 2 0
23 10 2A 3 0
23 10 2A 4 0
23 10 2A 5 0
23 10 2A 6 0
23 10 2A 7 0
23 10 2A 8 13
23 10 2A 8 24.5
23 10 3A 1 0
;
run;

proc sort data=sample;
by COMP STAND PLOT TRANS;
run;

data result;
set sample;
by COMP STAND PLOT TRANS;
if first.plot or last.plot then delete;
run;





Hope this is helpful.


Mark Terjeson
Senior Programmer Analyst, IM&R
Russell Investment Group


Russell
Global Leaders in Multi-Manager Investing









-----Original Message-----
From: SAS(r) Discussion [mailto: XXXX@XXXXX.COM ] On Behalf Of
tantrik
Sent: Friday, June 24, 2005 7:47 AM
To: XXXX@XXXXX.COM
Subject: How to trim 1st and last data in a series?


HI
I have been too addicted to point and click but realized life's not that
easy. I have a dataset like this:

COMP STAND PLOT TRANS DIA
23 10 1A 1 0
23 10 1A 2 9
23 10 1A 3 0
23 10 1A 4 0
23 10 1A 5 8.5
23 10 1A 6 0
23 10 1A 7 9
23 10 1A 8 0
23 10 1A 9 0
23 10 1A 10 0
23 10 1A 11 0
23 10 1A 12 0
23 10 2A 1 28.5
23 10 2A 2 0
23 10 2A 3 0
23 10 2A 4 0
23 10 2A 5 0
23 10 2A 6 0
23 10 2A 7 0
23 10 2A 8 13
23 10 2A 8 24.5
23 10 3A 1 0
so on................................

I want to get rid of the first and the last TRANS in a PLOT because it
is outside the plot boundary. so for Plot 1A, Trans 1 and trans 12 would
be out, for Plot 2A, TRANS 1 and both of the Trans 8 would be out.. How
would I do it? As I mentioned, I hardly know how to spell programming.
Nevertheless, i'm a quick learner and dont mind working loooong hours to
learn including nights and weekends. I appreciate your valuable
suggestions. HAVE A GREAT WEEKEND.

Prab Dahal
Program Technician
University of Arkansas-Monticello


Similar Threads

1. How to trim 1st and last data in a series?

HI
I have been too addicted to point and click but realized life's not
that easy. I have a dataset like this:

COMP	STAND	PLOT	TRANS	DIA
23	10	1A	1	0
23	10	1A	2	9
23	10	1A	3	0
23	10	1A	4	0
23	10	1A	5	8.5
23	10	1A	6	0
23	10	1A	7	9
23	10	1A	8	0
23	10	1A	9	0
23	10	1A	10	0
23	10	1A	11	0
23	10	1A	12	0
23	10	2A	1	28.5
23	10	2A	2	0
23	10	2A	3	0
23	10	2A	4	0
23	10	2A	5	0
23	10	2A	6	0
23	10	2A	7	0
23	10	2A	8	13
23	10	2A	8	24.5
23	10	3A	1	0
so on................................

I want to get rid of the first and the last TRANS in a PLOT because it
is outside the plot boundary. so for Plot 1A, Trans 1 and trans 12
would be out, for Plot 2A, TRANS 1 and both of the Trans 8 would be
out.. How would I do it? As I mentioned, I hardly know how to spell
programming. Nevertheless, i'm a quick learner and dont mind working
loooong hours to learn including nights and weekends.
I appreciate your valuable suggestions.
HAVE A GREAT WEEKEND.

Prab Dahal
Program Technician
University of Arkansas-Monticello

2. Detecting the last observation in a data statement

3. Problem that Data Step Statements doesn't work after a Macro

Eason,

Since nobody has responded, I presume that everyone is thinking that you
simply ought to ask for a better dataset.  I agree.

However, if your data has NO missing values, and ALWAYS follows the
pattern shown in your example, then the following just might work.

The code is definitely NOT guaranteed and, without question, is far from
optimal.  Hopefully, though, it will give you an idea of how to solve your
problem:

data _null_;
  file 'c:\have.txt';
  input;
  put @1 _infile_;
  cards;
1|2|3|4 4
 4
|5|6 6
2|2|3|4 4 4|5|6
 6
3|2|3|4 4 4|5|6 6
;

data _null_;
  array hold(6) $;
  array lengths(6) (1 1 1 5 1 3);
  infile 'c:\have.txt' missover;
  file 'c:\have_modified.txt';
  i=0;
  j=0;
  do until (i eq 6);
    i+1;
    j+1;
    if scan(_infile_,1,'|') eq ''
     or scan(_infile_,j,'|') eq '' then do;
      input;
      j=1;
    end;
    hold(i)=catt(hold(i),scan(_infile_,j,'|'));
    x=length(hold(i));
    if length(hold(i)) lt lengths(i) then do;
      i=i-1;
      j=0;
      _infile_='';
    end;
  end;
  do i=1 to 6;
    put hold(i) @;
    call missing(hold(i));
    if i lt 6 then put '|' @;
    else put;
  end;
  _infile_='';
  i=0;
  j=0;
run;

HTH,
Art
--------
On Mon, 22 Jun 2009 21:11:48 -0700, Eason Chu < XXXX@XXXXX.COM > wrote:

>Hi, all SAS-Ls
>
>I have made a Macro to manupilate _infile_ variable from input buffer,
>which intends to read in a raw data that is sperated by a specific
>delimiter but a record of it may be broken into lines due to line feed
>or carriage return characters contained in a field value.
>Data sample as below:
>
>                 1|2|3|4 4
>                  4
>                 |5|6 6
>                 2|2|3|4 4 4|5|6
>                  6
>                 3|2|3|4 4 4|5|6 6
>
>It should appear like below in table,
>
>                 1|2|3|4 4(LF or CR) 4(LF or CR)|5|6 6
>                 2|2|3|4 4 4|5|6(LF or CR) 6
>                 3|2|3|4 4 4|5|6 6
>
>but the LF or CR cause the raw data broken into lines.
>The Macro I made below is to solve this situation.
>
>                %Macro BLRDR(dlm,dlm_n,span_cut);
>                       format tmp_infile_line $32767.;
>                       informat tmp_infile_line $32767.;
>                       retain tmp_infile_line;
>                       input @;
>                       do while (&dlm_n - count(trimn
>(tmp_infile_line),"&dlm") >= &span_cut.);
>                                tmp_infile_line = trimn
>(tmp_infile_line)||_infile_;
>                                input;
>                                input @;
>                                if &dlm_n - count(trimn
>(tmp_infile_line)||_infile_,"&dlm") < &span_cut. then do;
>                                           input @@;
>                                           _infile_ = tmp_infile_line;
>                                           tmp_infile_line = "";
>                                end;
>                       end;
>                       *drop tmp_infile_line;
>                 %mend;
>
>&dlm: specify the delimiter;
>&dlm_n: indicate the delimiter number in one complete record
>&span_cut: set a broken line without &dlm (like " 6" between line4 and
>line6 in the data sample) as partial value of last field of last
>record or first field of next record. 0 as last field of last record;1
>as belonging to first field of next record. For example, when
>&span_cut = o then line4-6 records will be read as
>                 2|2|3|4 4 4|5|6 6
>                 3|2|3|4 4 4|5|6 6
>when &span_cut = 1 then line4-6 records will be read as
>                 2|2|3|4 4 4|5|6
>                  63|2|3|4 4 4|5|6 6
>
>If it worked as what I expect, many of my broken raw data would be
>read in correctly. However the result run out does not appear like
>that.
>I put this Macro into raw data reading in code.
>
>                Data rst.test;
>                      infile "C:\My SAS\SD.txt" dlm="|" dsd;
>                      length n m l o p q $10.;
>                      %BLRDR(|,5,1);
>                      input n m l o p q;
>                      put tmp_infile_line= n= m= l= o= p= q=;
>                Run;
>
>Logs as below after run,
>
>61         Data rst.test;
>62              infile "C:\My SAS\SD.txt" dlm="|" dsd;
>63              length n m l o p q $10.;
>64              %BLRDR(|,5,1);
>MLOGIC(BLRDR):  Beginning execution.
>MLOGIC(BLRDR):  Parameter DLM has value |
>MLOGIC(BLRDR):  Parameter DLM_N has value 5
>MLOGIC(BLRDR):  Parameter SPAN_CUT has value 1
>MPRINT(BLRDR):   format tmp_infile_line $32767.;
>MPRINT(BLRDR):   informat tmp_infile_line $32767.;
>MPRINT(BLRDR):   retain tmp_infile_line;
>MPRINT(BLRDR):   input @;
>SYMBOLGEN:  Macro variable DLM_N resolves to 5
>SYMBOLGEN:  Macro variable DLM resolves to |
>SYMBOLGEN:  Macro variable SPAN_CUT resolves to 1
>MPRINT(BLRDR):   do while (5 - count(trimn(tmp_infile_line),"|") >=
>1);
>MPRINT(BLRDR):   tmp_infile_line = trimn(tmp_infile_line)||_infile_;
>MPRINT(BLRDR):   input;
>MPRINT(BLRDR):   input @;
>SYMBOLGEN:  Macro variable DLM_N resolves to 5
>SYMBOLGEN:  Macro variable DLM resolves to |
>SYMBOLGEN:  Macro variable SPAN_CUT resolves to 1
>MPRINT(BLRDR):   if 5 - count(trimn(tmp_infile_line)||_infile_,"|") <
>1 then do;
>MPRINT(BLRDR):   input @@;
>MPRINT(BLRDR):   _infile_ = tmp_infile_line;
>MPRINT(BLRDR):   tmp_infile_line = "";
>MPRINT(BLRDR):   end;
>MPRINT(BLRDR):   end;
>MPRINT(BLRDR):   *drop tmp_infile_line;
>MLOGIC(BLRDR):  Ending execution.
>66              input n m l o p q;
>69              put _infile_ tmp_infile_line= n= m= l= o= p= q=;
>70         Run;
>
>NOTE: The infile "C:\My SAS\SD.txt" is:
>      File Name=C:\My SAS\SD.txt,
>      RECFM=V,LRECL=256
>
>NOTE: 6 records were read from the infile "C:\My SAS\SD.txt".
>      The minimum record length was 2.
>      The maximum record length was 17.
>NOTE: The data set RST.TEST has 0 observations and 7 variables.
>
>
>NO obs was read in! At first I guess that _infile_ turned missing just
>before input statement. So I inserted some put statemnt into this
>Macro and data steps. Codes inserted as below,
>
>                %Macro BLRDR(dlm,dlm_n,span_cut);
>                       format tmp_infile_line $32767.;
>                       informat tmp_infile_line $32767.;
>                       retain tmp_infile_line;
>                       input @;
>                       put _infile_;
>                       do while (&dlm_n - count(trimn
>(tmp_infile_line),"&dlm") >= &span_cut.);
>                                tmp_infile_line = trimn
>(tmp_infile_line)||_infile_;
>                                input;
>                                input @;
>                                put _infile_;
>                                if &dlm_n - count(trimn
>(tmp_infile_line)||_infile_,"&dlm") < &span_cut. then do;
>                                           input @@;
>                                           put _infile_;
>                                           _infile_ = tmp_infile_line;
>                                           put _infile_;
>                                           tmp_infile_line = "";
>                                end;
>                       end;
>                       *drop tmp_infile_line;
>                 %mend;
>
>                Data rst.test;
>                      infile "C:\My SAS\SD.txt" dlm="|" dsd;
>                      length n m l o p q $10.;
>                      %BLRDR(|,5,1);
>                      put "######";
>                      input n m l o p q;
>                      put "######";
>                      put _infile_;
>                      put _infile_ tmp_infile_line= n= m= l= o= p= q=;
>                Run;
>
>And here the logs,
>
>61         Data rst.test;
>62              infile "C:\My SAS\SD.txt" dlm="|" dsd;
>63              length n m l o p q $10.;
>64              %BLRDR(|,5,1);
>MLOGIC(BLRDR):  Beginning execution.
>MLOGIC(BLRDR):  Parameter DLM has value |
>MLOGIC(BLRDR):  Parameter DLM_N has value 5
>MLOGIC(BLRDR):  Parameter SPAN_CUT has value 1
>MPRINT(BLRDR):   format tmp_infile_line $32767.;
>MPRINT(BLRDR):   informat tmp_infile_line $32767.;
>MPRINT(BLRDR):   retain tmp_infile_line;
>MPRINT(BLRDR):   input @;
>MPRINT(BLRDR):   put _infile_;
>SYMBOLGEN:  Macro variable DLM_N resolves to 5
>SYMBOLGEN:  Macro variable DLM resolves to |
>SYMBOLGEN:  Macro variable SPAN_CUT resolves to 1
>MPRINT(BLRDR):   do while (5 - count(trimn(tmp_infile_line),"|") >=
>1);
>MPRINT(BLRDR):   tmp_infile_line = trimn(tmp_infile_line)||_infile_;
>MPRINT(BLRDR):   input;
>MPRINT(BLRDR):   input @;
>MPRINT(BLRDR):   put _infile_;
>SYMBOLGEN:  Macro variable DLM_N resolves to 5
>SYMBOLGEN:  Macro variable DLM resolves to |
>SYMBOLGEN:  Macro variable SPAN_CUT resolves to 1
>MPRINT(BLRDR):   if 5 - count(trimn(tmp_infile_line)||_infile_,"|") <
>1 then do;
>MPRINT(BLRDR):   input @@;
>MPRINT(BLRDR):   put _infile_;
>MPRINT(BLRDR):   _infile_ = tmp_infile_line;
>MPRINT(BLRDR):   put _infile_;
>MPRINT(BLRDR):   tmp_infile_line = "";
>MPRINT(BLRDR):   end;
>MPRINT(BLRDR):   end;
>MPRINT(BLRDR):   *drop tmp_infile_line;
>MLOGIC(BLRDR):  Ending execution.
>65              put "######";
>66              input n m l o p q;
>67              put "######";
>68              put _infile_;
>69              put _infile_ tmp_infile_line= n= m= l= o= p= q=;
>70         Run;
>
>NOTE: The infile "C:\My SAS\SD.txt" is:
>      File Name=C:\My SAS\SD.txt,
>      RECFM=V,LRECL=256
>
>1|2|3|4 4
> 4
>|5|6 6
>|5|6 6
>1|2|3|4 4
>4
>
>
>2|2|3|4 4 4|5|6
>2|2|3|4 4 4|5|6
>1|2|3|4 4
>4
>
>
> 6
>3|2|3|4 4 4|5|6 6
>3|2|3|4 4 4|5|6 6
>1|2|3|4 4 4
>6
>NOTE: 6 records were read from the infile "C:\My SAS\SD.txt".
>      The minimum record length was 2.
>      The maximum record length was 17.
>NOTE: The data set RST.TEST has 0 observations and 7 variables.
>
>
>
>We can see that no put into log after the Macro execution, which seems
>the statements after the Macro doesn't work and no error msg here. It
>confused me a lot. Is there anyone knowing why?

4. Problem that Data Step Statements doesn't work after a Macro execution

5. 1st CFP: International Workshop on Feature Selection for Data Mining

6. How to remove the 1st space of the category data

7. Trim data by x characters

I have a dataset that holds a variable with a prefix code prior to the
description.

I want to trim the variable by the number of characters the prefix
code has but the prefix code can be 4, 5, 6, 7 or 8 characters long.

Is there a way of trimming the data using a procedure?


Thanks

8. libref for "work" data sets (one level name): WORK vs USER vs User= option