sas >> trend test

by 'rique » Thu, 02 Jun 2005 22:25:59 GMT

Hello,


I have race distribution (W,B,H) by year (2000-2003) and I want to test
whether there is any trending (increasing or decreasing) WITHIN each
race group. In other words, for Hispanics, is the population
significantly increasing or decreasing by year. I know there is a CHISQ
test for trend, but I don't know how to set it up. I know there is a
significant difference in race distribution BETWEEN groups, but that's
not what I want. Can anyone help me with this?


Thanks
Enrique


sas >> trend test

by dhssresearcher » Fri, 03 Jun 2005 09:58:23 GMT


First off, Hispanic is not a race. Its an ethnic group. You're going to have muddied the data as there are people who are Black and Hispanic, Black and non Hispanic, and the same thing with White. Where do they go? And what are you losing by doing it this way.

Three years of data is not enough for a real trend.

But, if you want to do it anyway, you're going to have to compare your data against something. Population, K-Mart shoppers, Pointy headed bosses, statisticians. Some standard you can gear your data against.




__________________________________________________________________
Switch to Netscape Internet Service.
As low as $9.95 a month -- Sign up today at http://isp.netscape.com/register

Netscape. Just the Net You Need.

New! Netscape Toolbar for Internet Explorer
Search from anywhere on the Web and block those annoying pop-ups.
Download now at http://channels.netscape.com/ns/search/install.jsp

sas >> trend test

by flom » Fri, 03 Jun 2005 18:30:43 GMT

I missed the original post, and only saw David Cassell's response, where
(as usual) he offered excellent advice. I add my thoughts at the end



<<<<
I have race distribution (W,B,H) by year (2000-2003) and I want to test
whether there is any trending (increasing or decreasing) WITHIN each
race group. In other words, for Hispanics, is the population
significantly increasing or decreasing by year. I know there is a CHISQ
test for trend, but I don't know how to set it up. I know there is a
significant difference in race distribution BETWEEN groups, but that's
not what I want. Can anyone help me with this?

David replied

<<<<
I'm sure someone on SAS-L can help you. But I think you're going to
have to give us more information before we can help.
You say you have "race distribution (W,B,H) by year (2000-2003)". What
does that mean?

Do you have some percents in a 3x4 table? If that's what you have, then
you're not going to be able to perform a chi-squared test in PROC FREQ.
And if you only have a 3x4 table, then you're not going to have many
degrees of freedom for whatever test you want. Fewer degrees of freedom
makes it that much harder to establish a statistically valid effect.

When you plot your data, do you see trends for each separate race class?
If not, then this could be a problem in your analysis. Do they all go
in the same direction? If not, then you'll have interaction terms which
have to be dealt with.

So perhaps you could write back to SAS-L (not to me personally) and
explain more about your data and your goals. Then we can give you some
better advice.

I *THINK* that he has a 3x4 table In each row, he has 4 %s (what these
are isn't clear - but maybe it's % of people who have or do something).
So, if he wants a test for trend WITHIN race, he could turn it into 4
2x4 tables, each with ONE race, and the rows being "yes" and "no" or
whatever, and then do a test for trend, or (my preference) a Jonckheere
Terpstra test
for each table.

Of course, I am operating somewhat blind here, given the lack of
information, and may be all wrong.

Peter

Peter L. Flom, PhD
Assistant Director, Statistics and Data Analysis Core
Center for Drug Use and HIV Research
National Development and Research Institutes
71 W. 23rd St
www.peterflom.com
New York, NY 10010
(212) 845-4485 (voice)
(917) 438-0894 (fax)

sas >> Trend Test

by sudip.memphis » Thu, 28 Aug 2008 23:59:59 GMT


Dear All,

In my dataset I have the counts or number of people infected with
disease A (only A). I have information of cities from year 1980 -
2000. What I want is to do a simple trend test : My concern is that
can I do this test with counts ?

My data looks like

City year People_infectd
A 1980 120
A 1981 122
A 1982 133
...
....
A 2000 500
....

E 1981 250
...
....
....

E 2000 700


Should I use GLIMMIX for this kind of analysis ?


Like :

Proc Glimmix data = mydata ;
nloptions tech = nrridg ;
class city ;
people_infectd = year / link = log dist = poisson solution ;
random _residual_ / subject = city type = ar(1);
run ;

Need some feedback ???

Regards

sas >> Trend Test

by peterflomconsulting » Fri, 29 Aug 2008 00:57:39 GMT

sudip chatterjee < XXXX@XXXXX.COM > wrote



Hi Sudip

You should definitely use a mixed model, probably with random intercept, and maybe random slope as well. I happen to know that you have about 100 cities.

The question is whether the POISSON distribution is right. You probably want to look at the residuals from the model (using ODS GRAPHICS and the RESIDUAL option).

HTH

Peter

Peter L. Flom, PhD
Statistical Consultant
www DOT peterflom DOT com

sas >> Trend Test

by stringplayer_2 » Fri, 29 Aug 2008 01:26:58 GMT

-- On Thu, 8/28/08, sudip chatterjee < XXXX@XXXXX.COM > wrote:


Sudip,

Yes, the GLIMMIX procedure is appropriate for this analysis. But
I am not sure that your code is appropriate. Let me ask you a
couple of questions that might shed light on the appropriateness
of the code you present.

Question 1: Do the city populations increase over the 21 years of
data collection? If the populations increase over this time frame,
then wouldn't the counts increase just because of the population
increase? OK, that is two questions right there, but they are
really part and parcel of the same fundamental problem that interest
probably lies not in the absolute counts (which should increase over
time), but in the rate of occurrence. That is, the fundamental
question is probably not whether there is an increase in the number
of infections over that time frame, but rather whether the infection
rate has increased.

In order to address whether the infection rate has increased, you
need to include log(city population in year i) as an offset parameter
in your model. Actually, if the infections are expected in just a
particular population demographic, then you would ideally use the
population of that demographic in each city in each year. Do you
have or can you get such information?

Question 2: Apart from differences in population, are there
differences between cities in the infection rates? And if you
believe that the answer to that question is yes, then as a follow-up,
I would ask whether you are interested in trend in infection status
only in these 5 cities, or do these cities represent a larger
population of cities that you would want your inferences extended
to? (OK, now I am up to four questions, but who's counting except
me? Whoops, 5 questions!)

If there are (or could be) differences in infection rates across
cities, then you need to include city in your model. Assuming that
these 5 cities are not the only cities of interest and that you want
inference about trend to extend to other cities, then city should
enter the model as a random effect. If you are only interested in
the 5 cities that appear in your data, then city should enter the
model as a fixed effect.

Assuming that these cities are only representative of some larger
population of cities and that city population (for some target
demographic) is available in each year, then appropriate code
would be:

Proc Glimmix data=mydata ;
nloptions tech=nrridg ;
class city ;
model people_infectd = year / offset=log_pop_cityXyear
dist=poisson
solution ;
random intercept / subject=city;
random _residual_ / subject=city type=ar(1);
run ;


where log_pop_cityXyear=log(target_population) in a particular city
by year combination is computed in a data step prior to invoking
the GLIMMIX procedure.

Alternatively, since the infection counts are large and since the
Poisson converges to a normal distribution for large expectation,
you could compute a variable RATE = COUNT / (TARGET_POPULATION)
and then use RATE as the response variable. There would be no need
for the offset parameter and the distribution of the response would
be assumed normal. You can plot these rates by year for each city
and your audience will immediately see a trend in rates as well as
city to city differences in rates.

HTH,

Dale

-------------------------------------

sas >> Trend Test

by sudip.memphis » Fri, 29 Aug 2008 01:51:50 GMT

ale thank you.

Let me start " Bottom Up "

As you mentioned

" Alternatively, since the infection counts are large and since the
Poisson converges to a normal distribution for large expectation,
you could compute a variable RATE = COUNT / (TARGET_POPULATION)
and then use RATE as the response variable. There would be no need
for the offset parameter and the distribution of the response would
be assumed normal. You can plot these rates by year for each city
and your audience will immediately see a trend in rates as well as
city to city differences in rates."

I did this initially, I have RATE = (Count/city_pop) *1000

Now the problem with counts are that in some year in some city either
thay have not reported or may be they are in reality very low/high
cases . The city population in all cities has increased but slightly
whereas the disease counts in some cities either increased 5 times or
have decreased 5 times ( the reason for this are innumerable )

So I thought let me check with the real report rather than creating a
Rate variable which I think somehow not capturing the right process
hence I decided to go for this model



" Apart from differences in population, are there
differences between cities in the infection rates? And if you
believe that the answer to that question is yes, then as a follow-up,
I would ask whether you are interested in trend in infection status
only in these 5 cities, or do these cities represent a larger
population of cities that you would want your inferences extended
to? If there are (or could be) differences in infection rates across
cities, then you need to include city in your model. Assuming that
these 5 cities are not the only cities of interest and that you want
inference about trend to extend to other cities, then city should
enter the model as a random effect. If you are only interested in
the 5 cities that appear in your data, then city should enter the
model as a fixed effect."

Yes there are sometime huge differences in both counts and rates
between cities and sometime within cities. I have more than 50 cities
in the dataset.


" Do the city populations increase over the 21 years of
data collection? If the populations increase over this time frame,
then wouldn't the counts increase just because of the population
increase? OK, that is two questions right there, but they are
really part and parcel of the same fundamental problem that interest
probably lies not in the absolute counts (which should increase over
time), but in the rate of occurrence. That is, the fundamental
question is probably not whether there is an increase in the number
of infections over that time frame, but rather whether the infection
rate has increased."

Yes the city population increased but in a very decent manner or
slightly. The problem is the disease count in some year in some cities
has increased or decreased a lot.

My aim is to figure out the cities which has either increase or
decrease in disease count. The only information I have in city level
is population.

I forgot to put the offset in my model but you showed me a new way of
population and year interaction as offset (thank you !).

The problem is with the disease report, so I think that creating
Rate variable wont capture the actual process going on, but if I do
the test on the actual counts may be it will figure out the actual
process.


sas >> Trend Test

by ajayohri » Fri, 29 Aug 2008 11:31:50 GMT

trend test or=A0time series....(where disease function of time ?)
=A0
have you had a look at ARIMA , or better stil the ETS /HPF
=A0
that will show trend or no trend.

=A0




From: sudip chatterjee < XXXX@XXXXX.COM >
Subject: Trend Test
To: XXXX@XXXXX.COM
Date: Thursday, August 28, 2008, 9:29 PM

Dear All,

In my dataset I have the counts or number of people infected with
disease A (only A). I have information of cities from year 1980 -
2000. What I want is to do a simple trend test : My concern is that
can I do this test with counts ?

My data looks like

City year People_infectd
A 1980 120
A 1981 122
A 1982 133
...
....
A 2000 500
....

E 1981 250
...
....
....

E 2000 700


Should I use GLIMMIX for this kind of analysis ?


Like :

Proc Glimmix data =3D mydata ;
nloptions tech =3D nrridg ;
class city ;
people_infectd =3D year / link =3D log dist =3D poisson solution ;
random _residual_ / subject =3D city type =3D ar(1);
run ;

Need some feedback ???

Regards
=0A=0A=0A

sas >> Trend Test

by Warren.Schlechte » Fri, 29 Aug 2008 21:49:28 GMT

'll respond, but not to the SAS side as I think Dale is much more
qualified than myself.

You likely know this, but just a reminder; you also need to concern
yourself with reporting and testing rates. Disease incidence could be
the same, but increases/decreases could reflect changes in reporting and
testing. Not quite sure how you could address it, except for some ad
hoc ideas associated with looking for covariates that might suggest
changes in these rates, then adjusting for these covariates.

Warren Schlechte

-----Original Message-----
From: sudip chatterjee [mailto: XXXX@XXXXX.COM ]=20
Sent: Thursday, August 28, 2008 12:52 PM
Subject: Re: Trend Test

Dale thank you.

Let me start " Bottom Up "

As you mentioned

" Alternatively, since the infection counts are large and since the
Poisson converges to a normal distribution for large expectation,
you could compute a variable RATE =3D COUNT / (TARGET_POPULATION)
and then use RATE as the response variable. There would be no need
for the offset parameter and the distribution of the response would
be assumed normal. You can plot these rates by year for each city
and your audience will immediately see a trend in rates as well as
city to city differences in rates."

I did this initially, I have RATE =3D (Count/city_pop) *1000

Now the problem with counts are that in some year in some city either
thay have not reported or may be they are in reality very low/high
cases . The city population in all cities has increased but slightly
whereas the disease counts in some cities either increased 5 times or
have decreased 5 times ( the reason for this are innumerable )

So I thought let me check with the real report rather than creating a
Rate variable which I think somehow not capturing the right process
hence I decided to go for this model



" Apart from differences in population, are there
differences between cities in the infection rates? And if you
believe that the answer to that question is yes, then as a follow-up,
I would ask whether you are interested in trend in infection status
only in these 5 cities, or do these cities represent a larger
population of cities that you would want your inferences extended
to? If there are (or could be) differences in infection rates across
cities, then you need to include city in your model. Assuming that
these 5 cities are not the only cities of interest and that you want
inference about trend to extend to other cities, then city should
enter the model as a random effect. If you are only interested in
the 5 cities that appear in your data, then city should enter the
model as a fixed effect."

Yes there are sometime huge differences in both counts and rates
between cities and sometime within cities. I have more than 50 cities
in the dataset.


" Do the city populations increase over the 21 years of
data collection? If the populations increase over this time frame,
then wouldn't the counts increase just because of the population
increase? OK, that is two questions right there, but they are
really part and parcel of the same fundamental problem that interest
probably lies not in the absolute counts (which should increase over
time), but in the rate of occurrence. That is, the fundamental
question is probably not whether there is an increase in the number
of infections over that time frame, but rather whether the infection
rate has increased

sas >> Trend Test

by sudip.memphis » Fri, 29 Aug 2008 22:12:01 GMT

arren,

Yeah, thats what I was wondering. Looking at rate only contains lot of
confounding information. Latent variable modeling is one way I know to
go for it but due to software unavailibility I am trying to do a model
based on counts only. I am struggling with the fact to know what is
the role of OFFSET function in poisson model ?

I understand the meaning of it and which variables to put in that
function but I have doubts in the mechanism they play in the model.

I will appreciate if somebody explains me that, then may be I will be
more than confident, that what I am trying to do worth to do !

Regards

On Fri, Aug 29, 2008 at 9:49 AM, Warren Schlechte
< XXXX@XXXXX.COM > wrote:

sas >> Trend Test

by Warren.Schlechte » Fri, 29 Aug 2008 22:29:29 GMT

he offset works just like the rate conversion does.

Consider a Poisson process with a rate of 1 occurrence per 1000 events.
If you observe 1000 events, you should, on average see 1 occurrence. If
instead, you observe 10,000 events, you should see 10 occurrences. If
you just modeled the occurrences, without accounting for the changing
number of events, you could infer something that is merely an artifact
of the unequal observations. In the model, the log(occurrences) are
used as the offset to account for this. The log(occurrences) is used
because the Poisson model uses a log link within the general linear
modeling framework.

HTH

Warren Schlechte


-----Original Message-----
From: sudip chatterjee [mailto: XXXX@XXXXX.COM ]
Sent: Friday, August 29, 2008 9:12 AM
To: Warren Schlechte
Cc: XXXX@XXXXX.COM
Subject: Re: Trend Test

Warren,

Yeah, thats what I was wondering. Looking at rate only contains lot of
confounding information. Latent variable modeling is one way I know to
go for it but due to software unavailibility I am trying to do a model
based on counts only. I am struggling with the fact to know what is
the role of OFFSET function in poisson model ?

I understand the meaning of it and which variables to put in that
function but I have doubts in the mechanism they play in the model.

I will appreciate if somebody explains me that, then may be I will be
more than confident, that what I am trying to do worth to do !

Regards

On Fri, Aug 29, 2008 at 9:49 AM, Warren Schlechte
< XXXX@XXXXX.COM > wrote:
and
wrote:

sas >> Trend Test

by citam.sasl » Fri, 29 Aug 2008 22:53:50 GMT

On Fri, 29 Aug 2008 09:29:29 -0500, Warren Schlechte



Dale has clarified this and I have beat the question concerning the
logistic version into the ground.

Consider the generalized linear model formulation (McCullagh and Nelder):

We related the mean rate to the linear combination of covariates using a
log link function:

log(lambda) = eta = Xb

For varying time spans, we also have

E(Y) = t * lambda

Noting that exp(log(u)) = u for u > 0, we then have

E(Y) = t * exp(log(lambda))

log[E(Y)] = log(t) + log(lambda)
= log(t) + Xb


lambda is a rate or density. For instance, crashes per week at an
intersection or bacteria per cubic centimeter, respectively. If our rate
is 1 crash per week and we observe the intersection for a month, how many
crashes do you expect to observe?

We need to adjust for the number of units, if they vary. If not, we call
the unit = 1 and note that log(1) = 0....

HTH,

Kevin

Similar Threads

1. Question abou trend test

Hi, all,

I want to do a trend test of the positivity rate along with year. We only
have 5 years data. Here is the data:

Year   Rate
1999     9.3
2000     9.2
2001     9.3
2002     9.9
2003     9.3

What's the best way to test the trend and get the CI wih so few dots?

Thank you very much for your help!

Sarah

2. mean trend test

3. GLM linear trend test?

I have a simple ANOVA design:

   Independent (class) variable = x (ordered categorical, four levels
[1,2,3,4])
   Dependent variable           = y (continuous)

I wish to:

   1. Use proc glm (or another proc) to test for a linear relationship
(trend)
      between the class variable and y.

   2. Save the p-value for the test in a dataset (for merging and
table-production).

As a first pass, I tried the code:

proc glm data = dsname;
   class x;
   model y = x;
   estimate  "Linear trend for x"  x 3 -1  1  3;
run;

(I took the 'estimate' line from an earlier post in SAS-L).

This doesn't seem to work, however.  I get this message:

   NOTE: Linear trend for x is not estimable.

My first thought was that maybe SAS was using the formatted values of
x (which are
non-numeric), but I don't think that's the issue (the default should
be to treat
x in its original numeric form, I believe).

Any suggestions?

4. problem of parametric trend test

5. Jonkheere trend test

Hi All,

I am little bit confused while performing this test.

I looked up the web found two options

1) Using Jt option in proc freq(sas calls this Jonckheere tepesta
2) ordering the doses and using kandell correction in proc correlate

I am not very clear about the difference between the two as I get
different results from these.

Can anybody explain me the difference between the two.

Can anybody suggest me how to perform non parametric jonkheere test

Thanks in advance

Thanks
Ankur

6. Trend test p-value vs. heterogeneity test p-value in PROC MIXED

7. Trend test - JT test equivalent to Kendall tau-b for continuous variables?

8. Testing for trends