Hello,

I have race distribution (W,B,H) by year (2000-2003) and I want to test

whether there is any trending (increasing or decreasing) WITHIN each

race group. In other words, for Hispanics, is the population

significantly increasing or decreasing by year. I know there is a CHISQ

test for trend, but I don't know how to set it up. I know there is a

significant difference in race distribution BETWEEN groups, but that's

not what I want. Can anyone help me with this?

Thanks

Enrique

I have race distribution (W,B,H) by year (2000-2003) and I want to test

whether there is any trending (increasing or decreasing) WITHIN each

race group. In other words, for Hispanics, is the population

significantly increasing or decreasing by year. I know there is a CHISQ

test for trend, but I don't know how to set it up. I know there is a

significant difference in race distribution BETWEEN groups, but that's

not what I want. Can anyone help me with this?

Thanks

Enrique

First off, Hispanic is not a race. Its an ethnic group. You're going to have muddied the data as there are people who are Black and Hispanic, Black and non Hispanic, and the same thing with White. Where do they go? And what are you losing by doing it this way.

Three years of data is not enough for a real trend.

But, if you want to do it anyway, you're going to have to compare your data against something. Population, K-Mart shoppers, Pointy headed bosses, statisticians. Some standard you can gear your data against.

__________________________________________________________________

Switch to Netscape Internet Service.

As low as $9.95 a month -- Sign up today at http://isp.netscape.com/register

Netscape. Just the Net You Need.

New! Netscape Toolbar for Internet Explorer

Search from anywhere on the Web and block those annoying pop-ups.

Download now at http://channels.netscape.com/ns/search/install.jsp

I missed the original post, and only saw David Cassell's response, where

(as usual) he offered excellent advice. I add my thoughts at the end

<<<<

I have race distribution (W,B,H) by year (2000-2003) and I want to test

whether there is any trending (increasing or decreasing) WITHIN each

race group. In other words, for Hispanics, is the population

significantly increasing or decreasing by year. I know there is a CHISQ

test for trend, but I don't know how to set it up. I know there is a

significant difference in race distribution BETWEEN groups, but that's

not what I want. Can anyone help me with this?

David replied

<<<<

I'm sure someone on SAS-L can help you. But I think you're going to

have to give us more information before we can help.

You say you have "race distribution (W,B,H) by year (2000-2003)". What

does that mean?

Do you have some percents in a 3x4 table? If that's what you have, then

you're not going to be able to perform a chi-squared test in PROC FREQ.

And if you only have a 3x4 table, then you're not going to have many

degrees of freedom for whatever test you want. Fewer degrees of freedom

makes it that much harder to establish a statistically valid effect.

When you plot your data, do you see trends for each separate race class?

If not, then this could be a problem in your analysis. Do they all go

in the same direction? If not, then you'll have interaction terms which

have to be dealt with.

So perhaps you could write back to SAS-L (not to me personally) and

explain more about your data and your goals. Then we can give you some

better advice.

I *THINK* that he has a 3x4 table In each row, he has 4 %s (what these

are isn't clear - but maybe it's % of people who have or do something).

So, if he wants a test for trend WITHIN race, he could turn it into 4

2x4 tables, each with ONE race, and the rows being "yes" and "no" or

whatever, and then do a test for trend, or (my preference) a Jonckheere

Terpstra test

for each table.

Of course, I am operating somewhat blind here, given the lack of

information, and may be all wrong.

Peter

Peter L. Flom, PhD

Assistant Director, Statistics and Data Analysis Core

Center for Drug Use and HIV Research

National Development and Research Institutes

71 W. 23rd St

www.peterflom.com

New York, NY 10010

(212) 845-4485 (voice)

(917) 438-0894 (fax)

(as usual) he offered excellent advice. I add my thoughts at the end

<<<<

I have race distribution (W,B,H) by year (2000-2003) and I want to test

whether there is any trending (increasing or decreasing) WITHIN each

race group. In other words, for Hispanics, is the population

significantly increasing or decreasing by year. I know there is a CHISQ

test for trend, but I don't know how to set it up. I know there is a

significant difference in race distribution BETWEEN groups, but that's

not what I want. Can anyone help me with this?

David replied

<<<<

I'm sure someone on SAS-L can help you. But I think you're going to

have to give us more information before we can help.

You say you have "race distribution (W,B,H) by year (2000-2003)". What

does that mean?

Do you have some percents in a 3x4 table? If that's what you have, then

you're not going to be able to perform a chi-squared test in PROC FREQ.

And if you only have a 3x4 table, then you're not going to have many

degrees of freedom for whatever test you want. Fewer degrees of freedom

makes it that much harder to establish a statistically valid effect.

When you plot your data, do you see trends for each separate race class?

If not, then this could be a problem in your analysis. Do they all go

in the same direction? If not, then you'll have interaction terms which

have to be dealt with.

So perhaps you could write back to SAS-L (not to me personally) and

explain more about your data and your goals. Then we can give you some

better advice.

I *THINK* that he has a 3x4 table In each row, he has 4 %s (what these

are isn't clear - but maybe it's % of people who have or do something).

So, if he wants a test for trend WITHIN race, he could turn it into 4

2x4 tables, each with ONE race, and the rows being "yes" and "no" or

whatever, and then do a test for trend, or (my preference) a Jonckheere

Terpstra test

for each table.

Of course, I am operating somewhat blind here, given the lack of

information, and may be all wrong.

Peter

Peter L. Flom, PhD

Assistant Director, Statistics and Data Analysis Core

Center for Drug Use and HIV Research

National Development and Research Institutes

71 W. 23rd St

www.peterflom.com

New York, NY 10010

(212) 845-4485 (voice)

(917) 438-0894 (fax)

Dear All,

In my dataset I have the counts or number of people infected with

disease A (only A). I have information of cities from year 1980 -

2000. What I want is to do a simple trend test : My concern is that

can I do this test with counts ?

My data looks like

City year People_infectd

A 1980 120

A 1981 122

A 1982 133

...

....

A 2000 500

....

E 1981 250

...

....

....

E 2000 700

Should I use GLIMMIX for this kind of analysis ?

Like :

Proc Glimmix data = mydata ;

nloptions tech = nrridg ;

class city ;

people_infectd = year / link = log dist = poisson solution ;

random _residual_ / subject = city type = ar(1);

run ;

Need some feedback ???

Regards

sudip chatterjee < XXXX@XXXXX.COM > wrote

Hi Sudip

You should definitely use a mixed model, probably with random intercept, and maybe random slope as well. I happen to know that you have about 100 cities.

The question is whether the POISSON distribution is right. You probably want to look at the residuals from the model (using ODS GRAPHICS and the RESIDUAL option).

HTH

Peter

Peter L. Flom, PhD

Statistical Consultant

www DOT peterflom DOT com

Hi Sudip

You should definitely use a mixed model, probably with random intercept, and maybe random slope as well. I happen to know that you have about 100 cities.

The question is whether the POISSON distribution is right. You probably want to look at the residuals from the model (using ODS GRAPHICS and the RESIDUAL option).

HTH

Peter

Peter L. Flom, PhD

Statistical Consultant

www DOT peterflom DOT com

-- On Thu, 8/28/08, sudip chatterjee < XXXX@XXXXX.COM > wrote:

Sudip,

Yes, the GLIMMIX procedure is appropriate for this analysis. But

I am not sure that your code is appropriate. Let me ask you a

couple of questions that might shed light on the appropriateness

of the code you present.

Question 1: Do the city populations increase over the 21 years of

data collection? If the populations increase over this time frame,

then wouldn't the counts increase just because of the population

increase? OK, that is two questions right there, but they are

really part and parcel of the same fundamental problem that interest

probably lies not in the absolute counts (which should increase over

time), but in the rate of occurrence. That is, the fundamental

question is probably not whether there is an increase in the number

of infections over that time frame, but rather whether the infection

rate has increased.

In order to address whether the infection rate has increased, you

need to include log(city population in year i) as an offset parameter

in your model. Actually, if the infections are expected in just a

particular population demographic, then you would ideally use the

population of that demographic in each city in each year. Do you

have or can you get such information?

Question 2: Apart from differences in population, are there

differences between cities in the infection rates? And if you

believe that the answer to that question is yes, then as a follow-up,

I would ask whether you are interested in trend in infection status

only in these 5 cities, or do these cities represent a larger

population of cities that you would want your inferences extended

to? (OK, now I am up to four questions, but who's counting except

me? Whoops, 5 questions!)

If there are (or could be) differences in infection rates across

cities, then you need to include city in your model. Assuming that

these 5 cities are not the only cities of interest and that you want

inference about trend to extend to other cities, then city should

enter the model as a random effect. If you are only interested in

the 5 cities that appear in your data, then city should enter the

model as a fixed effect.

Assuming that these cities are only representative of some larger

population of cities and that city population (for some target

demographic) is available in each year, then appropriate code

would be:

Proc Glimmix data=mydata ;

nloptions tech=nrridg ;

class city ;

model people_infectd = year / offset=log_pop_cityXyear

dist=poisson

solution ;

random intercept / subject=city;

random _residual_ / subject=city type=ar(1);

run ;

where log_pop_cityXyear=log(target_population) in a particular city

by year combination is computed in a data step prior to invoking

the GLIMMIX procedure.

Alternatively, since the infection counts are large and since the

Poisson converges to a normal distribution for large expectation,

you could compute a variable RATE = COUNT / (TARGET_POPULATION)

and then use RATE as the response variable. There would be no need

for the offset parameter and the distribution of the response would

be assumed normal. You can plot these rates by year for each city

and your audience will immediately see a trend in rates as well as

city to city differences in rates.

HTH,

Dale

-------------------------------------

Sudip,

Yes, the GLIMMIX procedure is appropriate for this analysis. But

I am not sure that your code is appropriate. Let me ask you a

couple of questions that might shed light on the appropriateness

of the code you present.

Question 1: Do the city populations increase over the 21 years of

data collection? If the populations increase over this time frame,

then wouldn't the counts increase just because of the population

increase? OK, that is two questions right there, but they are

really part and parcel of the same fundamental problem that interest

probably lies not in the absolute counts (which should increase over

time), but in the rate of occurrence. That is, the fundamental

question is probably not whether there is an increase in the number

of infections over that time frame, but rather whether the infection

rate has increased.

In order to address whether the infection rate has increased, you

need to include log(city population in year i) as an offset parameter

in your model. Actually, if the infections are expected in just a

particular population demographic, then you would ideally use the

population of that demographic in each city in each year. Do you

have or can you get such information?

Question 2: Apart from differences in population, are there

differences between cities in the infection rates? And if you

believe that the answer to that question is yes, then as a follow-up,

I would ask whether you are interested in trend in infection status

only in these 5 cities, or do these cities represent a larger

population of cities that you would want your inferences extended

to? (OK, now I am up to four questions, but who's counting except

me? Whoops, 5 questions!)

If there are (or could be) differences in infection rates across

cities, then you need to include city in your model. Assuming that

these 5 cities are not the only cities of interest and that you want

inference about trend to extend to other cities, then city should

enter the model as a random effect. If you are only interested in

the 5 cities that appear in your data, then city should enter the

model as a fixed effect.

Assuming that these cities are only representative of some larger

population of cities and that city population (for some target

demographic) is available in each year, then appropriate code

would be:

Proc Glimmix data=mydata ;

nloptions tech=nrridg ;

class city ;

model people_infectd = year / offset=log_pop_cityXyear

dist=poisson

solution ;

random intercept / subject=city;

random _residual_ / subject=city type=ar(1);

run ;

where log_pop_cityXyear=log(target_population) in a particular city

by year combination is computed in a data step prior to invoking

the GLIMMIX procedure.

Alternatively, since the infection counts are large and since the

Poisson converges to a normal distribution for large expectation,

you could compute a variable RATE = COUNT / (TARGET_POPULATION)

and then use RATE as the response variable. There would be no need

for the offset parameter and the distribution of the response would

be assumed normal. You can plot these rates by year for each city

and your audience will immediately see a trend in rates as well as

city to city differences in rates.

HTH,

Dale

-------------------------------------

ale thank you.

Let me start " Bottom Up "

As you mentioned

" Alternatively, since the infection counts are large and since the

Poisson converges to a normal distribution for large expectation,

you could compute a variable RATE = COUNT / (TARGET_POPULATION)

and then use RATE as the response variable. There would be no need

for the offset parameter and the distribution of the response would

be assumed normal. You can plot these rates by year for each city

and your audience will immediately see a trend in rates as well as

city to city differences in rates."

I did this initially, I have RATE = (Count/city_pop) *1000

Now the problem with counts are that in some year in some city either

thay have not reported or may be they are in reality very low/high

cases . The city population in all cities has increased but slightly

whereas the disease counts in some cities either increased 5 times or

have decreased 5 times ( the reason for this are innumerable )

So I thought let me check with the real report rather than creating a

Rate variable which I think somehow not capturing the right process

hence I decided to go for this model

" Apart from differences in population, are there

differences between cities in the infection rates? And if you

believe that the answer to that question is yes, then as a follow-up,

I would ask whether you are interested in trend in infection status

only in these 5 cities, or do these cities represent a larger

population of cities that you would want your inferences extended

to? If there are (or could be) differences in infection rates across

cities, then you need to include city in your model. Assuming that

these 5 cities are not the only cities of interest and that you want

inference about trend to extend to other cities, then city should

enter the model as a random effect. If you are only interested in

the 5 cities that appear in your data, then city should enter the

model as a fixed effect."

Yes there are sometime huge differences in both counts and rates

between cities and sometime within cities. I have more than 50 cities

in the dataset.

" Do the city populations increase over the 21 years of

data collection? If the populations increase over this time frame,

then wouldn't the counts increase just because of the population

increase? OK, that is two questions right there, but they are

really part and parcel of the same fundamental problem that interest

probably lies not in the absolute counts (which should increase over

time), but in the rate of occurrence. That is, the fundamental

question is probably not whether there is an increase in the number

of infections over that time frame, but rather whether the infection

rate has increased."

Yes the city population increased but in a very decent manner or

slightly. The problem is the disease count in some year in some cities

has increased or decreased a lot.

My aim is to figure out the cities which has either increase or

decrease in disease count. The only information I have in city level

is population.

I forgot to put the offset in my model but you showed me a new way of

population and year interaction as offset (thank you !).

The problem is with the disease report, so I think that creating

Rate variable wont capture the actual process going on, but if I do

the test on the actual counts may be it will figure out the actual

process.

Let me start " Bottom Up "

As you mentioned

" Alternatively, since the infection counts are large and since the

Poisson converges to a normal distribution for large expectation,

you could compute a variable RATE = COUNT / (TARGET_POPULATION)

and then use RATE as the response variable. There would be no need

for the offset parameter and the distribution of the response would

be assumed normal. You can plot these rates by year for each city

and your audience will immediately see a trend in rates as well as

city to city differences in rates."

I did this initially, I have RATE = (Count/city_pop) *1000

Now the problem with counts are that in some year in some city either

thay have not reported or may be they are in reality very low/high

cases . The city population in all cities has increased but slightly

whereas the disease counts in some cities either increased 5 times or

have decreased 5 times ( the reason for this are innumerable )

So I thought let me check with the real report rather than creating a

Rate variable which I think somehow not capturing the right process

hence I decided to go for this model

" Apart from differences in population, are there

differences between cities in the infection rates? And if you

believe that the answer to that question is yes, then as a follow-up,

I would ask whether you are interested in trend in infection status

only in these 5 cities, or do these cities represent a larger

population of cities that you would want your inferences extended

to? If there are (or could be) differences in infection rates across

cities, then you need to include city in your model. Assuming that

these 5 cities are not the only cities of interest and that you want

inference about trend to extend to other cities, then city should

enter the model as a random effect. If you are only interested in

the 5 cities that appear in your data, then city should enter the

model as a fixed effect."

Yes there are sometime huge differences in both counts and rates

between cities and sometime within cities. I have more than 50 cities

in the dataset.

" Do the city populations increase over the 21 years of

data collection? If the populations increase over this time frame,

then wouldn't the counts increase just because of the population

increase? OK, that is two questions right there, but they are

really part and parcel of the same fundamental problem that interest

probably lies not in the absolute counts (which should increase over

time), but in the rate of occurrence. That is, the fundamental

question is probably not whether there is an increase in the number

of infections over that time frame, but rather whether the infection

rate has increased."

Yes the city population increased but in a very decent manner or

slightly. The problem is the disease count in some year in some cities

has increased or decreased a lot.

My aim is to figure out the cities which has either increase or

decrease in disease count. The only information I have in city level

is population.

I forgot to put the offset in my model but you showed me a new way of

population and year interaction as offset (thank you !).

The problem is with the disease report, so I think that creating

Rate variable wont capture the actual process going on, but if I do

the test on the actual counts may be it will figure out the actual

process.

trend test or=A0time series....(where disease function of time ?)

=A0

have you had a look at ARIMA , or better stil the ETS /HPF

=A0

that will show trend or no trend.

=A0

From: sudip chatterjee < XXXX@XXXXX.COM >

Subject: Trend Test

To: XXXX@XXXXX.COM

Date: Thursday, August 28, 2008, 9:29 PM

Dear All,

In my dataset I have the counts or number of people infected with

disease A (only A). I have information of cities from year 1980 -

2000. What I want is to do a simple trend test : My concern is that

can I do this test with counts ?

My data looks like

City year People_infectd

A 1980 120

A 1981 122

A 1982 133

...

....

A 2000 500

....

E 1981 250

...

....

....

E 2000 700

Should I use GLIMMIX for this kind of analysis ?

Like :

Proc Glimmix data =3D mydata ;

nloptions tech =3D nrridg ;

class city ;

people_infectd =3D year / link =3D log dist =3D poisson solution ;

random _residual_ / subject =3D city type =3D ar(1);

run ;

Need some feedback ???

Regards

=0A=0A=0A

=A0

have you had a look at ARIMA , or better stil the ETS /HPF

=A0

that will show trend or no trend.

=A0

From: sudip chatterjee < XXXX@XXXXX.COM >

Subject: Trend Test

To: XXXX@XXXXX.COM

Date: Thursday, August 28, 2008, 9:29 PM

Dear All,

In my dataset I have the counts or number of people infected with

disease A (only A). I have information of cities from year 1980 -

2000. What I want is to do a simple trend test : My concern is that

can I do this test with counts ?

My data looks like

City year People_infectd

A 1980 120

A 1981 122

A 1982 133

...

....

A 2000 500

....

E 1981 250

...

....

....

E 2000 700

Should I use GLIMMIX for this kind of analysis ?

Like :

Proc Glimmix data =3D mydata ;

nloptions tech =3D nrridg ;

class city ;

people_infectd =3D year / link =3D log dist =3D poisson solution ;

random _residual_ / subject =3D city type =3D ar(1);

run ;

Need some feedback ???

Regards

=0A=0A=0A

'll respond, but not to the SAS side as I think Dale is much more

qualified than myself.

You likely know this, but just a reminder; you also need to concern

yourself with reporting and testing rates. Disease incidence could be

the same, but increases/decreases could reflect changes in reporting and

testing. Not quite sure how you could address it, except for some ad

hoc ideas associated with looking for covariates that might suggest

changes in these rates, then adjusting for these covariates.

Warren Schlechte

-----Original Message-----

From: sudip chatterjee [mailto: XXXX@XXXXX.COM ]=20

Sent: Thursday, August 28, 2008 12:52 PM

Subject: Re: Trend Test

Dale thank you.

Let me start " Bottom Up "

As you mentioned

" Alternatively, since the infection counts are large and since the

Poisson converges to a normal distribution for large expectation,

you could compute a variable RATE =3D COUNT / (TARGET_POPULATION)

and then use RATE as the response variable. There would be no need

for the offset parameter and the distribution of the response would

be assumed normal. You can plot these rates by year for each city

and your audience will immediately see a trend in rates as well as

city to city differences in rates."

I did this initially, I have RATE =3D (Count/city_pop) *1000

Now the problem with counts are that in some year in some city either

thay have not reported or may be they are in reality very low/high

cases . The city population in all cities has increased but slightly

whereas the disease counts in some cities either increased 5 times or

have decreased 5 times ( the reason for this are innumerable )

So I thought let me check with the real report rather than creating a

Rate variable which I think somehow not capturing the right process

hence I decided to go for this model

" Apart from differences in population, are there

differences between cities in the infection rates? And if you

believe that the answer to that question is yes, then as a follow-up,

I would ask whether you are interested in trend in infection status

only in these 5 cities, or do these cities represent a larger

population of cities that you would want your inferences extended

to? If there are (or could be) differences in infection rates across

cities, then you need to include city in your model. Assuming that

these 5 cities are not the only cities of interest and that you want

inference about trend to extend to other cities, then city should

enter the model as a random effect. If you are only interested in

the 5 cities that appear in your data, then city should enter the

model as a fixed effect."

Yes there are sometime huge differences in both counts and rates

between cities and sometime within cities. I have more than 50 cities

in the dataset.

" Do the city populations increase over the 21 years of

data collection? If the populations increase over this time frame,

then wouldn't the counts increase just because of the population

increase? OK, that is two questions right there, but they are

really part and parcel of the same fundamental problem that interest

probably lies not in the absolute counts (which should increase over

time), but in the rate of occurrence. That is, the fundamental

question is probably not whether there is an increase in the number

of infections over that time frame, but rather whether the infection

rate has increased

qualified than myself.

You likely know this, but just a reminder; you also need to concern

yourself with reporting and testing rates. Disease incidence could be

the same, but increases/decreases could reflect changes in reporting and

testing. Not quite sure how you could address it, except for some ad

hoc ideas associated with looking for covariates that might suggest

changes in these rates, then adjusting for these covariates.

Warren Schlechte

-----Original Message-----

From: sudip chatterjee [mailto: XXXX@XXXXX.COM ]=20

Sent: Thursday, August 28, 2008 12:52 PM

Subject: Re: Trend Test

Dale thank you.

Let me start " Bottom Up "

As you mentioned

" Alternatively, since the infection counts are large and since the

Poisson converges to a normal distribution for large expectation,

you could compute a variable RATE =3D COUNT / (TARGET_POPULATION)

and then use RATE as the response variable. There would be no need

for the offset parameter and the distribution of the response would

be assumed normal. You can plot these rates by year for each city

and your audience will immediately see a trend in rates as well as

city to city differences in rates."

I did this initially, I have RATE =3D (Count/city_pop) *1000

Now the problem with counts are that in some year in some city either

thay have not reported or may be they are in reality very low/high

cases . The city population in all cities has increased but slightly

whereas the disease counts in some cities either increased 5 times or

have decreased 5 times ( the reason for this are innumerable )

So I thought let me check with the real report rather than creating a

Rate variable which I think somehow not capturing the right process

hence I decided to go for this model

" Apart from differences in population, are there

differences between cities in the infection rates? And if you

believe that the answer to that question is yes, then as a follow-up,

I would ask whether you are interested in trend in infection status

only in these 5 cities, or do these cities represent a larger

population of cities that you would want your inferences extended

to? If there are (or could be) differences in infection rates across

cities, then you need to include city in your model. Assuming that

these 5 cities are not the only cities of interest and that you want

inference about trend to extend to other cities, then city should

enter the model as a random effect. If you are only interested in

the 5 cities that appear in your data, then city should enter the

model as a fixed effect."

Yes there are sometime huge differences in both counts and rates

between cities and sometime within cities. I have more than 50 cities

in the dataset.

" Do the city populations increase over the 21 years of

data collection? If the populations increase over this time frame,

then wouldn't the counts increase just because of the population

increase? OK, that is two questions right there, but they are

really part and parcel of the same fundamental problem that interest

probably lies not in the absolute counts (which should increase over

time), but in the rate of occurrence. That is, the fundamental

question is probably not whether there is an increase in the number

of infections over that time frame, but rather whether the infection

rate has increased

arren,

Yeah, thats what I was wondering. Looking at rate only contains lot of

confounding information. Latent variable modeling is one way I know to

go for it but due to software unavailibility I am trying to do a model

based on counts only. I am struggling with the fact to know what is

the role of OFFSET function in poisson model ?

I understand the meaning of it and which variables to put in that

function but I have doubts in the mechanism they play in the model.

I will appreciate if somebody explains me that, then may be I will be

more than confident, that what I am trying to do worth to do !

Regards

On Fri, Aug 29, 2008 at 9:49 AM, Warren Schlechte

< XXXX@XXXXX.COM > wrote:

Yeah, thats what I was wondering. Looking at rate only contains lot of

confounding information. Latent variable modeling is one way I know to

go for it but due to software unavailibility I am trying to do a model

based on counts only. I am struggling with the fact to know what is

the role of OFFSET function in poisson model ?

I understand the meaning of it and which variables to put in that

function but I have doubts in the mechanism they play in the model.

I will appreciate if somebody explains me that, then may be I will be

more than confident, that what I am trying to do worth to do !

Regards

On Fri, Aug 29, 2008 at 9:49 AM, Warren Schlechte

< XXXX@XXXXX.COM > wrote:

he offset works just like the rate conversion does.

Consider a Poisson process with a rate of 1 occurrence per 1000 events.

If you observe 1000 events, you should, on average see 1 occurrence. If

instead, you observe 10,000 events, you should see 10 occurrences. If

you just modeled the occurrences, without accounting for the changing

number of events, you could infer something that is merely an artifact

of the unequal observations. In the model, the log(occurrences) are

used as the offset to account for this. The log(occurrences) is used

because the Poisson model uses a log link within the general linear

modeling framework.

HTH

Warren Schlechte

-----Original Message-----

From: sudip chatterjee [mailto: XXXX@XXXXX.COM ]

Sent: Friday, August 29, 2008 9:12 AM

To: Warren Schlechte

Cc: XXXX@XXXXX.COM

Subject: Re: Trend Test

Warren,

Yeah, thats what I was wondering. Looking at rate only contains lot of

confounding information. Latent variable modeling is one way I know to

go for it but due to software unavailibility I am trying to do a model

based on counts only. I am struggling with the fact to know what is

the role of OFFSET function in poisson model ?

I understand the meaning of it and which variables to put in that

function but I have doubts in the mechanism they play in the model.

I will appreciate if somebody explains me that, then may be I will be

more than confident, that what I am trying to do worth to do !

Regards

On Fri, Aug 29, 2008 at 9:49 AM, Warren Schlechte

< XXXX@XXXXX.COM > wrote:

and

wrote:

Consider a Poisson process with a rate of 1 occurrence per 1000 events.

If you observe 1000 events, you should, on average see 1 occurrence. If

instead, you observe 10,000 events, you should see 10 occurrences. If

you just modeled the occurrences, without accounting for the changing

number of events, you could infer something that is merely an artifact

of the unequal observations. In the model, the log(occurrences) are

used as the offset to account for this. The log(occurrences) is used

because the Poisson model uses a log link within the general linear

modeling framework.

HTH

Warren Schlechte

-----Original Message-----

From: sudip chatterjee [mailto: XXXX@XXXXX.COM ]

Sent: Friday, August 29, 2008 9:12 AM

To: Warren Schlechte

Cc: XXXX@XXXXX.COM

Subject: Re: Trend Test

Warren,

Yeah, thats what I was wondering. Looking at rate only contains lot of

confounding information. Latent variable modeling is one way I know to

go for it but due to software unavailibility I am trying to do a model

based on counts only. I am struggling with the fact to know what is

the role of OFFSET function in poisson model ?

I understand the meaning of it and which variables to put in that

function but I have doubts in the mechanism they play in the model.

I will appreciate if somebody explains me that, then may be I will be

more than confident, that what I am trying to do worth to do !

Regards

On Fri, Aug 29, 2008 at 9:49 AM, Warren Schlechte

< XXXX@XXXXX.COM > wrote:

and

wrote:

On Fri, 29 Aug 2008 09:29:29 -0500, Warren Schlechte

Dale has clarified this and I have beat the question concerning the

logistic version into the ground.

Consider the generalized linear model formulation (McCullagh and Nelder):

We related the mean rate to the linear combination of covariates using a

log link function:

log(lambda) = eta = Xb

For varying time spans, we also have

E(Y) = t * lambda

Noting that exp(log(u)) = u for u > 0, we then have

E(Y) = t * exp(log(lambda))

log[E(Y)] = log(t) + log(lambda)

= log(t) + Xb

lambda is a rate or density. For instance, crashes per week at an

intersection or bacteria per cubic centimeter, respectively. If our rate

is 1 crash per week and we observe the intersection for a month, how many

crashes do you expect to observe?

We need to adjust for the number of units, if they vary. If not, we call

the unit = 1 and note that log(1) = 0....

HTH,

Kevin

Dale has clarified this and I have beat the question concerning the

logistic version into the ground.

Consider the generalized linear model formulation (McCullagh and Nelder):

We related the mean rate to the linear combination of covariates using a

log link function:

log(lambda) = eta = Xb

For varying time spans, we also have

E(Y) = t * lambda

Noting that exp(log(u)) = u for u > 0, we then have

E(Y) = t * exp(log(lambda))

log[E(Y)] = log(t) + log(lambda)

= log(t) + Xb

lambda is a rate or density. For instance, crashes per week at an

intersection or bacteria per cubic centimeter, respectively. If our rate

is 1 crash per week and we observe the intersection for a month, how many

crashes do you expect to observe?

We need to adjust for the number of units, if they vary. If not, we call

the unit = 1 and note that log(1) = 0....

HTH,

Kevin

Similar Threads

Hi, all, I want to do a trend test of the positivity rate along with year. We only have 5 years data. Here is the data: Year Rate 1999 9.3 2000 9.2 2001 9.3 2002 9.9 2003 9.3 What's the best way to test the trend and get the CI wih so few dots? Thank you very much for your help! Sarah

I have a simple ANOVA design: Independent (class) variable = x (ordered categorical, four levels [1,2,3,4]) Dependent variable = y (continuous) I wish to: 1. Use proc glm (or another proc) to test for a linear relationship (trend) between the class variable and y. 2. Save the p-value for the test in a dataset (for merging and table-production). As a first pass, I tried the code: proc glm data = dsname; class x; model y = x; estimate "Linear trend for x" x 3 -1 1 3; run; (I took the 'estimate' line from an earlier post in SAS-L). This doesn't seem to work, however. I get this message: NOTE: Linear trend for x is not estimable. My first thought was that maybe SAS was using the formatted values of x (which are non-numeric), but I don't think that's the issue (the default should be to treat x in its original numeric form, I believe). Any suggestions?

4. problem of parametric trend test

Hi All, I am little bit confused while performing this test. I looked up the web found two options 1) Using Jt option in proc freq(sas calls this Jonckheere tepesta 2) ordering the doses and using kandell correction in proc correlate I am not very clear about the difference between the two as I get different results from these. Can anybody explain me the difference between the two. Can anybody suggest me how to perform non parametric jonkheere test Thanks in advance Thanks Ankur

6. Trend test p-value vs. heterogeneity test p-value in PROC MIXED

7. Trend test - JT test equivalent to Kendall tau-b for continuous variables?