comp.soft-sys.sas - The SAS statistics package.
Hello everyone, I was wondering if there an easy way to get the inverse predicted values from a mixed models; that is, the values corresponding to the independent variable (X) based on the predicted values for the dependent variable (Y). Thanks, Tony
I am working on a project using the GLIMMIX procedure to estimate a random coefficients Logit model. BUT, I am struggling to understand some theoretical concepts and SAS programming under the MIXED/GLIMMIX procedure. I have some questions I was hoping to get help with. =20 1A. Theoretical Question: if you estimated a random coefficients model (say with PROC MIXED) would you expect the mean of the predicted values BY SUBJECT to equal the mean of the dependent variable BY SUBJECT? =20 For example, using data from SAS's example 41.5 for the MIXED procedure: ------------------------------- proc mixed data=3Drc; class Batch Monthc; model Y =3D / s outp=3Dpredicted; random Monthc / sub=3DBatch s; run; =20 proc sort data=3Dpredicted; by Batch; proc summary data=3Dpredicted; where Y~=3D.; by Batch;=20 var Y Pred; output out=3Dtestst1 mean=3D ; =20 proc print data=3Dtestst1; =20 =20 proc means data=3Dpredicted; where Y~=3D.; var Y Pred; ---------------------------- I found that the mean of Y and mean of Pred over all observations was the same, but that the mean of Y and mean of Pred within each Batch were NOT the same. Why would that be? Under what conditions would you expect them to be the same? =20 1B. In the non-linear world of GLIMMIX, I find that the mean of Y and Pred over ALL observations was not the same. Is this do to the non-linear nature (would one expect this theoretically) or is this likely a programming problem?=20 =20 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ =20 2. SAS programming question: I am trying to figure out what happens when you include a categorical variable in the random statement. In the code based on example 41.5 pasted above is there an equivalent, using dummy variables, to the "random Monthc" part of the program? (Note: In the example Monthc takes on 6 values: 0, 1, 3, 6, 9, 12.) =20 What I am struggling with is how PROC MIXED comes up with estimated coefficient for the fixed effect (the intercept) AS WELL AS estimates of random effects for each value of Monthc and each Batch (6 Monthc values * 3 Batch Values =3D 18 estimated coefficient). If I was doing this via dummy variables, I would think I would have to leave a dummy out and hence not have all those estimated co-efficients. ( Note: on page 2089 of the manual it says, in reference to the opening example: "The CLASS statement instructs PROC MIXED to consider (variables listed in the CLASS statement) as classification variables. Dummy (indicator) variables are, as a result, created corresponding to all of the distinct levels of (variables listed in the CLASS statement"). =20 I tried to recreate example 41.5 putting 6 dummies (month00 month01 month03 month06 month09 month12) in the random statement instead of Monthc. SAS smartly excluded one of them. (Though I'm not sure why they picked Month06 to exclude.) =20 =20 ---------------------------- data rc2; set rc; if month =3D 0 then month00 =3D 1; else month00 =3D 0;=20 if month =3D 1 then month01 =3D 1; else month01 =3D 0;=20 if month =3D 3 then month03 =3D 1; else month03 =3D 0;=20 if month =3D 6 then month06 =3D 1; else month06 =3D 0;=20 if month =3D 9 then month09 =3D 1; else month09 =3D 0;=20 if month =3D 12 then month12 =3D 1; else month12 =3D 0; =20 proc mixed data=3Drc2; class Batch ; model Y =3D / s outp=3Dpredicted2; random month00 month01 month03 month06 month09 month12 / sub=3DBatch s; run; ---------------------------- =20 =20 Any help or reference to understand what happens to classification variables in the random statement in PROC MIXED and GLIMMIX would be helpful. =20 =20 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ =20 =20 3. (Related to question 1) In the model that I am interested in, when I take the mean of the predicted values and the mean of the independent variable over all observations, the mean of the predicted values is less than the mean of the independent variable. Also, when I take the means within subjects (for me subjects are states), the means of the predicted values for each subject (state) is *less* than the mean of the independent variables for each subject (state). Am I doing something wrong? Why does this happen? =20 My code: =20 proc glimmix data=3Dt21.hlmALL03 method=3DMMPL ; class sdtype sevetype CLtype FIPS;=20 model IEPinc =3D / dist=3Dbinomial link=3Dlogit SOLUTION; random sdtype CLtype sevetype / sub=3DFIPS SOLUTION G; NLOPTIONS tech=3Dnrridg;=20 weight ORIGWT; =20 output out=3Dpreddata PREDICTED(ilink blup)=3Dpred; run;=20 =20 proc means data=3D preddata; var IEPinc pred; weight ORIGWT; =20 INFO on my Project given below. =20 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 4. About the project: I am trying to get estimates, for each state, of the relationship between a student's characteristics and whether or not he/she is included in an assessment. I intend to use the coeffieints of the model to apply to a subsequent year of data to do a decomposition (Oaxaca-Blinder / Findley for this non-linear case) FOR EACH STATE of the portion of change in inclusion rates that are due to changes in student population in the state and the proportion that is due to all other factors. I could estimate separate logits for each state, since I am really just interested in state by state changes, but particular SDtypes are not common, so a state may have 1 observations for = SDtype=3D4 in the first year and 10 in the year I am applying the model to. It was thought that combining the estimation in a multi-level model would help me estimate coefficeints for states with 0 or few observations for particular type (say SDtype=3D4) drawing on data from states with many observations. It is thought that states have some similarities in how they handle different types of students but also some differences and that differences are systemic state-wide. =20 IEPinc =3D 0 if not included; 1 if included on the assessment Sdtype has 13 values=20 Sevetype has 4 values CLtype has 3 values FIPS is the state identification variable =20 Note also that students are nested within schools within states, but I ignore, perhaps incorrectly, the with-in school nesting. The weights are given to make the students within each state representative of that state.=20 =20 Comments on my modeling of the situation in "proc glimmix" above would be very welcome. Specifically: 4a. Is this the right way to get separate coeffiecnts by state? 4b. Do I need to include "sdtype CLtype sevetype" in the MODEL statement? I want separate estimates by state so I left out the fixed effects portion. 4c. Do I need to inclued "Intercept" in the RANDOM statement? I left it out b/c it caused the G matrix to not have full rank. The estimated coefficients come out the same. =20 =20 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ =20 =20 Thanks for reading and thanks for all the help people give on this list... the archives have always been a great resource for me. =20 Sami=20 =20 =20
Hi All, Does anyone know if it is possible to calculate predicted values AND their confidence intervals for an individual with known values of the explanatory variables, using proc glm? In proc genmod it is possible to do this as follows: 1) Extend the dataset by one or more row(s) corresponding to the values that you want to predict. 2) Create a weight variable, such that the original data are given a weight of 1, and the new variable a weight of zero, so that only the original data is used in the model fitting calculations. When predicted values are asked for, proc genmod gives predicted values for the original data and the new lines. I tried this in proc glm, but the values for the confidence interval of the predicted values of my new lines are just listed as missing, with a note that 'observation was not used in this analysis'. I have created some dummy data: - the last two lines of the data are the ones I want to predict values and confidence interval of the y variable for, but not to take into account when fitting the model. data MadeUp; input y x1$ x2$ weight; datalines; 10 y n 1 15 n y 1 12 y n 1 10 y y 1 11 n y 0 12 y n 0 ; run; proc glm data=MadeUp; class x1 x2; weight weight; model y = x1 x2 / cli; run; To put this into context, I have data on fruit yield (y) with several factors that predict growth (x1 - x4). Each of the x1-x4 have a cost associated with them, and I will go on to calculate the profit and CI for profit based on predicted yield and the cost of having each factor (i.e. i.e. calculate the economic optimum). I'm running SAS 8.2 on Windows XP. Any help would be much appreciated. Thanks, Mark
Referencing much-appreciated prior posts by Dale McLerran, I fit a zero inflated Poisson model using: proc nlmixed data=mmf.model; by groupvar; ETA_PROB = BP_0 + BP_1*z1; p_0 = exp(eta_prob)/(1 + exp(eta_prob)); ETA_LAMBDA = B0+ B1*x1+B2*x2+B3*x3+ B4*x4+B5*x5+B6*x6+B7*x7; lambda = exp(eta_lambda); if y=0 then prob = p_0 + (1-p_0)*exp(-lambda); if y=0 then loglike = log(prob); else loglike = log(1-p_0) + y*log(lambda) -lambda - lgamma(y+1); model y ~ general(loglike); run; and now I need to plot the predicted values of y versus the observed values of y. Can these predicted values of y be output somehow from PROC NLMIXED (as a colleague, who is not available for questioning, indicated)? I can't seem to find a way. Otherwise, I think that what I need to do is compute for each observation "p_0_hat" and "lambda_hat" (so, predicted values of p_0 and lambda, respectively) and then compute y_hat = (1- p_0_hat)*lambda_hat. Is that right? Thank you!