sas >> how to generate multivariate random data from a given

by statsjeff » Thu, 31 Jan 2008 23:43:20 GMT


I want to generate multivariate random data from a given distribution, which
is not multivariate normal or student's t.
especially, the idea is from the paper by Clayton el al(1985): Journal of
royal statistical society, ser A.
Does anybody have some suggestion? thanks


Similar Threads

1. Generating Multivariate Exponential Random Vector

2. how to generate correlated random variables with given

3. how to generate random data from uniform distribution?

for instance, randomly select one value from (1,2,3,4). i know ranuni is
random sampling from U(0,1)

4. Generating Random Data

5. Generating random dummy data


I need to create random, dummy data in the following scenarios.  Where I 
believe I have a solution, I'll post it, but am open to improvements.

1.  Uniform distribution of contiguous numeric values (eg. 1,2,3,4,5)
var = int(ranuni(0)*5)+1;  where 5 = number of elements, 1 = starting value

data _null_;
   do i=1 to 20;
      var = int(ranuni(0)*5)+1;
      put var=;

2.  Uniform distribution of non-contiguous numeric values (eg. 9,3,7,4,5)
array list{5} _temporary_ (9,3,7,4,5);
var = list{int(ranuni(0)*dim(list))+1};

data _null_;
   array list{5} _temporary_ (9,3,7,4,5);
   do i=1 to 20;
      var = list{int(ranuni(0)*dim(list))+1};
      put var=;

Note:  this approach is generic, and could also be used for #1.

3.  Uniform distribution of either contiguous or non-contiguous character 

Use similar approach to #2

data _null_;
   array list{5} $ _temporary_ ("S","C","O","T","B");
   do i=1 to 20;
      var = list{int(ranuni(0)*dim(list))+1};
      put var=;

4.  Non-uniform distribution of boolean values (desire a weighting toward 
one or the other)

var = (ranuni(0) > .8);  where we desire 80% of hits to be 0, 20% to be 1
                         this assumes 0 based numbering, use an offset if 
using 1 (or n) based numbering

reverse the comparison if we want 80% of hits to be 1, i.e.

var = (ranuni(0) < .8);

data _null_;
   do i=1 to 20;
      var = (ranuni(0) > .8);
      put var=;


Here is where I'm stuck for ideas...

5.  Non-uniform distribution of multiple values, weighting desired for one 
item, remaining percentage spread amongst other values.

eg. (1,2,3,4,5), desire 80% of hits on 4, 20% of hits spread between 1,2,3,5

Perhaps this is a good approach???

data _null_;
   array list{4} _temporary_ (1,2,4,5);
   do i=1 to 40;
      if (ranuni(0) < .8) then
         var = 3;
         var = list{int(ranuni(0)*dim(list))+1};
      put var=;

6.  Non-uniform distribution of multiple values, weighting desired for 
multiple items, remaining percentage spread amongst other values.

eg. (1,2,3,4,5), desire 30% of hits on 2, 20% of hits on 4, rest of hits 
spread between 1,3,5

I'm stuck on the best approach on this one.  Any good ideas?

The solutions needs to run within a data step, as this algorithm would be 
part of a larger data step.

Any input, esp. on #5 and #6, is appreciated.


P.S.:  The final solution would be a macro that would get the values of a 
format and create this code.  For example (psuedocode and untested):

proc format;
   value code (NOTSORTED)
      9 = "Code 1"
      3 = "Code 2"
      7 = "Code 3"
      4 = "Code 4"
      5 = "Code 5"

data testdata;
   attrib code1 length=8 format=code.;
   attrib code2 length=8 format=code.;
   do pt=10011001 to 10011010;
      code1=%dummy_data(code);  * uniform distribution across uncoded format 
values ;
      code2=%dummy_data(code,wval=7,wpct=.8);  * 80% of values are 7, 
remainder are spread across rest of values ;

This would likely involve creating a proc format cntlout dataset, using 
%sysfunc to open that dataset, build some macro variables, and generate the 
appropriate SAS code.  Of course, the generated SAS code must be 
syntactically correct for a data step (i.e. cannot invoke a procedure, etc). 

6. Help Generating Multivariate Normal Observations from specified means, vars, and correlation matrix

7. multivariate random effect using proc mixed

Dear All,

  I have a dataset with an hierarchy structure individuals nested within countries, I have four continuous outcomes Y1,Y2,Y3,Y4, the correlation between these variables varies from 0.15 to 0.56.

  I have two continuous and 4 categorical independent variables.

  I first used an univariate random effect model, i.e. separate mixed mode for each Yi

     Only 1 of my 4 outcomes , Y2, had a country effect, with 5% of the variance of Y2 was explained by the country effect, all three others have no significant country effect as judged by the p value of the random effect solution.

      I went to use proc mixed in a multivariate framework by analysing the 4 outcomes (Y1,Y2,Y3,Y4) in one single model,  I have used the  code kindly provided by our friend Dale.

      I run out of memory in the full multivariate model, then I have considered 6 separate bivariate random effect models (Y1,Y2) , (Y1,Y3).(Y3,Y4)  to take into account the bivariate correlation of the outcomes.

      I have noticed some differences in the fixed part between different bivariate models , and with the univariate model (when I consider proc mixed for each outcome separately)

   My question is  how I will interpret the results of these 6 models and the separate univariate mixed model, and what will be the best way to take into account the correlation of the outcomes and the hierarchy of my design.

  Thanks a lot for all your comments.


 Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail

8. Multivariate Random Regression using Proc Mixed ?