Hi;

could someone suggest how to calculate a random variable with 0

correlation to a series of know observations?

I have a dataset that consists of stock price returns for one stock on

a dly basis. Total number of observations is 200. I need to create a

random variable with 200 observations that has a correlation of .2

with this dataset

could someone suggest how to calculate a random variable with 0

correlation to a series of know observations?

I have a dataset that consists of stock price returns for one stock on

a dly basis. Total number of observations is 200. I need to create a

random variable with 200 observations that has a correlation of .2

with this dataset

You ask for a random variable that has zero correlation with another

RV? That's easy, just use any random number generator, your new

variable is uncorrelated with the old.

You ask for a random variable that has 0.2 correlation with another

RV? The method is here: http://www.uvm.edu/ ~dhowell/StatPages/More_Stuff/Gener_Correl_Numbers.html

Paige Miller

paige\dot\miller \at\ kodak\dot\com

Paige,

the population correlation may be 0 but the sample correlation isn't.

I'm looking to create a second random variable with sample correlation

of 0, once that's done getting the sample correlation to equal .2 is

relatively straightforward. If you use the rand() function in excel

and create two random variables you quickly see what I'm talking

about. Thanks.

I am getting a little over my head, although theoretically random x

and y are uncorrelated/independent and I expect the corr-->0 as sample

size increases, the sample correlation is unlikely to be 0. I think

you can create a second variable with correlation zero by making it a

special othogonal vector to the first? I think the dot product has to

be zero and something else (sum of components=1? x+y=1 I am not

sure)

Here is an example of two vectors with zero correlation

data z;

x=1;

y=0;

output;

x=0;

y=1;

output;

x=0;

y=0;

output;

x=1;

y=1;

output;

run;

proc print;

run;

Obs X Y

1 1 0

2 0 1

3 0 0

4 1 1

proc corr data=z;

var x;

with y;

run;

Pearson Correlation Coefficients, N = 4

Prob > |r| under H0: Rho=0

X

Y 0.00000

1.0000

Okay, so now you go from talking about random variables to samples.

Different animal. You simply need to take your first vector of

observations, and create a vector orthogonal to it. Simple geometry.

Since I never use the rand() function in Excel (and in fact, never use

Excel for anything statistical), I do not "quickly see what you are

talking about". Perhaps you could explain what you are talking about

here without referring to Excel.

Paige Miller

paige\dot\miller \at\ kodak\dot\com

Paige,

It's just the difference between a population statistic and a sample

statistic. Use any statistical package and create two random

variables with n observations. When you measure the correlation over

the sample it won't be 0. As n increases the correlation will get

closer and closer to 0. The distribution of sample correlations

should have a mean of 0 but any one of them will be different.

Hi,

Here is some code based on a previous post

It produces Y with a ~.5 correlation with X.

data p5;

seed1 = 373765061;

seed2 = 535327321;

r=0.50;

r2=r*r;

do i=1 to 1000000;

x = rannor(seed1);

y = rannor(seed2);

y = x*r + y*sqrt(1-r2);

keep x y;

output;

end;

run;

proc corr data=p5;

var x;

with y;

run;

Pearson Correlation Coefficients, N = 1000000

Prob > |r| under H0: Rho=0

X

Y 0.49864

<.0001

Just thinking out loud. Say your known variable is X, with n

observations. Create a random variable Y for the first n-1

observations of X. Then calculate the n-th value of Y so that r is

whatever you want it to be. To do that, invert the r formula. r is a

function of x, y and their means. If you know all values but y(n),

then inverting the formula should be easy and you would end up with

the exact value of r you want.

observations. Create a random variable Y for the first n-1

observations of X. Then calculate the n-th value of Y so that r is

whatever you want it to be. To do that, invert the r formula. r is a

function of x, y and their means. If you know all values but y(n),

then inverting the formula should be easy and you would end up with

the exact value of r you want.

Similar Threads

1. Correlation Coefficient of 2 Random Variables

2. Probit dose-response model with second, random main effect

Can anyone suggest a way in SAS to fit a probit dose-response regression model with a second main effect, e.g., probit(response) = f(dose, lab, error) where lab is a random effect? Proc probit will allow the second main effect, but not, as far as I can see, as a random effect. There are so many procs (MIXED, NLMIXED, GLM...) I'm not sure where to look. Thanks in advance for any suggestions. John Uebersax

3. correlation coefficient in a random effects model

4. Creating a second dimension

Hi there, I have a small challenge to solve, which I can't do on my on with my current knowledge of SAS. I'm sure it souldn't be too difficult for a professional. Imagine the following table: colA ------ A B C What I want to reach is the following: colA colB ------- ------- A A A B A C B A B B B C C A C B C C I hope you get the idea. I'd be very thankful for any help. Thanks! Joshua

5. Correlation between a binary variable and a categorical variable that is on an ordinal scale

6. how to create correlation matrix in SAS

hello: i have a dataset containing 20 subjects and thousands of variables. I wonder what SAS procedure I can use to calculate the correlation matrix of these variables? these variables are continuous and I may want to collapse them into binary ones. Is it possible to calcualte the correlation matrix(chisq p value) for lots of binary variables? thanks

7. How to identify first and second occurence of a variable and

8. PROC UNIVARIATE Second Class Variable in SAS 9.1.3

Hi Folks. Perhaps you've seen this one before. I just reported it to SAS. I have found a problem with the second class variable in SAS 9.1.3 on UNIX and Windows. When I create an output data set with two class variables, where the second class variable is of character type, the values of the second class variable may get corrupted. Consider the following: data c2t1 ; length c1 $1. c2 $8. ; input c1 $ c2 $ n1 ; cards ; a enormous 100 a tiny 10 a enormous 101 a tiny 11 a enormous 102 a tiny 12 ; run ; proc univariate data = c2t1 noprint ; class c1 c2 ; var n1 ; output out = c2t1u1 sum = sum_n1 n = class_n ; run ; local print WORK.C2T1U1 Obs c1 c2 class_n sum_n1 1 a enormous 3 303 2 a tinymous 3 33 Note the incorrect value of c2 in the second observation of c2t1u1, the output from the univariate. This does not happen with the first class variable. proc univariate data = c2t1 noprint ; class c2 ; var n1 ; output out = c2t1u2 sum = sum_n1 n = class_n ; run ; local print WORK.C2T1U2 Obs c2 class_n sum_n1 1 enormous 3 303 2 tiny 3 33 -- TMK -- "The Macro Klutz"