MATLAB >> k fold cross validation algorithm

by Tony M » Tue, 05 May 2009 14:22:01 GMT

Hello,
Here I have one more question about 10fold cross validation, Im not sure about the following sequence of actions:

generate indices using crossvalind:
for k=1 to 10
select trainset and testset
make a new net using newff
train the net(new net everytime??)
calculate accuracy (from confusion matrix construction) using testset.
construct ROC parameters
end
average accuracy over all 10 iterations


For accuracy, I can get average over 10 runs, but what should I do with ROC ?
can that also be averaged?
Is this psuedocode-like sequence correct? I appreciate your respected comments.

MATLAB >> k fold cross validation algorithm

by Greg Heath » Tue, 05 May 2009 18:28:00 GMT



You will need a validation set if you are using Early
Stopping and/or need to make unbiased estimates
of hyperparameters (e.g., learning rate, momentum
coeff, classification threshold, etc). Use the partition
trn/val/tst : 8/1/1. Make sure that each val-tst pair
is unique i.e., don't use subsets p and q for val and
test and then use them again for test and val. Try
the indexing (10,1),(1,2),...(9,10).


Yes! A brand new random weight initialization every time.

If you use the weights from fold j-1 to intialize the
net at fold j you violate the independent unbiased
holdout status of the test set: it was part of the design
(training or validation) set at all previous folds.


Confusion matrix implies a classifier.
Are there only two classes?
Are you using one output with logsig output activation
and unipolar binary {0,1} targets?
Are you using a threshold of T = 0.5 to construct the
confusion matrix?

If so, I would postpone this step because one of the purposes
of the ROC analysis is to
1. Use the validation set to find the optimal value of T to
optimize some objective (e.g., equal error rate, fixed error
rate for false positives, or minimum cost weighted risk).
2. Use that value of T and the test set to construct the confusion
matrix

There will be a different value of T and corresponding test set
confusion matrix for each fold.

It is wise to keep a complete record of every fold just in case
you want to modify your criterion for choosing T:

For each input keep track of
1. Fold
2. Class membership: "Positive" or "Negative"
3. Subset membership: "Training", "Validation" or "Test"
4. Output value (0 < y < 1 for logsig).


For training, validation and test sets
Sort y for each class
Obtain CDF vs y for positives
Obtain 1-CDF vs y for negatives
Combine to obtain ROC coordinates

Determine T from validation ROC
Obtain confusion matrix from test ROC.

Interesting to demonstrate the importance of
independent holdout validation and testing
by comparing the values at the optimum points
on the three ROC curves.


From adding the 10 test set confusion matrices


Since you have a complete record of every fold,
you can obtain 2 ROCs
1. Training: For each class, sort all of the training
set outputs
2. Nontraining (validation + test): For each class,
sort the mixture of validation and test set outputs.

Hope this helps.

Greg

Similar Threads

1. Protein Folding Prediction with Genetic Algorithms

Hi,
This is my question on this forum.
I have not used MATLAB . As i am going to do my Master (MSCS) Final thesis on 3D protein Folding Structure using Genetic Algorithm (GA) based on the lattice model.
For this i have to implement the proposed algorithm in MATLAB, so i want to know that whether it is possible to implement the Algorithm in MATLAB.

Thanks in advance.

2. Early stopping in cross Validation - MATLAB

3. how to do cross validation

Dear all,

I've read the documents about the cross validation, but I 
still dont know what to do with my data.

My data is quite simple as follows:
freq  lane
23      1
22      1
.       .
.       .
30      2
.       .
.       .
40      3
.       .
.       .
45      4

I have 4 groups, each having distinct number of 
datapoints, could be the same, but it doesnt matter, 
anyway. And I want to do the leave-one-out cross 
validation and carry out the canonical correlation 
analysis in each iteration.
But I still have out figured out how and what I should 
input the parameter required.

I need some help.
Thanks a lot.

Liaem

4. cross-validation - MATLAB

5. cross validation for an MLP model

hello,

I created an MLP model for time serie prediction using the function "newff", I want to know if it is obligatory to do a cross validation for this network or only a test (simulation) is sufficient ? 
then, if I must do a cross validation, is there a matlab function to do it? how should I proceed? 
I m new in using matlab and neural nets I need your help.
your answer will be appreciated, thank you in advance.

6. problem using neural networks with cross validation - MATLAB

7. Cross-Validation classifier Matlab

Hi!

I am doing some classifications of images that I am working with. The
criteria in which I am basing my classification gives me a ratio
number which I compare to other images. And so on for all images in
the end I get different ratios which I place into a coordinate
system. ratios for the first class are grouped in one part whereas
ratios for the other class are a bit further.
For the cases in the vicinity of class boundary I thought that I
might use cross-validation analyzing distance between the nearest
neighboring points.

Does anyone of you have done something similar. If Yes could you tell
me how to do it.

regards
ekrem

8. cross validation in matlab - MATLAB