comp.soft-sys.sas - The SAS statistics package.
On NHANES data. SAS-9.1.3 sp4 Appreciate it.
Hi, I normally use SAS on datasets with only a 1,000-10,000 subjects/lines of data. When I run SAS on such datasets, the calculations usually complete within seconds, or at most minutes. Now I am working on a dataset that has ~500,000 entries. It is stored within an MS-Access database. When I tried to simply read in the relevant table from the Access db, it took over 10 hours in real time! Then I tried a simple PROC MEANS, and that has been running for over two hours, and is still not complete! Is this performance typical with large datasets? If not, how can one improve the situation (besides getting a faster computer, more memory, etc)? What are your experiences? Thanks, Howard Alper
Hey Bob, Have you turned on dataset compression? I have found that datasets with large numbers of repeating elements are actually accessed quicker with compression turned on in addition to (and because of the ) the space savings... Of course, certain datasets with high number of distinct values and very little text data can actually be larger and slower when compressed... Regards, Stephen On Wed, Apr 2, 2008 at 1:02 PM, < XXXX@XXXXX.COM > wrote: > We have some Phase 3 trials, with 1000 or more patients in each trial, and > durations of a year or more. This means the volume of lab data will be > (already is) quite large. I have already split the overall lab dataset > into chemistry, hematology, urinalysis, etc. and added indexing to > retrieve specific lab tests, but is there anything else that I should be > doing? Is hashing applicable here? > > Thanks, > > Bob Abelson > HGSI > 240 314 4400 x1374 > XXXX@XXXXX.COM >