sas >> proc import, csv file with comma in varaibles

by Mindy » Thu, 03 May 2007 08:03:04 GMT

Does anyone know how to imprt a csv file, which inlude comma as
variable values (such as firstname.lastname) to a sas file?

I try to use wizard to do this, but ger error messages. I don't think
dsd or dlm will take of this because from the wizard log, they use dsd
and dlm options.

Thanks.

Min



sas >> proc import, csv file with comma in varaibles

by davidlcassell » Thu, 03 May 2007 10:50:57 GMT





If it is a *real* CSV file, then character fields with commas embedded will
be surrounded by quotes to 'protect' the real comma. So I don't think
you have a real CSV file. You have a file with some separators.

I don't think you should use the wizard for this.

Could you show us a small sample from the data set, cut down so it would
not wrap when you paste it into an email?

HTH,
David
_________________________________________________________________
Need a break? Find your escape route with Live Search Maps.
http://maps.live.com/default.aspx?ss=Restaurants ~Hotels~Amusement%20Park&cp=33.832922~-117.915659&style=r&lvl=13&tilt=-90&dir=0&alt=-1000&scene=1118863&encType=1&FORM=MGAC01



Similar Threads

1. proc import with comma delimited files

Does anyone know how to  rig proc import so that if some variables are
missing in first several rows  it will not truncate the field  length?

2. proc import .csv file question!

3. Infile for csv file where char variable contains inbedded comma

In the following data step, variable VAR12 contains an inbedded comma in the
value and the variable is wrapped with 2 sets of double quotes.  How do I
modify the infile, informat, or input statement to read the value with the
comma and exclude the quote?

Desired output
VAR12 = 123 MAIN TREET, LOT 126 TR


DATA TEST;
  INFILE DATALINES MISSOVER DSD;

  INFORMAT
    VAR01 $20.
    VAR02 8.
    VAR03 MMDDYY8.
    VAR04 $20.
    VAR05 $8.
    VAR06 8.
    VAR07 8.
    VAR08 8.
    VAR09 MMDDYY8.
    VAR10 $35.
    VAR11 $4.
    VAR12 $100.
    VAR13 $10.
    VAR14 $28.
    VAR15 $2.
    VAR16 $10.
    VAR17 $25.
    VAR18 $1.
    VAR19 $5.
    VAR20 COMMA15.2
    VAR21 COMMA15.2
    VAR22 $10.
    VAR23 COMMA15.2
    VAR24 8.
    VAR25 $100.
    VAR26 $25. ;

  INPUT
    VAR01 $ VAR02 VAR03 VAR04 $ VAR05 $ VAR06 VAR07
    VAR08 VAR09 VAR10 $ VAR11 $ VAR12 $ VAR13 $ VAR14 $
    VAR15 $ VAR16 $ VAR17 $ VAR18 $ VAR19 $ VAR20 VAR21
    VAR22 $ VAR23 VAR24 VAR25 $ VAR26 $
    ;
datalines;
"ABCDEFGHIJKLM","1234","01/01/04","ABC123456789","Q12345","315.00","0","10","01/23/04","Completed","A123",""123
MAIN TREET, LOT 126
TR"","n/a","DALLAS","TX","11111","COUNTY","Y","O","30000.00","26000.00","","35000.00","16.67","No
Reason","ABC"
;
RUN;

4. proc import question for CSV file

5. FW: Infile for csv file where char variable contains inbedded comma

-----Original Message-----
From: Peter Crawford [mailto: XXXX@XXXXX.COM ]
Sent: 30 September 2004 20:48
To: 'Pudding Man'
Subject: RE: Infile for csv file where char variable contains inbedded comma


Hey "Man"

I'm sorry to find I want to criticize your data for not being in valid .CSV
format. That, I consider would require, not this line
"Completed","A123",""123 MAIN TREET, LOT 126 TR"","n/a"
   but either
              this
"Completed","A123","123 MAIN TREET, LOT 126 TR","n/a"
           or this
"Completed","A123","""123 MAIN TREET, LOT 126 TR""","n/a"

These two are the cards I used in this simple data step
55   data uu;
56     infile cards dsd truncover;
57     length var10 $35. var11 $4. var12 $100. var13 $10.  ;
58     input  var10 - var13   @1 card1 $char100.;
59     put (_all_)(/=);
60   cards;


var10=Completed
var11=A123
var12=123 MAIN TREET, LOT 126 TR
var13=n/a
card1="Completed","A123","123 MAIN TREET, LOT 126 TR","n/a"

var10=Completed
var11=A123
var12="123 MAIN TREET, LOT 126 TR"
var13=n/a
card1="Completed","A123","""123 MAIN TREET, LOT 126 TR""","n/a"
NOTE: The data set WORK.UU has 2 observations and 5 variables.


When I pass your original line in, I get this output in the log
var10=Completed var11=A123 var12=""123 MAIN TREET var13=LOT 126 TR
card1="Completed","A123",""123 MAIN TREET, LOT 126 TR"","n/a" This is
because quotes(") need to be balanced, and when a quote is to be treated as
data it is repeated. By starting with '""' your data cell is interpreted as
starting with '"' but having no embedded delimiters. So the first comma it
finds is expected to be a data delimiter

I really don't know how you expect to parse that data
          ,""123 MAIN TREET, LOT 126 TR"",

Have you tried it through the Microsoft import routine in excel or access ?

What sort of application created that data?

Peter

-----Original Message-----
From: Pudding Man [mailto: XXXX@XXXXX.COM ]
Sent: 30 September 2004 17:05
To: Peter Crawford
Cc:  XXXX@XXXXX.COM 
Subject: Re: Infile for csv file where char variable contains inbedded comma


On Wed, 29 Sep 2004 19:08:31 -0400, Peter Crawford
< XXXX@XXXXX.COM > wrote:
>
>DSD infile option
>is meant to deal with all this

But does it? Effectively? Perhaps a demo?

Plunking around with W2k V9, I find DSD behavior to
be both a benefit and a hindrance.

I had hoped to just read the entire intended VAR12
field and parse. The combination of contigious dq's
and embedded comma seem to confound DSD.

If I'm not mistaken (always possible), it _should_
be practical to read VAR12 as 2 comma-separated
fields, then concatenate and parse. When I tried
this, reading the second var ate the terminating
comma and subsequent field value. Nicht Gut!

The following is materially ugly, but seems to
work:

data uu(drop = _:);
  infile cards dsd column=_p1;
  length var10 $35. var11 $4. var12 $100. var13 $10.;
  input  var10 $ var11 $ @;
    _p2 = index(substr(_infile_, _p1 + 2), '""') + _p1 + 3;
    _len = _p2 - _p1;
  input @_p1 _var12x $varying100. _len +(-2) _dum $ var13 $;
  var12 = input(tranwrd(_var12x, '""', '"'), $quote100.);
  put _all_;
cards;
"Completed","A123",""123 MAIN TREET, LOT 126 TR"","n/a"
; run;

  Skoal,
  Puddin'

******************************************************
*** Puddin' Man      PuddingDotMan at GmailDotCom  ***
******************************************************;

"I'm sitting here wonderin'
 would a matchbox hold my clothes.
 I ain't got so many matches
 but I got so far to go."
    from "Matchbox Blues", Blind Lemon Jefferson, maybe 1927

>On Wed, 29 Sep 2004 15:47:28 -0500, Pudding Man < XXXX@XXXXX.COM >
>wrote:

>>I will studiously avoid all your code, most of your data.
>>
>>This shouldn't be necessary, but ...
>>
>>Consider use of TRANWRD for "" --> ", then $QUOTE to
>>remove remaining dq's. Following might suggest something
>>viable:
>>
>>data _null_;
>> input v12x :&$100.;
>> v12 = input(tranwrd(v12x, '""', '"'), $quote100.);
>> put _all_;
>>cards;
>>""123 MAIN TREET, LOT 126 TR""
>>; run;
>>
>>Hope it hep's ...
>>
>>  Puddin'
>>
>>******************************************************
>>*** Puddin' Man      PuddingDotMan at GmailDotCom  ***
>>******************************************************;
>>
>>"Now, I may -look- like I'm crazy,
>> but po' John do know right from wrong!"
>>    -from "Drop Down, Mama", Sleepy John Estes
>>
>>On 23 Sep 04 22:59:48 GMT,  XXXX@XXXXX.COM  (Chuck Enright)
>>wrote:
>>
>>>In the following data step, variable VAR12 contains an inbedded comma
>>>in
>the
>>>value and the variable is wrapped with 2 sets of double quotes.  How
>>>do I modify the infile, informat, or input statement to read the
>>>value with the comma and exclude the quote?
>>>
>>>Desired output
>>>VAR12 = 123 MAIN TREET, LOT 126 TR
>>>
>>>
>>>DATA TEST;
>>>  INFILE DATALINES MISSOVER DSD;
>>>
>>>  INFORMAT
>>>    VAR01 $20.
>>>    VAR02 8.
>>>    VAR03 MMDDYY8.
>>>    VAR04 $20.
>>>    VAR05 $8.
>>>    VAR06 8.
>>>    VAR07 8.
>>>    VAR08 8.
>>>    VAR09 MMDDYY8.
>>>    VAR10 $35.
>>>    VAR11 $4.
>>>    VAR12 $100.
>>>    VAR13 $10.
>>>    VAR14 $28.
>>>    VAR15 $2.
>>>    VAR16 $10.
>>>    VAR17 $25.
>>>    VAR18 $1.
>>>    VAR19 $5.
>>>    VAR20 COMMA15.2
>>>    VAR21 COMMA15.2
>>>    VAR22 $10.
>>>    VAR23 COMMA15.2
>>>    VAR24 8.
>>>    VAR25 $100.
>>>    VAR26 $25. ;
>>>
>>>  INPUT
>>>    VAR01 $ VAR02 VAR03 VAR04 $ VAR05 $ VAR06 VAR07
>>>    VAR08 VAR09 VAR10 $ VAR11 $ VAR12 $ VAR13 $ VAR14 $
>>>    VAR15 $ VAR16 $ VAR17 $ VAR18 $ VAR19 $ VAR20 VAR21
>>>    VAR22 $ VAR23 VAR24 VAR25 $ VAR26 $
>>>    ;
>>>datalines;
>>>"ABCDEFGHIJKLM","1234","01/01/04","ABC123456789","Q12345","315.00","0
>>>","10
>","01/23/04","Completed","A123",""123
>>>MAIN TREET, LOT 126
>>>TR"","n/a","DALLAS","TX","11111","COUNTY","Y","O","30000.00","26000.0
>>>0",""
>,"35000.00","16.67","No
>>>Reason","ABC"
>>>;
>>>RUN;

6. GUESSINGCOLUMNS: PROC IMPORT/IMPORT Wizard CSV/TAB/DLM

7. Problem with importing from a csv file

Hello again

I have a .csv file with two columns, named CORE and CLUSTER. Core is a character field, cluster is numeric.  The first few lines of CORE are 6 characters long e.g.

00614D
08185C
08394B
08545A

but later on, they are longer, e.g.,
BW27011a
BW28011a
BW29011a
BW31011a
EN20101A
EN20102A


(Yes, I know, that's a terrible way to write a variable, but it was due to it being input over a period of 25 years or so, from machines that required certain formats, which differed over time....)

I am importing it into SAS using

PROC IMPORT OUT= WORK.twoclust
            DATAFILE= "C:\Fivelead\Data\twoclust.csv"
            DBMS=CSV REPLACE;
     GETNAMES=YES;
     DATAROW=2;
RUN;


and this is creating two variable, appropriately CORE and CLUSTER, but CORE is formatted as $6.  This causes major problems, because the first 6 characters of CORE are not unique.

Two questions:
1) How can I fix this?
2) Isn't this a horrible bug?

Thanks in advance

Peter

PS I will also write to SAS tech support about this.


Peter L. Flom, PhD
Statistical Consultant
www DOT peterflom DOT com

8. Import CSV file, variable field numbers