公告

[公告]
2014/01/17
由於已經是faculty的關係,不太有足夠時間寫部落格。因此更新的速度會相當緩慢。再加上近幾年來SAS GLOBAL FORUM沒有出現讓我覺得驚艷的技術文件,所以能分享的文章相對也減少許多。若有人推薦值得分享的SAS技術文件,請利用『問題討論區』告知。

2013/07/19
臉書留言板的功能因為有不明原因故障,因此特此移除。而intensedebate的留言板因管理不易,也一併移除。目前已經開啟內建的 G+ 留言系統,所以請有需要留言的朋友,可直接至『問題討論區』裡面留言。


2007年10月3日 星期三

A Macro for Importing Multiple Excel Worksheets into SAS® Data Sets

[註]本文在 2007/12/30 做過修正,主要是調整程式碼混亂問題。

原文載點:http://www2.sas.com/proceedings/sugi30/040-30.pdf

由於 SAS 並沒有提供比較方便的介面讓使用者輸入資料,因此很多人會利用別的軟體將資料輸入後再轉成 SAS 資料檔。其中最常被用來當作資料輸入環境介面的軟體當屬 Microsoft Excel。當資料完整地輸入到 Excel 並存成 .xls 檔時,可用 infile statement 或工具列上的 import 選項來一步一步地將資料轉成 SAS 資料檔。可是如果一個 Excel 資料檔裡面含有大量頁面,則上述的導入動作將會變成相當繁瑣。Helen San 和 Cindy Wong 在 SUGI 30 發表了一篇技術文件來解決這個問題。

San 和 Wong 提供了一個 macro 程式來完成大量轉換 EXCEL 資料檔頁面的動作,程式如下所示:

%xl2sas(Path=,
File=,
StartRow=,
StartCol=,
EndRow=,
EndCol=,
GetVarName=);


在眾多參數中,只有 Path 和 File 是一定要強制宣告的,其餘都是選擇性的,視情況來使用。每個參數的詳細說明如下:
  • Path:放置檔案的路徑
  • File:Excel 檔名
  • StartRow:從第幾行開始讀檔
  • StartCol:從第幾列開始讀檔
  • EndRow:到第幾行結束讀檔
  • EndCol:到第幾列結束讀檔
  • GetVarName:變數放在哪一行(預設是1,因為通常變數名稱是放在第一行)
如果 EndRow 和 EndCol 沒有特別宣告,程式會自動讀到最後一行(列)有資料為止。如果你只是要讀到某個特地的行或列,就必須要在這兩個參數上面進行宣告。

範例:
%xl2sas(path=S:\Probe3\SAS\EXCEL\,file=log.xls);


上述的範例只是最簡單的告訴 %xl2sas 要轉換的路徑和資料在哪裡,其餘的參數都由程式自行判斷,因此相當適合每個頁面含有不同大小的樣本和變數數量的情況。

原始的 macro 程式內碼如下:

%macro xl2sas(Path=,
File=,
StartRow=,
StartCol=,
EndRow=,
EndCol=,
GetVarName=);
%*launch Excel;
options noxsync noxwait noxmin;
%local excelstarted;
filename excelchk dde 'excel|system' command;
%let excelstarted = %sysfunc(fopen(excelchk,S));
data _null_;
length fid rc start stop time 8;
fid=&excelstarted;
if fid le 0 then do;
rc=system('start excel');
start=datetime();
stop=start+10;
do while (fid le 0);
fid=fopen('excelchk','s');
time=datetime();
if (time ge stop) then fid=1;
end;
end;
run;
%let rc = %sysfunc(fclose(&excelstarted));
filename excelchk clear;

%*open the excel file and insert macro sheet;
filename xl2sas dde 'excel|system' lrecl=200;
data _null_;
length string $200;
file xl2sas;
string='[open("'||"&path.&file."||'",0,false)]';
put string;
put '[workbook.next()]';
put '[workbook.insert(3)]';
string='[workbook.move("Macro1","'||"&file"||'",1)]';
put string;
run;
filename xlmacro dde "excel|macro1!r1c1:r1000c1" notab lrecl=200;

%*initialize local macro variables;
%local nsheets sh wn i sheet xlrows xlcols EndNewRow EndNewCol
MaxCol VarWds ind rowi row1 row2 StartRowNew VarNames VarNum;

%*get number of sheets and put in &nsheets;
%let nsheets=;
data _null_;
file xlmacro;
put '=select("r1c2:r1c2")';
put '=set.name("nsheet",selection())';
put '=set.value(nsheet,get.workbook(4))';
put '=halt(true)';
put '!dde_flush';
file xl2sas;
put '[run("macro1!r1c1")]';
put '[error(false)]';
run;
filename nsheets dde "excel|macro1!r1c2:r1c2" notab lrecl=200;
data _null_;
length nsheets 8;
infile nsheets;
input nsheets;
call symput('nsheets',trim(left(put(nsheets,2.))));
run;
%let nsheets=%eval(&nsheets-1);
filename nsheets clear;

%*clear macro sheet;
data _null_;
file xl2sas;
put '[workbook.activate("macro1")]';
put '[select("r1c1:r1000c2")]';
put '[clear(1)]';
put '[select("r1c1")]';
run;

%*get names of sheets and put in &SheetNames;
data _null_;
length MacCmd $200;
file xlmacro;
%do sh=1 %to &nsheets; /*choose one sheet at a time*/
MacCmd="=select(!$b$&sh,!$b$&sh)";
put MacCmd;
put '=set.name("cell",selection())';
%do wn=1 %to &sh;
put '=workbook.next()'; /*move to current sheet*/
%end;
put '=set.value(cell,get.workbook(3))';/*get name of current sheet */
put '=workbook.activate("Macro1")';
%end;
put '=halt(true)';
put '!dde_flush';
file xl2sas;
put '[run("macro1!r1c1")]';
put '[error(false)]';
run;

filename sheets dde "excel|macro1!r1c2:r&nsheets.c2" lrecl=200;
data _sheet_names;
length junk SheetName $256;
infile sheets delimiter=']';
input junk SheetName;
drop junk;
run;

filename sheets clear;
data _null_;
file xl2sas;
put '[workbook.activate("macro1")]';
put '[select("r1c1:r1000c2")]';
put '[clear(1)]';
put '[select("r1c1")]';
run;

proc sql noprint;
select SheetName into: SheetNames separated by '@'
from _sheet_names;
quit;
proc datasets nolist;
delete _sheet_names;
run;

%do i=1 %to &nsheets; /*for every sheet*/
%let sheet=%scan(&SheetNames,&i,@);
%*get # rows and # cols of sheet;
data _null_;
length ddestring $ 200;
file xlmacro;
put '=select("r1c2:r1c2")';
put '=set.name("rows",selection())';
put '=select("r2c2:r2c2")';
put '=set.name("cols",selection())';
ddestring='=workbook.activate("'||"&sheet"||'")';
put ddestring;
put '=set.value(rows,get.document(10))'; /*last row number of sheet*/
put '=set.value(cols,get.document(12))'; /*last column number of sheet*/
put '=workbook.activate("macro1")';
put '=halt(true)';
put '!dde_flush';
file xl2sas;
put '[run("macro1!r1c1")]';
put '[error(false)]';
run;

filename xlrows dde "excel|macro1!r1c2:r1c2" lrecl=200;
filename xlcols dde "excel|macro1!r2c2:r2c2" lrecl=200;
%let xlrows=0;
data _null_;
length xlrows 8;
infile xlrows;
input xlrows;
call symput('xlrows',trim(left(put(xlrows,5.))));
run;
%if &xlrows=0 %then %goto exit; /*if this sheet is empty, no more processing*/
%let xlcols=1;
data _null_;
length xlcols 8;
infile xlcols;
input xlcols;
call symput('xlcols',trim(left(put(xlcols,5.))));
run;
filename xlrows clear;
filename xlcols clear;

%if &EndRow= %then %let EndNewRow=&xlrows; %else %let EndNewRow=&EndRow;
%if &EndCol= %then %let EndNewCol=&xlcols; %else %let EndNewCol=&EndCol;
%put EndNewRow=&EndNewRow EndNewCol=&EndNewCol;
%*clear macro sheet;
data _null_;
file xl2sas;
put '[workbook.activate("macro1")]';
put '[select("r1c1:r1000c2")]';
put '[clear(1)]';
put '[select("r1c1")]';
run;

%*get width of each column and put in &xlcols+1 column in &sheet;
%*since dates in Excel can take on length from 5 to 10, extend length to 11
for dd-mmm-yyyy format;
%let MaxCol=%eval(&xlcols+1);
%let VarWds=; /*width of columns*/

%do ind=&StartCol %to &EndNewCol;
%let rowi=%eval(&StartRow+&ind);
data _null_;
file xlmacro;
length ddestring $200 c1 c2 $1;
/*switch to Excel Application as front end*/
put '=app.maximize()';
ddestring ='=app.activate("Microsoft Excel - '||"&file"||'")';
put ddestring;
ddestring ='=workbook.activate("'||"&sheet"||'")';
put ddestring;
chars='ABCDEFGHIJKLMNOPQRSTUVWXYZ';
if floor(&ind/27) then c1=substr(chars,floor(&ind/27),1);
else c1=' ';
if mod(&ind,26) then c2=substr(chars,mod(&ind,26),1);
else c2='Z';
ddestring ='=select("r'||trim(left("&rowi"))||'c'||"&MaxCol"||'")';
put ddestring;
ddestring='=send.keys("=max{(}len{(}'||trim(left(c1))||trim(left(c2))
||"&StartRow.:" ||trim(left(c1))||trim(left(c2))||"&xlrows"||'{)}{)}")';
put ddestring;
put '=send.keys("^+{return}")';
put '=halt(true)';
put '!dde_flush';
file xl2sas;
put '[run("macro1!r1c1")]';
run;
%end;

%*clear macro1 sheet;
data _null_;
file xl2sas;
put '[workbook.activate("macro1")]';
put '[select("r1c1:r1000c2")]';
put '[clear(1)]';
put '[select("r1c1")]';
run;
%let row1=%eval(&StartRow+&StartCol);
%let row2=%eval(&StartRow+&EndNewCol);
filename xlmax dde "excel|&sheet!r&row1.c&MaxCol:r&row2.c&MaxCol" lrecl=200;
data _xlmax_;
length xlmax 8 ;
infile xlmax;
input xlmax;
if xlmax=0 then xlmax=1;
else if xlmax<=10 then xlmax=11; run; filename xlmax clear; %*put widths of columns to &VarWds; proc sql noprint; select xlmax into: VarWds separated by ' ' from _xlmax_; quit; proc datasets nolist; delete _xlmax_; run; %if &VarWds= %then %goto exit; %*put all variable names to &VarNames; %if &GetVarName %then %do; %* get names and numbers from start row; data _null_; length ddestring $100; ddestring="'excel|&sheet!r&StartRow.c&StartCol.:r&StartRow.c&EndNewCol'"; call symput('ddestring',ddestring); run; filename GetName dde &ddestring notab lrecl=2000; data _VarName_; length TempVar $200; infile GetName dlm='09'x ; input TempVar @@; %*remove special characters, only keep A-Z, a-z and 0-9; len=length(trim(left(TempVar))); length var $32;/*truncate to 32 if longer*/ do i=1 to len; if not verify(substr(trim(left(TempVar)),i,1), ' _0123456789abcdedfghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ') then var=trim(var)||substr(trim(left(TempVar)),i,1); end; %*if the compressed variable name is empty, then name it as Col#; if var=' ' then var='Col'||compress(put(_n_,best3.)); %*if the compressed variable name begins with a number, then add underscore before it; if indexc('0123456789',substr(trim(left(var)),1,1)) then var='_'||trim(left(var)); drop len i TempVar; run; %*check for duplicate names, if duplicate name, rename as Col#; proc transpose data=_VarName_ out=_VarNamet; var var; run; proc sql noprint; select count(*) into: VarNum from _VarName_; quit; data _VarNamet; set _VarNamet; %do ind=1 %to &VarNum-1; %do indj=&ind+1 %to &VarNum; if col&ind=col&indj then col&indj="Col&indj"; %end; %end; run; proc transpose data=_VarNamet out=_VarName; var %do ind=1 %to &VarNum; Col&ind %end;; run; %*put all variable names to &VarNames; proc sql noprint; select var into: VarNames separated by ' ' from _VarName; quit; filename GetName clear; proc datasets nolist; delete _VarName_ _VarNamet _VarName; run; %let StartRowNew=%eval(&StartRow+1); %end; %else %do; /*simply name variables as Col1, Col2, etc*/ %let VarNames=; %let VarNum=%eval(&EndNewCol-&StartCol+1); %do ind=1 %to &VarNum; %let VarNames=&VarNames Col&ind; %end; %let StartRowNew=&StartRow; %end; %if &VarNum=0 %then %goto exit; %*if length of variable is 1 (the column is empty) and variable name is Col#, then put it to &DelCols to be deleted; %let DelCols=; %do ind=1 %to &VarNum; %if %scan(&VarWds,&ind)=1 and %index(%scan(&VarNames,&ind),Col)=1 %then %do; %let DelCols=&DelCols %scan(&VarNames,&ind); %end; %end; %*input data to SAS; data _null_; length ddestring $100; ddestring="'excel|&sheet!r&StartRowNew.c&StartCol.:r&EndNewRow.c&EndNewCol'"; call symput('ddestring',ddestring); run; filename indata dde &ddestring notab lrecl=2000; data sheet&i; length SheetName $30; SheetName="&sheet"; infile indata notab dsd dlm='09'x missover; length %do ind=1 %to &VarNum; %scan(&VarNames,&ind) $%scan(&VarWds,&ind) %end;; input &VarNames; %if &DelCols ne %then drop &DelCols;; %*delete observation if all variables are empty; if %do ind=1 %to &VarNum-1; %scan(&VarNames,&ind)=' ' and %end; %scan(&VarNames,&VarNum)=' ' then delete; run; filename indata clear; %exit:; %end; /*end of %do i=1 %to &nsheets "for every sheet"*/

文內有詳盡的 macro 內部架構介紹,有興趣的人可以參考原文。一般使用者是要懂得如何使用這個 macro 程式即可。不過兩人也提出這個程式的一些使用限制:
  1. Excel 頁面文字不能包含一些特殊字元,如冒號或引號等,所以請儘量使用單純的文字就好。
  2. 所有資料經過轉換後會變成文字格式,所以數值變數和日期變數需要再做一次轉換。
  3. 轉出來的 SAS 資料檔名一律是 sheet1, sheet2, sheet3, ... etc。這有個好處就是比較容易將這些轉好的資料再 merge 成一個資料檔。如果想要用別的名稱的話,就必須要自行重新命名。
CONTACT INFORMATION
Helen Sun
SAS Programmer
hsun@robarts.ca
Robarts Clinical Trials
Robarts Research Institute
P.O. Box 5015, 100 Perth Drive
London, Ontario, Canada
N6A 5K8
Tel: 519-663-3400
Fax: 519-663-3807

沒有留言:

張貼留言

要問問題的人請在文章下方的intensedebate欄位留言,請勿使用blogger預設的意見表單。今後用blogger意見表單留言的人我就不回應了。

CODE { display: block; /* fixes a strange ie margin bug */ font-family: Courier New; font-size: 8pt; overflow:auto; background: #f0f0f0 url(http://klcintw.images.googlepages.com/Code_BG.gif) left top repeat-y; border: 1px solid #ccc; padding: 10px 10px 10px 21px; max-height:200px; height:200px; // for IE6 line-height: 1.2em; }