Stata 12’s new import excel command will help you simply import real-world Excel recordsdata into Stata. Excel recordsdata usually include header and footer info within the first few and previous couple of rows of a sheet, and it’s possible you’ll not need that info loaded. Additionally, the column labels used within the sheet are invalid Stata variable names and due to this fact can’t be loaded. Each of those points may be simply solved utilizing import excel.
Let’s begin by taking a look at an Excel spreadsheet, metro_gdp.xls, that’s downloaded from the Bureau of Financial Evaluation web site.
As you may see, the primary 5 rows of the Excel file include an outline of the info, and rows 374 by means of 381 include footer notes. We don’t wish to load these rows into Stata. import excel has a cellrange() possibility that may assist us keep away from undesirable info being loaded.
With cellrange(), you specify the higher left cell and the decrease proper cell (utilizing normal Excel notation) of the world of knowledge you need loaded. Within the file metro_gdp.xls, we wish all the info from column A row 6 (higher left cell) to column L row 373 (decrease proper cell) loaded into Stata. To do that, we sort
. import excel metro_gdp.xls, cellrange(A6:L373) clear
In Stata, we open the Information Editor to examine the loaded information.
The primary row of the info we loaded contained column labels. Due to these labels, import excel loaded all the info as strings. import excel once more has a simple repair. We have to specify the firstrow possibility to inform import excel that the primary row of knowledge comprises the variable names.
. import excel metro_gdp.xls, cellrange(A6:L373) firstrow clear
We once more open the Information Editor to examine the info.
The info at the moment are within the appropriate format, however we’re lacking the yr column labels. Stata doesn’t settle for numeric variable names, so import excel has to make use of the Excel column title (C, D, …) for the variable names as an alternative of 2001, 2002, …. The straightforward resolution is to rename the column headers in Excel to one thing like y2001, y2002, and many others., earlier than loading. You too can use Stata to rename the column headers. import excel saves the values within the first row of knowledge as variable labels in order that the knowledge will not be misplaced. If we describe the info, we are going to see all of the column labels from the Excel file saved as variable labels.
. describe
Incorporates information
obs: 367
vars: 12
measurement: 37,067
-------------------------------------------------------------------------------
storage show worth
variable title sort format label variable label
-------------------------------------------------------------------------------
Fips str5 %9s Fips
Space str56 %56s Space
C lengthy %10.0g 2001
D lengthy %10.0g 2002
E lengthy %10.0g 2003
F lengthy %10.0g 2004
G lengthy %10.0g 2005
H lengthy %10.0g 2006
I lengthy %10.0g 2007
J lengthy %10.0g 2008
Ok lengthy %10.0g 2009
L lengthy %10.0g 2010
-------------------------------------------------------------------------------
Sorted by:
Word: dataset has modified since final saved
We wish to seize the variable label for every variable through the use of the prolonged macro operate :variable label varname, create a sound lowercase variable title from that label through the use of the strtoname() and decrease() features, and rename the variable to the brand new title through the use of rename. We are able to do that with a foreach loop.
foreach var of varlist _all {
native label : variable label `var'
native new_name = decrease(strtoname("`label'"))
rename `var' `new_name'
}
Now once we describe our information, they appear to be this:
. describe
Incorporates information
obs: 367
vars: 12
measurement: 37,067
-------------------------------------------------------------------------------
storage show worth
variable title sort format label variable label
-------------------------------------------------------------------------------
fips str5 %9s Fips
space str56 %56s Space
_2001 lengthy %10.0g 2001
_2002 lengthy %10.0g 2002
_2003 lengthy %10.0g 2003
_2004 lengthy %10.0g 2004
_2005 lengthy %10.0g 2005
_2006 lengthy %10.0g 2006
_2007 lengthy %10.0g 2007
_2008 lengthy %10.0g 2008
_2009 lengthy %10.0g 2009
_2010 lengthy %10.0g 2010
-------------------------------------------------------------------------------
Sorted by:
Word: dataset has modified since final saved
One very last thing we’d wish to do is to rename the yr variables from _20## to y20##, which we will simply accomplish with rename:
. rename (_*) (y*)
. describe
Incorporates information
obs: 367
vars: 12
measurement: 37,067
-------------------------------------------------------------------------------
storage show worth
variable title sort format label variable label
-------------------------------------------------------------------------------
fips str5 %9s Fips
space str56 %56s Space
y2001 lengthy %10.0g 2001
y2002 lengthy %10.0g 2002
y2003 lengthy %10.0g 2003
y2004 lengthy %10.0g 2004
y2005 lengthy %10.0g 2005
y2006 lengthy %10.0g 2006
y2007 lengthy %10.0g 2007
y2008 lengthy %10.0g 2008
y2009 lengthy %10.0g 2009
y2010 lengthy %10.0g 2010
-------------------------------------------------------------------------------
Sorted by:
Word: dataset has modified since final saved


