Saturday, April 4, 2026

Utilizing import excel with actual world information


Stata 12’s new import excel command will help you simply import real-world Excel recordsdata into Stata. Excel recordsdata usually include header and footer info within the first few and previous couple of rows of a sheet, and it’s possible you’ll not need that info loaded. Additionally, the column labels used within the sheet are invalid Stata variable names and due to this fact can’t be loaded. Each of those points may be simply solved utilizing import excel.

Let’s begin by taking a look at an Excel spreadsheet, metro_gdp.xls, that’s downloaded from the Bureau of Financial Evaluation web site.

 

As you may see, the primary 5 rows of the Excel file include an outline of the info, and rows 374 by means of 381 include footer notes. We don’t wish to load these rows into Stata. import excel has a cellrange() possibility that may assist us keep away from undesirable info being loaded.

With cellrange(), you specify the higher left cell and the decrease proper cell (utilizing normal Excel notation) of the world of knowledge you need loaded. Within the file metro_gdp.xls, we wish all the info from column A row 6 (higher left cell) to column L row 373 (decrease proper cell) loaded into Stata. To do that, we sort


. import excel metro_gdp.xls, cellrange(A6:L373) clear

In Stata, we open the Information Editor to examine the loaded information.

Stata Data Editor

 

The primary row of the info we loaded contained column labels. Due to these labels, import excel loaded all the info as strings. import excel once more has a simple repair. We have to specify the firstrow possibility to inform import excel that the primary row of knowledge comprises the variable names.


. import excel metro_gdp.xls, cellrange(A6:L373) firstrow clear

We once more open the Information Editor to examine the info.

Stata Data Editor

 

The info at the moment are within the appropriate format, however we’re lacking the yr column labels. Stata doesn’t settle for numeric variable names, so import excel has to make use of the Excel column title (C, D, …) for the variable names as an alternative of 2001, 2002, …. The straightforward resolution is to rename the column headers in Excel to one thing like y2001, y2002, and many others., earlier than loading. You too can use Stata to rename the column headers. import excel saves the values within the first row of knowledge as variable labels in order that the knowledge will not be misplaced. If we describe the info, we are going to see all of the column labels from the Excel file saved as variable labels.


. describe

Incorporates information
  obs:           367
 vars:            12
 measurement:        37,067
-------------------------------------------------------------------------------
              storage  show     worth
variable title   sort   format      label      variable label
-------------------------------------------------------------------------------
Fips            str5   %9s                    Fips
Space            str56  %56s                   Space
C               lengthy   %10.0g                 2001
D               lengthy   %10.0g                 2002
E               lengthy   %10.0g                 2003
F               lengthy   %10.0g                 2004
G               lengthy   %10.0g                 2005
H               lengthy   %10.0g                 2006
I               lengthy   %10.0g                 2007
J               lengthy   %10.0g                 2008
Ok               lengthy   %10.0g                 2009
L               lengthy   %10.0g                 2010
-------------------------------------------------------------------------------
Sorted by:
     Word:  dataset has modified since final saved

We wish to seize the variable label for every variable through the use of the prolonged macro operate :variable label varname, create a sound lowercase variable title from that label through the use of the strtoname() and decrease() features, and rename the variable to the brand new title through the use of rename. We are able to do that with a foreach loop.


foreach var of varlist _all {
        native label : variable label `var'
        native new_name = decrease(strtoname("`label'"))
        rename `var' `new_name'
}

Now once we describe our information, they appear to be this:


. describe

Incorporates information
  obs:           367
 vars:            12
 measurement:        37,067                          
-------------------------------------------------------------------------------
              storage  show     worth      
variable title   sort   format      label      variable label
-------------------------------------------------------------------------------
fips            str5   %9s                    Fips
space            str56  %56s                   Space
_2001           lengthy   %10.0g                 2001
_2002           lengthy   %10.0g                 2002
_2003           lengthy   %10.0g                 2003
_2004           lengthy   %10.0g                 2004
_2005           lengthy   %10.0g                 2005
_2006           lengthy   %10.0g                 2006
_2007           lengthy   %10.0g                 2007
_2008           lengthy   %10.0g                 2008
_2009           lengthy   %10.0g                 2009
_2010           lengthy   %10.0g                 2010
-------------------------------------------------------------------------------
Sorted by:  
     Word:  dataset has modified since final saved

One very last thing we’d wish to do is to rename the yr variables from _20## to y20##, which we will simply accomplish with rename:


. rename (_*) (y*)

. describe

Incorporates information
  obs:           367
 vars:            12
 measurement:        37,067                          
-------------------------------------------------------------------------------
              storage  show     worth      
variable title   sort   format      label      variable label
-------------------------------------------------------------------------------
fips            str5   %9s                    Fips
space            str56  %56s                   Space
y2001           lengthy   %10.0g                 2001
y2002           lengthy   %10.0g                 2002
y2003           lengthy   %10.0g                 2003
y2004           lengthy   %10.0g                 2004
y2005           lengthy   %10.0g                 2005
y2006           lengthy   %10.0g                 2006
y2007           lengthy   %10.0g                 2007
y2008           lengthy   %10.0g                 2008
y2009           lengthy   %10.0g                 2009
y2010           lengthy   %10.0g                 2010
-------------------------------------------------------------------------------
Sorted by:  
     Word:  dataset has modified since final saved



Related Articles

Latest Articles