Stata/Python integration half 9: Utilizing the Stata Perform Interface to repeat information from Python to Stata

November 18, 2025

134

In my earlier put up, we discovered the best way to use the Stata Perform Interface (SFI) module to repeat information from Stata to Python. On this put up, I’ll present you the best way to use the SFI module to repeat information from Python to Stata. We can be utilizing the yfinance module to obtain monetary information from the Yahoo! finance web site. You possibly can set up this module in your Python setting by typing pip set up yfinance. Our purpose is to make use of Python to obtain historic information for the Dow Jones Industrial Common (DJIA) and use Stata to create the next graph.

If you’re not aware of Python, it could be useful to learn the primary 4 posts in my Stata/Python Integration sequence earlier than you learn additional.

Utilizing the yfinance module to obtain Yahoo! finance information

Let’s start by importing the yfinance module utilizing the alias pd. Then, we’ll use the obtain() technique to obtain information for the DJIA from the Yahoo! finance web site. The obtain() technique accepts three arguments. The primary argument is the inventory image for the DJIA. The second argument is the beginning date for the inventory information, and the third argument is the top date.

python:
import yfinance as yf
dowjones = yf.obtain("^DJI", begin="2010-01-01", finish="2019-12-31")
dowjones
finish

The obtain() technique returns a Pandas information body with 2,515 rows and 6 columns listed by date.

. python:
----------------------------------------------- python (kind finish to exit) ------
>>> import yfinance as yf
>>> dowjones = yf.obtain("^DJI", begin="2010-01-01", finish="2019-12-31")
[*********************100%***********************]  1 of 1 accomplished
>>> dowjones
                    Open          Excessive  ...     Adj Shut     Quantity
Date                                    ...
2010-01-04  10430.690430  10604.969727  ...  10583.959961  179780000
2010-01-05  10584.559570  10584.559570  ...  10572.019531  188540000
2010-01-06  10564.719727  10594.990234  ...  10573.679688  186040000
2010-01-07  10571.110352  10612.370117  ...  10606.860352  217390000
2010-01-08  10606.400391  10619.400391  ...  10618.190430  172710000
...                  ...           ...  ...           ...        ...
2019-12-23  28491.779297  28582.490234  ...  28551.529297  223530000
2019-12-24  28572.570312  28576.800781  ...  28515.449219   86150000
2019-12-26  28539.460938  28624.099609  ...  28621.390625  155970000
2019-12-27  28675.339844  28701.660156  ...  28645.259766  182280000
2019-12-30  28654.759766  28664.689453  ...  28462.140625  181600000

[2515 rows x 6 columns]
>>> finish
--------------------------------------------------------------------------------

We are able to view the main points of the index by typing dowjones.index in Python. The output beneath reveals us that the index column is called ‘Date’, which is saved as the information kind datetime64[ns].

. python:
----------------------------------------------- python (kind finish to exit) ------
>>> dowjones.index
DatetimeIndex(['2010-01-04', '2010-01-05', '2010-01-06', '2010-01-07',
               '2010-01-08', '2010-01-11', '2010-01-12', '2010-01-13',
               '2010-01-14', '2010-01-15',
               ...
               '2019-12-16', '2019-12-17', '2019-12-18', '2019-12-19',
               '2019-12-20', '2019-12-23', '2019-12-24', '2019-12-26',
               '2019-12-27', '2019-12-30'],
              dtype="datetime64[ns]", identify="Date", size=2515, freq=None)
>>> finish
--------------------------------------------------------------------------------

We should convert the index to a string variable earlier than we copy it to a Stata variable. We are able to use the astype() technique to create a brand new column named dowdate that incorporates the index saved as a string.

. python:
----------------------------------------------- python (kind finish to exit) ------
>>> dowjones['dowdate'] = dowjones.index.astype(str)
>>> dowjones['dowdate']
Date
2010-01-04    2010-01-04
2010-01-05    2010-01-05
2010-01-06    2010-01-06
2010-01-07    2010-01-07
2010-01-08    2010-01-08
                 ...
2019-12-23    2019-12-23
2019-12-24    2019-12-24
2019-12-26    2019-12-26
2019-12-27    2019-12-27
2019-12-30    2019-12-30
Title: dowdate, Size: 2515, dtype: object
>>> finish
--------------------------------------------------------------------------------

Use the SFI module to repeat the Python information to Stata

Now, we’re prepared to make use of the SFI module to repeat the dowjones information body to Stata. We start by importing the Knowledge class from the SFI module. Then, we are able to use the setObsTotal() technique so as to add observations to the present Stata dataset. We should add sufficient observations to the Stata dataset to retailer all of the rows of the dowjones information body. We are able to rely the variety of rows utilizing the
len() technique.

. python:
----------------------------------------------- python (kind finish to exit) ------
... from sfi import Knowledge
>>> Knowledge.setObsTotal(len(dowjones))
>>> finish
--------------------------------------------------------------------------------

Subsequent, we’ll add three variables to the Stata dataset. The addVarStr() technique provides a 10-character string variable named “dowdate”. The addVarDouble() technique provides a double-precision variable named “dowclose”. And the addVarInt() technique provides an int variable named “dowvolume”.

. python:
----------------------------------------------- python (kind finish to exit) ------
>>> Knowledge.addVarStr("dowdate",10)
>>> Knowledge.addVarDouble("dowclose")
>>> Knowledge.addVarInt("dowvolume")
>>> finish
--------------------------------------------------------------------------------

Now, we are able to use the retailer() technique to repeat columns from the Pandas information body to variables within the Stata dataset. retailer() accepts 4 arguments. The primary argument is the identify of the Stata variable to which the Pandas column is to be copied. The second argument can be utilized to specify a subset of the information body to be copied. For instance, we may use the vary() operate to specify a spread of rows to repeat. I want to copy all of the observations so I’ve specified “None”. The third argument is the Python information that’s to be copied to Stata. The fourth argument is the identify of an indicator variable that specifies which rows within the Python information body are to be copied to the Stata variable. This argument is non-obligatory. When it’s not specified,
all of the rows within the Python information body are copied.

The primary assertion within the output beneath copies column dowdate from the Python information body dowjones to the Stata variable dowdate. The second assertion copies column Adj Shut from the Python information body dowjones to the Stata variable dowclose. The third assertion within the output beneath copies column Quantity from the Python information body dowjones to the Stata variable dowvolume. Be aware that I’ve specified “None” for the second and fourth arguments as a result of I want to copy all of the rows from the Pandas information body.

. python:
----------------------------------------------- python (kind finish to exit) ------
>>> Knowledge.retailer("dowdate", None, dowjones['dowdate'], None)
>>> Knowledge.retailer("dowclose", None, dowjones['Adj Close'], None)
>>> Knowledge.retailer("dowvolume", None, dowjones['Volume'], None)
>>> finish
--------------------------------------------------------------------------------

Clear the Stata dataset

Let’s checklist the primary 5 observations in our Stata dataset to test our work.

. checklist in 1/5, abbreviate(9)

     +-----------------------------------+
     |    dowdate   dowclose   dowvolume |
     |-----------------------------------|
  1. | 2010-01-04   10583.96     1.8e+08 |
  2. | 2010-01-05   10572.02     1.9e+08 |
  3. | 2010-01-06   10573.68     1.9e+08 |
  4. | 2010-01-07   10606.86     2.2e+08 |
  5. | 2010-01-08   10618.19     1.7e+08 |
     +-----------------------------------+

Our information look fairly good, however we have now a number of information administration duties to finish earlier than we are able to graph the information. First, recall that dowdate is saved as a string. Let’s use Stata’s date() operate to generate a brand new variable named date. The primary argument is the string variable that’s to be transformed to a date. The second argument is the order of the 12 months, month, and day within the first argument. I’ve specified “YMD” as a result of the information in dowdate are saved with the 12 months first, the month second, and the day third.

. generate date = date(dowdate,"YMD")

. checklist in 1/5, abbreviate(9)

     +-------------------------------------------+
     |    dowdate   dowclose   dowvolume    date |
     |-------------------------------------------|
  1. | 2010-01-04   10583.96   1.798e+08   18266 |
  2. | 2010-01-05   10572.02   1.885e+08   18267 |
  3. | 2010-01-06   10573.68   1.860e+08   18268 |
  4. | 2010-01-07   10606.86   2.174e+08   18269 |
  5. | 2010-01-08   10618.19   1.727e+08   18270 |
     +-------------------------------------------+

The information in date don’t seem like dates to you and me. The date() operate returns the variety of days since January 1, 1960. We are able to use format to show the information in a well-known date format.

. format %tdCCYY-NN-DD date

. checklist in 1/5, abbreviate(9)

     +------------------------------------------------+
     |    dowdate   dowclose   dowvolume         date |
     |------------------------------------------------|
  1. | 2010-01-04   10583.96   1.798e+08   2010-01-04 |
  2. | 2010-01-05   10572.02   1.885e+08   2010-01-05 |
  3. | 2010-01-06   10573.68   1.860e+08   2010-01-06 |
  4. | 2010-01-07   10606.86   2.174e+08   2010-01-07 |
  5. | 2010-01-08   10618.19   1.727e+08   2010-01-08 |
     +------------------------------------------------+

Subsequent, dowvolume is displayed in scientific notation. We are able to once more use format to show the numbers with commas within the hundreds place.

. format %16.0fc dowvolume

. checklist in 1/5, abbreviate(9)

     +--------------------------------------------------+
     |    dowdate   dowclose     dowvolume         date |
     |--------------------------------------------------|
  1. | 2010-01-04   10583.96   179,780,000   2010-01-04 |
  2. | 2010-01-05   10572.02   188,540,000   2010-01-05 |
  3. | 2010-01-06   10573.68   186,040,000   2010-01-06 |
  4. | 2010-01-07   10606.86   217,390,000   2010-01-07 |
  5. | 2010-01-08   10618.19   172,710,000   2010-01-08 |
     +--------------------------------------------------+

It appears like dowvolume is rounded to the closest 10,000. So let’s divide dowvolume by 1,000,000 and use format to show the information with two decimal locations. Let’s additionally label dowvolume in order that we don’t overlook that the information are saved as “tens of millions of shares”.

. exchange dowvolume = dowvolume/1000000
(2,515 actual modifications made)

. format %10.2fc dowvolume

. label variable dowvolume "DJIA Quantity (Tens of millions of Shares)"

. checklist in 1/5, abbreviate(9)

     +------------------------------------------------+
     |    dowdate   dowclose   dowvolume         date |
     |------------------------------------------------|
  1. | 2010-01-04   10583.96      179.78   2010-01-04 |
  2. | 2010-01-05   10572.02      188.54   2010-01-05 |
  3. | 2010-01-06   10573.68      186.04   2010-01-06 |
  4. | 2010-01-07   10606.86      217.39   2010-01-07 |
  5. | 2010-01-08   10618.19      172.71   2010-01-08 |
     +------------------------------------------------+

Our information are formatted and able to graph.

Graph the information

I’d wish to create a graph that features a line graph for dowclose and a bar chart for dowvolume. That is straightforward with graph twoway. We are able to use graph twoway line to create a line plot for dowclose and graph twoway bar for dowvolume. Chances are you’ll be aware of most of the graphics choices within the code block beneath. I’ve used title() so as to add a title and used xtitle(“”) and ytitle(“”) to take away the titles from the x and y axes, respectively. I’ve used xlabel and ylabel() to label the axes. And I’ve used legend() to format the legend.

The choices yaxis(2) and axis(2) could also be unfamiliar to you. The yaxis(2) possibility creates a second y axis on the appropriate facet of the graph for the bar chart. The axis(2) possibility within the ytitle() and ylabel() specify that these choices apply to the second y axis. Be aware that I’ve plotted dowclose with a inexperienced line and labeled the left y axis with a inexperienced font to emphasise that the left axis describes the road graph. Equally, I’ve I’ve plotted dowvolume with blue bars and labeled the appropriate y axis with a blue font to emphasise that the appropriate axis describes the bar graph.

twoway (line dowclose date, lcolor(inexperienced) lwidth(medium))         ///
       (bar dowvolume date, fcolor(blue) lcolor(blue) yaxis(2)),  ///
       title("Dow Jones Industrial Common (2010 - 2019)")        ///
       xtitle("") ytitle("") ytitle("", axis(2))                  ///
       xlabel(, labsize(small) angle(horizontal))                 ///
       ylabel(5000(5000)30000,                                    ///
              labsize(small) labcolor(inexperienced)                      ///
              angle(horizontal) format(%9.0fc))                   ///
       ylabel(0(500)3000,                                         ///
              labsize(small) labcolor(blue)                       ///
              angle(horizontal) axis(2))                          ///
       legend(order(1 "Closing Worth" 2 "Quantity (tens of millions)")      ///
              cols(1) place(10) ring(0))

The code block above creates the next graph.

Conclusion

We did it! We used the Yahoo! finance module in Python to obtain information for the Dow Jones Industrial Common and import it to a Pandas information body. We used the Stata Perform Interface (SFI) module to create new variables in Stata and duplicate particular columns from the Pandas information body to Stata variables. And we used graph twoway to create a graph with Stata. I’ve collected the examples beneath.

Subsequent time, I’ll present you the best way to use the Stata Perform Interface (SFI) module to repeat scalars, macros, and matrices backwards and forwards between Stata and Python.

instance.do

python:
import yfinance as yf
from sfi import Knowledge, Macro

# Use the yfinance module to obtain the DJIA information
dowjones = yf.obtain("^DJI", begin="2010-01-01", finish="2019-12-31")

# Show the column names
dowjones.columns      
# Show the identify of the index
dowjones.index.identify    
# Show the index information                          
dowjones.index   
# Copy the index to a string column                                
dowjones['dowdate'] = dowjones.index.astype(str) 
dowjones[['dowdate','Adj Close', 'Volume']]

# Set the variety of observations within the Stata dataset
Knowledge.setObsTotal(len(dowjones))  # Add observations to the present dataset

# Create the variables within the Stata dataset
Knowledge.addVarStr("dowdate",10)
Knowledge.addVarDouble("dowclose")
Knowledge.addVarInt("dowvolume")

# Write the information from the Pandas information body to the Stata variables
Knowledge.retailer("dowdate", None, dowjones['dowdate'], None)
# This works too
#Knowledge.retailer("dowdate", None, dowjones.index.astype(str), None) 
Knowledge.retailer("dowclose", None, dowjones['Adj Close'], None)
Knowledge.retailer("dowvolume", None, dowjones['Volume'], None)
finish

// Use the date() operate to create the variable date
generate date = date(dowdate,"YMD")
format %tdCCYY-NN-DD date
label var date "Date"

// Format the variable dowclose
format %10.2f dowclose
label var dowclose "DJIA Closing Worth"

// Format the variable dowvolume
exchange dowvolume = dowvolume/1000000
format %10.2fc dowvolume
label var dowvolume "DJIA Quantity (Tens of millions of Shares)"

checklist in 1/5, abbrev(9)

// Create the graph
twoway (line dowclose date, lcolor(inexperienced) lwidth(medium))         ///
       (bar dowvolume date, fcolor(blue) lcolor(blue) yaxis(2)),  ///
       title("Dow Jones Industrial Common (2010 - 2019)")        ///
       xtitle("") ytitle("") ytitle("", axis(2))                  ///
       xlabel(, labsize(small) angle(horizontal))                 ///
       ylabel(5000(5000)30000,                                    ///
              labsize(small) labcolor(inexperienced)                      ///
              angle(horizontal) format(%9.0fc))                   ///
       ylabel(0(500)3000,                                         ///
              labsize(small) labcolor(blue)                       ///
              angle(horizontal) axis(2))                          ///
       legend(order(1 "Closing Worth" 2 "Quantity (tens of millions)")      ///
              cols(1) place(10) ring(0))

Stata/Python integration half 9: Utilizing the Stata Perform Interface to repeat information from Python to Stata

Related Articles

But One other Solution to Middle an (Absolute) Aspect

Switching Inference Suppliers With out Downtime

Nothing confirms Headphone (a) launch with daring yellow design and lower cost

Latest Articles

But One other Solution to Middle an (Absolute) Aspect

Switching Inference Suppliers With out Downtime

Nothing confirms Headphone (a) launch with daring yellow design and lower cost

These Offers Can Have You Zipping Round on a New E-Scooter This Spring

The actual fact that this text calls to thoughts a Roald Dahl quick story might be a crimson flag