Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
458 views
in Technique[技术] by (71.8m points)

python - pandas multiindex (hierarchical index) subtract columns and append result

I have this bit of code which is working beautifully...

from pandas_datareader import data as pdr
import pandas as pd

myStartDate = '2020-12-15'   # (Format: Year-Month-Day)
myEndDate = '2020-12-31'   # (Format: Year-Month-Day)
myTickers = 'TSLA'
myData = pdr.get_data_yahoo(myTickers, myStartDate, myEndDate)

pd.set_option('display.max_columns', None)
pd.set_option('display.width', 200)

print(myData.head())

The result is:

           High     Low    Open    Close   Volume    Adj Close
Date                                                                             
20-12-15  646.90  623.79  643.28  633.25  45223600  633.250000
20-12-16  632.50  605.00  628.22  622.77  42095800  622.770020
20-12-17  658.82  619.50  628.19  655.90  56270100  655.900024
20-12-18  695.00  628.53  668.90  695.00  22212620  695.000000
20-12-21  668.50  646.07  666.23  649.85  58045300  649.859985

Then, I have this:

myData['Some New Column Name'] = myData['High'] - myData['Low']
print(myData.head())

And it works beautifully and the result is this:

           High     Low    Open    Close   Volume   Adj Close   Some New Column Name
Date                                                                                                   
20-12-15  646.90  623.79  643.28  633.25   45223600  633.25            23.10
20-12-16  632.50  605.00  628.22  622.77   42095800  622.77            27.50
20-12-17  658.82  619.50  628.19  655.90   56270100  655.90            39.32
20-12-18  695.00  628.53  668.90  695.00   22212620  695.00            66.46
20-12-21  668.50  646.07  666.23  649.85   58045300  649.85            22.42

I want to modify the code to do multiple "myTickers" like this:

from pandas_datareader import data as pdr
import pandas as pd

myStartDate = '2020-12-15'   # (Format: Year-Month-Day)
myEndDate = '2020-12-31'   # (Format: Year-Month-Day)
myTickers = ['TSLA', 'SQ']
myData = pdr.get_data_yahoo(myTickers, myStartDate, myEndDate)

pd.set_option('display.max_columns', None)
pd.set_option('display.width', 200)

print(myData.head())

This works beautifully and the result is:

Attributes  Adj Close         Close            High            Low             Open           Volume    
Symbols    TSLA     SQ     TSLA     SQ     TSLA     SQ     TSLA     SQ     TSLA     SQ      TSLA    SQ
Date                                                
12/15/20  633.25  219.99  633.25  219.99  646.90  221.72  623.80  216.73  643.28  218.41  45223600  5713600
12/16/20  622.77  227.08  622.77  227.08  632.50  227.96  605.00  220.03  628.23  223.86  42095800  8216500
12/17/20  655.90  230.74  655.90  230.74  658.82  237.09  619.50  227.70  628.19  230.00  56270100  10440100
12/18/20  695.00  235.45  695.00  235.45  695.00  236.37  628.54  231.10  668.90  235.00  22212620  8099500
12/21/20  649.86  233.50  649.86  233.50  668.50  241.85  646.07  232.26  666.24  236.09  58045300  1115540

Similar to the first example, I want to do: (Tesla's high) - (Tesla's low) = (Some New Column Name) and (SQ's high) - (SQ's low) = (Some New Column Name) but I am not very experienced with pandas on how to do this. My understanding is that this is a multiindex or (hierarchical index) pandas dataframe. Reading some documentation, I thought something like this might work:

myData['Some New Column Name']['TSLA', 'SQ'] = myData['High'] - myData['Low']

or

myData[['Some New Column Name'],['TSLA', 'SQ']] = myData['High'] - myData['Low']

But this doesn't work... I feel like maybe it is close?

To be clear, I want something like this:

Attributes  Adj Close         Close       ...       Open            Volume      Some New Col
Symbols    TSLA     SQ     TSLA     SQ    ...   TSLA     SQ      TSLA     SQ     TSLA    SQ
Date                                      ...           
12/15/20  633.25  219.99  633.25  219.99  ...  643.28  218.41  45223600 5713600  23.10  4.99
12/16/20  622.77  227.08  622.77  227.08  ...  628.23  223.86  42095800 8216500  27.50  7.93
12/17/20  655.90  230.74  655.90  230.74  ...  628.19  230.00  56270100 1044010  39.32  9.39
12/18/20  695.00  235.45  695.00  235.45  ...  668.90  235.00  22212620 8099500  66.46  5.27
12/21/20  649.86  233.50  649.86  233.50  ...  666.24  236.09  58045300 1115540  22.43  9.59

I just tried:

myNewData = myData['High'] - myData['Low']

and the result is this:

Symbols          TSLA        SQ
Date                           
2020-12-15  23.100037  4.990005
2020-12-16  27.500000  7.930008
2020-12-17  39.320007  9.389999
2020-12-18  66.460022  5.269989
2020-12-21  22.429993  9.590012

I can probably work with this... But... It would be nice to append this to myData in some clean way instead


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think you were pretty close...

myData[[('Some new column', c) for c in myData['High'].columns]] = myData['High'] - myData['Low']

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...