Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
2.2k views
in Technique[技术] by (71.8m points)

pandas - How to build a Python function with a rolling total?

I am looking to build a function that creates a rolling total by code by day for the below DataFrames, where the In for each code on a date is subtracted from the Out for each code on a date, this subtotal is the subtracted from the previous days total but the total must be >=0 ( I have included an example of this it the Desired Output below).

Below are an example of my inputs and the function I am using along with an example of my desired output.

df1 - In

s = """        Date    Code    Quantity
0   10/01/2019  A   20
3   11/01/2019  A   2
7   12/01/2019  A   4
11  13/01/2019  A   10
"""

df2 - Out

s ='''    Date     Code   Quantity
0   11/01/2019  A   5
3   12/01/2019  A   100
4   15/01/2019  A   1
6   16/01/2019  A   2
'''

Code

df3 = df1.merge(df2, how='outer', left_on=['date', 'code'], right_on=['date', 'code']).fillna(0)
df3['qty1'] = df3['qty_x'] - df3['qty_y']
df3['qty'] = 0
def final_adder(x):
    x.qty_x = x.qty_x
    print(x)
    return x
df_final = df3.groupby(['code']).apply(final_adder)
df_final['qty'] = df_final['qty'].clip(lower=0)
df_final.drop(['qty_x', 'qty_y','qty1'], inplace=True, axis=1)


          date code  qty_x  qty_y qty1  qty
0   10/01/2019   A   20.0    0.0  20.0    0
3   11/01/2019   A    2.0    5.0  -3.0    0
7   12/01/2019   A    4.0  100.0 -96.0    0
11  13/01/2019   A   10.0    0.0  10.0    0

Desired Output

s = """        Date    Code    Quantity
0   10/01/2019  A   20
3   11/01/2019  A   17
7   12/01/2019  A   0
11  13/01/2019  A   10
12  14/01/2019  A   10
15  15/01/2019  A   9
16  16/01/2019  A   7
"""
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There is a whole sub-genre of questions on SO involving cumulative operations with limits (e.g.: "reset to zero when the cumulative sum becomes negative"). This is not the same as cumulative operation with known reset points (e.g. from another column, or where there are NaNs, etc.) because the condition involves the cumulative value itself.

There is no clean way in the current pandas or numpy to do that in a vectorized way.

The best (fastest) way I am aware of for this is this SO answer involving numba. Slightly modified and adapted to your problem:

from numba import njit
@njit
def poscumsum(x):
    total = 0
    result = np.empty(x.shape)
    for i, y in enumerate(x):
        total += y
        if total < 0:
            total = 0
        result[i] = total
    return result

With this, you can do:

a = df1.set_index(['Code', 'Date'])
b = df2.set_index(['Code', 'Date'])
idx = a.index.union(b.index).sort_values()
df3 = (a.reindex(idx, fill_value=0) - b.reindex(idx, fill_value=0))
# optional: resample Date to daily within each group:
df3 = df3.groupby('Code').resample('D', level='Date').sum()
df3['Quantity'] = df3.groupby('Code')['Quantity'].transform(
    lambda g: poscumsum(g.values))

On the data provided in the question:

>>> df3
                 Quantity
Code Date                
A    2019-01-10        20
     2019-01-11        17
     2019-01-12         0
     2019-01-13        10
     2019-01-14        10
     2019-01-15         9
     2019-01-16         7

If you prefer, you can also use merge. Here is an example where all the intermediary results are kept (for forensic analysis):

df3 = df1.merge(df2, on=['Code', 'Date'], how='outer', sort=True).fillna(0)
# optional: resample Date to daily within each group:
df3 = df3.set_index(['Code', 'Date']).groupby('Code').resample('D', level='Date').sum()
df3['diff'] = df3['Quantity_x'] - df3['Quantity_y']
df3['cumdiff'] = df3.groupby('Code')['diff'].transform(
   lambda g: poscumsum(g.values))

df3
# out:
                 Quantity_x  Quantity_y  diff  cumdiff
Code Date                                             
A    2019-01-10        20.0         0.0  20.0     20.0
     2019-01-11         2.0         5.0  -3.0     17.0
     2019-01-12         4.0       100.0 -96.0      0.0
     2019-01-13        10.0         0.0  10.0     10.0
     2019-01-14         0.0         0.0   0.0     10.0
     2019-01-15         0.0         1.0  -1.0      9.0
     2019-01-16         0.0         2.0  -2.0      7.0

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...