. Advertisement .
..3..
. Advertisement .
..4..
Anyone studying time-series data must be confused while figuring out the moving average issue at least once. Calculating a mean value throughout specific intervals can get complex very quickly.
That is why we prepared this guide on how to perform NumPy moving average in Python. We hope that you can make use of it to the fullest extent.
How To Perform Numpy Moving Average In Python
With numpy.convolve()
The first solution is, of course, the simplest and most approachable one. People familiar with signal processing must be well-aware of what convolve()
does. For those who don’t, it basically performs linear convolution on two arrays and returns the result.
More specifically, it takes the current window and the array of ones, gets their inner product, and calculates the sum. This process makes it perfect for this type of task, which requires rapid mean value calculation in particular intervals.
import numpy as np
def moving_average(x, w): return np.convolve(x, np.ones(w), 'valid') / w
data = np.array([10,5,8,9,15,22,26,11,15,16,18,7])
print(moving_average(data,4))
All you need as input parameters are the array and the range in which you want to perform the moving average process.
With scipy.convolve()
As we mentioned, numpy.convolve()
is the classic solution to this issue, but it’s also the slowest. If you want to follow its exact solving process without sacrificing computing speed, you can check out scipy.convolve()
.
Despite having the same goal and using the same principle, this approach is a little faster due to the SciPy library’s innate nature. After all, you mainly use this library for more complex and taxing operations.
def moving_average(a, n) :
ret = np.cumsum(a, dtype=float)
ret[n:] = ret[n:] - ret[:-n]
return ret[n - 1:] / n
data = np.array([10,5,8,9,15,22,26,11,15,16,18,7])
print(moving_average(data,4))
There is also the fact that numpy.convolve()
only works with one-dimensional arrays. You need to prepare a separate function if you want to work with multi-dimensional arrays. That is not a problem with scipy.convolve()
.
With bottleneck Module
What we call the bottleneck module is, in fact, a combination of fast NumPy methods focusing mainly on array manipulation. That is why it’s only natural for the bottleneck module to have a way to deal with this issue. We are referring to the move_mean()
method.
import bottleneck as bn
import numpy as np
def rollavg_bottlneck(a,n):
return bn.move_mean(a, window=n,min_count = None)
data = np.array([10,5,8,9,15,22,26,11,15,16,18,7])
print(rollavg_bottlneck(data, 4))
As long as you use this method, you just need to add in the array and the range for it to work its magic. You should remember, though, that there can be nan values replacing the first few numbers. This issue is dependent on the time window interval.
If you want to print, you must figure out how to redirect print output.
With pandas
It’s only natural for us to mention pandas whenever we talk about anything having to do with time series data. It was built specifically for computing this type of data, after all.
import pandas as pd
import numpy as np
data = np.array([10,5,8,9,15,22,26,11,15,16,18,7])
d = pd.Series(data) print(d.rolling(4).mean())
This library has two functions, mean()
and rolling()
, which can finish the job if you use them in combination. rolling()
is responsible for calculating the time window while mean()
computes the moving average value.
Conclusion
All in all, we have shown you a total of four approaches to performing NumPy moving average in Python. Out of them, the first and second approaches are the most alike, with the former being simpler to grasp and the latter more flexible.
The third and fourth method both has clear weaknesses, but they also have very strong advantages in the right situations.
Leave a comment