本文整理汇总了Python中zipline.utils.math_utils.nanmean函数的典型用法代码示例。如果您正苦于以下问题:Python nanmean函数的具体用法?Python nanmean怎么用?Python nanmean使用的例子?那么恭喜您, 这里精选的函数代码示例或许可以为您提供帮助。
在下文中一共展示了nanmean函数的9个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: compute
def compute(self, today, assets, out, closes):
diffs = diff(closes, axis=0)
ups = nanmean(clip(diffs, 0, inf), axis=0)
downs = abs(nanmean(clip(diffs, -inf, 0), axis=0))
return evaluate(
"100 - (100 / (1 + (ups / downs)))", local_dict={"ups": ups, "downs": downs}, global_dict={}, out=out
)
示例2: zscore
def zscore(self, mask=NotSpecified, groupby=NotSpecified):
"""
Construct a Factor that Z-Scores each day's results.
The Z-Score of a row is defined as::
(row - row.mean()) / row.stddev()
If ``mask`` is supplied, ignore values where ``mask`` returns False
when computing row means and standard deviations, and output NaN
anywhere the mask is False.
If ``groupby`` is supplied, compute by partitioning each row based on
the values produced by ``groupby``, z-scoring the partitioned arrays,
and stitching the sub-results back together.
Parameters
----------
mask : zipline.pipeline.Filter, optional
A Filter defining values to ignore when Z-Scoring.
groupby : zipline.pipeline.Classifier, optional
A classifier defining partitions over which to compute Z-Scores.
Returns
-------
zscored : zipline.pipeline.Factor
A Factor producing that z-scores the output of self.
Notes
-----
Mean and standard deviation are sensitive to the magnitudes of
outliers. When working with factor that can potentially produce large
outliers, it is often useful to use the ``mask`` parameter to discard
values at the extremes of the distribution::
>>> base = MyFactor(...)
>>> normalized = base.zscore(mask=base.percentile_between(1, 99))
``zscore()`` is only supported on Factors of dtype float64.
Example
-------
See :meth:`~zipline.pipeline.factors.Factor.demean` for an in-depth
example of the semantics for ``mask`` and ``groupby``.
See Also
--------
:meth:`pandas.DataFrame.groupby`
"""
return GroupedRowTransform(
transform=lambda row: (row - nanmean(row)) / nanstd(row),
factor=self,
mask=mask,
groupby=groupby,
)
示例3: get_simple_transform
def get_simple_transform(self, asset, transform_name, dt, data_frequency,
bars=None):
if transform_name == "returns":
# returns is always calculated over the last 2 days, regardless
# of the simulation's data frequency.
hst = self.get_history_window(
[asset], dt, 2, "1d", "price", ffill=True
)[asset]
return (hst.iloc[-1] - hst.iloc[0]) / hst.iloc[0]
if bars is None:
raise ValueError("bars cannot be None!")
if data_frequency == "minute":
freq_str = "1m"
calculated_bar_count = int(self._get_minute_count_for_transform(
dt, bars
))
else:
freq_str = "1d"
calculated_bar_count = bars
price_arr = self.get_history_window(
[asset], dt, calculated_bar_count, freq_str, "price", ffill=True
)[asset]
if transform_name == "mavg":
return nanmean(price_arr)
elif transform_name == "stddev":
return nanstd(price_arr, ddof=1)
elif transform_name == "vwap":
volume_arr = self.get_history_window(
[asset], dt, calculated_bar_count, freq_str, "volume",
ffill=True
)[asset]
vol_sum = nansum(volume_arr)
try:
ret = nansum(price_arr * volume_arr) / vol_sum
except ZeroDivisionError:
ret = np.nan
return ret
示例4: compute
def compute(self, today, assets, out, data):
out[:] = nanmean(data, axis=0)
示例5: compute
def compute(self, today, assets, out, close, volume):
out[:] = nanmean(close * volume, axis=0)
示例6: zscore
def zscore(row):
return (row - nanmean(row)) / nanstd(row)
示例7: demean
def demean(row):
return row - nanmean(row)
示例8: vectorized_beta
def vectorized_beta(dependents, independent, allowed_missing, out=None):
"""
Compute slopes of linear regressions between columns of ``dependents`` and
``independent``.
Parameters
----------
dependents : np.array[N, M]
Array with columns of data to be regressed against ``independent``.
independent : np.array[N, 1]
Independent variable of the regression
allowed_missing : int
Number of allowed missing (NaN) observations per column. Columns with
more than this many non-nan observations in both ``dependents`` and
``independents`` will output NaN as the regression coefficient.
Returns
-------
slopes : np.array[M]
Linear regression coefficients for each column of ``dependents``.
"""
# Cache these as locals since we're going to call them multiple times.
nan = np.nan
isnan = np.isnan
N, M = dependents.shape
if out is None:
out = np.full(M, nan)
# Copy N times as a column vector and fill with nans to have the same
# missing value pattern as the dependent variable.
#
# PERF_TODO: We could probably avoid the space blowup by doing this in
# Cython.
# shape: (N, M)
independent = np.where(
isnan(dependents),
nan,
independent,
)
# Calculate beta as Cov(X, Y) / Cov(X, X).
# https://en.wikipedia.org/wiki/Simple_linear_regression#Fitting_the_regression_line # noqa
#
# NOTE: The usual formula for covariance is::
#
# mean((X - mean(X)) * (Y - mean(Y)))
#
# However, we don't actually need to take the mean of both sides of the
# product, because of the folllowing equivalence::
#
# Let X_res = (X - mean(X)).
# We have:
#
# mean(X_res * (Y - mean(Y))) = mean(X_res * (Y - mean(Y)))
# (1) = mean((X_res * Y) - (X_res * mean(Y)))
# (2) = mean(X_res * Y) - mean(X_res * mean(Y))
# (3) = mean(X_res * Y) - mean(X_res) * mean(Y)
# (4) = mean(X_res * Y) - 0 * mean(Y)
# (5) = mean(X_res * Y)
#
#
# The tricky step in the above derivation is step (4). We know that
# mean(X_res) is zero because, for any X:
#
# mean(X - mean(X)) = mean(X) - mean(X) = 0.
#
# The upshot of this is that we only have to center one of `independent`
# and `dependent` when calculating covariances. Since we need the centered
# `independent` to calculate its variance in the next step, we choose to
# center `independent`.
# shape: (N, M)
ind_residual = independent - nanmean(independent, axis=0)
# shape: (M,)
covariances = nanmean(ind_residual * dependents, axis=0)
# We end up with different variances in each column here because each
# column may have a different subset of the data dropped due to missing
# data in the corresponding dependent column.
# shape: (M,)
independent_variances = nanmean(ind_residual ** 2, axis=0)
# shape: (M,)
np.divide(covariances, independent_variances, out=out)
# Write nans back to locations where we have more then allowed number of
# missing entries.
nanlocs = isnan(independent).sum(axis=0) > allowed_missing
out[nanlocs] = nan
return out
示例9: demean
#.........这里部分代码省略.........
A classifier defining partitions over which to compute means.
Example
-------
Let ``f`` be a Factor which would produce the following output::
AAPL MSFT MCD BK
2017-03-13 1.0 2.0 3.0 4.0
2017-03-14 1.5 2.5 3.5 1.0
2017-03-15 2.0 3.0 4.0 1.5
2017-03-16 2.5 3.5 1.0 2.0
Let ``c`` be a Classifier producing the following output::
AAPL MSFT MCD BK
2017-03-13 1 1 2 2
2017-03-14 1 1 2 2
2017-03-15 1 1 2 2
2017-03-16 1 1 2 2
Let ``m`` be a Filter producing the following output::
AAPL MSFT MCD BK
2017-03-13 False True True True
2017-03-14 True False True True
2017-03-15 True True False True
2017-03-16 True True True False
Then ``f.demean()`` will subtract the mean from each row produced by
``f``.
::
AAPL MSFT MCD BK
2017-03-13 -1.500 -0.500 0.500 1.500
2017-03-14 -0.625 0.375 1.375 -1.125
2017-03-15 -0.625 0.375 1.375 -1.125
2017-03-16 0.250 1.250 -1.250 -0.250
``f.demean(mask=m)`` will subtract the mean from each row, but means
will be calculated ignoring values on the diagonal, and NaNs will
written to the diagonal in the output. Diagonal values are ignored
because they are the locations where the mask ``m`` produced False.
::
AAPL MSFT MCD BK
2017-03-13 NaN -1.000 0.000 1.000
2017-03-14 -0.500 NaN 1.500 -1.000
2017-03-15 -0.166 0.833 NaN -0.666
2017-03-16 0.166 1.166 -1.333 NaN
``f.demean(groupby=c)`` will subtract the group-mean of AAPL/MSFT and
MCD/BK from their respective entries. The AAPL/MSFT are grouped
together because both assets always produce 1 in the output of the
classifier ``c``. Similarly, MCD/BK are grouped together because they
always produce 2.
::
AAPL MSFT MCD BK
2017-03-13 -0.500 0.500 -0.500 0.500
2017-03-14 -0.500 0.500 1.250 -1.250
2017-03-15 -0.500 0.500 1.250 -1.250
2017-03-16 -0.500 0.500 -0.500 0.500
``f.demean(mask=m, groupby=c)`` will also subtract the group-mean of
AAPL/MSFT and MCD/BK, but means will be calculated ignoring values on
the diagonal , and NaNs will be written to the diagonal in the output.
::
AAPL MSFT MCD BK
2017-03-13 NaN 0.000 -0.500 0.500
2017-03-14 0.000 NaN 1.250 -1.250
2017-03-15 -0.500 0.500 NaN 0.000
2017-03-16 -0.500 0.500 0.000 NaN
Notes
-----
Mean is sensitive to the magnitudes of outliers. When working with
factor that can potentially produce large outliers, it is often useful
to use the ``mask`` parameter to discard values at the extremes of the
distribution::
>>> base = MyFactor(...)
>>> normalized = base.demean(mask=base.percentile_between(1, 99))
``demean()`` is only supported on Factors of dtype float64.
See Also
--------
:meth:`pandas.DataFrame.groupby`
"""
return GroupedRowTransform(
transform=lambda row: row - nanmean(row),
factor=self,
mask=mask,
groupby=groupby,
)