时间序列预测

时间序列预测¶

此示例展示了如何使用 Prophet 和 Dask 进行可扩展的时间序列预测。

Prophet 是一种基于加性模型的时间序列数据预测方法，其中使用年度、每周和每日季节性以及假日效应来拟合非线性趋势。

正如在规模化预测中所讨论的，大型数据集并不是团队遇到的唯一类型的扩展挑战。在本例中，我们将重点关注该论文中指出的第三种扩展挑战：

在大多数实际环境中，会创建大量预测，这就需要高效、自动化的方法来评估和比较它们，以及检测何时它们可能表现不佳。当进行数百甚至数千次预测时，让机器完成模型评估和比较的繁重工作，同时有效利用人工反馈来解决性能问题变得非常重要。

这听起来是 Dask 的绝佳机会。我们将结合使用 Prophet 和 Dask 来并行化研究的诊断阶段。它并不试图并行化模型本身的训练。

[1]:

import pandas as pd
from prophet import Prophet

Importing plotly failed. Interactive plots will not work.

我们将逐步介绍 Prophet 快速入门中的示例。这些值代表了 Peyton Manning 维基百科页面的对数每日页面浏览量。

[2]:

df = pd.read_csv(
    'https://raw.githubusercontent.com/facebook/prophet/master/examples/example_wp_log_peyton_manning.csv',
    parse_dates=['ds']
)
df.head()

[2]:

	ds	y
0	2007-12-10	9.590761
1	2007-12-11	8.519590
2	2007-12-12	8.183677
3	2007-12-13	8.072467
4	2007-12-14	7.893572

[3]:

df.plot(x='ds', y='y');

../_images/applications_forecasting-with-prophet_4_0.png

拟合模型需要几秒钟。Dask 在这里完全没有参与。

[4]:

%%time
m = Prophet(daily_seasonality=False)
m.fit(df)

/usr/share/miniconda3/envs/dask-examples/lib/python3.9/site-packages/prophet/forecaster.py:896: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  components = components.append(new_comp)

CPU times: user 2.46 s, sys: 108 ms, total: 2.56 s
Wall time: 2.61 s

[4]:

<prophet.forecaster.Prophet at 0x7f73c3efc9d0>

我们可以进行预测。同样，Dask 在这里也没有参与。

[5]:

future = m.make_future_dataframe(periods=365)
forecast = m.predict(future)
m.plot(forecast);

/usr/share/miniconda3/envs/dask-examples/lib/python3.9/site-packages/prophet/forecaster.py:896: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  components = components.append(new_comp)
/usr/share/miniconda3/envs/dask-examples/lib/python3.9/site-packages/prophet/forecaster.py:896: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  components = components.append(new_comp)

../_images/applications_forecasting-with-prophet_8_1.png

并行诊断¶

Prophet 包含一个 prophet.diagnostics.cross_validation 函数方法，该方法使用模拟历史预测来提供模型质量的一些概念。

这是通过在历史记录中选择截止点来完成的，并且对于每个截止点，仅使用截至该截止点的数据来拟合模型。然后我们可以将预测值与实际值进行比较。

更多信息请参阅 https://fbdocs.cn/prophet/docs/diagnostics.html。

在内部，cross_validation 生成一个要尝试的截止值列表。Prophet 为每个截止值拟合一个模型并计算一些指标。默认情况下，每个模型是按顺序拟合的，但可以使用 parallel= 关键字并行训练模型。在单机上，parallel="processes" 是个不错的选择。对于希望在集群上分布工作的大型问题，在创建 Client 连接到集群后，请使用 parallel="dask"。

[6]:

import dask
from distributed import Client, performance_report
import prophet.diagnostics

client = Client(threads_per_worker=1)
client

[6]:

Client

Client-97840748-0de0-11ed-9f87-000d3a8f7959

连接方法： Cluster 对象	集群类型： distributed.LocalCluster
仪表板： http://127.0.0.1:8787/status

集群信息

LocalCluster

c05d4eee

仪表板： http://127.0.0.1:8787/status	工作器 2
总线程数 2	总内存： 6.78 GiB
状态：运行中	使用进程： True

调度器信息

调度器

Scheduler-e9054925-07ca-4262-a860-0ebc0723bd70

通信： tcp://127.0.0.1:44751	工作器 2
仪表板： http://127.0.0.1:8787/status	总线程数 2
启动时间：刚刚	总内存： 6.78 GiB

工作器

工作器：0

通信： tcp://127.0.0.1:34575	总线程数 1
仪表板： http://127.0.0.1:39267/status	内存： 3.39 GiB
Nanny： tcp://127.0.0.1:39955
本地目录： /home/runner/work/dask-examples/dask-examples/applications/dask-worker-space/worker-toj_a9te

工作器：1

通信： tcp://127.0.0.1:35829	总线程数 1
仪表板： http://127.0.0.1:34391/status	内存： 3.39 GiB
Nanny： tcp://127.0.0.1:43645
本地目录： /home/runner/work/dask-examples/dask-examples/applications/dask-worker-space/worker-1yf153x7

[7]:

%%time
df_cv = prophet.diagnostics.cross_validation(
    m, initial="730 days", period="180 days", horizon="365 days",
    parallel="dask"
)

INFO:prophet:Making 11 forecasts with cutoffs between 2010-02-15 00:00:00 and 2015-01-20 00:00:00
INFO:prophet:Applying in parallel with <Client: 'tcp://127.0.0.1:44751' processes=2 threads=2, memory=6.78 GiB>

CPU times: user 868 ms, sys: 122 ms, total: 990 ms
Wall time: 27.5 s

运行时务必查看 Dask 仪表板。模型在集群上并行拟合。开始时，将模型和数据移动到工作器会产生一些开销，但之后扩展效果看起来非常好。

使用 Numba 进行模板计算

2021 年 Dask 用户调查结果

Dask 示例文档