定制培训循环的深度学习功能加速

使用时dlfeval功能in a custom training loop, the software traces each inputdlarray.目的of the model loss function to determine the computation graph used for automatic differentiation. This tracing process can take some time and can spend time recomputing the same trace. By optimizing, caching, and reusing the traces, you can speed up gradient computation in deep learning functions. You can also optimize, cache, and reuse traces to accelerate other deep learning functions that do not require automatic differentiation, for example you can also accelerate model functions and functions used for prediction.

要加快对深度学习功能的调整，可以使用dlaccelerate.功能to create an加速功能目的that automatically optimizes, caches, and reuses the traces. You can use thedlaccelerate.功能to accelerate model functions and model loss functions directly.

这returned加速功能目的caches the traces of calls to the underlying function and reuses the cached result when the same input pattern reoccurs.

Try usingdlaccelerate.对于函数调用：

是漫长的
有dlarray.对象，结构dlarray.对象，或dlnetwork对象作为输入
do not have side effects like writing to files or displaying output

Invoke the accelerated function as you would invoke the underlying function. Note that the accelerated function is not a function handle.

笔记

使用时dlfeval功能, the software automatically accelerates the向前和预测函数dlnetworkinput. If you accelerate a deep learning function where the majority of the computation takes place in calls to the向前或者预测函数dlnetworkinput, then you might not see an improvement in training time.

Because of the nature of caching traces, not all functions support acceleration.

缓存过程可以缓存您可能期望更改或取决于外部因素的值。您必须在加速：

有inputs with random or frequently changing values
有outputs with frequently changing values
generate random numbers
采用如果陈述和尽管loops with conditions that depend on the values ofdlarray.对象
有inputs that are handles or that depend on handles
从外部源读取数据（例如，通过使用数据存储或一个minibatchqueue目的）

由于缓存过程需要额外的计算，因此在某些情况下，加速可能导致更长的运行代码。当软件花费时间创建不经常重复使用的新高速缓存时，会发生这种情况。例如，当您将多个不同序列长度的多批次传递给该函数时，软件触发每个唯一序列长度的新迹线。

Accelerated functions can do the following when calculating a new trace only.

modify the global state such as, the random number stream or global variables
使用文件输入或输出
使用图形或命令行显示显示数据

在并行使用加速功能时，例如使用时par循环，然后每个工作人员都维护自己的缓存。缓存不会传输到主机。

加速功能中使用的功能和自定义图层也必须支持加速。金宝app

您可以嵌套和递归调用加速功能。然而，具有单一加速功能通常更有效。

Accelerate Deep Learning Function Directly

在大多数情况下，您可以直接加速深度学习功能。例如，您可以通过替换对相应加速功能的调用来直接加速模型丢失功能：

Consider the following use of thedlfeval功能in a custom training loop.

[丢失，渐变，状态] = dlfeval（@ modelloss，参数，x，t，状态）

To accelerate the model loss function and evaluate the accelerated function, use thedlaccelerate.函数并评估返回加速功能目的：

accfun = dlaccelerate (@modelLoss);[丢失，渐变，状态] = DLFeval（AccFun，参数，X，T，状态）

因为缓存的迹线没有直接附加到加速功能对象，它们之间是共享的加速功能对象that use the same underlying function, you can create the加速功能无论是在自定义训练循环体中。

Accelerate Parts of Deep Learning Function

如果深度学习功能没有完全支持加速，例如，需要一个函数金宝app如果statement with a condition that depends on the value of adlarray.目的, then you can accelerate parts of a deep learning function by creating a separate function contains any supported function calls you want to accelerate.

为了example, consider the following code snippet that calls different functions depending on whether the sum of thedlarray.目的Xis negative or nonnegative.

如果总和（x，"all"）<0 y = negfun1（参数，x）;y = negfun2（参数，y）;别的Y = posFun1(parameters,X); Y = posFun2(parameters,Y);结尾

Because the如果声明取决于a的值dlarray.对象，包含此代码片段的函数不支持加速。金宝app但是，如果在身体内使用的代码块如果statement support acceleration, then you can accelerate these parts separately by creating a new function containing those blocks and accelerating the new functions instead.

例如，创建功能negFunAll和Posfunall.包含在身体中使用的代码块如果statement.

功能y = negfunall（参数，x）y = negfun1（参数，x）;y = negfun2（参数，y）;结尾功能Y = posFunAll(parameters,X) Y = posFun1(parameters,X); Y = posFun2(parameters,Y);结尾

这n, accelerate these functions and use them in the body of the如果statement instead.

accfunNeg = dlaccelerate(@negFunAll) accfunPos = dlaccelerate(@posFunAll)如果总和（x，"all"）<0 Y = ACCUUNEG（参数，x）;别的Y = accfunPos(parameters,X);结尾

Reusing Caches

Reusing a cached trace depends on the function inputs and outputs:

为了anydlarray.目的或者structure ofdlarray.对象输入，跟踪取决于大小，格式和底层数据类型dlarray.。That is, the accelerated function triggers a new trace fordlarray.CACHE中不包含大小，格式或底层数据类型的输入。任何dlarray.输入只有值与先前缓存的跟踪不同的输入不会触发新的跟踪。
为了anydlnetwork输入，跟踪取决于大小，格式和底层数据类型dlnetworkstate and learnable parameters. That is, the accelerated function triggers a new trace fordlnetwork具有学习参数或具有大小，格式和底层数据类型的输入，并且在缓存中没有包含的输入。任何dlnetworkinputs differing only by the value of the state and learnable parameters to a previously cached trace do not trigger a new trace.
为了other types of input, the trace depends on the values of the input. That is, the accelerated function triggers a new trace for other types of input with value not contained in the cache. Any other inputs that have the same value as a previously cached trace do not trigger a new trace.
跟踪的数量取决于函数输出s. That is, the accelerated function triggers a new trace for function calls with previously unseen numbers of output arguments. Any function calls with the same number of output arguments as a previously cached trace do not trigger a new trace.

When necessary, the software caches any new traces by evaluating the underlying function and caching the resulting trace in the加速功能目的。

Caution

一个加速功能对象不知道对底层函数的更新。如果修改与加速函数关联的函数，则使用该函数清除缓存clearCache对象函数或或使用命令清晰的功能。

存储和清除缓存

加速功能对象store the cache in a queue:

该软件将新的迹线添加到队列的背面。
When the cache is full, the software discards the cached item at the head of the queue.
重复使用缓存时，软件将缓存的项目移动到队列的后面。这有助于防止软件丢弃通常重复使用的缓存项目。

这加速功能对象do not directly hold the cache. This means that:

多加速功能对象that have the same underlying function share the same cache.
清除或覆盖包含一个变量加速功能对象不清除缓存。
覆盖包含一个的变量加速功能with another加速功能具有相同的底层功能并不清除缓存。

加速函数具有相同的底层函数共享相同的缓存。

To clear the cache of an accelerated function, use theclearCache目的功能。Alternatively, you can clear all functions in the current MATLAB^®使用命令会话清晰的功能或者清除所有。

笔记

清除加速功能变量不清除与输入功能关联的缓存。清除缓存的加速功能目的that no longer exists in the workspace, create a new加速功能对象到相同的功能，并使用clearCache功能on the new object. Alternatively, you can clear all functions in the current MATLAB session using the commands清晰的功能或者清除所有。

加速考虑因素

Because of the nature of caching traces, not all functions support acceleration.

缓存过程可以缓存您可能期望更改或取决于外部因素的值。您必须在加速：

有inputs with random or frequently changing values
有outputs with frequently changing values
generate random numbers
采用如果陈述和尽管loops with conditions that depend on the values ofdlarray.对象
有inputs that are handles or that depend on handles
从外部源读取数据（例如，通过使用数据存储或一个minibatchqueue目的）

Accelerated functions can do the following when calculating a new trace only.

modify the global state such as, the random number stream or global variables
使用文件输入或输出
使用图形或命令行显示显示数据

在并行使用加速功能时，例如使用时par循环，然后每个工作人员都维护自己的缓存。缓存不会传输到主机。

加速功能中使用的功能和自定义图层也必须支持加速。金宝app

Function Inputs with Random or Frequently Changing Values

当加速随机或频繁更改值的函数时，必须小心，例如将随机噪声作为输入将随机噪声的模型丢失函数递增，并将其添加到输入数据中。如果任何随机或频繁的输入到加速函数的输入不是dlarray.对象, then the function trigger a new trace for each previously unseen value.

您可以通过检查来检查这样的场景占用和命中率的财产加速功能目的。If the占用财产高，而且命中率很低，那么这可以表明加速功能对象创建许多新迹线，它不会重用。

为了dlarray.对象输入，值的更改不会触发新迹线。为防止频繁更改输入触发每个评估的新迹线，重新推荐代码，以便随机输入是dlarray.inputs.

为了example, consider the model loss function that accepts a random array of noise values:

功能(损失,gradients,state] = modelLoss(parameters,X,T,state,noise) X = X + noise; [Y,state] = model(parameters,X,state); loss = crossentropy(Y,T); gradients = dlgradient(loss,parameters);结尾

To accelerate this model loss function, convert the input噪音todlarray.before evaluating the accelerated function. Because theModelloss.功能也支持金宝appdlarray.输入噪声，您无需更改功能。

噪音= dlarray(noise,“SSCB”);accfun = dlaccelerate (@modelLoss);(损失,gradients,state] = dlfeval(accfun,parameters,X,T,state,noise);

或者，您可以加快不需要随机输入的模型丢失功能的部分。

职能with Random Number Generation

您必须在加速使用随机数生成的函数时要注意，例如生成随机噪声的函数添加到输入。当软件缓存一个函数的跟踪时，该函数生成无随机数的dlarray.对象，软件缓存跟踪中产生的随机样本。重用跟踪时，加速功能使用缓存的随机样本。加速功能不会生成新的随机值。

Random number generation using the“像”option of the兰特用A功能dlarray.对象支持加速度金宝app。要在加速功能中使用随机数生成，请确保该功能使用兰特功能with the“像”选项设置为追踪dlarray.目的(adlarray.取决于输入的对象dlarray.目的）。

为了example, consider the following model loss function.

[丢失，渐变，状态] = Modelloss（参数，x，t，状态）sz = size（x）;噪音= rand(sz); X = X + noise; [Y,state] = model(parameters,X,state); loss = crossentropy(Y,T); gradients = dlgradient(loss,parameters);结尾

确保兰特函数为每个评估生成一个新值，使用“像”追踪的选择dlarray.目的X。

[丢失，渐变，状态] = Modelloss（参数，x，t，状态）sz = size（x）;噪声=兰特（SZ，“像”，X）;x = x +噪声;[y，状态] =模型（参数，x）;损失=基于转角（Y，T）;梯度= Dlgradient（损失，参数）;结尾

或者，您可以加速不需要随机数生成的模型丢失功能的部分。

使用`如果`陈述和`尽管`Loops

You must take care when accelerating functions that use如果陈述和尽管循环。特别是，当您加速功能时，您可以获得意外结果如果statements or尽管loops that yield different code paths for function inputs of the same size and format.

加速功能如果声明或者尽管loop conditions that depend on the values of the function input or values from external sources (for example, results of random number generation) can lead to unexpected behavior. When the accelerated function caches a new trace, if the function contains an如果声明或者尽管循环，然后软件缓存由此产生的码路径的跟踪如果声明或者尽管loop condition for that particular trace. Because changes in the value of thedlarray.输入不会触发新的跟踪，在使用不同的值重用跟踪时，即使在值差异导致不同的代码路径时，该软件也使用相同的缓存跟踪（其包含相同的缓存代码路径）。

通常，加速包含的功能如果statements or尽管循环与不依赖于函数输入或外部因素的值的条件（例如，尽管遍历数组中的元素的循环不会导致意外行为。例如，因为一个大小的变化dlarray.输入触发了一个新的跟踪，当用相同大小的输入重用跟踪时，即使在有值存在差异时，该大小的输入的缓存代码路径保持一致。

To avoid unexpected behavior from caching code paths of如果statements, you can refactor your code so that it determines the correct result by combining the results of all branches and extracting the desired solution.

例如，考虑此代码。

如果tf Y = funcA(X);别的y = funcb（x）;结尾

为了支金宝app持加速，您可以用以下表格的代码替换它。

y = tf * funca（x）+〜tf * funcb（x）;

或者，为了避免不必要的乘法操作，您也可以使用此替换。

y =猫（3，Funca（x），funcb（x））;Y = Y（：，：，[tf〜tf]）;

请注意，这些技术可能导致更长的运行代码，因为它们需要执行两个分支中使用的代码如果statement.

To use如果陈述和尽管loops that depend ondlarray.对象值，加速身体如果声明或者尽管仅循环。

Function Inputs that Depend on Handles

您必须在加速依赖于句柄的对象作为输入的对象时要小心，例如aminibatchqueue目的that has a preprocessing function specified as a function handle. The加速功能当根据句柄评估使用输入的函数，对象抛出错误。

相反，您可以加速模型丢失功能的部分，这些丢失功能不需要依赖于句柄的输入。

调试

You must take care when debugging accelerated functions. Cached traces do not support break points. When using accelerated functions, the software reaches break points in the underlying function during the tracing process only.

要使用断点调试底层函数中的代码，请通过设置禁用加速度启用财产错误的。

To debug the cached traces, you can compare the outputs of the accelerated functions with the outputs of the underlying function, by setting the校验窗口财产“宽容”。

`dlode45`Does Not Support Acceleration When`GradientMode`是`“直接的”`

这dlaccelerate.功能does not support accelerating thedlode45函数时GradientMode选项是“直接的”。加速调用的代码dlode45功能, set theGradientModeoption to"adjoint"或者accelerate parts of your code that do not call thedlode45功能with theGradientMode选项设置为“直接的”。