From Android 10, the Neural Networks API (NNAPI) provides functions to support caching of compilation artifacts, which reduces the time used for compilation when an app starts. Using this caching functionality, the driver doesn't need to manage or clean up the cached files. This is an optional feature that can be implemented with NN HAL 1.2. For more information about this function, see ANeuralNetworksCompilation_setCaching
.
The driver can also implement compilation caching independent of the NNAPI. This can be implemented whether the NNAPI NDK and HAL caching features are used or not. AOSP provides a low-level utility library (a caching engine). For more information, see Implementing a caching engine.
Workflow overview
This section describes general workflows with the compilation caching feature implemented.
Cache information provided and cache hit
- The app passes a caching directory and a checksum unique to the model.
- The NNAPI runtime looks for the cache files based on the checksum, the execution preference, and the partitioning outcome and finds the files.
- The NNAPI opens the cache files and passes the handles to the driver with
prepareModelFromCache
. - The driver prepares the model directly from the cache files and returns the prepared model.
Cache information provided and cache miss
- The app passes a checksum unique to the model and a caching directory.
- The NNAPI runtime looks for the caching files based on the checksum, the execution preference, and the partitioning outcome and doesn't find the cache files.
- The NNAPI creates empty cache files based on the checksum, the execution preference, and the partitioning, opens the cache files, and passes the handles and the model to the driver with
prepareModel_1_2
. - The driver compiles the model, writes caching information to the cache files, and returns the prepared model.
Cache information not provided
- The app invokes compilation without providing any caching information.
- The app passes nothing related to caching.
- The NNAPI runtime passes the model to the driver with
prepareModel_1_2
. - The driver compiles the model and returns the prepared model.
Cache information
The caching information that is provided to a driver consists of a token and cache file handles.
Token
The token is a caching token of length Constant::BYTE_SIZE_OF_CACHE_TOKEN
that identifies the prepared model。当使用 prepareModel_1_2
保存缓存文件以及使用 prepareModelFromCache
检索预编译模型时,会提供相同的令牌。驱动程序的客户端应选择冲突率低的令牌。驱动程序无法检测令牌冲突。冲突会导致执行失败,或者成功执行但产生不正确的输出值。
缓存文件句柄(两种类型的缓存文件)
两种类型的缓存文件是数据缓存和模型缓存。
- 数据缓存: 用于缓存常量数据,包括预处理和转换后的张量缓冲区。对数据缓存的修改不应导致比执行时生成错误输出值更糟糕的后果。
- 模型缓存: 用于缓存安全敏感数据,例如设备原生二进制格式的已编译可执行机器代码。对模型缓存的修改可能会影响驱动程序的执行行为,恶意客户端可能会利用这一点来执行超出授权范围的操作。因此,驱动程序必须在从缓存准备模型之前检查模型缓存是否已损坏。有关详细信息,请参阅 安全性。
驱动程序必须决定如何在两种类型的缓存文件之间分配缓存信息,并使用 getNumberOfCacheFilesNeeded
报告每种类型所需的缓存文件数量。
NNAPI 运行时始终以读写权限打开缓存文件句柄。
安全
在编译缓存中,模型缓存可能包含安全敏感数据,例如设备原生二进制格式的已编译可执行机器代码。如果未正确保护,对模型缓存的修改可能会影响驱动程序的执行行为。由于缓存内容存储在应用目录中,因此客户端可以修改缓存文件。有缺陷的客户端可能会意外损坏缓存,而恶意客户端可能会故意利用这一点在设备上执行未经验证的代码。根据设备的特性,这可能是一个安全问题。因此,驱动程序必须能够在从缓存准备模型之前检测到潜在的模型缓存损坏。
一种方法是让驱动程序维护从令牌到模型缓存的加密哈希值的映射。驱动程序可以在保存编译到缓存时存储令牌及其模型缓存的哈希值。当从缓存检索编译时,驱动程序会使用记录的令牌和哈希值对来检查模型缓存的新哈希值。此映射应在系统重启后保持持久性。驱动程序可以使用 Android 密钥库服务、framework/ml/nn/driver/cache
中的实用程序库或任何其他合适的机制来实现映射管理器。驱动程序更新后,应重新初始化此映射管理器,以防止从早期版本准备缓存文件。
为了防止 检查时间到使用时间 (TOCTOU) 攻击,驱动程序必须在保存到文件之前计算记录的哈希值,并在将文件内容复制到内部缓冲区之后计算新的哈希值。
此示例代码演示了如何实现此逻辑。
bool saveToCache(const sp<V1_2::IPreparedModel> preparedModel,
const hidl_vec<hidl_handle>& modelFds, const hidl_vec<hidl_handle>& dataFds,
const HidlToken& token) {
// Serialize the prepared model to internal buffers.
auto buffers = serialize(preparedModel);
// This implementation detail is important: the cache hash must be computed from internal
// buffers instead of cache files to prevent time-of-check to time-of-use (TOCTOU) attacks.
auto hash = computeHash(buffers);
// Store the {token, hash} pair to a mapping manager that is persistent across reboots.
CacheManager::get()->store(token, hash);
// Write the cache contents from internal buffers to cache files.
return writeToFds(buffers, modelFds, dataFds);
}
sp<V1_2::IPreparedModel> prepareFromCache(const hidl_vec<hidl_handle>& modelFds,
const hidl_vec<hidl_handle>& dataFds,
const HidlToken& token) {
// Copy the cache contents from cache files to internal buffers.
auto buffers = readFromFds(modelFds, dataFds);
// This implementation detail is important: the cache hash must be computed from internal
// buffers instead of cache files to prevent time-of-check to time-of-use (TOCTOU) attacks.
auto hash = computeHash(buffers);
// Validate the {token, hash} pair by a mapping manager that is persistent across reboots.
if (CacheManager::get()->validate(token, hash)) {
// Retrieve the prepared model from internal buffers.
return deserialize<V1_2::IPreparedModel>(buffers);
} else {
return nullptr;
}
}
高级用例
在某些高级用例中,驱动程序需要在编译调用后访问缓存内容(读取或写入)。示例用例包括:
- 即时编译: 编译延迟到首次执行时。
- 多阶段编译: 最初执行快速编译,稍后根据使用频率执行可选的优化编译。
要访问编译调用后的缓存内容(读取或写入),请确保驱动程序:
- 在调用
prepareModel_1_2
或prepareModelFromCache
期间复制文件句柄,并在稍后读取/更新缓存内容。 - 在普通编译调用之外实现文件锁定逻辑,以防止写入与读取或另一次写入同时发生。
实现缓存引擎
除了 NN HAL 1.2 编译缓存接口之外,您还可以在 frameworks/ml/nn/driver/cache
目录中找到缓存实用程序库。nnCache
子目录包含持久存储代码,供驱动程序在不使用 NNAPI 缓存功能的情况下实现编译缓存。这种形式的编译缓存可以使用任何版本的 NN HAL 来实现。如果驱动程序选择实现与 HAL 接口断开连接的缓存,则驱动程序负责在不再需要缓存工件时释放它们。