- Nicholas Marion ([email protected])
- Andreas Krebbel ([email protected])
1.1.0
Deep Learning Library - the deep learning library support (zDNN) is the SW enablement technology provided by IBM to meet the following requirements:
- Specialized-function-assist instructions are intended to provide performance improvements for specific operations used in software libraries, utilities, and operating system (OS) services. The facilities and instructions described as specialized-function-assist instructions may be replaced or removed in the future. As such, the IBM recommendation for these instructions is that a software library or operating system function be used instead of directly accessing the instructions. This is the function provided by zDNN.
- zAIU has very complex data layout requirements; these requirements arrange the tensor to enhance the performance characteristics of the operations. zDNN will format the tensor appropriately on behalf of the caller, and it will do so using an optimized approach.
- For deep learning operations, zAIU requires the use of an internal data type (DLFLOAT16). This is a 2-byte data type, similar in concept to Brain float (BFLOAT); that is, it is an AI optimized format that is used to speed up training and inference (from 4-byte formats) while minimizing the loss of accuracy at inference time.
The zDNN library will provide a set of APIs that an exploiter will utilize to drive the desired request. zDNN will be available on both z/OS and Linux on Z; the inclusion of Linux on Z provides particular benefit, as it will allow us to enable acceleration in frameworks for z/OS via z/OS Container Extensions (zCX).
z/OS:
- Problem state
- AMODE64
- XPLINK
This implies a zDNN limitation as well at this point.
-
For all ops:
- Number of elements in any dimension must not exceed the value returned by
zdnn_get_nnpa_max_dim_idx_size() - Total number of bytes required for storing a transformed tensor must not
exceed the value returned by
zdnn_get_nnpa_max_tensor_size()
- Number of elements in any dimension must not exceed the value returned by
The zDNN deep learning library provides the standard IBM Z software interface to the zAIU. This IBM-provided C library provides a set of functions that handle the data transformation requirements of the AIU and provide wrapper functions for the NNPA instruction primitives.
The zDNN functions use the following criteria to determine if zAIU can be used to accelerate a deep learning primitive:
- Neural Network Processing Assist (NNPA) facility indicator in the system STFLE output.
- Output of the NNPA-QAF (Query Available Functions) request.
To use the IBM-provided zDNN C library for the NNPA instruction, follow these steps:
- Link or re-link applications to use the IBM-provided zDNN. The IBM-provided zDNN is a library file in the z/OS UNIX System Services file system and can be statically or dynamically linked into your applications. The paths for the zDNN archive file and the zDNN header files are:
z/OS (LE required): Path for 64-bit dynamic library files:
/lib/libzdnn.so/lib/libzdnn.x
Path for the zDNN header files:
/usr/include/
The XL C/C++ compiler and the z/OS Language Environment provide various environment variables to control processing, in addition to the variables provided by the zDNN library itself.
-
Use the environment variable
_CEE_RUNOPTSto specify invocation Language Environment runtime options. For more information about using the environment variable_CEE_RUNOPTSand other C and LE variables, see z/OS XL C/C++ Programming Guide. -
For environment variables accepted by the zDNN library, see Runtime Environment Variables.
Linux on Z:
On Linux on Z we expect to ship source as well a package-installable library and header. The library installation will conform to the standards of the packaging method chosen.
Include Files: zdnn.h
#define ZDNN_VERSION "1.1.0"
#define ZDNN_VERNUM 0x010100 // 0x[major][minor][patch]
#define ZDNN_VER_MAJOR 1
#define ZDNN_VER_MINOR 1
#define ZDNN_VER_PATCH 0
- zDNN major version (ZDNN_VER_MAJOR) will be incremented if any backwards incompatible changes are introduced to the API. It may also include minor and patch level changes. Patch and minor version will be reset to 0 when major version is incremented.
- zDNN minor version (ZDNN_VER_MINOR) will be incremented if new, backwards compatible functionalities are introduced to the API or if any API functionalities are marked as deprecated. It may also include patch level changes. Patch version will be reset to 0 when minor version is incremented.
- zDNN patch version (ZDNN_VER_PATCH) will be incremented if only backwards compatible bug fixes are introduced. A bug fix being defined as an internal change that fixes incorrect behavior.
Functions for checking version incompatibility with the zDNN load library are provided and described in the Support Functions section.
typedef struct zdnn_ztensor {
zdnn_tensor_desc
*pre_transformed_desc; // tensor's shape information before transformation
zdnn_tensor_desc *transformed_desc; // transformed tensor's shape information
uint64_t buffer_size; // tensor size in bytes
void *buffer; // pointer to the tensor in memory
bool is_transformed; // indicator if data in buffer has been transformed
char reserved[31]; // not currently used, should contain zeros.
} zdnn_ztensor;
bufferrequirements:- Calling zdnn_init_ztensor_with_malloc
automatically allocates and sets a valid
bufferfor a tensor. bufferfield must point to storage allocated of sufficient size to contain the transformed tensor data described by the itstransformed_descfield.- Calling zdnn_getsize_ztensor with the tensor's
transformed_descreturns the required size.
- Calling zdnn_getsize_ztensor with the tensor's
- Start of
bufferfield must be 4k aligned.
- Calling zdnn_init_ztensor_with_malloc
automatically allocates and sets a valid
reservedshould contain zeros, otherwise the program may not operate compatibly in the future.- Calling zdnn_init_ztensor or
zdnn_init_ztensor_with_malloc will set
reservedto zeros.
- Calling zdnn_init_ztensor or
zdnn_init_ztensor_with_malloc will set
- For use with weights/biases/hidden-weights/hidden-biases RNN-gates tensors.
- You must use
zdnn_generate_transformed_desc_concatenated
with the appropriate concatenation info
- Do not use
zdnn_generate_transformed_descwith concatenated tensors
- Do not use
- The pre-transformed shape dimensions should not include the concatenation.
- Thus, the pre-transformed shape should be that of a single gate, not the shape of the combined gates
- Afterward transform with zdnn_transform_ztensor as normal
- Must follow general tensor requirements
typedef struct zdnn_tensor_desc {
zdnn_data_layouts layout; // data layout
zdnn_data_formats format; // internal use only
zdnn_data_types type; // data type
uint32_t dim4; // number of elements in outermost dimension
uint32_t dim3; // ... outer dimension
uint32_t dim2; // ... inner dimension
uint32_t dim1; // number of elements in innermost dimension
} zdnn_tensor_desc;
- Helper methods zdnn_init_pre_transformed_desc and zdnn_generate_transformed_desc or zdnn_generate_transformed_desc_concatenated will set the correct dims based on the layout and format.
- The layout of the tensor descriptor affects the expected
order of the dims. For example:
- For tensors with less than 4 dimensions, unspecified dims:
- In the pre_transformed_desc are ignored. For example a ZDNN_3D expects values in dim4, dim3, and dim2.
- In the transformed_desc "unused" dims must be 1.
- A ZDNN_NCHW expects dims such that dim4 = N, dim3 = H, dim2 = W, dim1 = C
- A ZDNN_HWCK expects dims such that dim4 = W, dim3 = W, dim2 = C, dim1 = K
- For tensors with less than 4 dimensions, unspecified dims:
- The format changes the expected dims order for
ZDNN_4D tensors layouts
- ZDNN_FORMAT_4DFEATURE expects dims such that dim4 = N, dim3 = H, dim2 = W, dim1 = C
- ZDNN_FORMAT_4DKERNEL expects dims such that dim4 = H, dim3 = W, dim2 = C, dim1 = K
The following are layouts for zDNN ztensor descriptors. These indicate the number and order of dimensions to expect for the ztensor data.
typedef enum zdnn_data_layouts {
ZDNN_1D, // 1d tensor
ZDNN_2D, // 2d tensor
ZDNN_2DS, // represents special 2D tensors required by LSTM/GRU
ZDNN_3D, // 3d tensor
ZDNN_3DS, // represents special 3D tensors required by
// LSTM/GRU/Softmax/Matmul
ZDNN_ZRH, // represents (update, reset, hidden) used by GRU
ZDNN_4D, // 4d tensor
ZDNN_4DS, // represents special 4D tensors required by LSTM/GRU output
ZDNN_NHWC, // 4d feature tensor in NHWC
ZDNN_NCHW, // 4d feature tensor in NCHW
ZDNN_FICO, // represents (forget, input, cell, output) used by LSTM
ZDNN_HWCK, // 4d kernel CNN tensor
ZDNN_BIDIR_ZRH, // ZRH variant to work with bidirectional LSTM/GRU output
ZDNN_BIDIR_FICO // FICO variant to work with bidirectional LSTM/GRU output
} zdnn_data_layouts;
Some layouts also indicate special re-arrangement of the data during ztensor transformation.
ZDNN_2DS- The outermost dimension of the original shape is promoted to dim4 during transformation. For example, a shape of (a, b) becomes [a, 1, 1, b] (dim4, dim3, dim2, dim1) in thetransformed_descZDNN_3DS- The outermost dimension of the original shape is promoted to dim4 during transformation. For example, a shape of (a, b, c) becomes [a, 1, b, c] (dim4, dim3, dim2, dim1) in thetransformed_descZDNN_4DS- Arrangement for RNN output tensor
The followings are set automatically in transformed_desc based on info when
calling zdnn_generate_transformed_desc_concatenated():
ZDNN_ZRH/FICO- During transformation, the RNN input gates data are concatenated on the innermost dimension. Supported withpre_transformed_layoutofZDNN_2DSorZDNN_3DS.ZDNN_BIDIR_ZRH/FICO- Similar toZDNN_ZRH/FICO, used when:- transforming RNN input weight gate data, and
- the input tensor for the current RNN layer is a bidirectional RNN output from a previous RNN layer
typedef enum zdnn_data_formats {
ZDNN_FORMAT_4DFEATURE, // tensor in AIU data layout format 0
ZDNN_FORMAT_4DKERNEL, // tensor in AIU data layout format 1
} zdnn_data_formats;
typedef enum zdnn_data_types {
ZDNN_DLFLOAT16, // 16-bit deep learning format
BFLOAT, // Brain floating point format
FP16, // 16-bit IEEE-754 floating point format
FP32, // 32-bit IEEE-754 floating point format
} zdnn_data_types;
| Mnemonic Constant | Value | Meaning |
|---|---|---|
| ZDNN_OK | 0x00000000 | Success. |
| Mnemonic Constant | Value | Meaning |
|---|---|---|
| ZDNN_ELEMENT_RANGE_VIOLATION | 0x00020001 | AIU operation resulted in data that was out of the normal range. |
Note: ZDNN_ELEMENT_RANGE_VIOLATION indicates a range violation occurred for the AIU operation based on the data in the tensors. This usually indicates an overflow of the NNPA internal data type, but can also be associated with operation specific errors, such as "divide by zero". See the "z/Architecture Principles of Operation" for information about range violation on the operation that encountered the violation.
| Mnemonic Constant | Value | Meaning |
|---|---|---|
| ZDNN_INVALID_SHAPE* | 0x00040001 | Invalid shape information in one (or more) of the input/output tensor(s). |
| ZDNN_INVALID_LAYOUT | 0x00040002 | Invalid layout information in one (or more) of the input/output tensor(s). |
| ZDNN_INVALID_TYPE* | 0x00040003 | Invalid type information in one (or more) of the input/output tensor(s). |
| ZDNN_INVALID_FORMAT* | 0x00040004 | Invalid format information in one (or more) of the input/output tensor(s). |
| ZDNN_INVALID_DIRECTION | 0x00040005 | Invalid RNN direction. |
| ZDNN_INVALID_CONCAT_INFO | 0x00040006 | Invalid concatenation info. |
| ZDNN_INVALID_STRIDE_PADDING* | 0x00040007 | Invalid padding type parameter for current strides. |
| ZDNN_INVALID_STRIDES* | 0x00040008 | Invalid stride height or width parameter. |
| ZDNN_MISALIGNED_PARMBLOCK* | 0x00040009 | NNPA parameter block is not on double word boundary. |
| ZDNN_INVALID_CLIPPING_VALUE | 0x0004000A | Invalid clipping for the specified operation. |
| ZDNN_ALLOCATION_FAILURE | 0x00100001 | Can not allocate storage. |
| ZDNN_INVALID_BUFFER | 0x00100002 | Buffer address is NULL or not on 4K-byte boundary or insufficient buffer size. |
| ZDNN_CONVERT_FAILURE | 0x00100003 | Floating point data conversion failure. |
| ZDNN_INVALID_STATE | 0x00100004 | Invalid zTensor state. |
| ZDNN_UNSUPPORTED_AIU_EXCEPTION | 0x00100005 | AIU operation returned an unexpected exception. |
Note: *In certain scenarios, these statuses are returned only if ZDNN_ENABLE_PRECHECK is enabled. When not enabled, these scenarios will lead to abnormal program termination.
The following statuses indicate issues returned from the hardware.
| Mnemonic Constant | Value | Meaning |
|---|---|---|
| ZDNN_UNSUPPORTED_PARMBLOCK | 0x000C0001 | NNPA parameter block format is not supported by the model. |
| ZDNN_UNAVAILABLE_FUNCTION | 0x000C0002 | Specified NNPA function is not defined or installed on the machine. |
| ZDNN_UNSUPPORTED_FORMAT | 0x000C0010 | Specified tensor data layout format is not supported. |
| ZDNN_UNSUPPORTED_TYPE | 0x000C0011 | Specified tensor data type is not supported. |
| ZDNN_EXCEEDS_MDIS | 0x000C0012 | Tensor dimension exceeds maximum dimension index size (MDIS). |
| ZDNN_EXCEEDS_MTS | 0x000C0013 | Total number of bytes in tensor exceeds maximum tensor size. (MTS). |
| ZDNN_MISALIGNED_TENSOR | 0x000C0014 | Tensor address is not on 4K-byte boundary. |
| ZDNN_MISALIGNED_SAVEAREA | 0x000C0015 | Function specific save area address is not on 4K-byte boundary. |
The meaning of the following hardware statuses vary based on operation. See the operation that returned the status for the specific meaning.
| Mnemonic Constant | Value | Meaning |
|---|---|---|
| ZDNN_FUNC_RC_F000 | 0x000CF000 | Function specific response code (F000). |
| ZDNN_FUNC_RC_F001 | 0x000CF001 | Function specific response code (F001). |
| ZDNN_FUNC_RC_F002 | 0x000CF002 | Function specific response code (F002). |
| ZDNN_FUNC_RC_F003 | 0x000CF003 | Function specific response code (F003). |
| ZDNN_FUNC_RC_F004 | 0x000CF004 | Function specific response code (F004). |
| ZDNN_FUNC_RC_F005 | 0x000CF005 | Function specific response code (F005). |
| ZDNN_FUNC_RC_F006 | 0x000CF006 | Function specific response code (F006). |
| ZDNN_FUNC_RC_F007 | 0x000CF007 | Function specific response code (F007). |
| ZDNN_FUNC_RC_F008 | 0x000CF008 | Function specific response code (F008). |
| ZDNN_FUNC_RC_F009 | 0x000CF009 | Function specific response code (F009). |
ZDNN_ENABLE_PRECHECK: true/false- If set to
true, tensor integrity prechecks are run before issuing NNPA operations. - Enabling precheck may impact performance.
- Enable to debug issues which cause hardware exceptions that otherwise would result in abnormal program termination.
- If set to
ZDNN_STATUS_DIAG: nnnnnnnn (decimal) or 0xnnnnnnnn (hexadecimal)- Prints or produces diagnostic information whenever zDNN status code is equal to the specified value. Only one status value can be specified.
The following are only available when the zDNN library was built with
ZDNN_CONFIG_DEBUG enabled.
ZDNN_LOGLEVEL: off/fatal/error/warn/info/debug/trace- Sets logging facility's output level
ZDNN_LOGMODULE: module name(s)- Produces log output only when the issuer's module name is in the list. You may specify multiple module names by separating them with either commas or spaces.
- Environment variables settings are checked during initial library load by zdnn_init.
- To change environment variable settings afterward, zdnn_init must be called again manually.
- Initialization
- Query
- Get Size
- Initialize pre-transformed tensor descriptor
- Generate transformed tensor descriptor
- Generate concatenated transformed tensor descriptor
- Initialize zTensor
- Initialize zTensor with memory allocate
- Reset zTensor
- Allocate memory for zTensor
- De-allocate memory for zTensor
- Retrieve status message of the status code
- Reshape zTensor
- Check if version is runnable
- Get maximum runnable version
Initialize the zDNN library. This sends an NNPA_QAF to query the NNPA and loads the current environment variable settings.
This needs to be invoked at least once if zDNN library is statically-linked. It is automatically invoked if zDNN library is dynamically loaded.
void zdnn_init();
None
None
Retrieve the maximum dimension index size value currently supported by the AIU from zDNN's internal memory.
uint32_t zdnn_get_nnpa_max_dim_idx_size();
None
Maximum dimension index size supported by the AIU
Retrieve the maximum tensor size value (number of bytes required for storing a transformed tensor) currently supported by the AIU from zDNN's internal memory.
uint64_t zdnn_get_nnpa_max_tensor_size();
None
Maximum tensor size supported by the AIU
Interrogates the hardware to determine if the NNPA and NNP-internal data type (DLFLOAT16) conversion instructions are installed.
Use this function during application initialization to determine whether the AIU hardware is available.
bool zdnn_is_nnpa_installed();
- None.
true if NNPA and zdnn conversion instructions are installed, false
otherwise.
Query, from zDNN internal memory, if requested NNPA functions are available.
bool zdnn_is_nnpa_function_installed(int count, ...);
-
int count- number of NNPA functions to check
-
... (additional arguments)- Function names separated by commas, e.g., NNPA_MUL, NNPA_MIN
NNPA_QAF
NNPA_ADD
NNPA_SUB
NNPA_MUL
NNPA_DIV
NNPA_MIN
NNPA_MAX
NNPA_LOG
NNPA_EXP
NNPA_RELU
NNPA_TANH
NNPA_SIGMOID
NNPA_SOFTMAX
NNPA_BATCHNORMALIZATION
NNPA_MAXPOOL2D
NNPA_AVGPOOL2D
NNPA_LSTMACT
NNPA_GRUACT
NNPA_CONVOLUTION
NNPA_MATMUL_OP
NNPA_MATMUL_OP_BCAST23
true if all queried formats are installed or if count is zero, false
otherwise.
Query, from zDNN internal memory, if requested parameter block formats are installed.
bool zdnn_is_nnpa_parmblk_fmt_installed(int count, ...);
-
int count- number of NNPA parameter block formats to check
-
... (additional arguments)- NNPA parameter block formats separated by commas
NNPA_PARMBLKFORMAT_0
true if all queried formats are installed or if count is zero, false
otherwise.
Query, from zDNN internal memory, if requested NNPA data type are installed.
bool zdnn_is_nnpa_datatype_installed(uint16_t types_bitmask);
-
uint16_t types_bitmask- OR'd type bitmasks as defined in zdnn_query_datatypes enum
QUERY_DATATYPE_INTERNAL1
true if all queried data types are installed, false otherwise.
Query, from zDNN internal memory, if requested NNPA data layout format are installed.
bool zdnn_is_nnpa_layout_fmt_installed(uint32_t layout_bitmask);
-
uint32_t layout_bitmask- OR'd layout bitmasks as defined in zdnn_query_layoutfmts enum
QUERY_LAYOUTFMT_4DFEATURE
QUERY_LAYOUTFMT_4DKERNEL
true if all queried data layouts are installed, false otherwise.
Query, from zDNN internal memory, if requested NNPA data-type to/from BFP format conversions are installed.
bool zdnn_is_nnpa_conversion_installed(nnpa_data_type type,
uint16_t format_bitmask);
-
nnpa_data_type type- NNPA data-type number as defined in nnpa_data_type enum
NNPA_DATATYPE_1
-
uint16_t format_bitmask- OR'd BFP format bitmasks as defined in zdnn_query_bfpfmts enum
QUERY_BFPFMT_TINY (FP16)
QUERY_BFPFMT_SHORT (FP32/BFLOAT)
true if all queried conversions are installed, false otherwise.
Retrieve library version number as a 32-bit hex value
(0x00[major][minor][patch]).
uint32_t zdnn_get_library_version();
Library version number in 0x00[major][minor][patch] format.
Retrieve the library version number and build information as a string.
char *zdnn_get_library_version_str();
Library version number and build information as a string.
Refresh zDNN in-memory query result from zAIU.
zdnn_status zdnn_refresh_nnpa_query_result();
None
This is called automatically as a part of zdnn_init and should not need to be
called directly. Manually refreshing query results before making other
zdnn_query_* calls may noticeably impact performance.
ZDNN_OKZDNN_UNAVAILABLE_FUNCTION
Used to determine the buffer size required for the transformed tensor (including
concatenated) in zDNN transformed format. Requires tensor descriptor
(zdnn_tensor_desc) with transformed shape information.
uint64_t zdnn_getsize_ztensor(const zdnn_tensor_desc *tfrmd_desc);
-
zdnn_tensor_desc *tfrmd_desc- Contains transformed information about the shape, layout and data type.
- required buffer size in bytes
Initialize tensor descriptor (zdnn_tensor_desc) struct with pre-transformed
(original) shape information.
void zdnn_init_pre_transformed_desc(zdnn_data_layouts layout,
zdnn_data_types type,
zdnn_tensor_desc *pre_tfrmd_desc, ...);
-
zdnn_data_layouts layout- data layout
-
zdnn_data_types type- data type
-
zdnn_tensor_desc *pre_tfrmd_desc- output zdnn_tensor_desc struct
-
... (additional arguments)- Variadic: number of elements in each dimension in accordance to the layout, in outermost to innermost order
- None
Generate transformed tensor descriptor information based on supplied pre-transformed tensor descriptor.
zdnn_status zdnn_generate_transformed_desc(
const zdnn_tensor_desc *pre_tfrmd_desc, zdnn_tensor_desc *tfrmd_desc);
-
zdnn_tensor_desc *pre_tfrmd_desc- input tensor descriptor with pre-transformed shape information
-
zdnn_tensor_desc *tfrmd_desc- output
zdnn_tensor_descstruct
- output
ZDNN_OKZDNN_INVALID_LAYOUT- pre-transformedlayoutis not recognized or is a layout only used for concatenated tensors.
Generate concatenated transformed tensor descriptor information for RNN input-gates tensors based on a supplied pre-transformed tensor descriptor.
zdnn_status zdnn_generate_transformed_desc_concatenated(
const zdnn_tensor_desc *pre_tfrmd_desc,
zdnn_concat_info info, zdnn_tensor_desc *tfrmd_desc);
-
zdnn_tensor_desc *pre_tfrmd_desc- input tensor descriptor with pre-transformed shape information
-
zdnn_concat_info info-
Information about how the tensors will be concatenated, consists of the RNN_TYPE, PREV_LAYER and USAGE flags OR'd together:
RNN_TYPE flags:
- RNN_TYPE_LSTM - For LSTM
- RNN_TYPE_GRU - For GRU
PREV_LAYER flags:
- PREV_LAYER_UNI - Previous RNN layer is uni-directional
- PREV_LAYER_NONE - Previous layer is not a RNN layer
- PREV_LAYER_BIDIR - Previous RNN layer is bi-directional
USAGE flags:
- USAGE_WEIGHTS - Concatenate as input weights
- USAGE_HIDDEN_WEIGHTS - Concatenate as input hidden-weights
- USAGE_BIASES - Concatenate as input biases
- USAGE_HIDDEN_BIASES - Concatenate as input hidden-biases
-
-
zdnn_tensor_desc *tfrmd_desc- output
zdnn_tensor_descstruct
- output
ZDNN_OKZDNN_INVALID_LAYOUT- pre-transformedlayoutis not recognized or is not supported for concatenated tensors.ZDNN_INVALID_CONCAT_INFO- invalid concatenation information.
Initialize a zdnn_ztensor struct using the pre-transformed and transformed
tensor shape information
void zdnn_init_ztensor(zdnn_tensor_desc *pre_tfrmd_desc,
zdnn_tensor_desc *tfrmd_desc, zdnn_ztensor *output);
-
zdnn_tensor_desc *pre_tfrmd_desc- input tensor descriptor with pre-transformed shape information
-
zdnn_tensor_desc *tfrmd_desc- input tensor descriptor with transformed shape information
-
zdnn_ztensor *output- The
zdnn_ztensorstruct being initialized.
- The
- None
Same functionality as zdnn_init_ztensor, and computes the size required for
the tensor in the zDNN transformed format and allocates the storage for it. Sets
buffer and buffer_size fields within output.
zdnn_status zdnn_init_ztensor_with_malloc(zdnn_tensor_desc *pre_tfrmd_desc,
zdnn_tensor_desc *tfrmd_desc,
zdnn_ztensor *output);
-
zdnn_tensor_desc *pre_tfrmd_desc- input tensor descriptor with pre-transformed shape information
-
zdnn_tensor_desc *tfrmd_desc- input tensor descriptor with transformed shape information
-
zdnn_ztensor *output- The
zdnn_ztensorstruct being initialized.
- The
ZDNN_OKZDNN_INVALID_FORMAT-tfrmd_desc->formatis not recognized.ZDNN_INVALID_TYPE-tfrmd_desc->typeis not recognized or is a pre_tfrmd_desc type.ZDNN_INVALID_SHAPE- (if any of the following are true)ZDNN_ALLOCATION_FAILURE- Unable to allocate required memory on a 4K boundary.
Reset a zdnn_ztensor struct for reuse.
Note this operation does not set or reset the buffer and buffer_size fields
nor free the transformed area storage.
void zdnn_reset_ztensor(zdnn_ztensor *ztensor);
-
zdnn_ztensor *output- The
zdnn_ztensorstruct being reset.
- The
- None
Calculate the size required for the tensor in the zDNN transformed format and
allocate the needed storage, satisfying alignment requirements. Sets buffer
and buffer_size fields within ztensor.
Note that the calling application assumes ownership of this storage and is responsible for freeing it.
zdnn_status zdnn_allochelper_ztensor(zdnn_ztensor *ztensor);
-
zdnn_ztensor *ztensor- A
zdnn_ztensorstruct that contains the transformed shape information in thetransformed_descfield.
- A
ZDNN_OKZDNN_INVALID_FORMAT-ztensor->transformed_desc->formatis not recognized.ZDNN_INVALID_TYPE-ztensor->transformed_desc->typeis not recognized or is a pre_transformed_desc type.ZDNN_INVALID_SHAPE- (if any of the following are true)- One of
ztensor->transformed_desc->dim*dimensions is 0. - One of
ztensor->transformed_desc->dim*dimensions is greater thanzdnn_get_nnpa_max_dim_idx_size. - The total number of transformed_desc elements is larger than
zdnn_get_nnpa_max_tensor_size.
- One of
ZDNN_ALLOCATION_FAILURE- Unable to allocate required memory on a 4K boundary.
Given an input zdnn_ztensor, zdnn_free_ztensor_buffer will free the transformed area storage associated with it.
Note that the routine does not free the storage allocated for the zdnn_ztensor struct itself.
zdnn_status zdnn_free_ztensor_buffer(const zdnn_ztensor *ztensor);
-
zdnn_ztensor *tensor- A
zdnn_ztensorstruct with field buffer pointing to storage allocated.
- A
ZDNN_OKZDNN_INVALID_BUFFER-tensor->bufferisNULL
Retrieve status message of the status code
const char *zdnn_get_status_message(zdnn_status status);
-
zdnn_status status- Status code
Pointer to the description string or "(Status string is not defined.)" if
status is not defined.
Reshape and copy buffer content from source zTensor's buffer to destination zTensor's in accordance to destination zTensor's shape.
The following conditions must be satisfied:
- Both tensor's transformed_desc must be fully initialized
dest->buffermust be pre-allocatedsrcmust be transformeddestmust be not already transformed- Both
transformed_desc->layoutmust be the same and either NHWC or HWCK - Both zTensors must contain equal number of elements
zdnn_status zdnn_reshape_ztensor(const zdnn_ztensor *src, zdnn_ztensor *dest);
-
src- Source zTensor to copy from
-
dest- Destination zTensor to copy to
-
If
srcanddesthave the sametransformed_desc->dim1dimension size, the transformed data is directly copied to the destination without untransformation. -
If
srcanddesthave differenttransformed_desc->dim1dimension sizes, reshaping will internally un-transform the source and then re-transform the values into the destination.
ZDNN_OKZDNN_INVALID_SHAPE- (if any of the following are true)src's anddest'stransformed_desc->dim*total to different numbers of elements.- One of
dest->transformed_desc->dim*dimensions is 0. - One of
dest->transformed_desc->dim*dimensions is greater thanzdnn_get_nnpa_max_dim_idx_size. - The total number of
dest->transformed_desc-dim*elements is larger thanzdnn_get_nnpa_max_tensor_size.
ZDNN_INVALID_LAYOUT- (if any of the following are true)src's anddest'stransformed_desc->layoutare not the same.transformed_desc->layoutis notZDNN_NHWCnorZDNN_HWCK.src->pre_transformed_desc->layoutis not recognized or is not a valid pre_transformed_desc layout.dest->pre_transformed_desc->layoutis not recognized or is not a valid pre_transformed_desc layout.
ZDNN_INVALID_STATE- (if any of the following are true)srcis not already transformed.destis already transformed.
ZDNN_INVALID_FORMAT-src->transformed_desc->formatis notZDNN_FORMAT_4DFEATURE.ZDNN_INVALID_TYPE(if any of the following are true)src->pre_transformed_desc->typeis not recognized or is a transformed_desc type.dest->pre_transformed_desc->typeis not recognized or is a transformed_desc type.dest->transformed_desc->typeis not recognized or is a pre_transformed_desc type.
ZDNN_INVALID_BUFFER(if any of the following are true)src->bufferisNULL.src->bufferis not on a 4K boundary.dest->bufferisNULL.dest->bufferis not on a 4K boundary.dest->buffer_sizeis too small to hold transformed values.
ZDNN_CONVERT_FAILURE- Values failed to un-transform or transform.
Check if application built for zDNN version ver_num can be run on the current
AIU hardware with the installed zDNN library
bool zdnn_is_version_runnable(uint32_t ver_num);
-
ver_num- zDNN version number from the application in 0x00[major][minor][patch] form. Typically this is ZDNN_VERNUM used to compile the application
- true/false
Returns the maximum zDNN version number that the current hardware and installed
zDNN library can run together. The returned value means the current runtime
environment fully supports zDNN APIs set of that major.minor version and
below.
uint32_t zdnn_get_max_runnable_version();
- None
- A 32-bit zDNN version number in 0x00[major][minor]FF form.
zAIU requires the tensor data to be arranged in a format that enhances the performance characteristics of the operations. In this documentation, it is referred to as "transformed format". In addition, data conversions are necessary from the common formats (FP32, FP16, BFLOAT) to the internal format (DLFLOAT16) supported by the AIU. Two functions are provided:
-
'
zdnn_transform_ztensor-
zdnn_transform_ztensor will transform the input tensor and convert the input data to the format required by the AIU. The resulting transformed ztensor can be reused as many times as necessary.
-
See zdnn_transform_ztensor for details on transforming an input tensor to the internal format.
-
-
zdnn_transform_origtensor-
zdnn_transform_origtensor transforms a ztensor (usually output from an operation or network) to the format and data types that are usable by the application.
-
See zdnn_transform_origtensor for details on transforming an input tensor to the internal format.
-
Converts the input tensor to the supported transformed format for execution by
zdnn operations. If transformation is successful the is_transformed field
within ztensor will be set to true otherwise it is set to false.
Transformation will fail if is_transformed was already true.
Note that the tensor layout in memory, once in transformed format, is dependent
on the content of the input tensor's descriptors (zdnn_tensor_desc fields).
Once converted, a zdnn_ztensor should only be manipulated by zDNN API
functions.
zdnn_status zdnn_transform_ztensor(zdnn_ztensor *ztensor, ...);
-
zdnn_ztensor *tensor- The input
zdnn_ztensorstruct.pre_transformed_descandtransformed_descmust be set,is_transformedmust befalse. A 4k-aligned tensor storage must be pre-allocated by the caller (directly or by calling the zDNN allocation helper function) and fieldbuffermust point to the storage.
- The input
-
... (additional arguments)- Variadic: list of pointers for input data to be transformed:
- Non-concatenated: 1 data pointer
- LSTM concatenated: 4 data pointers, one for each input gate in Forget, Input, Cell, Output (FICO) order
- GRU concatenated: 3 data pointers, one for each input gate in (Z)update, Reset, Hidden, (ZRH) gate order
- Variadic: list of pointers for input data to be transformed:
- This function clears the pre-thread floating-point exception flags at entry,
and may set
FE_UNDERFLOW/FE_INVALID/FE_INEXACT/FE_OVERFLOWwhen it encounters errors during data conversion.
ZDNN_OKZDNN_INVALID_FORMAT-zdnn_ztensor->transformed_desc->formatis not recognized.ZDNN_INVALID_LAYOUT- (if any of the following are true)zdnn_ztensor->pre_transformed_desc->layoutis not recognized or is not a valid pre_transformed_desc layout.zdnn_ztensor->transformed_desc->layoutis not recognized or is not a valid transformed_desc layout.
ZDNN_INVALID_TYPE- (if any of the following are true)zdnn_ztensor->pre_transformed_desc->typeis not recognized or is a transformed_desc type.zdnn_ztensor->transformed_desc->typeis not recognized or is a pre_transformed_desc type.
ZDNN_INVALID_BUFFER(if any of the following are true)bufferisNULL.bufferis not on a 4K boundary.buffer_sizeis too small to hold transformed values.
ZDNN_INVALID_SHAPE- (if any of the following are true)- One of
zdnn_ztensor->transformed_desc->dim*dimensions is 0. - One of
zdnn_ztensor->transformed_desc->dim*dimensions is greater thanzdnn_get_nnpa_max_dim_idx_size. - The total number of transformed_desc elements is larger than
zdnn_get_nnpa_max_tensor_size.
- One of
ZDNN_INVALID_STATE- Tensor is already transformed.ZDNN_CONVERT_FAILURE- Values failed to transform.
Converts the input tensor from the zDNN transformed format back to a standard
non-transformed layout. The is_transformed field within ztensor must be
true.
All stick format tensors are supported, except:
- Kernel tensors
- Concatenated RNN input-gates tensors
zdnn_status zdnn_transform_origtensor(const zdnn_ztensor *ztensor, void *out_buf);
-
zdnn_ztensor *ztensor- The input
zdnn_ztensorstruct.pre_transformed_desc,transformed_descandbuffermust be set,is_transformedmust betrue.
- The input
-
void *out_buf- The buffer for storing the standard non-transformed tensor data. Must be pre-allocated by the caller.
- This function clears the pre-thread floating-point exception flags at entry,
and may set
FE_UNDERFLOW/FE_INVALID/FE_INEXACT/FE_OVERFLOWwhen it encounters errors during data conversion.
ZDNN_OKZDNN_INVALID_FORMAT-ztensor->transformed_desc->formatis notZDNN_FORMAT_4DFEATURE.ZDNN_INVALID_LAYOUT- (if any of the following are true)zdnn_ztensor->pre_transformed_desc->layoutis not recognized or is not a valid pre_transformed_desc layout.zdnn_ztensor->transformed_desc->layoutis not recognized or is not a valid transformed_desc layout required by this function.
ZDNN_INVALID_TYPEztensor->pre_transformed_desc->typeis not recognized or is a transformed_desc type.ztensor->transformed_desc->typeis not recognized or is a pre_transformed_desc type.
ZDNN_INVALID_BUFFER(if any of the following are true)ztensor->bufferisNULL.ztensor->bufferis not on a 4K boundary.
ZDNN_INVALID_STATE-ztensoris not transformed.ZDNN_CONVERT_FAILURE- Values failed to un-transform.
See Table of Contents for operations list
Given two input tensors in zDNN transformed format, performs element-wise addition and stores the result into the provided output zDNN tensor.
Note that for zDNN use, broadcasting of the input tensor(s) must be performed by the caller. As such, the input tensors must be of the same shape.
zdnn_status zdnn_add(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b,
zdnn_ztensor *output);
-
zdnn_ztensor *input_a- Tensor with addends to add to
input_btensor - Must follow general tensor requirements
- Tensor with addends to add to
-
zdnn_ztensor *input_b- Tensor with addends to add to
input_atensor - Must follow general tensor requirements
- Tensor with addends to add to
-
zdnn_ztensor *output- Tensor to hold the result of the addition
- Must follow general tensor requirements
Returns (see zDNN Statuses for descriptions)
ZDNN_OK- warning statuses
ZDNN_INVALID_SHAPEZDNN_INVALID_TYPEZDNN_INVALID_FORMAT- hardware statuses
Given two input tensors in zDNN transformed format, performs element-wise subtraction and stores the result into the provided output zDNN tensor.
Note that for zDNN use, broadcasting of the input tensor(s) must be performed by the caller. As such, the input tensors must be of the same shape.
zdnn_status zdnn_sub(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b,
zdnn_ztensor *output);
-
zdnn_ztensor *input_a- Tensor with minuends that will be subtracted by
input_btensor. - Must follow general tensor requirements
- Tensor with minuends that will be subtracted by
-
zdnn_ztensor *input_b- Tensor with subtrahends to subtract from
input_atensor. - Must follow general tensor requirements
- Tensor with subtrahends to subtract from
-
zdnn_ztensor *output- Tensor to hold the result of the subtraction
- Must follow general tensor requirements
Returns (see zDNN Statuses for descriptions)
ZDNN_OK- warning statuses
ZDNN_INVALID_SHAPEZDNN_INVALID_TYPEZDNN_INVALID_FORMAT- hardware statuses
Given two input tensors in zDNN transformed format, performs element-wise multiplication and stores the result into the provided output zDNN tensor.
Note that for zDNN use, broadcasting of the input tensor(s) must be performed by the caller. As such, the input tensors must be of the same shape.
zdnn_status zdnn_mul(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b,
zdnn_ztensor *output);
-
zdnn_ztensor *input_a- Tensor with multiplicands that will be multiplied by
input_btensor. - Must follow general tensor requirements
- Tensor with multiplicands that will be multiplied by
-
zdnn_ztensor *input_b- Tensor with multipliers for
input_atensor. - Must follow general tensor requirements
- Tensor with multipliers for
-
zdnn_ztensor *output- Tensor to hold the result of the multiplication.
- Must follow general tensor requirements
Returns (see zDNN Statuses for descriptions)
ZDNN_OK- warning statuses
ZDNN_INVALID_SHAPEZDNN_INVALID_TYPEZDNN_INVALID_FORMAT- hardware statuses
Given two input tensors in zDNN transformed format, performs element-wise division and stores the result into the provided output zDNN tensor.
Note that for zDNN use, broadcasting of the input tensor(s) must be performed by the caller. As such, the input tensors must be of the same shape.
zdnn_status zdnn_div(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b,
zdnn_ztensor *output);
-
zdnn_ztensor *input_a- Tensor with dividends that will be divided by
input_btensor. - Must follow general tensor requirements
- Tensor with dividends that will be divided by
-
zdnn_ztensor *input_b- Tensor with divisors for
input_atensor. - Must follow general tensor requirements
- Tensor with divisors for
-
zdnn_ztensor *output- Tensor to hold the result of the division.
- Must follow general tensor requirements
Returns (see zDNN Statuses for descriptions)
ZDNN_OK- warning statuses
ZDNN_INVALID_SHAPEZDNN_INVALID_TYPEZDNN_INVALID_FORMAT- hardware statuses
Given two input tensors in zDNN transformed format, computes the element-wise minimum and stores the result into the provided output zDNN tensor.
Note that for zDNN use, broadcasting of the input tensor(s) must be performed by the caller. As such, the input tensors must be of the same shape.
zdnn_status zdnn_min(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b,
zdnn_ztensor *output);
-
zdnn_ztensor *input_a- Tensor with values that will be compared with
input_btensor. - Must follow general tensor requirements
- Tensor with values that will be compared with
-
zdnn_ztensor *input_b- Tensor with values that will be compared with
input_atensor. - Must follow general tensor requirements
- Tensor with values that will be compared with
-
zdnn_ztensor *output- Tensor that holds the smaller value from each comparison of the inputs.
- Must follow general tensor requirements
Returns (see zDNN Statuses for descriptions)
ZDNN_OK- warning statuses
ZDNN_INVALID_SHAPEZDNN_INVALID_TYPEZDNN_INVALID_FORMAT- hardware statuses
Given two input tensors in zDNN transformed format, computes the element-wise maximum and stores the result into the provided output zDNN tensor.
Note that for zDNN use, broadcasting of the input tensor(s) must be performed by the caller. As such, the input tensors must be of the same shape.
zdnn_status zdnn_max(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b,
zdnn_ztensor *output);
-
zdnn_ztensor *input_a- Tensor with values that will be compared with
input_btensor. - Must follow general tensor requirements
- Tensor with values that will be compared with
-
zdnn_ztensor *input_b- Tensor with values that will be compared with
input_atensor. - Must follow general tensor requirements
- Tensor with values that will be compared with
-
zdnn_ztensor *output- Tensor that holds the larger value from each comparison of the inputs.
- Must follow general tensor requirements
Returns (see zDNN Statuses for descriptions)s
ZDNN_OK- warning statuses
ZDNN_INVALID_SHAPEZDNN_INVALID_TYPEZDNN_INVALID_FORMAT- hardware statuses
Given an input tensor in zDNN transformed format, computes the natural logarithm element-wise and stores the result into the provided output zDNN tensor.
zdnn_status zdnn_log(const zdnn_ztensor *input, zdnn_ztensor *output);
-
zdnn_ztensor *input- Tensor with values to evaluate.
- Must follow general tensor requirements
-
zdnn_ztensor *output- Tensor that holds the calculated natural logarithm of each value from
input_a - Must follow general tensor requirements
- Tensor that holds the calculated natural logarithm of each value from
Returns (see zDNN Statuses for descriptions)
ZDNN_OK- warning statuses
ZDNN_INVALID_SHAPEZDNN_INVALID_TYPEZDNN_INVALID_FORMAT- hardware statuses
Given an input tensor in zDNN transformed format, computes the exponential element-wise and stores the result into the provided output zDNN tensor.
zdnn_status zdnn_exp(const zdnn_ztensor *input, zdnn_ztensor *output);
-
zdnn_ztensor *input- Tensor with values to evaluate.
- Must follow general tensor requirements
-
zdnn_ztensor *output- Tensor that holds the calculated exponential of each value from
input - Must follow general tensor requirements
- Tensor that holds the calculated exponential of each value from
Returns (see zDNN Statuses for descriptions)
ZDNN_OK- warning statuses
ZDNN_INVALID_SHAPEZDNN_INVALID_TYPEZDNN_INVALID_FORMAT- hardware statuses
Given an input tensor in zDNN transformed format produce an output tensor where the rectified linear function, y = max(0, x) is applied to the input element-wise. If an optional clipping_value is provided, clipping is performed against the intermediate output where z = min(y, clipping_value).
zdnn_status zdnn_relu(const zdnn_ztensor *input, const void *clipping_value,
zdnn_ztensor *output);
-
zdnn_ztensor *input- Tensor with values to evaluate.
- Must follow general tensor requirements
-
void *clipping_value- A pointer to an FP32 value, used to clip input tensor's elements.
- If set to NULL or 0, no clipping will occur.
- Must not be a negative value.
-
zdnn_ztensor *output- Tensor that holds the rectified linear function result of each value from
input - Must follow general tensor requirements
- Tensor that holds the rectified linear function result of each value from
Returns (see zDNN Statuses for descriptions)
ZDNN_OK- warning statuses
ZDNN_INVALID_SHAPEZDNN_INVALID_TYPEZDNN_INVALID_FORMATZDNN_INVALID_CLIPPING_VALUE- hardware statuses
Given an input tensor in zDNN transformed format, produces an output tensor where the hyperbolic tangent is applied to the input element-wise.
zdnn_status zdnn_tanh(const zdnn_ztensor *input, zdnn_ztensor *output);
-
zdnn_ztensor *input- Tensor with values to evaluate.
- Must follow general tensor requirements
-
zdnn_ztensor *output- Tensor that holds the hyperbolic tangent result of each value from
input - Must follow general tensor requirements
- Tensor that holds the hyperbolic tangent result of each value from
Returns (see zDNN Statuses for descriptions)
ZDNN_OK- warning statuses
ZDNN_INVALID_SHAPEZDNN_INVALID_TYPEZDNN_INVALID_FORMAT- hardware statuses
Given an input tensor in zDNN transformed format, produces an output tensor where the sigmoid function is applied to the input element-wise.
zdnn_status zdnn_sigmoid(const zdnn_ztensor *input, zdnn_ztensor *output);
-
zdnn_ztensor *input- Tensor with values to evaluate.
- Must follow general tensor requirements
-
zdnn_ztensor *output- Tensor that holds the sigmoid result of each value from
input - Must follow general tensor requirements
- Tensor that holds the sigmoid result of each value from
Returns (see zDNN Statuses for descriptions)
ZDNN_OK- warning statuses
ZDNN_INVALID_SHAPEZDNN_INVALID_TYPEZDNN_INVALID_FORMAT- hardware statuses
Given an input tensor in zDNN transformed format, computes the softmax
(normalized exponential) for each vector formed in dimension-1, then if
act_func is not SOFTMAX_ACT_NONE, the activation function is applied to the
results. Finally stores the results into the provided output zDNN tensor.
Note: Other parameters, such as axis, are not supported.
zdnn_status zdnn_softmax(const zdnn_ztensor *input, void *save_area,
zdnn_softmax_act act_func, zdnn_ztensor *output);
-
zdnn_ztensor *input- ZDNN_3DS tensor with pre-transformed shape [batch size, batch size, vector dimension size] or output from another operation that is of the correct shape.
- Must follow general tensor requirements
-
void *save_area- A preallocated memory address to use for temporary storage during internal operation processing.
- The preallocate memory must be at least 8K bytes in size, aligned on a 4k boundary.
- If set to NULL, the operation will determine, allocate and free storage automatically.
-
zdnn_softmax_act act_func- Activation function to apply to the results.
SOFTMAX_ACT_NONEorSOFTMAX_ACT_LOG
-
zdnn_ztensor *output- ZDNN_3DS tensor with the same shape as
input_athat holds the softmax result of each value frominput_a. - Must follow general tensor requirements
- ZDNN_3DS tensor with the same shape as
-
If all elements of a dimension 1 vector are the largest magnitude negative number possible for the transformed data type, accuracy may be reduced.
-
A
ZDNN_3DStensor is expected, where thetransformed_descdim1 describes the vector, and dim2 and dim4 are used to batch multiple vector requests together. Dim3 must always be 1. Thezdnn_softmaxoperation is performed against the vector in dim1 repeating for each dim1 vector in the dim4 and dim2 dimensions. -
Tensors that cannot be processed as vectors in dim1 or as batches of dim1 vectors must be coerced or reshaped by the caller.
- When the entire tensor is to be processed by softmax, it can be coerced by
simply creating an alternate descriptor prior to zDNN transformation. For
example:
- A 4D tensor with
pre_transformed_descdimensions 2x2x2x2 and a data array of 16 FP32 entries could have an alternateZDNN_3DSlayoutpre_transformed_descusing dimensions 1x1x16 and use the same original data array prior tozdnn_transform_ztensor. After transformation, such a tensor would be valid forzdnn_softmax. - In another example, the 4D 2x2x2x2 tensor could be processed as 2 batches
of 8 vectors using a
ZDNN_3DSlayoutpre_transformed_descwith dimensions 1x2x8.
- A 4D tensor with
- When the entire tensor is to be processed by softmax, it can be coerced by
simply creating an alternate descriptor prior to zDNN transformation. For
example:
Returns (see zDNN Statuses for descriptions)
ZDNN_OK- warning statuses
ZDNN_INVALID_SHAPEZDNN_INVALID_TYPEZDNN_INVALID_FORMATZDNN_ALLOCATION_FAILURE- A preallocatedsave_areawas not specified and internal allocation for the required memory failed.- hardware statuses
ZDNN_FUNC_RC_F000- input tensorinput->transformed_desc->dim3was not 1.ZDNN_FUNC_RC_F001- Invalidact_func
Given an input tensor in zDNN transformed format, produces a downsampled tensor reducing the middle dimensions to a size of 1 based on the mean of the original values and stores the result to the provided output zDNN tensor.
zdnn_status zdnn_meanreduce2d(const zdnn_ztensor *input, zdnn_ztensor *output);
-
zdnn_ztensor *input- Must be a ZDNN_NHWC tensor with pre_transformed shape [batch_Num, Height, Width, Channel].
- Height and Width dimension must be less than or equal to 1024.
- Must follow general tensor requirements
-
zdnn_ztensor *output- The result tensor which will hold the result of the pooling operation in its buffer.
- Shape:
outputdimensions batch_Num and Channel must be the same as the respective input dimensions.outputdimensions Height and Width must be 1.
- Must follow general tensor requirements
Returns (see zDNN Statuses for descriptions)
ZDNN_OKZDNN_INVALID_SHAPE- Shape of input or output tensor is invalid based on given kernel and stride parametersZDNN_INVALID_TYPEZDNN_INVALID_FORMAT- hardware statuses
ZDNN_FUNC_RC_F001-inputtensor has a Height or Width dimension greater than allowed forzdnn_meanreduce2d.
TensorFlow Reduce Mean with axis set for the Height and Width axes and
keepdims set to True.
Given three input zDNN tensors input_a, input_b, and input_c, computes the
batch-normalized result for each vector formed in dimension-1 as follows:
output = input_b * input_a + input_c
where input_b is a precomputed elementwise divide of scale and variance
tensors, and input_c is a precomputed elementwise multiply of (-1) * mean and
'input_b' + input bias tensors.
zdnn_status zdnn_batchnorm(const zdnn_ztensor *input_a,
const zdnn_ztensor *input_b,
const zdnn_ztensor *input_c, zdnn_ztensor *output);
-
zdnn_ztensor *input_a- Must be a 4D ZDNN_NHWC tensor
- Must follow general tensor requirements
-
zdnn_ztensor *input_b- Must be a 1D ZDNN_1D tensor
- Must follow general tensor requirements
-
zdnn_ztensor *input_c- Must be a 1D ZDNN_1D tensor
- Must follow general tensor requirements
-
zdnn_ztensor *output- A zdnn_ztensor of the same size as
input_arepresenting the computed value of the above formula - Must follow general tensor requirements
- A zdnn_ztensor of the same size as
Returns (see zDNN Statuses for descriptions)
ZDNN_OK- warning statuses
ZDNN_INVALID_SHAPEZDNN_INVALID_TYPEZDNN_INVALID_FORMAT- hardware statuses
Given three input zDNN tensors input_a, input_b, and input_c, determine
the matrix multiplication of input_a * input_b then perform one of the
following operations, using input_c against the dot product, storing the
result into the specified output zDNN tensor:
- Addition
- Compare - If dot product is greater than element.
- Compare - If dot product is greater or equal to element.
- Compare - If dot product is equal to element.
- Compare - If dot product is not equal to element.
- Compare - If dot product is less than or equal to element.
- Compare - If dot product is less than element.
For an operation type of addition, input_c is added to the intermediate dot
product. For operation types of comparison, the intermediate dot product is
compared to input_c and if the comparison is true, the result is set to a
value of 1; otherwise it is set to a value of 0.
The outermost dimension can optionally indicate that the inputs are stacks of matrices. The results for each matrix stack is independent of other stacks but all stacks are calculated in a single call.
zdnn_status zdnn_matmul_op(const zdnn_ztensor *input_a,
const zdnn_ztensor *input_b,
const zdnn_ztensor *input_c,
zdnn_matmul_ops op_type, zdnn_ztensor *output);
- See table in this section for
pre_transformed_descand shape requirements for each tensor. - All tensors must either be stacked or unstacked.
- Must follow general tensor requirements
| type | input_a | input_b | input_c | result |
|---|---|---|---|---|
| unstacked | ZDNN_2D (m, n) |
ZDNN_2D (n, p) |
ZDNN_1D (p) |
ZDNN_2D (m, p) |
| stacked | ZDNN_3DS (s, m, n) |
ZDNN_3DS (s, n, p) |
ZDNN_2DS (s, p) |
ZDNN_3DS (s, m, p) |
-
zdnn_ztensor *input_a- Input tensor with the first matrix for multiplication
- pre_transformed shape and layout must match matmul tensor requirements
-
zdnn_ztensor *input_b- Input tensor with the second matrix for multiplication
- pre_transformed shape and layout must match matmul tensor requirements
-
zdnn_ztensor *input_c- Input tensor that will have the requested operation performed against the
intermediate dot product of
input_aandinput_b. - pre_transformed shape and layout must match matmul tensor requirements
- Input tensor that will have the requested operation performed against the
intermediate dot product of
-
zdnn_matmul_ops op_type- Operation to perform on dot product.
MATMUL_OP_ADDITIONMATMUL_OP_GREATERMATMUL_OP_GREATER_EQUALMATMUL_OP_EQUALMATMUL_OP_NOT_EQUALMATMUL_OP_LESSER_EQUALMATMUL_OP_LESSER
- Operation to perform on dot product.
-
zdnn_ztensor *output- The output tensor which will hold the result of the operation in its buffer.
- pre_transformed shape and layout must match matmul tensor requirements
- Care must be exercised when comparing values for equality or inequality since the order of operations and rounding may produce, what appear to be, slightly different values when they are essentially the same value.
Returns (see zDNN Statuses for descriptions)
ZDNN_OKZDNN_INVALID_SHAPEZDNN_INVALID_TYPEZDNN_INVALID_FORMAT- hardware statuses
ZDNN_FUNC_RC_F000- Invalidop_type.
Given three input zDNN tensors input_a, input_b, and input_c, determine
the matrix multiplication of input_a * input_b, then perform one of the
following operations, using input_c against the dot product, storing the
result into the specified output zDNN tensor:
- Addition
The outermost dimension for input_a can optionally indicate that the input is
a stack of matrices. Each stack of input_a is then multiplied by the same
input_b matrix and input_c which are broadcast over each stack of input_a.
Results for each stack are returned in the corresponding stack index of
output.
zdnn_status zdnn_matmul_bcast_op(const zdnn_ztensor *input_a,
const zdnn_ztensor *input_b,
const zdnn_ztensor *input_c,
zdnn_matmul_bcast_ops op_type, zdnn_ztensor *output);
- See table in this section for
pre_transformed_descand shape requirements for each tensor. - Must follow general tensor requirements
| input_a | input_b | input_c | result |
|---|---|---|---|
ZDNN_3DS (s, m, n) |
ZDNN_2D (n, p) |
ZDNN_1D (p) |
ZDNN_3DS (s, m, p) |
-
zdnn_ztensor *input_a- Input tensor with the first matrix for multiplication.
- pre_transformed shape and layout must match matmul broadcast tensor requirements
-
zdnn_ztensor *input_b- Input tensor with the second matrix for multiplication.
- The same single
input_bmatrix is broadcast and used as the multiplier for each stack dimension ofinput_a - pre_transformed shape and layout must match matmul broadcast tensor requirements
-
zdnn_ztensor *input_c- Input tensor that will have the requested operation performed against the
intermediate dot product for each "m" dimension in
output. - pre_transformed shape and layout must match matmul broadcast tensor requirements
- Input tensor that will have the requested operation performed against the
intermediate dot product for each "m" dimension in
-
zdnn_matmul_bcast_ops op_type-
Operation to perform on dot product.
MATMUL_BCAST_OP_ADDITION
-
-
zdnn_ztensor *output- The output tensor which will hold the result of the operation in its buffer.
- pre_transformed shape and layout must match matmul broadcast tensor requirements
zdnn_matmul_bcast_opsonly supportsMATMUL_BCAST_OP_ADDITIONop_type, any other op_types will be ignored and may not operate compatibly in the future.
Returns (see zDNN Statuses for descriptions)
ZDNN_OKZDNN_INVALID_SHAPEZDNN_INVALID_TYPEZDNN_INVALID_FORMAT- hardware statuses
Implements Long-Short Term Memory layer (LSTM - Hochreiter 1997).
The following formula is computed for the input tensor input(t) for all time steps:
(Default: f=Sigmoid, g=Tanh, h=Tanh):
- it = f(Xt*(Wi^T) + Ht-1*(Ri^T) + Wbi + Rbi)
- ft = f(Xt*(Wf^T) + Ht-1*(Rf^T) + Wbf + Rbf)
- ct = g(Xt*(Wc^T) + Ht-1*(Rc^T) + Wbc + Rbc)
- Ct = ft (.) Ct-1 + it (.) ct
- ot = f(Xt*(Wo^T) + Ht-1*(Ro^T) + Wbo + Rbo)
- Ht = ot (.) h(Ct)
zdnn_status zdnn_lstm(const zdnn_ztensor *input, const zdnn_ztensor *h0,
const zdnn_ztensor *c0, const zdnn_ztensor *weights,
const zdnn_ztensor *biases,
const zdnn_ztensor *hidden_weights,
const zdnn_ztensor *hidden_biases,
lstm_gru_direction direction, void *work_area,
zdnn_ztensor *hn_output, zdnn_ztensor *cf_output);
Also see an example in the usage example section.
-
zdnn_ztensor *input- Input must be a tensor with the shape (num_timesteps, num_batches,
num_features) prior to transformation with the
zdnn_transform_ztensorAPI. - Expects
pre_transformed_desc->layoutto beZDNN_3DS. - Must follow general tensor requirements
- Input must be a tensor with the shape (num_timesteps, num_batches,
num_features) prior to transformation with the
-
zdnn_ztensor *h0- Tensor containing the initial hidden state with shape (num_dirs,
num_batches, num_hidden) prior to transformation with the
zdnn_transform_ztensorAPI. - Expects
pre_transformed_desc->layoutto beZDNN_3DS. - Must follow general tensor requirements
- Must follow num_hidden requirements
- Tensor containing the initial hidden state with shape (num_dirs,
num_batches, num_hidden) prior to transformation with the
-
zdnn_ztensor *c0- Tensor containing the initial cell state with shape (num_dirs, num_batches,
num_hidden) prior to transformation with the
zdnn_transform_ztensorAPI. - Expects
pre_transformed_desc->layoutto beZDNN_3DS. - Must follow general tensor requirements
- Must follow num_hidden requirements
- Tensor containing the initial cell state with shape (num_dirs, num_batches,
num_hidden) prior to transformation with the
-
zdnn_ztensor *weights- Tensor containing the concatenated input connection weights in Forget, Input, Cell, Output (FICO) order.
- Prior to transformation, each gate needs to be transposed to shape (num_dirs, num_features, num_hidden) by the caller.
- Expects
pre_transformed_desc->layoutto beZDNN_3DS. - Expects
zdnn_concat_infohaving the following flags turned on:RNN_TYPE_LSTMUSAGE_WEIGHTS- Appropriate
PREV_LAYERflag:PREV_LAYER_NONEifinputtensor is not from a previous RNN layerPREV_LAYER_UNIifinputtensor is uni-directional output from a previous RNN layerPREV_LAYER_BIDIRifinputtensor is bi-directional output from a previous RNN layer
- Must follow concatenated tensor requirements
- Must follow num_hidden requirements
-
zdnn_ztensor *biases- Tensor containing the concatenated input connection bias in Forget, Input, Cell, Output (FICO) order.
- Prior to transformation, expects each gate needs to be shape (num_dirs, num_hidden).
- Expects
pre_transformed_desc->layoutto beZDNN_2DS. - Expects
zdnn_concat_infohaving the following flags turned on:RNN_TYPE_LSTMUSAGE_BIASES- Appropriate
PREV_LAYERflag:PREV_LAYER_NONEifinputtensor is not from a previous RNN layerPREV_LAYER_UNIifinputtensor is uni-directional output from a previous RNN layerPREV_LAYER_BIDIRifinputtensor is bi-directional output from a previous RNN layer
- Must follow concatenated tensor requirements
- Must follow num_hidden requirements
-
zdnn_ztensor *hidden_weights- Tensor containing the concatenated hidden connection weights in Forget, Input, Cell, Output (FICO) order.
- Prior to transformation, each gate needs to be transposed to shape (num_dirs, num_hidden, num_hidden) by the caller.
- Expects
pre_transformed_desc->layoutto beZDNN_3DS. - Expects
zdnn_concat_infohaving the following flags turned on:RNN_TYPE_LSTMUSAGE_HIDDEN_WEIGHTS- Appropriate
PREV_LAYERflag:PREV_LAYER_NONEifinputtensor is not from a previous RNN layerPREV_LAYER_UNIifinputtensor is uni-directional output from a previous RNN layerPREV_LAYER_BIDIRifinputtensor is bi-directional output from a previous RNN layer
- Must follow concatenated tensor requirements
- Must follow num_hidden requirements
-
zdnn_ztensor *hidden_biases- Tensor containing the concatenated hidden connection bias in Forget, Input, Cell, Output (FICO) order.
- Prior to transformation, expects each gate needs to be shape (num_dirs, num_hidden).
- Expects
pre_transformed_desc->layoutto beZDNN_2DS. - Expects
zdnn_concat_infohaving the following flags turned on:RNN_TYPE_LSTMUSAGE_HIDDEN_BIASES- Appropriate
PREV_LAYERflag:PREV_LAYER_NONEifinputtensor is not from a previous RNN layerPREV_LAYER_UNIifinputtensor is uni-directional output from a previous RNN layerPREV_LAYER_BIDIRifinputtensor is bi-directional output from a previous RNN layer
- Must follow concatenated tensor requirements
- Must follow num_hidden requirements
-
lstm_gru_direction direction- Direction indicator of
lstm_gru_direction directiontype. Valid values:FWD(forward)BWD(backward)BIDIR(bi-directional).
- For input and output shapes, the num_dirs dimension should be:
1for unidirectional calls such as FWD or BWD2for bidirectional calls such that:- dimension 0 contains FWD values.
- dimension 1 contains BWD values.
- Direction indicator of
-
void *work_area-
A preallocated memory address to use for temporary storage during internal operation processing.
-
If set to NULL, the operation will determine, allocate and free storage automatically.
-
Amount of required storage can be determined given the LSTM timestep, batch, and num_hidden values.
-
The sample code below creates a ztensor descriptor that is an equivalent size of the required
work_area. To use this sample code yourself, replace thenum_timesteps,num_batches, andnum_hiddenvariables with your own values.zdnn_tensor_desc desc; desc.dim4 = (4 * num_timesteps) + 6; desc.dim3 = 1; desc.dim2 = num_batches; desc.dim1 = num_hidden; uint64_t work_area_size = zdnn_getsize_ztensor(&desc);
-
-
For bidirectional, twice the amount of contiguous storage is required.
-
The start of the buffer must be 4k aligned.
-
-
zdnn_ztensor *hn_output-
Output results of the hidden states
-
Expects pre_transformed_desc->layout to be
ZDNN_4DS. -
Must follow general tensor requirements
-
Must follow num_hidden requirements
-
Output pre-transformed shapes:
- all timesteps: (num_timesteps, num_dirs, num_batches, num_hidden)
- final timestep only: (1, num_dirs, num_batches, num_hidden)
-
For bidirectional (
BIDIR) output:- Forward and backward results are concatenated on the innermost dimension.
- Can be used directly as input for subsequent RNN layers without needing
untransformation.
- Can not be used directly as input for other non-RNN zDNN ops.
- Untransformation is supported.
-
Note that for
BWDand the backward component ofBIDIRdirections, the output order matches the order of the input, not the processing order. For example, the first input timestep is the last to be processed and its result is the first timestep of the output.
-
-
zdnn_ztensor *cf_output-
Output results of the cell state for the last processed timestep
-
Expects pre_transformed_desc->layout to be
ZDNN_4DS. -
Must follow general tensor requirements
-
Must follow num_hidden requirements
-
Output pre-transformed shapes:
- (1, num_dirs, num_batches, num_hidden)
-
For bidirectional (
BIDIR):- Forward and backward results are concatenated on the innermost dimension.
- Can not be used directly as input for other non-RNN zDNN ops.
- Untransformation is supported.
-
| pre-transformed layout | pre-transformed shape | |
|---|---|---|
| input | ZDNN_3DS |
(num_timesteps, num_batches, num_features) |
| h0 | ZDNN_3DS |
(num_dirs, num_batches, num_hidden) |
| c0 | ZDNN_3DS |
(num_dirs, num_batches, num_hidden) |
| weights | ZDNN_3DS |
(num_dirs, num_features, num_hidden) |
| bias | ZDNN_2DS |
(num_dirs, num_hidden) |
| hidden_weights | ZDNN_3DS |
(num_dirs, num_hidden, num_hidden) |
| hidden_biases | ZDNN_2DS |
(num_dirs, num_hidden) |
| hn_output | ZDNN_4DS |
(num_timesteps, num_dirs, num_batches, num_hidden) (last timestep only when num_timesteps = 1) |
| cf_output | ZDNN_4DS |
(1, num_dirs, num_batches, num_hidden) |
| create transformed descriptor via | |
|---|---|
| input | zdnn_generate_transformed_desc |
| h0 | zdnn_generate_transformed_desc |
| c0 | zdnn_generate_transformed_desc |
| weights | zdnn_generate_transformed_desc_concatenated - RNN_TYPE_LSTM + USAGE_WEIGHTS + one of the following:PREV_LAYER_NONE/PREV_LAYER_UNI/PREV_LAYER_BIDIR |
| bias | zdnn_generate_transformed_desc_concatenated - RNN_TYPE_LSTM + USAGE_BIASES + one of the following:PREV_LAYER_NONE/PREV_LAYER_UNI/PREV_LAYER_BIDIR |
| hidden_weights | zdnn_generate_transformed_desc_concatenated - RNN_TYPE_LSTM + USAGE_HIDDEN_WEIGHTS + one of the following:PREV_LAYER_NONE/PREV_LAYER_UNI/PREV_LAYER_BIDIR |
| hidden_biases | zdnn_generate_transformed_desc_concatenated - RNN_TYPE_LSTM + USAGE_HIDDEN_BIASES + one of the following:PREV_LAYER_NONE/PREV_LAYER_UNI/PREV_LAYER_BIDIR |
| hn_output | zdnn_generate_transformed_desc |
| cf_output | zdnn_generate_transformed_desc |
Returns (see zDNN Statuses for descriptions)
ZDNN_OKZDNN_INVALID_TYPEZDNN_INVALID_FORMATZDNN_INVALID_SHAPE- (if any of the following are not true)hn_outputtimesteps dimension must be 1 or the same size asinputtimestep dimension.- All tensors with a direction dimension have the same direction dimension size.
inputtimestep dimension must be greater than or equal to 1.- Other general shape violations (exceeds MDIS, etc.)
ZDNN_INVALID_DIRECTION-directionparameter was not a recognizedlstm_gru_direction.ZDNN_ALLOCATION_FAILURE- A preallocatedwork_areawas not specified and internal allocation for the required memory failed.- hardware statuses
Implements Gated Recurrent Unit (Kyunghyun Cho 2014). Supports only reset after linear.
The following formula is computed for the input tensor input(t) for all time steps:
(Default: f=Sigmoid, g=Tanh):
- zt = f(Xt*(Wz^T) + Ht-1*(Rz^T) + Wbz + Rbz)
- rt = f(Xt*(Wr^T) + Ht-1*(Rr^T) + Wbr + Rbr)
- ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*(Rh^T) + Rbh)) + Wbh)
- Ht = (1 - zt) (.) ht + zt (.) Ht-1
zdnn_status zdnn_gru(const zdnn_ztensor *input, const zdnn_ztensor *h0,
const zdnn_ztensor *weights, const zdnn_ztensor *biases,
const zdnn_ztensor *hidden_weights,
const zdnn_ztensor *hidden_biases,
lstm_gru_direction direction, void *work_area,
zdnn_ztensor *hn_output);
Also see an example in the usage example section.
-
zdnn_ztensor *input- Input must be a tensor with the shape (num_timesteps, num_batches,
num_features) prior to transformation with the
zdnn_transform_ztensorAPI. - Expects
pre_transformed_desc->layoutto beZDNN_3DS. - Must follow general tensor requirements
- Input must be a tensor with the shape (num_timesteps, num_batches,
num_features) prior to transformation with the
-
zdnn_ztensor *h0- Tensor containing the initial hidden state with shape (num_dirs,
num_batches, num_hidden) prior to transformation with the
zdnn_transform_ztensorAPI. - Expects
pre_transformed_desc->layoutto beZDNN_3DS. - Must follow general tensor requirements
- Must follow num_hidden requirements
- Tensor containing the initial hidden state with shape (num_dirs,
num_batches, num_hidden) prior to transformation with the
-
zdnn_ztensor *weights- Tensor containing the concatenated input connection weights in (Z)update, Reset, Hidden, (ZRH) order.
- Prior to transformation, each gate needs to be transposed to shape (num_dirs, num_features, num_hidden) by the caller.
- Expects
pre_transformed_desc->layoutto beZDNN_3DS. - Expects
zdnn_concat_infohaving the following flags turned on:RNN_TYPE_GRUUSAGE_WEIGHTS- Appropriate
PREV_LAYERflag:PREV_LAYER_NONEifinputtensor is not from a previous RNN layerPREV_LAYER_UNIifinputtensor is uni-directional output from a previous RNN layerPREV_LAYER_BIDIRifinputtensor is bi-directional output from a previous RNN layer
- Must follow concatenated tensor requirements
- Must follow num_hidden requirements
-
zdnn_ztensor *biases- Tensor containing the concatenated input connection bias in (Z)update, Reset, Hidden, (ZRH) order.
- Prior to transformation, expects each gate needs to be shape (num_dirs, num_hidden).
- Expects
pre_transformed_desc->layoutto beZDNN_2DS. - Expects
zdnn_concat_infohaving the following flags turned on:RNN_TYPE_GRUUSAGE_BIASES- Appropriate
PREV_LAYERflag:PREV_LAYER_NONEifinputtensor is not from a previous RNN layerPREV_LAYER_UNIifinputtensor is uni-directional output from a previous RNN layerPREV_LAYER_BIDIRifinputtensor is bi-directional output from a previous RNN layer
- Must follow concatenated tensor requirements
- Must follow num_hidden requirements
-
zdnn_ztensor *hidden_weights- Tensor containing the concatenated hidden connection weights in (Z)update, Reset, Hidden, (ZRH) order.
- Prior to transformation, each gate needs to be transposed to shape (num_dirs, num_hidden, num_hidden) by the caller.
- Expects
pre_transformed_desc->layoutto beZDNN_3DS. - Expects
zdnn_concat_infohaving the following flags turned on:RNN_TYPE_GRUUSAGE_HIDDEN_WEIGHTS- Appropriate
PREV_LAYERflag:PREV_LAYER_NONEifinputtensor is not from a previous RNN layerPREV_LAYER_UNIifinputtensor is uni-directional output from a previous RNN layerPREV_LAYER_BIDIRifinputtensor is bi-directional output from a previous RNN layer
- Must follow concatenated tensor requirements
- Must follow num_hidden requirements
-
zdnn_ztensor *hidden_biases- Tensor containing the concatenated hidden connection bias in (Z)update, Reset, Hidden, (ZRH) order.
- Prior to transformation, expects each gate needs to be shape (num_dirs, num_hidden).
- Expects
pre_transformed_desc->layoutto beZDNN_2DS. - Expects
zdnn_concat_infohaving the following flags turned on:RNN_TYPE_GRUUSAGE_HIDDEN_BIASES- Appropriate
PREV_LAYERflag:PREV_LAYER_NONEifinputtensor is not from a previous RNN layerPREV_LAYER_UNIifinputtensor is uni-directional output from a previous RNN layerPREV_LAYER_BIDIRifinputtensor is bi-directional output from a previous RNN layer
- Must follow concatenated tensor requirements
- Must follow num_hidden requirements
-
lstm_gru_direction direction- Direction indicator of
lstm_gru_direction directiontype. Valid values:FWD(forward)BWD(backward)BIDIR(bi-directional).
- For input shapes, the num_dirs dimension should be:
1for unidirectional calls such as FWD or BWD2for bidirectional calls such that:- dimension 0 contains FWD values.
- dimension 1 contains BWD values.
- Direction indicator of
-
void *work_area-
A preallocated memory address to use for temporary storage during internal operation processing.
-
If set to NULL, the operation will determine, allocate and free storage automatically.
-
Amount of required storage can be determined given the GRU timestep, batch, and num_hidden values.
-
The sample code below creates a ztensor descriptor that is an equivalent size of the required
work_area. To use this sample code yourself, replace thenum_timesteps,num_batches, andnum_hiddenvariables with your own values.zdnn_tensor_desc desc; desc.dim4 = (3 * num_timesteps) + 5; desc.dim3 = 1; desc.dim2 = num_batches; desc.dim1 = num_hidden; uint64_t work_area_size = zdnn_getsize_ztensor(&desc);
-
-
For bidirectional, twice the amount of contiguous storage is required.
-
The start of the buffer must be 4k aligned.
-
-
zdnn_ztensor *hn_output-
Output results of the hidden states
-
Expects pre_transformed_desc->layout to be
ZDNN_4DS. -
Must follow general tensor requirements
-
Must follow num_hidden requirements
-
Output pre-transformed shapes:
- all timesteps: (num_timesteps, num_dirs, num_batches, num_hidden)
- final timestep only: (1, num_dirs, num_batches, num_hidden)
-
For bidirectional (
BIDIR) output:- Forward and backward results are concatenated on the innermost dimension.
- Can be used directly as input for subsequent RNN layers without needing
untransformation.
- Can not be used directly as input for other non-RNN zDNN ops.
- Untransformation is supported.
-
Note that for
BWDand the backward component ofBIDIRdirections, the output order matches the order of the input, not the processing order. For example, the first input timestep is the last to be processed and its result is the first timestep of the output.
-
| pre-transformed layout | pre-transformed shape | |
|---|---|---|
| input | ZDNN_3DS |
(num_timesteps, num_batches, num_features) |
| h0 | ZDNN_3DS |
(num_dirs, num_batches, num_hidden) |
| c0 | ZDNN_3DS |
(num_dirs, num_batches, num_hidden) |
| weights | ZDNN_3DS |
(num_dirs, num_features, num_hidden) |
| bias | ZDNN_2DS |
(num_dirs, num_hidden) |
| hidden_weights | ZDNN_3DS |
(num_dirs, num_hidden, num_hidden) |
| hidden_biases | ZDNN_2DS |
(num_dirs, num_hidden) |
| hn_output | ZDNN_4DS |
(num_timesteps, num_dirs, num_batches, num_hidden) (last timestep only when num_timesteps = 1) |
| create transformed descriptor via | |
|---|---|
| input | zdnn_generate_transformed_desc |
| h0 | zdnn_generate_transformed_desc |
| c0 | zdnn_generate_transformed_desc |
| weights | zdnn_generate_transformed_desc_concatenated - RNN_TYPE_LSTM + USAGE_WEIGHTS + one of the following:PREV_LAYER_NONE/PREV_LAYER_UNI/PREV_LAYER_BIDIR |
| bias | zdnn_generate_transformed_desc_concatenated - RNN_TYPE_LSTM + USAGE_BIASES + one of the following:PREV_LAYER_NONE/PREV_LAYER_UNI/PREV_LAYER_BIDIR |
| hidden_weights | zdnn_generate_transformed_desc_concatenated - RNN_TYPE_LSTM + USAGE_HIDDEN_WEIGHTS + one of the following:PREV_LAYER_NONE/PREV_LAYER_UNI/PREV_LAYER_BIDIR |
| hidden_biases | zdnn_generate_transformed_desc_concatenated - RNN_TYPE_LSTM + USAGE_HIDDEN_BIASES + one of the following:PREV_LAYER_NONE/PREV_LAYER_UNI/PREV_LAYER_BIDIR |
| hn_output | zdnn_generate_transformed_desc |
Returns (see zDNN Statuses for descriptions)
ZDNN_OKZDNN_INVALID_TYPEZDNN_INVALID_FORMATZDNN_INVALID_SHAPE- (if any of the following are not true)hn_outputtimesteps dimension must be 1 or the same size asinputtimestep dimension.- All tensors with a direction dimension have the same direction dimension size.
inputtimestep dimension must be greater than or equal to 1.- Other general shape violations (exceeds MDIS, etc.)
ZDNN_INVALID_DIRECTION-directionparameter was not a recognizedlstm_gru_direction.ZDNN_ALLOCATION_FAILURE- A preallocatedwork_areawas not specified and internal allocation for the required memory failed.- hardware statuses
Given an input tensor in zDNN transformed format, padding type, kernel size and kernel stride, produces a downsampled tensor reducing the middle dimensions based on the mean values within the kernel window at each step and stores the results into the provided output zDNN tensor.
zdnn_status zdnn_avgpool2d(const zdnn_ztensor *input,
zdnn_pool_padding padding_type,
uint32_t kernel_height, uint32_t kernel_width,
uint32_t stride_height, uint32_t stride_width,
zdnn_ztensor *output);
-
zdnn_ztensor *input- Tensor with original values to be downsampled in the output tensor.
- Must be a ZDNN_NHWC tensor with pre_transformed shape [batch_Num, Height, Width, Channel].
- See Parameter Restrictions below for information on the expected shape of the input tensor.
- Must follow general tensor requirements
-
padding_type- The type of padding to use for the pooling operations.
- Valid values: are
SAME_PADDINGorVALID_PADDING. - See Parameter Restrictions below for information on the expected value of padding_type.
- For information on "same" vs "valid" padding see: https://www.pico.net/kb/what-is-the-difference-between-same-and-valid-padding-in-tf-nn-max-pool-of-tensorflow.
-
kernel_height- Size of the kernel window that passes over the input's height dimension.
- See Parameter Restrictions below for information on the expected value of kerneL_height.
-
kernel_width- Size of the kernel window that passes over the input's width dimension.
- See Parameter Restrictions below for information on the expected value of kerneL_width.
-
stride_height- Number of positions the kernel moves over input's height dimension at each step.
- If
stride_heightis 0 thenstride_widthmust also be 0. - If strides are greater than 0 then
stride_heightmust be less than or equal to 30.
-
stride_width- Number of positions the kernel moves over the input's width dimension at each step.
- If
stride_heightis 0 thenstride_widthmust also be 0. - If strides are greater than 0 then
stride_widthmust be less than or equal to 30.
-
zdnn_ztensor *output- The result tensor which will hold the result of the pooling operation its buffer.
- Must be a ZDNN_NHWC tensor with pre_transformed shape [batch_Num, Height, Width, Channel].
- See Parameter Restrictions below for information on the expected shape of the output tensor.
- Must follow general tensor requirements
Parameter restrictions may vary based on provided strides and padding_type.
-
Input tensor batch_Num and Channel dimensions must always match the output tensor's respective dimensions.
-
If strides are 0:
- Both input tensor's Height dimension and the kernel_height must match and be less than or equal to 1024.
- Both input tensor's Width dimension and the kernel_width must match and be less than or equal to 1024.
- Output tensor's height and width dimensions must be 1.
- padding_type must be
VALID_PADDING.
-
If strides are greater than zero:
- kernel_width and kernel_height must be less than or equal to 64.
- input tensor's height or weight dimension must not be greater than 1024.
- If padding_type is
SAME_PADDING:- Output tensor's height dimension must equal
ceil((float)input's height / stride_height). - Output tensor's width dimension must equal
ceil((float)input's width / stride_width).
- Output tensor's height dimension must equal
- If padding_type is
VALID_PADDING:- Output tensor's height dimension must equal
ceil((float)(input's height - kernel_height + 1) / stride_height). - Output tensor's width dimension must equal
ceil((float)(input's width - kernel_width + 1) / stride_width).
- Output tensor's height dimension must equal
- If the magnitude of difference between elements of
inputis large (greater than 10), accuracy may be reduced.
Returns (see zDNN Statuses for descriptions)
ZDNN_OKZDNN_INVALID_SHAPE- Shape of input or output tensor is invalid based on given kernel and stride parameters
- Other general shape violations (exceeds MDIS, etc.)
ZDNN_INVALID_TYPEZDNN_INVALID_FORMATZDNN_INVALID_STRIDE_PADDINGZDNN_INVALID_STRIDES- One stride was non-zero, but not the other.- hardware statuses
ZDNN_EXCEEDS_MDISwill also occur if any of the following conditions occur:- stride_height is larger than
zdnn_get_nnpa_max_dim_idx_size. - stride_width is larger than
zdnn_get_nnpa_max_dim_idx_size. - kernel_height is 0 or is larger than
zdnn_get_nnpa_max_dim_idx_size. - kernel_width is 0 or is larger than
zdnn_get_nnpa_max_dim_idx_size.
- stride_height is larger than
ZDNN_FUNC_RC_F000- Invalidpadding_typeZDNN_FUNC_RC_F001-stride_height= 0 andstride_width= 0, but a kernel parameter is greater than allowed (seekernel_heightorkernel_widthabove)ZDNN_FUNC_RC_F002-stride_height> 0 andstride_width> 0, but a kernel parameter is greater than allowed (seekernel_heightorkernel_widthabove)ZDNN_FUNC_RC_F003-stride_height> 0 andstride_width> 0, but a stride parameter is greater than allowed (seestride_heightorstride_widthabove)ZDNN_FUNC_RC_F004-stride_height> 0 andstride_width> 0, but either input tensor's height or weight dimension is greater than 1024.
Given an input tensor in zDNN transformed format, padding type, kernel size and kernel stride, produces a downsampled tensor reducing the middle dimensions based on the maximum values within the kernel window at each step and stores the results into the provided output zDNN tensor.
zdnn_status zdnn_maxpool2d(const zdnn_ztensor *input,
zdnn_pool_padding padding_type,
uint32_t kernel_height, uint32_t kernel_width,
uint32_t stride_height, uint32_t stride_width,
zdnn_ztensor *output);
-
zdnn_ztensor *input- Tensor with original values to be downsampled in the output tensor.
- Must be a ZDNN_NHWC tensor with pre_transformed shape [batch_Num, Height, Width, Channel].
- See Parameter Restrictions below for information on the expected shape of the input tensor.
- Must follow general tensor requirements
-
padding_type- The type of padding to use for the pooling operations.
- Valid values: are
SAME_PADDINGorVALID_PADDING. - See Parameter Restrictions below for information on the expected value of padding_type.
- For information on "same" vs "valid" padding see: https://www.pico.net/kb/what-is-the-difference-between-same-and-valid-padding-in-tf-nn-max-pool-of-tensorflow.
-
kernel_height- Size of the kernel window that passes over the input's height dimension.
- See Parameter Restrictions below for information on the expected value of kerneL_height.
-
kernel_width- Size of the kernel window that passes over the input's width dimension.
- See Parameter Restrictions below for information on the expected value of kerneL_width.
-
stride_height- Number of positions the kernel moves over input's height dimension at each step.
- If
stride_heightis 0 thenstride_widthmust also be 0. - If strides are greater than 0 then
stride_heightmust be less than or equal to 30.
-
stride_width- Number of positions the kernel moves over the input's width dimension at each step.
- If
stride_heightis 0 thenstride_widthmust also be 0. - If strides are greater than 0 then
stride_widthmust be less than or equal to 30.
-
zdnn_ztensor *output- The result tensor which will hold the result of the pooling operation its buffer.
- Must be a ZDNN_NHWC tensor with pre_transformed shape [batch_Num, Height, Width, Channel].
- See Parameter Restrictions below for information on the expected shape of the output tensor.
- Must follow general tensor requirements
Parameter restrictions may vary based on provided strides and padding_type.
-
Input tensor batch_Num and Channel dimensions must always match the output tensor's respective dimensions.
-
If strides are 0:
- Both input tensor's Height dimension and the kernel_height must match and be less than or equal to 1024.
- Both input tensor's Width dimension and the kernel_width must match and be less than or equal to 1024.
- Output tensor's height and width dimensions must be 1.
- padding_type must be
VALID_PADDING.
-
If strides are greater than zero:
- kernel_width and kernel_height must be less than or equal to 64.
- input tensor's height or weight dimension must not be greater than 1024.
- If padding_type is
SAME_PADDING:- Output tensor's height dimension must equal
ceil((float)input's height / stride_height). - Output tensor's width dimension must equal
ceil((float)input's width / stride_width).
- Output tensor's height dimension must equal
- If padding_type is
VALID_PADDING:- Output tensor's height dimension must equal
ceil((float)(input's height - kernel_height + 1) / stride_height). - Output tensor's width dimension must equal
ceil((float)(input's width - kernel_width + 1) / stride_width).
- Output tensor's height dimension must equal
- If the magnitude of difference between elements of
inputis large (greater than 10), accuracy may be reduced.
Returns (see zDNN Statuses for descriptions)
ZDNN_OKZDNN_INVALID_SHAPE- Shape of input or output tensor is invalid based on given kernel and stride parameters
- Other general shape violations (exceeds MDIS, etc.)
ZDNN_INVALID_TYPEZDNN_INVALID_FORMATZDNN_INVALID_STRIDE_PADDINGZDNN_INVALID_STRIDES- One stride was non-zero, but not the other.- hardware statuses
ZDNN_EXCEEDS_MDISwill also occur if any of the following conditions occur:- stride_height is larger than
zdnn_get_nnpa_max_dim_idx_size. - stride_width is larger than
zdnn_get_nnpa_max_dim_idx_size. - kernel_height is 0 or is larger than
zdnn_get_nnpa_max_dim_idx_size. - kernel_width is 0 or is larger than
zdnn_get_nnpa_max_dim_idx_size.
- stride_height is larger than
ZDNN_FUNC_RC_F000- Invalidpadding_typeZDNN_FUNC_RC_F001-stride_height= 0 andstride_width= 0, but a kernel parameter is greater than allowed (seekernel_heightorkernel_widthabove)ZDNN_FUNC_RC_F002-stride_height> 0 andstride_width> 0, but a kernel parameter is greater than allowed (seekernel_heightorkernel_widthabove)ZDNN_FUNC_RC_F003-stride_height> 0 andstride_width> 0, but a stride parameter is greater than allowed (seestride_heightorstride_widthabove)ZDNN_FUNC_RC_F004-stride_height> 0 andstride_width> 0, but either input tensor's height or weight dimension is greater than 1024.
Perform 2D convolution over an input tensor in zDNN transformed format.
First the input tensor is convolved with the kernel tensor. Then the bias
tensor is added to the results. Then if act_func is not CONV2D_ACT_NONE, the
activation function is applied to the results. Then if act_func is set to
CONV2D_ACT_RELU, and clipping_value is not NULL or 0, clipping is
performed against the intermediate result where z = min(intermediate_result,
clipping_value). Finally the results are stored into the provided output zDNN
tensor.
zdnn_status zdnn_conv2d(const zdnn_ztensor *input,
const zdnn_ztensor *kernel,
const zdnn_ztensor *bias,
zdnn_pool_padding padding_type,
uint32_t stride_height, uint32_t stride_width,
zdnn_conv2d_act act_func,
const void *clipping_value, zdnn_ztensor *output);
-
zdnn_ztensor *input- Tensor with original values to be downsampled in the output tensor.
- Must be a ZDNN_NHWC tensor with pre_transformed shape [num_batches, height_in, width_in, channels_in].
- See Convolution 2D Requirements for requirements.
- Must follow general tensor requirements
-
zdnn_ztensor *kernel- The kernel tensor to convolute with the input tensor.
- Must be a ZDNN_HWCK tensor with pre_transformed shape [kernel_height, kernel_width, channels_in, channels_out].
- See Convolution 2D Requirements for requirements.
- Must follow general tensor requirements
-
zdnn_ztensor *bias- The bias tensor to add to the convoluted results.
- Must be a ZDNN_1D tensor with pre_transformed shape [channels_out].
- See Convolution 2D Requirements for requirements.
- Must follow general tensor requirements
-
zdnn_pool_padding padding_type- The type of padding to use for the pooling operations.
- Valid values: are
SAME_PADDINGorVALID_PADDING. - For information on "same" vs "valid" padding see: https://www.pico.net/kb/what-is-the-difference-between-same-and-valid-padding-in-tf-nn-max-pool-of-tensorflow.
-
uint32_t stride_height- Number of positions the kernel moves over the input's
dim3dimension at each step. - See Convolution 2D Requirements for requirements.
- Number of positions the kernel moves over the input's
-
uint32_t stride_width- Number of positions the kernel moves over the input's
dim2dimension at each step. - See Convolution 2D Requirements for requirements.
- Number of positions the kernel moves over the input's
-
zdnn_conv2d_act act_func- Activation function to apply to the results.
CONV2D_ACT_NONEorCONV2D_ACT_RELU
-
void *clipping_value- A pointer to an FP32 value, used to clip input tensor's elements.
- If set to NULL or 0, no clipping will occur.
- Must not be a negative value.
- Value is ignored if
act_funcis not set toCONV2D_ACT_RELU.
-
zdnn_ztensor *output- The result tensor which will hold the results.
- Must be a ZDNN_NHWC tensor with pre_transformed shape [num_batches, height_out, width_out, channels_out].
- See Convolution 2D Requirements for requirements.
- Must follow general tensor requirements
| strides and padding | input (num_batches, height_in, width_in, channels_in) | kernel (kernel_height, kernel_width, channels_in, channels_out) | bias (channels_out) | output (num_batches, height_out, width_out, channels_out) |
|---|---|---|---|---|
| both strides > 0 and =< 13, SAME padding | both kernel_height and kernel_width must be =< 64 | height_out = ceil(height_in/stride_height) width_out = ceil(width_in/stride_width) |
||
| both strides > 0 and =< 13, VALID padding | height_in must be >= kernel_height width_in must be >= kernel_width |
both kernel_height and kernel_width must be =< 64 | height_out = ceil((height_in - kernel_height + 1)/stride_height) width_out = ceil((width_in - kernel_width + 1)/stride_width) |
|
| both strides = 0, VALID padding | height_in must be = kernel_height width_in must be = kernel_width |
both kernel_height and kernel_width must be =< 448 | both height_out and width_out must be 1 |
Returns (see zDNN Statuses for descriptions)
ZDNN_OK- warning statuses
ZDNN_INVALID_SHAPE- Shape of input or output tensor is invalid based on given kernel and stride parameters
- Other general shape violations (exceeds MDIS, etc.)
ZDNN_INVALID_TYPEZDNN_INVALID_FORMATZDNN_INVALID_STRIDE_PADDINGZDNN_INVALID_STRIDESZDNN_INVALID_CLIPPING_VALUE- hardware statuses
ZDNN_FUNC_RC_F000- Invalidpadding_typeZDNN_FUNC_RC_F001- Invalidact_funcZDNN_FUNC_RC_F002-stride_height= 0 andstride_width= 0, but eitherkernel_heightorkernel_width> 448ZDNN_FUNC_RC_F003-stride_height> 0 andstride_width> 0, but eitherkernel_heightorkernel_width> 64ZDNN_FUNC_RC_F004- Eitherstride_heightorstride_width> 13
- None
#include <assert.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "zdnn.h"
// ***************************************************************************
// Sample:
//
// Create 2 zTensors a and b, and add them together via zdnn_add()
// ***************************************************************************
int main(int argc, char *argv[]) {
zdnn_tensor_desc pre_tfrmd_desc, tfrmd_desc;
zdnn_ztensor ztensor_a;
zdnn_ztensor ztensor_b;
zdnn_ztensor ztensor_out;
zdnn_status status;
uint32_t dim_n = 1, dim_h = 32, dim_w = 32, dim_c = 3;
zdnn_data_types type = FP32;
short element_size = 4; // size of each element in bytes
uint64_t num_elements = dim_n * dim_h * dim_w * dim_c;
// allocate tensor data storage
void *data1 = malloc(num_elements * element_size);
void *data2 = malloc(num_elements * element_size);
void *data_out = malloc(num_elements * element_size);
// read input_data
// check status for AIU availability, supported ops, etc. here
// status = zdnn_query(…);
// set input tensor data to 0 to 127 sequentially and repeat
for (uint64_t i = 0; i < num_elements; i++) {
((float *)data1)[i] = (float)(i & 0x7f);
((float *)data2)[i] = (float)(i & 0x7f);
}
zdnn_init_pre_transformed_desc(ZDNN_NHWC, type, &pre_tfrmd_desc, dim_n, dim_h,
dim_w, dim_c);
// generate transformed shape information
status = zdnn_generate_transformed_desc(&pre_tfrmd_desc, &tfrmd_desc);
assert(status == ZDNN_OK);
// initialize zTensors and allocate 4k-aligned storage via helper function
status =
zdnn_init_ztensor_with_malloc(&pre_tfrmd_desc, &tfrmd_desc, &ztensor_a);
assert(status == ZDNN_OK);
status =
zdnn_init_ztensor_with_malloc(&pre_tfrmd_desc, &tfrmd_desc, &ztensor_b);
assert(status == ZDNN_OK);
status =
zdnn_init_ztensor_with_malloc(&pre_tfrmd_desc, &tfrmd_desc, &ztensor_out);
assert(status == ZDNN_OK);
// transform the feature tensor
status = zdnn_transform_ztensor(&ztensor_a, data1);
assert(status == ZDNN_OK);
status = zdnn_transform_ztensor(&ztensor_b, data2);
assert(status == ZDNN_OK);
// perform element-wise add between the two input tensors
status = zdnn_add(&ztensor_a, &ztensor_b, &ztensor_out);
assert(status == ZDNN_OK);
// transform resultant zTensor back to original data format
status = zdnn_transform_origtensor(&ztensor_out, data_out);
assert(status == ZDNN_OK);
for (uint64_t i = 0; i < num_elements; i++) {
printf("out element %" PRIu64 " %f\n", i, ((float *)data_out)[i]);
}
// Free zTensors
status = zdnn_free_ztensor_buffer(&ztensor_a);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&ztensor_b);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&ztensor_out);
assert(status == ZDNN_OK);
free(data1);
free(data2);
free(data_out);
}
// SPDX-License-Identifier: Apache-2.0
/*
* Copyright IBM Corp. 2021
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "zdnn.h"
// Sample: LSTM
int main(int argc, char *argv[]) {
zdnn_status status;
#ifdef STATIC_LIB
zdnn_init();
#endif
/***********************************************************************
*
* LSTM (FWD/BWD):
*
* INPUTS --------------------------------------------------------------
* input | ZDNN_3DS | (num_timesteps, num_batches, num_features)
* h0 | ZDNN_3DS | (1, num_batches, num_hidden)
* c0 | ZDNN_3DS | (1, num_batches, num_hidden)
* weights | ZDNN_3DS | (1, num_features, num_hidden)
* biases | ZDNN_2DS | (1, num_hidden)
* hidden_weights | ZDNN_3DS | (1, num_hidden, num_hidden)
* hidden_biases | ZDNN_2DS | (1, num_hidden)
*
* OUTPUTS -------------------------------------------------------------
* hn_output | ZDNN_4DS | (num_timesteps, 1, num_batches, num_hidden)
* | | or (1, 1, num_batches, num_hidden)
* cf_output | ZDNN_4DS | (1, 1, num_batches, num_hidden)
***********************************************************************/
/***********************************************************************
* Create input zTensor
***********************************************************************/
zdnn_tensor_desc input_pre_tfrmd_desc, input_tfrmd_desc;
zdnn_ztensor input;
uint32_t num_timesteps = 5;
uint32_t num_batches = 3;
uint32_t num_features = 32;
uint32_t num_hidden = 5;
zdnn_data_types type = FP32;
short element_size = 4; // size of each element in bytes
lstm_gru_direction dir = FWD;
uint8_t num_dirs = 1;
zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &input_pre_tfrmd_desc,
num_timesteps, num_batches, num_features);
status =
zdnn_generate_transformed_desc(&input_pre_tfrmd_desc, &input_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&input_pre_tfrmd_desc,
&input_tfrmd_desc, &input);
assert(status == ZDNN_OK);
uint64_t input_data_size =
num_timesteps * num_batches * num_features * element_size;
void *input_data = malloc(input_data_size);
status = zdnn_transform_ztensor(&input, input_data);
assert(status == ZDNN_OK);
/***********************************************************************
* Create initial hidden and cell state zTensors
***********************************************************************/
zdnn_tensor_desc h0c0_pre_tfrmd_desc, h0c0_tfrmd_desc;
zdnn_ztensor h0, c0;
zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &h0c0_pre_tfrmd_desc, num_dirs,
num_batches, num_hidden);
status =
zdnn_generate_transformed_desc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc,
&h0);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc,
&c0);
assert(status == ZDNN_OK);
uint64_t h0c0_data_size = num_batches * num_hidden * element_size;
void *hidden_state_data = malloc(h0c0_data_size);
void *cell_state_data = malloc(h0c0_data_size);
status = zdnn_transform_ztensor(&h0, hidden_state_data);
assert(status == ZDNN_OK);
status = zdnn_transform_ztensor(&c0, cell_state_data);
assert(status == ZDNN_OK);
/***********************************************************************
* Create input weights zTensor
* Resultant zTensor is concatenated
***********************************************************************/
zdnn_tensor_desc weights_pre_tfrmd_desc, weights_tfrmd_desc;
zdnn_ztensor weights;
zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &weights_pre_tfrmd_desc,
num_dirs, num_features, num_hidden);
status = zdnn_generate_transformed_desc_concatenated(
&weights_pre_tfrmd_desc, RNN_TYPE_LSTM | USAGE_WEIGHTS | PREV_LAYER_NONE,
&weights_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&weights_pre_tfrmd_desc,
&weights_tfrmd_desc, &weights);
assert(status == ZDNN_OK);
uint64_t weights_data_size = num_features * num_hidden * element_size;
void *weights_data_f = malloc(weights_data_size);
void *weights_data_i = malloc(weights_data_size);
void *weights_data_c = malloc(weights_data_size);
void *weights_data_o = malloc(weights_data_size);
status = zdnn_transform_ztensor(&weights, weights_data_f, weights_data_i,
weights_data_c, weights_data_o);
assert(status == ZDNN_OK);
/***********************************************************************
* Create biases zTensors
* Resultant zTensors are concatenated
***********************************************************************/
zdnn_tensor_desc biases_pre_tfrmd_desc, biases_tfrmd_desc;
zdnn_ztensor biases;
zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &biases_pre_tfrmd_desc,
num_dirs, num_hidden);
status = zdnn_generate_transformed_desc_concatenated(
&biases_pre_tfrmd_desc, RNN_TYPE_LSTM | USAGE_BIASES | PREV_LAYER_NONE,
&biases_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&biases_pre_tfrmd_desc,
&biases_tfrmd_desc, &biases);
assert(status == ZDNN_OK);
uint64_t biases_data_size = num_hidden * element_size;
void *biases_data_f = malloc(biases_data_size);
void *biases_data_i = malloc(biases_data_size);
void *biases_data_c = malloc(biases_data_size);
void *biases_data_o = malloc(biases_data_size);
status = zdnn_transform_ztensor(&biases, biases_data_f, biases_data_i,
biases_data_c, biases_data_o);
assert(status == ZDNN_OK);
/***********************************************************************
* Create hidden weights zTensor
* Resultant zTensor is concatenated
***********************************************************************/
zdnn_tensor_desc hidden_weights_pre_tfrmd_desc, hidden_weights_tfrmd_desc;
zdnn_ztensor hidden_weights;
zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &hidden_weights_pre_tfrmd_desc,
num_dirs, num_hidden, num_hidden);
status = zdnn_generate_transformed_desc_concatenated(
&hidden_weights_pre_tfrmd_desc,
RNN_TYPE_LSTM | USAGE_HIDDEN_WEIGHTS | PREV_LAYER_NONE,
&hidden_weights_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&hidden_weights_pre_tfrmd_desc,
&hidden_weights_tfrmd_desc,
&hidden_weights);
assert(status == ZDNN_OK);
uint64_t hidden_weights_data_size = num_hidden * num_hidden * element_size;
void *hidden_weights_data_f = malloc(hidden_weights_data_size);
void *hidden_weights_data_i = malloc(hidden_weights_data_size);
void *hidden_weights_data_c = malloc(hidden_weights_data_size);
void *hidden_weights_data_o = malloc(hidden_weights_data_size);
status = zdnn_transform_ztensor(&hidden_weights, hidden_weights_data_f,
hidden_weights_data_i, hidden_weights_data_c,
hidden_weights_data_o);
assert(status == ZDNN_OK);
/***********************************************************************
* Create hidden biases zTensors
* Resultant zTensors are concatenated
***********************************************************************/
zdnn_tensor_desc hidden_biases_pre_tfrmd_desc, hidden_biases_tfrmd_desc;
zdnn_ztensor hidden_biases;
zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &hidden_biases_pre_tfrmd_desc,
num_dirs, num_hidden);
status = zdnn_generate_transformed_desc_concatenated(
&hidden_biases_pre_tfrmd_desc,
RNN_TYPE_LSTM | USAGE_HIDDEN_BIASES | PREV_LAYER_NONE,
&hidden_biases_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(
&hidden_biases_pre_tfrmd_desc, &hidden_biases_tfrmd_desc, &hidden_biases);
assert(status == ZDNN_OK);
uint64_t hidden_biases_data_size = num_hidden * element_size;
void *hidden_biases_data_f = malloc(hidden_biases_data_size);
void *hidden_biases_data_i = malloc(hidden_biases_data_size);
void *hidden_biases_data_c = malloc(hidden_biases_data_size);
void *hidden_biases_data_o = malloc(hidden_biases_data_size);
status = zdnn_transform_ztensor(&hidden_biases, hidden_biases_data_f,
hidden_biases_data_i, hidden_biases_data_c,
hidden_biases_data_o);
assert(status == ZDNN_OK);
/***********************************************************************
* Create output zTensor
***********************************************************************/
// get only the last timestep, thus hn and cf can share descriptor
zdnn_tensor_desc hncf_pre_tfrmd_desc, hncf_tfrmd_desc;
zdnn_ztensor hn_output_ztensor, cf_output_ztensor;
zdnn_init_pre_transformed_desc(ZDNN_4DS, type, &hncf_pre_tfrmd_desc, 1, 1,
num_batches, num_hidden);
status =
zdnn_generate_transformed_desc(&hncf_pre_tfrmd_desc, &hncf_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&hncf_pre_tfrmd_desc, &hncf_tfrmd_desc,
&hn_output_ztensor);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&hncf_pre_tfrmd_desc, &hncf_tfrmd_desc,
&cf_output_ztensor);
assert(status == ZDNN_OK);
/***********************************************************************
* Call the AIU
***********************************************************************/
void *work_area = NULL;
status = zdnn_lstm(&input, &h0, &c0, &weights, &biases, &hidden_weights,
&hidden_biases, dir, work_area, &hn_output_ztensor,
&cf_output_ztensor);
assert(status == ZDNN_OK);
/***********************************************************************
* Output and Cleanup
***********************************************************************/
uint64_t hncf_data_size = num_batches * num_hidden * element_size;
void *hn_output_data = malloc(hncf_data_size);
void *cf_output_data = malloc(hncf_data_size);
status = zdnn_transform_origtensor(&hn_output_ztensor, hn_output_data);
assert(status == ZDNN_OK);
status = zdnn_transform_origtensor(&cf_output_ztensor, cf_output_data);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&input);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&h0);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&c0);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&weights);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&biases);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&hidden_weights);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&hidden_biases);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&hn_output_ztensor);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&cf_output_ztensor);
assert(status == ZDNN_OK);
free(input_data);
free(hidden_state_data);
free(cell_state_data);
free(weights_data_f);
free(weights_data_i);
free(weights_data_c);
free(weights_data_o);
free(hidden_weights_data_f);
free(hidden_weights_data_i);
free(hidden_weights_data_c);
free(hidden_weights_data_o);
free(biases_data_f);
free(biases_data_i);
free(biases_data_c);
free(biases_data_o);
free(hidden_biases_data_f);
free(hidden_biases_data_i);
free(hidden_biases_data_c);
free(hidden_biases_data_o);
free(hn_output_data);
free(cf_output_data);
}
// SPDX-License-Identifier: Apache-2.0
/*
* Copyright IBM Corp. 2021
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "zdnn.h"
// Sample: LSTM BI-DIR
int main(int argc, char *argv[]) {
zdnn_status status;
#ifdef STATIC_LIB
zdnn_init();
#endif
/***********************************************************************
*
* LSTM (BI-DIR):
*
* INPUTS --------------------------------------------------------------
* input | ZDNN_3DS | (num_timesteps, num_batches, num_features)
* h0 | ZDNN_3DS | (2, num_batches, num_hidden)
* c0 | ZDNN_3DS | (2, num_batches, num_hidden)
* weights | ZDNN_3DS | (2, num_features, num_hidden)
* biases | ZDNN_2DS | (2, num_hidden)
* hidden_weights | ZDNN_3DS | (2, num_hidden, num_hidden)
* hidden_biases | ZDNN_2DS | (2, num_hidden)
*
* OUTPUTS -------------------------------------------------------------
* hn_output | ZDNN_4DS | (num_timesteps, 2, num_batches, num_hidden)
* | | or (1, 2, num_batches, num_hidden)
* cf_output | ZDNN_4DS | (1, 2, num_batches, num_hidden)
***********************************************************************/
/***********************************************************************
* Create input zTensor
***********************************************************************/
zdnn_tensor_desc input_pre_tfrmd_desc, input_tfrmd_desc;
zdnn_ztensor input;
uint32_t num_timesteps = 5;
uint32_t num_batches = 3;
uint32_t num_features = 32;
uint32_t num_hidden = 5;
zdnn_data_types type = FP32;
short element_size = 4; // size of each element in bytes
lstm_gru_direction dir = BIDIR;
uint8_t num_dirs = 2;
zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &input_pre_tfrmd_desc,
num_timesteps, num_batches, num_features);
status =
zdnn_generate_transformed_desc(&input_pre_tfrmd_desc, &input_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&input_pre_tfrmd_desc,
&input_tfrmd_desc, &input);
assert(status == ZDNN_OK);
uint64_t input_data_size =
num_timesteps * num_batches * num_features * element_size;
void *input_data = malloc(input_data_size);
status = zdnn_transform_ztensor(&input, input_data);
assert(status == ZDNN_OK);
/***********************************************************************
* Create initial hidden and cell state zTensors
***********************************************************************/
zdnn_tensor_desc h0c0_pre_tfrmd_desc, h0c0_tfrmd_desc;
zdnn_ztensor h0, c0;
zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &h0c0_pre_tfrmd_desc, num_dirs,
num_batches, num_hidden);
status =
zdnn_generate_transformed_desc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc,
&h0);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc,
&c0);
assert(status == ZDNN_OK);
uint64_t h0c0_data_size = num_batches * num_hidden * element_size;
void *hidden_state_data = malloc(h0c0_data_size);
void *cell_state_data = malloc(h0c0_data_size);
status = zdnn_transform_ztensor(&h0, hidden_state_data);
assert(status == ZDNN_OK);
status = zdnn_transform_ztensor(&c0, cell_state_data);
assert(status == ZDNN_OK);
/***********************************************************************
* Create input weights zTensor
* Resultant zTensor is concatenated
***********************************************************************/
zdnn_tensor_desc weights_pre_tfrmd_desc, weights_tfrmd_desc;
zdnn_ztensor weights;
zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &weights_pre_tfrmd_desc,
num_dirs, num_features, num_hidden);
status = zdnn_generate_transformed_desc_concatenated(
&weights_pre_tfrmd_desc, RNN_TYPE_LSTM | USAGE_WEIGHTS | PREV_LAYER_NONE,
&weights_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&weights_pre_tfrmd_desc,
&weights_tfrmd_desc, &weights);
assert(status == ZDNN_OK);
uint64_t weights_data_size = num_features * num_hidden * element_size;
void *weights_data_f = malloc(weights_data_size);
void *weights_data_i = malloc(weights_data_size);
void *weights_data_c = malloc(weights_data_size);
void *weights_data_o = malloc(weights_data_size);
status = zdnn_transform_ztensor(&weights, weights_data_f, weights_data_i,
weights_data_c, weights_data_o);
assert(status == ZDNN_OK);
/***********************************************************************
* Create biases zTensors
* Resultant zTensors are concatenated
***********************************************************************/
zdnn_tensor_desc biases_pre_tfrmd_desc, biases_tfrmd_desc;
zdnn_ztensor biases;
zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &biases_pre_tfrmd_desc,
num_dirs, num_hidden);
status = zdnn_generate_transformed_desc_concatenated(
&biases_pre_tfrmd_desc, RNN_TYPE_LSTM | USAGE_BIASES | PREV_LAYER_NONE,
&biases_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&biases_pre_tfrmd_desc,
&biases_tfrmd_desc, &biases);
assert(status == ZDNN_OK);
uint64_t biases_data_size = num_hidden * element_size;
void *biases_data_f = malloc(biases_data_size);
void *biases_data_i = malloc(biases_data_size);
void *biases_data_c = malloc(biases_data_size);
void *biases_data_o = malloc(biases_data_size);
status = zdnn_transform_ztensor(&biases, biases_data_f, biases_data_i,
biases_data_c, biases_data_o);
assert(status == ZDNN_OK);
/***********************************************************************
* Create hidden weights zTensor
* Resultant zTensor is concatenated
***********************************************************************/
zdnn_tensor_desc hidden_weights_pre_tfrmd_desc, hidden_weights_tfrmd_desc;
zdnn_ztensor hidden_weights;
zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &hidden_weights_pre_tfrmd_desc,
num_dirs, num_hidden, num_hidden);
status = zdnn_generate_transformed_desc_concatenated(
&hidden_weights_pre_tfrmd_desc,
RNN_TYPE_LSTM | USAGE_HIDDEN_WEIGHTS | PREV_LAYER_NONE,
&hidden_weights_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&hidden_weights_pre_tfrmd_desc,
&hidden_weights_tfrmd_desc,
&hidden_weights);
assert(status == ZDNN_OK);
uint64_t hidden_weights_data_size = num_hidden * num_hidden * element_size;
void *hidden_weights_data_f = malloc(hidden_weights_data_size);
void *hidden_weights_data_i = malloc(hidden_weights_data_size);
void *hidden_weights_data_c = malloc(hidden_weights_data_size);
void *hidden_weights_data_o = malloc(hidden_weights_data_size);
status = zdnn_transform_ztensor(&hidden_weights, hidden_weights_data_f,
hidden_weights_data_i, hidden_weights_data_c,
hidden_weights_data_o);
assert(status == ZDNN_OK);
/***********************************************************************
* Create hidden biases zTensors
* Resultant zTensors are concatenated
***********************************************************************/
zdnn_tensor_desc hidden_biases_pre_tfrmd_desc, hidden_biases_tfrmd_desc;
zdnn_ztensor hidden_biases;
zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &hidden_biases_pre_tfrmd_desc,
num_dirs, num_hidden);
status = zdnn_generate_transformed_desc_concatenated(
&hidden_biases_pre_tfrmd_desc,
RNN_TYPE_LSTM | USAGE_HIDDEN_BIASES | PREV_LAYER_NONE,
&hidden_biases_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(
&hidden_biases_pre_tfrmd_desc, &hidden_biases_tfrmd_desc, &hidden_biases);
assert(status == ZDNN_OK);
uint64_t hidden_biases_data_size = num_hidden * element_size;
void *hidden_biases_data_f = malloc(hidden_biases_data_size);
void *hidden_biases_data_i = malloc(hidden_biases_data_size);
void *hidden_biases_data_c = malloc(hidden_biases_data_size);
void *hidden_biases_data_o = malloc(hidden_biases_data_size);
status = zdnn_transform_ztensor(&hidden_biases, hidden_biases_data_f,
hidden_biases_data_i, hidden_biases_data_c,
hidden_biases_data_o);
assert(status == ZDNN_OK);
/***********************************************************************
* Create output zTensor
***********************************************************************/
zdnn_tensor_desc hn_pre_tfrmd_desc, hn_tfrmd_desc, cf_pre_tfrmd_desc,
cf_tfrmd_desc;
zdnn_ztensor hn_output_ztensor, cf_output_ztensor;
zdnn_init_pre_transformed_desc(ZDNN_4DS, type, &hn_pre_tfrmd_desc,
num_timesteps, 2, num_batches, num_hidden);
status = zdnn_generate_transformed_desc(&hn_pre_tfrmd_desc, &hn_tfrmd_desc);
assert(status == ZDNN_OK);
zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &cf_pre_tfrmd_desc, 1, 2,
num_batches, num_hidden);
status = zdnn_generate_transformed_desc(&cf_pre_tfrmd_desc, &cf_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&hn_pre_tfrmd_desc, &hn_tfrmd_desc,
&hn_output_ztensor);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&cf_pre_tfrmd_desc, &cf_tfrmd_desc,
&cf_output_ztensor);
assert(status == ZDNN_OK);
/***********************************************************************
* Call the AIU
***********************************************************************/
void *work_area = NULL;
status = zdnn_lstm(&input, &h0, &c0, &weights, &biases, &hidden_weights,
&hidden_biases, dir, work_area, &hn_output_ztensor,
&cf_output_ztensor);
assert(status == ZDNN_OK);
/***********************************************************************
* Output and Cleanup
***********************************************************************/
uint64_t hn_data_size =
num_timesteps * 2 * num_batches * num_hidden * element_size;
uint64_t cf_data_size = 2 * num_batches * num_hidden * element_size;
void *hn_output_data = malloc(hn_data_size);
void *cf_output_data = malloc(cf_data_size);
status = zdnn_transform_origtensor(&hn_output_ztensor, hn_output_data);
assert(status == ZDNN_OK);
status = zdnn_transform_origtensor(&cf_output_ztensor, cf_output_data);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&input);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&h0);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&c0);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&weights);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&biases);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&hidden_weights);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&hidden_biases);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&hn_output_ztensor);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&cf_output_ztensor);
assert(status == ZDNN_OK);
free(input_data);
free(hidden_state_data);
free(cell_state_data);
free(weights_data_f);
free(weights_data_i);
free(weights_data_c);
free(weights_data_o);
free(hidden_weights_data_f);
free(hidden_weights_data_i);
free(hidden_weights_data_c);
free(hidden_weights_data_o);
free(biases_data_f);
free(biases_data_i);
free(biases_data_c);
free(biases_data_o);
free(hidden_biases_data_f);
free(hidden_biases_data_i);
free(hidden_biases_data_c);
free(hidden_biases_data_o);
free(hn_output_data);
free(cf_output_data);
}
// SPDX-License-Identifier: Apache-2.0
/*
* Copyright IBM Corp. 2021
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "zdnn.h"
void do_bidir_layer(zdnn_ztensor *input, uint32_t num_hidden,
zdnn_ztensor *hn_output, bool is_prev_layer_bidir) {
zdnn_status status;
uint32_t num_batches = input->pre_transformed_desc->dim2;
// if input is bidir output from previous layer then number of features for
// this layer is 2x of hidden-state size (dim1) of the previous layer
uint32_t num_features =
input->pre_transformed_desc->dim1 * (is_prev_layer_bidir ? 2 : 1);
zdnn_data_types type = FP32;
short element_size = 4; // size of each element in bytes
lstm_gru_direction dir = BIDIR;
uint8_t num_dirs = 2;
/***********************************************************************
* Create initial hidden and cell state zTensors
***********************************************************************/
zdnn_tensor_desc h0c0_pre_tfrmd_desc, h0c0_tfrmd_desc;
zdnn_ztensor h0, c0;
zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &h0c0_pre_tfrmd_desc, num_dirs,
num_batches, num_hidden);
status =
zdnn_generate_transformed_desc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc,
&h0);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc,
&c0);
assert(status == ZDNN_OK);
uint64_t h0c0_data_size = num_batches * num_hidden * element_size;
void *hidden_state_data = malloc(h0c0_data_size);
void *cell_state_data = malloc(h0c0_data_size);
status = zdnn_transform_ztensor(&h0, hidden_state_data);
assert(status == ZDNN_OK);
status = zdnn_transform_ztensor(&c0, cell_state_data);
assert(status == ZDNN_OK);
/***********************************************************************
* Create input weights zTensor
* Resultant zTensor is concatenated
***********************************************************************/
zdnn_tensor_desc weights_pre_tfrmd_desc, weights_tfrmd_desc;
zdnn_ztensor weights;
// if using previous layer bidir output as input then number of features of
// this layer is
zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &weights_pre_tfrmd_desc,
num_dirs, num_features, num_hidden);
status = zdnn_generate_transformed_desc_concatenated(
&weights_pre_tfrmd_desc,
RNN_TYPE_LSTM | USAGE_WEIGHTS |
(is_prev_layer_bidir ? PREV_LAYER_BIDIR : PREV_LAYER_UNI),
&weights_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&weights_pre_tfrmd_desc,
&weights_tfrmd_desc, &weights);
assert(status == ZDNN_OK);
uint64_t weights_data_size = num_features * num_hidden * element_size;
void *weights_data_f = malloc(weights_data_size);
void *weights_data_i = malloc(weights_data_size);
void *weights_data_c = malloc(weights_data_size);
void *weights_data_o = malloc(weights_data_size);
status = zdnn_transform_ztensor(&weights, weights_data_f, weights_data_i,
weights_data_c, weights_data_o);
assert(status == ZDNN_OK);
/***********************************************************************
* Create biases zTensors
* Resultant zTensors are concatenated
***********************************************************************/
zdnn_tensor_desc biases_pre_tfrmd_desc, biases_tfrmd_desc;
zdnn_ztensor biases;
zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &biases_pre_tfrmd_desc,
num_dirs, num_hidden);
status = zdnn_generate_transformed_desc_concatenated(
&biases_pre_tfrmd_desc,
RNN_TYPE_LSTM | USAGE_BIASES |
(is_prev_layer_bidir ? PREV_LAYER_BIDIR : PREV_LAYER_UNI),
&biases_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&biases_pre_tfrmd_desc,
&biases_tfrmd_desc, &biases);
assert(status == ZDNN_OK);
uint64_t biases_data_size = num_hidden * element_size;
void *biases_data_f = malloc(biases_data_size);
void *biases_data_i = malloc(biases_data_size);
void *biases_data_c = malloc(biases_data_size);
void *biases_data_o = malloc(biases_data_size);
status = zdnn_transform_ztensor(&biases, biases_data_f, biases_data_i,
biases_data_c, biases_data_o);
assert(status == ZDNN_OK);
/***********************************************************************
* Create hidden weights zTensor
* Resultant zTensor is concatenated
***********************************************************************/
zdnn_tensor_desc hidden_weights_pre_tfrmd_desc, hidden_weights_tfrmd_desc;
zdnn_ztensor hidden_weights;
zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &hidden_weights_pre_tfrmd_desc,
num_dirs, num_hidden, num_hidden);
status = zdnn_generate_transformed_desc_concatenated(
&hidden_weights_pre_tfrmd_desc,
RNN_TYPE_LSTM | USAGE_HIDDEN_WEIGHTS |
(is_prev_layer_bidir ? PREV_LAYER_BIDIR : PREV_LAYER_UNI),
&hidden_weights_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&hidden_weights_pre_tfrmd_desc,
&hidden_weights_tfrmd_desc,
&hidden_weights);
assert(status == ZDNN_OK);
uint64_t hidden_weights_data_size = num_hidden * num_hidden * element_size;
void *hidden_weights_data_f = malloc(hidden_weights_data_size);
void *hidden_weights_data_i = malloc(hidden_weights_data_size);
void *hidden_weights_data_c = malloc(hidden_weights_data_size);
void *hidden_weights_data_o = malloc(hidden_weights_data_size);
status = zdnn_transform_ztensor(&hidden_weights, hidden_weights_data_f,
hidden_weights_data_i, hidden_weights_data_c,
hidden_weights_data_o);
assert(status == ZDNN_OK);
/***********************************************************************
* Create hidden biases zTensors
* Resultant zTensors are concatenated
***********************************************************************/
zdnn_tensor_desc hidden_biases_pre_tfrmd_desc, hidden_biases_tfrmd_desc;
zdnn_ztensor hidden_biases;
zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &hidden_biases_pre_tfrmd_desc,
num_dirs, num_hidden);
status = zdnn_generate_transformed_desc_concatenated(
&hidden_biases_pre_tfrmd_desc,
RNN_TYPE_LSTM | USAGE_HIDDEN_BIASES |
(is_prev_layer_bidir ? PREV_LAYER_BIDIR : PREV_LAYER_UNI),
&hidden_biases_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(
&hidden_biases_pre_tfrmd_desc, &hidden_biases_tfrmd_desc, &hidden_biases);
assert(status == ZDNN_OK);
uint64_t hidden_biases_data_size = num_hidden * element_size;
void *hidden_biases_data_f = malloc(hidden_biases_data_size);
void *hidden_biases_data_i = malloc(hidden_biases_data_size);
void *hidden_biases_data_c = malloc(hidden_biases_data_size);
void *hidden_biases_data_o = malloc(hidden_biases_data_size);
status = zdnn_transform_ztensor(&hidden_biases, hidden_biases_data_f,
hidden_biases_data_i, hidden_biases_data_c,
hidden_biases_data_o);
assert(status == ZDNN_OK);
/***********************************************************************
* Create cf output zTensor
***********************************************************************/
zdnn_tensor_desc cf_pre_tfrmd_desc, cf_tfrmd_desc;
zdnn_ztensor cf_output_ztensor;
zdnn_init_pre_transformed_desc(ZDNN_4DS, type, &cf_pre_tfrmd_desc, 1, 2,
num_batches, num_hidden);
status = zdnn_generate_transformed_desc(&cf_pre_tfrmd_desc, &cf_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&cf_pre_tfrmd_desc, &cf_tfrmd_desc,
&cf_output_ztensor);
assert(status == ZDNN_OK);
/***********************************************************************
* Call the AIU
***********************************************************************/
void *work_area = NULL;
status =
zdnn_lstm(input, &h0, &c0, &weights, &biases, &hidden_weights,
&hidden_biases, dir, work_area, hn_output, &cf_output_ztensor);
assert(status == ZDNN_OK);
/***********************************************************************
* Cleanup and Return
***********************************************************************/
status = zdnn_free_ztensor_buffer(&h0);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&c0);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&weights);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&biases);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&hidden_weights);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&hidden_biases);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&cf_output_ztensor);
assert(status == ZDNN_OK);
free(hidden_state_data);
free(cell_state_data);
free(weights_data_f);
free(weights_data_i);
free(weights_data_c);
free(weights_data_o);
free(hidden_weights_data_f);
free(hidden_weights_data_i);
free(hidden_weights_data_c);
free(hidden_weights_data_o);
free(biases_data_f);
free(biases_data_i);
free(biases_data_c);
free(biases_data_o);
free(hidden_biases_data_f);
free(hidden_biases_data_i);
free(hidden_biases_data_c);
free(hidden_biases_data_o);
}
// Sample: LSTM multi-layer BIDIR
int main(int argc, char *argv[]) {
zdnn_status status;
#ifdef STATIC_LIB
zdnn_init();
#endif
uint32_t num_hidden[2] = {5, 4};
/***********************************************************************
* Create input zTensor
***********************************************************************/
zdnn_tensor_desc input_pre_tfrmd_desc, input_tfrmd_desc;
zdnn_ztensor input;
uint32_t num_timesteps = 5;
uint32_t num_batches = 3;
uint32_t num_features = 32;
zdnn_data_types type = FP32;
short element_size = 4; // size of each element in bytes
zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &input_pre_tfrmd_desc,
num_timesteps, num_batches, num_features);
status =
zdnn_generate_transformed_desc(&input_pre_tfrmd_desc, &input_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&input_pre_tfrmd_desc,
&input_tfrmd_desc, &input);
assert(status == ZDNN_OK);
uint64_t input_data_size =
num_timesteps * num_batches * num_features * element_size;
void *input_data = malloc(input_data_size);
status = zdnn_transform_ztensor(&input, input_data);
assert(status == ZDNN_OK);
/***********************************************************************
* Create 2 hn output zTensors
***********************************************************************/
zdnn_tensor_desc hn_pre_tfrmd_desc[2], hn_tfrmd_desc[2];
zdnn_ztensor hn_output[2];
for (int i = 0; i < 2; i++) {
zdnn_init_pre_transformed_desc(ZDNN_4DS, type, &hn_pre_tfrmd_desc[i],
num_timesteps, 2, num_batches,
num_hidden[i]);
status = zdnn_generate_transformed_desc(&hn_pre_tfrmd_desc[i],
&hn_tfrmd_desc[i]);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&hn_pre_tfrmd_desc[i],
&hn_tfrmd_desc[i], &hn_output[i]);
assert(status == ZDNN_OK);
}
/***********************************************************************
* Do the layers
***********************************************************************/
// call the first layer with input, previous layer bidir = false, output goes
// to hn_output[0]
do_bidir_layer(&input, num_hidden[0], &hn_output[0], false);
// call the second layer with hn_output[0] from layer 1, previous layer bidir
// = true, output goes to hn_output[1]
do_bidir_layer(&hn_output[0], num_hidden[1], &hn_output[1], true);
/***********************************************************************
* Output and Cleanup
***********************************************************************/
void *hn_output_data[2];
for (int i = 0; i < 2; i++) {
uint64_t hn_output_data_size = (uint64_t)num_timesteps * num_batches *
num_hidden[i] * 2 * element_size;
hn_output_data[i] = malloc(hn_output_data_size);
status = zdnn_transform_origtensor(&hn_output[i], hn_output_data[i]);
assert(status == ZDNN_OK);
}
status = zdnn_free_ztensor_buffer(&input);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&hn_output[0]);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&hn_output[1]);
assert(status == ZDNN_OK);
free(input_data);
free(hn_output_data[0]);
free(hn_output_data[1]);
}
// SPDX-License-Identifier: Apache-2.0
/*
* Copyright IBM Corp. 2021
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "zdnn.h"
// Sample: GRU
int main(int argc, char *argv[]) {
zdnn_status status;
#ifdef STATIC_LIB
zdnn_init();
#endif
/***********************************************************************
*
* GRU (FWD/BWD):
*
* INPUTS --------------------------------------------------------------
* input | ZDNN_3DS | (num_timesteps, num_batches, num_features)
* h0 | ZDNN_3DS | (1, num_batches, num_hidden)
* weights | ZDNN_3DS | (1, num_features, num_hidden)
* input_biases | ZDNN_2DS | (1, num_hidden)
* hidden_weights | ZDNN_3DS | (1, num_hidden, num_hidden)
* hidden_biases | ZDNN_2DS | (1, num_hidden)
*
* OUTPUTS -------------------------------------------------------------
* hn_output | ZDNN_4DS | (num_timesteps, 1, num_batches, num_hidden)
* | | or (1, 1, num_batches, num_hidden)
***********************************************************************/
/***********************************************************************
* Create input zTensor
***********************************************************************/
zdnn_tensor_desc input_pre_tfrmd_desc, input_tfrmd_desc;
zdnn_ztensor input;
uint32_t num_timesteps = 5;
uint32_t num_batches = 3;
uint32_t num_features = 32;
uint32_t num_hidden = 5;
zdnn_data_types type = FP32;
short element_size = 4; // size of each element in bytes
lstm_gru_direction dir = FWD;
uint8_t num_dirs = 1;
zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &input_pre_tfrmd_desc,
num_timesteps, num_batches, num_features);
status =
zdnn_generate_transformed_desc(&input_pre_tfrmd_desc, &input_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&input_pre_tfrmd_desc,
&input_tfrmd_desc, &input);
assert(status == ZDNN_OK);
uint64_t input_data_size =
num_timesteps * num_batches * num_features * element_size;
void *input_data = malloc(input_data_size);
status = zdnn_transform_ztensor(&input, input_data);
assert(status == ZDNN_OK);
/***********************************************************************
* Create initial hidden zTensor
***********************************************************************/
zdnn_tensor_desc h0_pre_tfrmd_desc, h0_tfrmd_desc;
zdnn_ztensor h0;
zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &h0_pre_tfrmd_desc, num_dirs,
num_batches, num_hidden);
status = zdnn_generate_transformed_desc(&h0_pre_tfrmd_desc, &h0_tfrmd_desc);
assert(status == ZDNN_OK);
status =
zdnn_init_ztensor_with_malloc(&h0_pre_tfrmd_desc, &h0_tfrmd_desc, &h0);
assert(status == ZDNN_OK);
uint64_t h0_data_size = num_batches * num_hidden * element_size;
void *hidden_state_data = malloc(h0_data_size);
status = zdnn_transform_ztensor(&h0, hidden_state_data);
assert(status == ZDNN_OK);
/***********************************************************************
* Create input weights zTensor
* Resultant zTensor is concatenated
***********************************************************************/
zdnn_tensor_desc weights_pre_tfrmd_desc, weights_tfrmd_desc;
zdnn_ztensor weights;
zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &weights_pre_tfrmd_desc,
num_dirs, num_features, num_hidden);
status = zdnn_generate_transformed_desc_concatenated(
&weights_pre_tfrmd_desc, RNN_TYPE_GRU | USAGE_WEIGHTS | PREV_LAYER_NONE,
&weights_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&weights_pre_tfrmd_desc,
&weights_tfrmd_desc, &weights);
assert(status == ZDNN_OK);
uint64_t weights_data_size = num_features * num_hidden * element_size;
void *weights_data_z = malloc(weights_data_size);
void *weights_data_r = malloc(weights_data_size);
void *weights_data_h = malloc(weights_data_size);
status = zdnn_transform_ztensor(&weights, weights_data_z, weights_data_r,
weights_data_h);
assert(status == ZDNN_OK);
/***********************************************************************
* Create biases zTensors
* Resultant zTensors are concatenated
***********************************************************************/
zdnn_tensor_desc biases_pre_tfrmd_desc, biases_tfrmd_desc;
zdnn_ztensor biases;
zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &biases_pre_tfrmd_desc,
num_dirs, num_hidden);
status = zdnn_generate_transformed_desc_concatenated(
&biases_pre_tfrmd_desc, RNN_TYPE_GRU | USAGE_BIASES | PREV_LAYER_NONE,
&biases_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&biases_pre_tfrmd_desc,
&biases_tfrmd_desc, &biases);
assert(status == ZDNN_OK);
uint64_t biases_data_size = num_hidden * element_size;
void *biases_data_z = malloc(biases_data_size);
void *biases_data_r = malloc(biases_data_size);
void *biases_data_h = malloc(biases_data_size);
status = zdnn_transform_ztensor(&biases, biases_data_z, biases_data_r,
biases_data_h);
assert(status == ZDNN_OK);
/***********************************************************************
* Create hidden weights zTensor
* Resultant zTensor is concatenated
***********************************************************************/
zdnn_tensor_desc hidden_weights_pre_tfrmd_desc, hidden_weights_tfrmd_desc;
zdnn_ztensor hidden_weights;
zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &hidden_weights_pre_tfrmd_desc,
num_dirs, num_hidden, num_hidden);
status = zdnn_generate_transformed_desc_concatenated(
&hidden_weights_pre_tfrmd_desc,
RNN_TYPE_GRU | USAGE_HIDDEN_WEIGHTS | PREV_LAYER_NONE,
&hidden_weights_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&hidden_weights_pre_tfrmd_desc,
&hidden_weights_tfrmd_desc,
&hidden_weights);
assert(status == ZDNN_OK);
uint64_t hidden_weights_data_size = num_hidden * num_hidden * element_size;
void *hidden_weights_data_z = malloc(hidden_weights_data_size);
void *hidden_weights_data_r = malloc(hidden_weights_data_size);
void *hidden_weights_data_h = malloc(hidden_weights_data_size);
status = zdnn_transform_ztensor(&hidden_weights, hidden_weights_data_z,
hidden_weights_data_r, hidden_weights_data_h);
assert(status == ZDNN_OK);
/***********************************************************************
* Create hidden biases zTensors
* Resultant zTensors are concatenated
***********************************************************************/
zdnn_tensor_desc hidden_biases_pre_tfrmd_desc, hidden_biases_tfrmd_desc;
zdnn_ztensor hidden_biases;
zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &hidden_biases_pre_tfrmd_desc,
num_dirs, num_hidden);
status = zdnn_generate_transformed_desc_concatenated(
&hidden_biases_pre_tfrmd_desc,
RNN_TYPE_GRU | USAGE_HIDDEN_BIASES | PREV_LAYER_NONE,
&hidden_biases_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(
&hidden_biases_pre_tfrmd_desc, &hidden_biases_tfrmd_desc, &hidden_biases);
assert(status == ZDNN_OK);
uint64_t hidden_biases_data_size = num_hidden * element_size;
void *hidden_biases_data_z = malloc(hidden_biases_data_size);
void *hidden_biases_data_r = malloc(hidden_biases_data_size);
void *hidden_biases_data_h = malloc(hidden_biases_data_size);
status = zdnn_transform_ztensor(&hidden_biases, hidden_biases_data_z,
hidden_biases_data_r, hidden_biases_data_h);
assert(status == ZDNN_OK);
/***********************************************************************
* Create output zTensor
***********************************************************************/
// get only the last timestep
zdnn_tensor_desc hn_pre_tfrmd_desc, hn_tfrmd_desc;
zdnn_ztensor hn_output_ztensor;
zdnn_init_pre_transformed_desc(ZDNN_4DS, type, &hn_pre_tfrmd_desc, 1, 1,
num_batches, num_hidden);
status = zdnn_generate_transformed_desc(&hn_pre_tfrmd_desc, &hn_tfrmd_desc);
assert(status == ZDNN_OK);
status = zdnn_init_ztensor_with_malloc(&hn_pre_tfrmd_desc, &hn_tfrmd_desc,
&hn_output_ztensor);
assert(status == ZDNN_OK);
/***********************************************************************
* Call the AIU
***********************************************************************/
void *work_area = NULL;
status = zdnn_gru(&input, &h0, &weights, &biases, &hidden_weights,
&hidden_biases, dir, work_area, &hn_output_ztensor);
assert(status == ZDNN_OK);
/***********************************************************************
* Output and Cleanup
***********************************************************************/
uint64_t hn_data_size = num_batches * num_hidden * element_size;
void *hn_output_data = malloc(hn_data_size);
status = zdnn_transform_origtensor(&hn_output_ztensor, hn_output_data);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&input);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&h0);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&weights);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&biases);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&hidden_weights);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&hidden_biases);
assert(status == ZDNN_OK);
status = zdnn_free_ztensor_buffer(&hn_output_ztensor);
assert(status == ZDNN_OK);
free(input_data);
free(hidden_state_data);
free(weights_data_z);
free(weights_data_r);
free(weights_data_h);
free(hidden_weights_data_z);
free(hidden_weights_data_r);
free(hidden_weights_data_h);
free(biases_data_z);
free(biases_data_r);
free(biases_data_h);
free(hidden_biases_data_z);
free(hidden_biases_data_r);
free(hidden_biases_data_h);
free(hn_output_data);
}