Run Sequence-to-Sequence Regression on FPGAs
This example shows how to create, compile, and deploy a long short-term memory (LSTM) network trained on remaining useful life (RUL) of engines. Use the deployed network to predict the RUL for an engine. Use MATLAB® to retrieve the prediction results from the target device.
This example uses the turbofan engine degradation data used in [1]. The example uses an LSTM network to predict the remaining useful life of an engine measured in cycles when given time series data representing various sensors in the engine. The training data contains simulated time series data for 100 engines. Each sequence varies in length and corresponds to a full run to failure (RTF) instance. The test data contains 100 partial sequences and the corresponding values for the remaining useful life at the end of each sequence.
The data set contains 100 training observations and 100 test observations.
To learn more about how to train this network, see Sequence-to-Sequence Regression Using Deep Learning. Fore this example, you must have a Xilinx® Zynq® Ultrascale+™ ZCU102 SoC development kit.
Download Data
Download and unzip the turbofan engine degradation simulation data set.
Each time series in the turbofan engine degradation simulation data set represents a different engine. Each engine starts with unknown degrees of initial wear and manufacturing variation. The engine operates normally at the start of each time series, and develops a fault at some point during the series. In the training set, the fault grows in magnitude until system failure.
The data contains ZIP-compressed text files with 26 columns of numbers, separated by spaces. Each row is a snapshot of data taken during a single operational cycle, and each column is a different variable. The columns correspond to the following:
Column 1 — Unit number
Column 2 — Time in cycles
Columns 3–5 — Operational settings
Columns 6–26 — Sensor measurements 1–21
Create a directory to store the turbofan engine degradation Simulation data set.
dataFolder = fullfile(tempdir,"turbofan"); if ~exist(dataFolder,'dir') mkdir(dataFolder); end
Download and extract the turbofan engine degradation simulation data set.
filename = matlab.internal.examples.downloadSupportFile("nnet","data/TurbofanEngineDegradationSimulationData.zip"); unzip(filename,dataFolder)
Prepare Test Data
Load the test data using the processTurboFanDataTest
function attached to this example. The processTurboFanDataTest
function extracts the data from filenamePredictors
and filenameResponses
and returns the cell arrays XTest
and YTest
, which contain the test predictor and response sequences, respectively.
filenamePredictors = fullfile(pwd,"test_FD001.txt"); filenameResponses = fullfile(pwd,"RUL_FD001.txt"); [XTest,YTest] = processTurboFanDataTest(filenamePredictors,filenameResponses);
Remove features with constant values using idxConstant
calculated from the training data. Normalize the test predictors using the same parameters as in the training data. Clip the test responses at the same threshold used for the training data.
filenamePredictors = fullfile(pwd,"train_FD001.txt");
[XTrain,YTrain] = processTurboFanDataTrain(filenamePredictors);
Remove Features with Constant Values
Features that remain constant for all time steps can negatively impact the training. Find the rows of data that have the same minimum and maximum values, and remove the rows. Then use these values to clean up the test dataset.
m = min([XTrain{:}],[],2); M = max([XTrain{:}],[],2); idxConstant = M == m; for i = 1:numel(XTrain) XTrain{i}(idxConstant,:) = []; end numFeatures = size(XTrain{1},1); mu = mean([XTrain{:}],2); sig = std([XTrain{:}],0,2); for i = 1:numel(XTrain) XTrain{i} = (XTrain{i} - mu) ./ sig; end thr = 150; %threshold for i = 1:numel(XTest) XTest{i}(idxConstant,:) = []; XTest{i} = (XTest{i} - mu) ./ sig; YTest{i}(YTest{i} > thr) = thr; end
Load the Pretrained Network
Load the LSTM network. This network was trained on NASA CMAPSS Data described in [1], enter:
load CMAPSSDataNetwork
View the layers of the network by using the analyzeNetwork
function. The function returns a graphical representation of the network and the parameter settings for the layers in the network.
analyzeNetwork(net)
Define FPGA Board Interface
Define the target FPGA board programming interface by using the dlhdl.Target
object. Specify that the interface is for a Xilinx board with an Ethernet interface.
To create the target object, enter:
hTarget = dlhdl.Target('Xilinx','Interface','Ethernet');
Alternatively, to use the JTAG interface, install Xilinx™ Vivado™ Design Suite 2020.2. To set the Xilinx Vivado toolpath, enter:
hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2020.2\bin\vivado.bat'); hTarget = dlhdl.Target('Xilinx','Interface','JTAG');
Prepare Network for Deployment
Prepare the network for deployment by creating a dlhdl.Workflow
object. Specify the network and the bitstream name. Ensure that the bitstream name matches the data type of your FPGA board. In this example, the target board is the Xilinx ZCU102 SOC. The bitstream uses a single data type.
hW = dlhdl.Workflow('network', net, 'Bitstream', 'zcu102_lstm_single','Target',hTarget);
Alternatively, to run the example on the Xilinx ZC706 board, enter:
hW = dlhdl.Workflow('Network', snet, 'Bitstream', 'zc706_lstm_single','Target',hTarget);
Compile Network
Run the compile
method of the dlhdl.Workflow
object to compile the network and generate the instructions, weights, and biases for deployment. The total number of frames exceeds the default value of 30. Set the InputFrameNumberLimit
name-value argument to 500 to run predictions in chunks of 500 frames to prevent timeouts.
dn = compile(hW,'InputFrameNumberLimit',500)
### Compiling network for Deep Learning FPGA prototyping ... ### Targeting FPGA bitstream zcu102_lstm_single. ### The network includes the following layers: 1 'sequenceinput' Sequence Input Sequence input with 17 dimensions (SW Layer) 2 'lstm' LSTM LSTM with 200 hidden units (HW Layer) 3 'fc_1' Fully Connected 50 fully connected layer (HW Layer) 4 'fc_2' Fully Connected 1 fully connected layer (HW Layer) 5 'regressionoutput' Regression Output mean-squared-error with response 'Response' (SW Layer) ### Notice: The layer 'sequenceinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software. ### Notice: The layer 'regressionoutput' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software. ### Compiling layer group: lstm.wi ... ### Compiling layer group: lstm.wi ... complete. ### Compiling layer group: lstm.wo ... ### Compiling layer group: lstm.wo ... complete. ### Compiling layer group: lstm.wg ... ### Compiling layer group: lstm.wg ... complete. ### Compiling layer group: lstm.wf ... ### Compiling layer group: lstm.wf ... complete. ### Compiling layer group: fc_1>>fc_2 ... ### Compiling layer group: fc_1>>fc_2 ... complete. ### Allocating external memory buffers: offset_name offset_address allocated_space _______________________ ______________ ________________ "InputDataOffset" "0x00000000" "4.0 MB" "OutputResultOffset" "0x00400000" "4.0 MB" "SchedulerDataOffset" "0x00800000" "4.0 MB" "SystemBufferOffset" "0x00c00000" "20.0 MB" "InstructionDataOffset" "0x02000000" "4.0 MB" "FCWeightDataOffset" "0x02400000" "4.0 MB" "EndOffset" "0x02800000" "Total: 40.0 MB" ### Network compilation complete.
dn = struct with fields:
weights: [1×1 struct]
instructions: [1×1 struct]
registers: [1×1 struct]
syncInstructions: [1×1 struct]
constantData: {}
ddrInfo: [1×1 struct]
Program Bitstream onto FPGA and Download Network Weights
To deploy the network on the Xilinx ZCU102 SoC hardware, run the deploy
function of the dlhdl.Workflow
object. This function uses the output of the compile
function to program the FPGA board by using the programming file. The deploy
function starts programming the FPGA device and displays progress messages, and the required time to deploy the network.
hW.deploy
### FPGA bitstream programming has been skipped as the same bitstream is already loaded on the target FPGA. ### Resetting network state. ### Loading weights to FC Processor. ### FC Weights loaded. Current time is 09-May-2023 14:02:27
Predict Remaining Useful Life
Run the predict
method of the dlhdl.Workflow
object, to make predictions on the test data.
for i = 1:numel(XTest) YPred{i} = hW.predict(XTest{i},Profile='on'); end
### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 31. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85597 0.00039 31 2660531 2563.4 memSeparator_0 102 0.00000 lstm.wi 19406 0.00009 lstm.wo 19287 0.00009 lstm.wg 19288 0.00009 lstm.wf 19315 0.00009 lstm.sigmoid_1 275 0.00000 lstm.sigmoid_3 267 0.00000 lstm.tanh_1 287 0.00000 lstm.sigmoid_2 267 0.00000 lstm.multiplication_2 487 0.00000 lstm.multiplication_1 417 0.00000 lstm.c_add 421 0.00000 lstm.tanh_2 281 0.00000 lstm.multiplication_3 481 0.00000 fc_1 4680 0.00002 fc_2 335 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 49. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85575 0.00039 49 4204571 2563.9 memSeparator_0 102 0.00000 lstm.wi 19473 0.00009 lstm.wo 19317 0.00009 lstm.wg 19287 0.00009 lstm.wf 19287 0.00009 lstm.sigmoid_1 285 0.00000 lstm.sigmoid_3 317 0.00000 lstm.tanh_1 277 0.00000 lstm.sigmoid_2 267 0.00000 lstm.multiplication_2 427 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 431 0.00000 lstm.tanh_2 271 0.00000 lstm.multiplication_3 401 0.00000 fc_1 4719 0.00002 fc_2 286 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 126. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85581 0.00039 126 10810895 2564.1 memSeparator_0 102 0.00000 lstm.wi 19479 0.00009 lstm.wo 19317 0.00009 lstm.wg 19287 0.00009 lstm.wf 19287 0.00009 lstm.sigmoid_1 285 0.00000 lstm.sigmoid_3 317 0.00000 lstm.tanh_1 287 0.00000 lstm.sigmoid_2 267 0.00000 lstm.multiplication_2 427 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 421 0.00000 lstm.tanh_2 271 0.00000 lstm.multiplication_3 401 0.00000 fc_1 4720 0.00002 fc_2 285 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 106. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85577 0.00039 106 9093836 2564.4 memSeparator_0 102 0.00000 lstm.wi 19406 0.00009 lstm.wo 19287 0.00009 lstm.wg 19288 0.00009 lstm.wf 19315 0.00009 lstm.sigmoid_1 285 0.00000 lstm.sigmoid_3 257 0.00000 lstm.tanh_1 277 0.00000 lstm.sigmoid_2 277 0.00000 lstm.multiplication_2 487 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 411 0.00000 lstm.tanh_2 271 0.00000 lstm.multiplication_3 481 0.00000 fc_1 4673 0.00002 fc_2 332 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 98. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85596 0.00039 98 8408321 2564.1 memSeparator_0 102 0.00000 lstm.wi 19462 0.00009 lstm.wo 19320 0.00009 lstm.wg 19277 0.00009 lstm.wf 19296 0.00009 lstm.sigmoid_1 285 0.00000 lstm.sigmoid_3 317 0.00000 lstm.tanh_1 287 0.00000 lstm.sigmoid_2 277 0.00000 lstm.multiplication_2 427 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 411 0.00000 lstm.tanh_2 291 0.00000 lstm.multiplication_3 401 0.00000 fc_1 4726 0.00002 fc_2 289 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 105. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85596 0.00039 105 9008955 2564.1 memSeparator_0 102 0.00000 lstm.wi 19406 0.00009 lstm.wo 19287 0.00009 lstm.wg 19290 0.00009 lstm.wf 19312 0.00009 lstm.sigmoid_1 285 0.00000 lstm.sigmoid_3 267 0.00000 lstm.tanh_1 277 0.00000 lstm.sigmoid_2 277 0.00000 lstm.multiplication_2 487 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 411 0.00000 lstm.tanh_2 291 0.00000 lstm.multiplication_3 461 0.00000 fc_1 4675 0.00002 fc_2 340 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 160. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85586 0.00039 160 13727877 2564.1 memSeparator_0 102 0.00000 lstm.wi 19406 0.00009 lstm.wo 19297 0.00009 lstm.wg 19281 0.00009 lstm.wf 19312 0.00009 lstm.sigmoid_1 284 0.00000 lstm.sigmoid_3 267 0.00000 lstm.tanh_1 277 0.00000 lstm.sigmoid_2 267 0.00000 lstm.multiplication_2 497 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 411 0.00000 lstm.tanh_2 291 0.00000 lstm.multiplication_3 461 0.00000 fc_1 4672 0.00002 fc_2 333 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 166. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85578 0.00039 166 14242348 2564.2 memSeparator_0 102 0.00000 lstm.wi 19396 0.00009 lstm.wo 19287 0.00009 lstm.wg 19287 0.00009 lstm.wf 19317 0.00009 lstm.sigmoid_1 285 0.00000 lstm.sigmoid_3 267 0.00000 lstm.tanh_1 287 0.00000 lstm.sigmoid_2 267 0.00000 lstm.multiplication_2 477 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 411 0.00000 lstm.tanh_2 291 0.00000 lstm.multiplication_3 471 0.00000 fc_1 4670 0.00002 fc_2 335 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 55. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85636 0.00039 55 4719211 2564.0 memSeparator_0 102 0.00000 lstm.wi 19456 0.00009 lstm.wo 19297 0.00009 lstm.wg 19281 0.00009 lstm.wf 19312 0.00009 lstm.sigmoid_1 284 0.00000 lstm.sigmoid_3 267 0.00000 lstm.tanh_1 277 0.00000 lstm.sigmoid_2 277 0.00000 lstm.multiplication_2 487 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 411 0.00000 lstm.tanh_2 291 0.00000 lstm.multiplication_3 461 0.00000 fc_1 4671 0.00002 fc_2 334 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 192. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85578 0.00039 192 16473028 2564.2 memSeparator_0 102 0.00000 lstm.wi 19396 0.00009 lstm.wo 19287 0.00009 lstm.wg 19287 0.00009 lstm.wf 19317 0.00009 lstm.sigmoid_1 285 0.00000 lstm.sigmoid_3 267 0.00000 lstm.tanh_1 277 0.00000 lstm.sigmoid_2 277 0.00000 lstm.multiplication_2 477 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 411 0.00000 lstm.tanh_2 291 0.00000 lstm.multiplication_3 471 0.00000 fc_1 4673 0.00002 fc_2 332 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 83. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85585 0.00039 83 7121410 2564.1 memSeparator_0 102 0.00000 lstm.wi 19463 0.00009 lstm.wo 19317 0.00009 lstm.wg 19287 0.00009 lstm.wf 19287 0.00009 lstm.sigmoid_1 285 0.00000 lstm.sigmoid_3 317 0.00000 lstm.tanh_1 287 0.00000 lstm.sigmoid_2 277 0.00000 lstm.multiplication_2 427 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 421 0.00000 lstm.tanh_2 281 0.00000 lstm.multiplication_3 401 0.00000 fc_1 4717 0.00002 fc_2 288 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 217. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85605 0.00039 217 18617229 2564.3 memSeparator_0 102 0.00000 lstm.wi 19483 0.00009 lstm.wo 19317 0.00009 lstm.wg 19287 0.00009 lstm.wf 19287 0.00009 lstm.sigmoid_1 275 0.00000 lstm.sigmoid_3 327 0.00000 lstm.tanh_1 277 0.00000 lstm.sigmoid_2 267 0.00000 lstm.multiplication_2 427 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 431 0.00000 lstm.tanh_2 291 0.00000 lstm.multiplication_3 401 0.00000 fc_1 4722 0.00002 fc_2 283 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 195. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85636 0.00039 195 16729884 2564.3 memSeparator_0 102 0.00000 lstm.wi 19456 0.00009 lstm.wo 19297 0.00009 lstm.wg 19281 0.00009 lstm.wf 19312 0.00009 lstm.sigmoid_1 284 0.00000 lstm.sigmoid_3 267 0.00000 lstm.tanh_1 277 0.00000 lstm.sigmoid_2 277 0.00000 lstm.multiplication_2 487 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 411 0.00000 lstm.tanh_2 291 0.00000 lstm.multiplication_3 461 0.00000 fc_1 4672 0.00002 fc_2 333 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 46. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85596 0.00039 46 3947354 2563.7 memSeparator_0 102 0.00000 lstm.wi 19406 0.00009 lstm.wo 19296 0.00009 lstm.wg 19287 0.00009 lstm.wf 19307 0.00009 lstm.sigmoid_1 274 0.00000 lstm.sigmoid_3 267 0.00000 lstm.tanh_1 277 0.00000 lstm.sigmoid_2 277 0.00000 lstm.multiplication_2 497 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 421 0.00000 lstm.tanh_2 281 0.00000 lstm.multiplication_3 461 0.00000 fc_1 4679 0.00002 fc_2 336 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 76. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85584 0.00039 76 6520835 2564.1 memSeparator_0 102 0.00000 lstm.wi 19482 0.00009 lstm.wo 19317 0.00009 lstm.wg 19287 0.00009 lstm.wf 19287 0.00009 lstm.sigmoid_1 265 0.00000 lstm.sigmoid_3 337 0.00000 lstm.tanh_1 277 0.00000 lstm.sigmoid_2 277 0.00000 lstm.multiplication_2 427 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 421 0.00000 lstm.tanh_2 271 0.00000 lstm.multiplication_3 401 0.00000 fc_1 4717 0.00002 fc_2 288 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 113. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85585 0.00039 113 9695264 2564.1 memSeparator_0 102 0.00000 lstm.wi 19406 0.00009 lstm.wo 19287 0.00009 lstm.wg 19290 0.00009 lstm.wf 19312 0.00009 lstm.sigmoid_1 284 0.00000 lstm.sigmoid_3 257 0.00000 lstm.tanh_1 277 0.00000 lstm.sigmoid_2 267 0.00000 lstm.multiplication_2 497 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 411 0.00000 lstm.tanh_2 291 0.00000 lstm.multiplication_3 471 0.00000 fc_1 4671 0.00002 fc_2 334 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 165. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85597 0.00039 165 14156683 2564.2 memSeparator_0 102 0.00000 lstm.wi 19475 0.00009 lstm.wo 19317 0.00009 lstm.wg 19287 0.00009 lstm.wf 19287 0.00009 lstm.sigmoid_1 285 0.00000 lstm.sigmoid_3 317 0.00000 lstm.tanh_1 287 0.00000 lstm.sigmoid_2 267 0.00000 lstm.multiplication_2 427 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 421 0.00000 lstm.tanh_2 291 0.00000 lstm.multiplication_3 401 0.00000 fc_1 4716 0.00002 fc_2 289 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 133. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85578 0.00039 133 11411341 2564.1 memSeparator_0 102 0.00000 lstm.wi 19458 0.00009 lstm.wo 19315 0.00009 lstm.wg 19287 0.00009 lstm.wf 19287 0.00009 lstm.sigmoid_1 285 0.00000 lstm.sigmoid_3 317 0.00000 lstm.tanh_1 287 0.00000 lstm.sigmoid_2 277 0.00000 lstm.multiplication_2 427 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 421 0.00000 lstm.tanh_2 281 0.00000 lstm.multiplication_3 401 0.00000 fc_1 4718 0.00002 fc_2 287 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 135. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85576 0.00039 135 11582414 2564.2 memSeparator_0 102 0.00000 lstm.wi 19456 0.00009 lstm.wo 19307 0.00009 lstm.wg 19286 0.00009 lstm.wf 19296 0.00009 lstm.sigmoid_1 285 0.00000 lstm.sigmoid_3 317 0.00000 lstm.tanh_1 287 0.00000 lstm.sigmoid_2 277 0.00000 lstm.multiplication_2 427 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 411 0.00000 lstm.tanh_2 291 0.00000 lstm.multiplication_3 401 0.00000 fc_1 4721 0.00002 fc_2 284 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 184. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85546 0.00039 184 15786933 2564.1 memSeparator_0 102 0.00000 lstm.wi 19366 0.00009 lstm.wo 19296 0.00009 lstm.wg 19287 0.00009 lstm.wf 19307 0.00009 lstm.sigmoid_1 284 0.00000 lstm.sigmoid_3 257 0.00000 lstm.tanh_1 277 0.00000 lstm.sigmoid_2 277 0.00000 lstm.multiplication_2 497 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 421 0.00000 lstm.tanh_2 271 0.00000 lstm.multiplication_3 471 0.00000 fc_1 4669 0.00002 fc_2 336 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 148. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85587 0.00039 148 12698421 2564.1 memSeparator_0 102 0.00000 lstm.wi 19406 0.00009 lstm.wo 19287 0.00009 lstm.wg 19288 0.00009 lstm.wf 19315 0.00009 lstm.sigmoid_1 285 0.00000 lstm.sigmoid_3 267 0.00000 lstm.tanh_1 287 0.00000 lstm.sigmoid_2 277 0.00000 lstm.multiplication_2 477 0.00000 lstm.multiplication_1 417 0.00000 lstm.c_add 421 0.00000 lstm.tanh_2 271 0.00000 lstm.multiplication_3 481 0.00000 fc_1 4679 0.00002 fc_2 326 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 39. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85596 0.00039 39 3346572 2563.8 memSeparator_0 102 0.00000 lstm.wi 19462 0.00009 lstm.wo 19320 0.00009 lstm.wg 19277 0.00009 lstm.wf 19296 0.00009 lstm.sigmoid_1 285 0.00000 lstm.sigmoid_3 317 0.00000 lstm.tanh_1 287 0.00000 lstm.sigmoid_2 277 0.00000 lstm.multiplication_2 427 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 411 0.00000 lstm.tanh_2 291 0.00000 lstm.multiplication_3 401 0.00000 fc_1 4726 0.00002 fc_2 289 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 130. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85586 0.00039 130 11153050 2564.3 memSeparator_0 102 0.00000 lstm.wi 19406 0.00009 lstm.wo 19297 0.00009 lstm.wg 19281 0.00009 lstm.wf 19312 0.00009 lstm.sigmoid_1 274 0.00000 lstm.sigmoid_3 257 0.00000 lstm.tanh_1 277 0.00000 lstm.sigmoid_2 277 0.00000 lstm.multiplication_2 497 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 421 0.00000 lstm.tanh_2 281 0.00000 lstm.multiplication_3 471 0.00000 fc_1 4672 0.00002 fc_2 333 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 186. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85586 0.00039 186 15958277 2564.2 memSeparator_0 102 0.00000 lstm.wi 19462 0.00009 lstm.wo 19320 0.00009 lstm.wg 19277 0.00009 lstm.wf 19296 0.00009 lstm.sigmoid_1 275 0.00000 lstm.sigmoid_3 327 0.00000 lstm.tanh_1 277 0.00000 lstm.sigmoid_2 267 0.00000 lstm.multiplication_2 427 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 431 0.00000 lstm.tanh_2 291 0.00000 lstm.multiplication_3 401 0.00000 fc_1 4715 0.00002 fc_2 290 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 48. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85575 0.00039 48 4119029 2563.7 memSeparator_0 102 0.00000 lstm.wi 19473 0.00009 lstm.wo 19317 0.00009 lstm.wg 19287 0.00009 lstm.wf 19287 0.00009 lstm.sigmoid_1 285 0.00000 lstm.sigmoid_3 317 0.00000 lstm.tanh_1 277 0.00000 lstm.sigmoid_2 277 0.00000 lstm.multiplication_2 427 0.00000 lstm.multiplication_1 417 0.00000 lstm.c_add 431 0.00000 lstm.tanh_2 271 0.00000 lstm.multiplication_3 401 0.00000 fc_1 4719 0.00002 fc_2 286 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 76. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85595 0.00039 76 6521203 2563.9 memSeparator_0 102 0.00000 lstm.wi 19473 0.00009 lstm.wo 19317 0.00009 lstm.wg 19287 0.00009 lstm.wf 19287 0.00009 lstm.sigmoid_1 275 0.00000 lstm.sigmoid_3 327 0.00000 lstm.tanh_1 277 0.00000 lstm.sigmoid_2 277 0.00000 lstm.multiplication_2 427 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 421 0.00000 lstm.tanh_2 281 0.00000 lstm.multiplication_3 411 0.00000 fc_1 4717 0.00002 fc_2 288 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 140. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85586 0.00039 140 12011869 2564.1 memSeparator_0 102 0.00000 lstm.wi 19406 0.00009 lstm.wo 19287 0.00009 lstm.wg 19288 0.00009 lstm.wf 19314 0.00009 lstm.sigmoid_1 275 0.00000 lstm.sigmoid_3 267 0.00000 lstm.tanh_1 287 0.00000 lstm.sigmoid_2 267 0.00000 lstm.multiplication_2 497 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 411 0.00000 lstm.tanh_2 281 0.00000 lstm.multiplication_3 471 0.00000 fc_1 4676 0.00002 fc_2 329 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 158. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85586 0.00039 158 13556375 2564.1 memSeparator_0 102 0.00000 lstm.wi 19406 0.00009 lstm.wo 19287 0.00009 lstm.wg 19288 0.00009 lstm.wf 19314 0.00009 lstm.sigmoid_1 285 0.00000 lstm.sigmoid_3 267 0.00000 lstm.tanh_1 277 0.00000 lstm.sigmoid_2 267 0.00000 lstm.multiplication_2 497 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 411 0.00000 lstm.tanh_2 291 0.00000 lstm.multiplication_3 461 0.00000 fc_1 4676 0.00002 fc_2 329 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 171. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85585 0.00039 171 14671396 2564.2 memSeparator_0 102 0.00000 lstm.wi 19450 0.00009 lstm.wo 19312 0.00009 lstm.wg 19286 0.00009 lstm.wf 19296 0.00009 lstm.sigmoid_1 285 0.00000 lstm.sigmoid_3 317 0.00000 lstm.tanh_1 287 0.00000 lstm.sigmoid_2 277 0.00000 lstm.multiplication_2 427 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 421 0.00000 lstm.tanh_2 281 0.00000 lstm.multiplication_3 401 0.00000 fc_1 4724 0.00002 fc_2 291 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 143. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85601 0.00039 143 12269177 2564.1 memSeparator_0 102 0.00000 lstm.wi 19479 0.00009 lstm.wo 19317 0.00009 lstm.wg 19287 0.00009 lstm.wf 19287 0.00009 lstm.sigmoid_1 285 0.00000 lstm.sigmoid_3 317 0.00000 lstm.tanh_1 287 0.00000 lstm.sigmoid_2 277 0.00000 lstm.multiplication_2 427 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 411 0.00000 lstm.tanh_2 291 0.00000 lstm.multiplication_3 401 0.00000 fc_1 4722 0.00002 fc_2 283 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 196. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85577 0.00039 196 16815689 2564.3 memSeparator_0 102 0.00000 lstm.wi 19396 0.00009 lstm.wo 19287 0.00009 lstm.wg 19288 0.00009 lstm.wf 19315 0.00009 lstm.sigmoid_1 275 0.00000 lstm.sigmoid_3 257 0.00000 lstm.tanh_1 277 0.00000 lstm.sigmoid_2 277 0.00000 lstm.multiplication_2 497 0.00000 lstm.multiplication_1 427 0.00000 lstm.c_add 421 0.00000 lstm.tanh_2 281 0.00000 lstm.multiplication_3 471 0.00000 fc_1 4680 0.00002 fc_2 325 0.00000 * The clock frequency of the DL processor is: 220MHz ### Resetting network state. ### Finished writing input activations. ### Running a sequence of length 145. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 85590 0.00039 145 12440830 2564.1 memSeparator_0 102 0.00000 lstm.wi 19488 0.00009 lstm.wo 19317 0.00009 lstm.wg 19287 0.00009 lstm.wf 19287 0.00009 lstm.sigmoid_1 285 0.00000 lstm.sigmoid_3 317 0.00000 lstm.tanh_1 287 0.00000 lstm.sigm...
The LSTM network makes predictions on the partial sequence one time step at a time. At each time step, the network makes predictions using the value at this time step, and the network state calculated from the previous time steps. The network updates its state between each prediction. The predict
function returns a sequence of these predictions. The last element of the prediction corresponds to the predicted RUL for the partial sequence.
Alternatively, you can make predictions one time step at a time by using predictAndUpdateState
. This function is useful when you have the values of the time steps in a stream. Usually, it is faster to make predictions on full sequences when compared to making predictions one time step at a time. For an example showing how to forecast future time steps by updating the network between single time step predictions, see Time Series Forecasting Using Deep Learning.
Visualize some of the predictions in a plot.
idx = randperm(numel(YPred),4); figure for i = 1:numel(idx) subplot(2,2,i) plot(YTest{idx(i)},'--') hold on plot(YPred{idx(i)},'.-') hold off ylim([0 thr + 25]) title("Test Observation " + idx(i)) xlabel("Time Step") ylabel("RUL") end legend(["Test Data" "Predicted"],'Location','southeast')
For a given partial sequence, the predicted current RUL is the last element of the predicted sequences. Calculate the root-mean-square error (RMSE) of the predictions and visualize the prediction error in a histogram.
for i = 1:numel(YTest) YTestLast(i) = YTest{i}(end); YPredLast(i) = YPred{i}(end); end figure rmse = sqrt(mean((YPredLast - YTestLast).^2))
rmse = single
20.7713
histogram(YPredLast - YTestLast) title("RMSE = " + rmse) ylabel("Frequency") xlabel("Error")
References
Saxena, Abhinav, Kai Goebel, Don Simon, and Neil Eklund. "Damage propagation modeling for aircraft engine run-to-failure simulation." 2008 International Conference on Prognostics and Health Management (2008): 1–9. https://doi.org/10.1109/PHM.2008.4711414.
See Also
dlhdl.Target
| dlhdl.Workflow
| compile
| deploy
| predict
| classify