Improving the Performance of Your NI LabVIEW Applications
Dan Hedges Senior Software Engineer LabVIEW Performance Group
Agenda
How to find performance problems
Benchmarking Profiling
Understanding LabVIEW under the hood
Memory usage Execution system
Optimization Cycle
Benchmark
Evaluate Performance Identify Problem Areas
Optimize
Improve efficiency Improve Speed
Benchmarking Code Execution
Timing Template (data dep) LabVIEW Shipping Example
Benchmarking Code Execution
Analysis
Calibration
Code
Benchmark Project LabVIEW Real-Time Shipping Example
Tools for Measuring Resource Usage (Windows)
Task Manager Perfmon
Windows Task Manager
Gives user a rough idea of whether memory or CPU is the bottleneck Can be helpful in identifying memory leaks
ViewSelect Columns allows you to add additional stats
Perfmon
Allows you to monitor Processors Disk I/O Network Tx/Rx Memory/Paging Access by typing perfmon into the Windows Run dialog
Why Should You Profile Your VIs?
The 80/20 rule of software performance
80 percent of the execution time is spent in 20 percent of the code
Performance improvements are most effective in the 20 percent Guessing which 20 percent is difficult
9
VI Profiler
Tools >> Profile >> Performance and Memory
10
Demo VI Profiling
11
LabVIEW Desktop Execution Trace Toolkit
Detailed execution traces Thread and VI information Measurement of execution time
Multiple Sessions
Threads, CPU and Memory
VIs
12
Profiling and Benchmarking Summary
To answer this question: Use these tools:
What is my current performance?
What are my limiting resources? How much time are each of my VIs taking? In what order are events occurring?
Benchmark VIs
Task Manager, Perfmon VI Profiler LabVIEW Desktop Execution Trace Toolkit
13
Under LabVIEWs Hood
Memory Management Execution System
14
What Is In Memory?
Panel Compiled Code
Diagram Data
15
VIs in Memory
When a VI is loaded into memory
We always load the data We load the code if it matches our platform (x86 Windows, x86 Linux, x86 Mac, PowerPC Mac) We load the panel and diagram only if we need to (for instance, we need to recompile the VI)
16
Panel and Diagram Data
How many bytes of memory does this VI use? The answer depends on:
Is the panel in memory? Is the environment multi-threaded?
17
Execute, Operate and Transfer Data
4K Execute Data 4K Transfer Data
Populated by Code Temporary Buffer
4K Operate Copy for Indicator Data
18
Avoid Loading Panels, Save Memory
19
Wire Semantics
Every wire is a buffer Branches typically create copies
20
Optimizations by LabVIEW
The theoretical 5 copies become 1 copy operation
Copy
Output is in place with input
21
The In Place Algorithm
Determines when a copy needs to be made
Weighs arrays and clusters higher than other types
Does not know the size of an array or cluster
Algorithm runs during compilation, not execution
Relies on the sequential aspects of the program
Branches may require copies
22
Bottom Up
In-place information is propagated bottom up
Branched wire
Copy because of increment
No copies required
Increments array in place
23
Showing Buffer Allocations
24
The In-Place Element Structure
Allows you to explicitly modify data in place
25
Example of In Place Optimization
Operate on each element of an array of waveforms
26
Make the First SubVI In Place
changes into
27
SubVI 2 Is Made In Place
Changes into
28
SubVI 3 Is Made In Place
Changes into
29
Final Result: Dots Are Hidden
30
Building Arrays
There are a number of ways to build arrays and some are better than others
Bad
Reallocates array memory on every loop iteration
No compile time optimization
31
Building Arrays
There are a number of ways to build arrays. Try to minimize reallocations.
Best
Memory preallocated Indexing tunnel eliminates need for copies
32
Demo Effects of Memory Optimization
33
Under LabVIEWs Hood
Memory Management Execution System
34
VIs Are Compiled
0011101001010 1010001000111 1110101010101 0001011100010 1100101100110 0011101001010 1010001000111 1110101010101 0001011100010 1100101100110 0011101001010 1010001000111 1110101010101 0001011100010
35
VIs0 Are Compiled: Clumps 0 Clump Clump 1 Clump
Clump 2
36
VIs Are Compiled: Clumps
Clump 0
Clump 0 Clump 1
Start of diagram: Reads controls, then schedules Clumps 1 and 2 Then sleeps ...
Top for loop indicator is updated Clump 0 Scheduled Sleep ... Clump 0 Sleeping
Clump 2
Clump 1 Sleeping
Completion of diagram: Divide nodes, display of indicators, then VI exit
Bottom for loop indicator is updated Clump 0 Scheduled Sleep ...
Clump 2 Sleeping
37
Single-Threaded LabVIEW
CPU
Thread
Coroutines Code Execution
User Interface Loop
38
Multithreaded LabVIEW
CPU Thread UI Loop
messages Thread Exec Thread
Exec
Thread Exec Thread
Exec
39
LabVIEW on a Multicore Machine
Thread UI Loop CPU0 CPU1 Thread Exec
Thread Exec
messages
Thread
Thread
Exec
Exec
40
Some Operations Require the UI Thread
Front Panel Control References
Call Library Nodes
Control/Indicator Property Nodes
41
Execution Properties
42
Reentrant VIs
Reentrancy allows one subVI to be called simultaneously from different places
Requires extra memory for each instance To allow a subVI to be called in parallel To allow a subVI instance to maintain its own state
Use reentrant VIs in two different cases
43
LabVIEW 2010 Compiler
Generates code that runs faster, ~30% Takes longer to run (~5x-7x)
SSE Instructions SubVI Inliner Register Candidates Dead Code Elimination
Loop Invariant Code Motion
Common Subexpression Elimination
And more
44
Demo Effects of Execution Optimization
45
Next Steps
In LabVIEW LabVIEW Help
On the Web
ni.com/multicore ni.com/devzone
46