3.4.2 Step 1: Basic Implementation

In this section, use SmartHLS to compile the Sobel filter to hardware without any modifications to the C++ code.

  1. Start the SmartHLS IDE. On Windows, double click on the SmartHLS shortcut either in the start menu or the desktop. On Linux, ensure that $(SMARTHLS_INSTALL_DIR)/SmartHLS/bin is on your PATH, and the SmartHLS IDE can be opened by running the following command: shls_ide. The Eclipse Launcher dialog box is displayed with a default workspace directory as shown in the following figure.
  2. Click OK to use the default workspace for all parts of this tutorial.
    Figure 3-3. Choosing a SmartHLS workspace.
    Important: Ensure that there are no spaces in your workspace path. Otherwise, SmartHLS displays an error message while running synthesis.
  3. Launch the SmartHLS IDE. Choose File > New > SmartHLS C/C++ Project as shown in the following figure.
    Figure 3-4. Create a New SmartHLS C/C++ Project
  4. Enter sobel_part1 as the project name as shown in the following figure. Click Next.
    Figure 3-5. Creating a New SmartHLS Project
  5. The following figure shows leaving the Top-Level Function blank as the sobel_filter function in sobel.cpp already has a pragma to indicate the top-level function.
  6. Click on Add Files to import the source files for part 1 of this tutorial into the project. Navigate to where you have downloaded the tutorial files and go into the part1 directory. You can hold shift to select all three source files: input.h, output.h and sobel.cpp. After you have added the source files to the project, click Next.
    Figure 3-6. Adding part1 source files into the new SmartHLS project.
  7. A dialog box appears where you can specify your test bench, which is not needed for this part of the tutorial. ClickNext without changing any of the options.
  8. To complete the project creation, you must choose the FPGA device you intend to target. Use the selections shown in the following figure for the FPGA family choose PolarFire®. For FPGA Device, you have an option to choose MPF300TS-1FCG1152I on the MPF300 Board or use another PolarFire device that is not listed.
    Important: For this tutorial, you will use another PolarFire device, MPF100T-FCVG484I, which can be used with a Microchip Libero® free Silver license (the bigger MPF300TS-1FCG1152I device requires the paid Gold license). To use MPF100T, choose Custom Device for the FPGA Device field, then type in MPF100T-FCVG484I in the Custom Device field. Click on Finish when you are done. It may take a few moments to create the project.
    Figure 3-7. Choose an FPGA Device
  9. Setup the paths to ModelSim (and Microchip Libero for later parts of this tutorial), if this is the first time you are using SmartHLS. To setup the paths, click SmartHLS on the top menu bar, then click Tool Path Settings. The dialog opens, set the paths for ModelSim Simulator and Microchip Libero SoC as shown in the following figure, and click OK.
    Figure 3-8. SmartHLS Tool Path Settings
  10. An important panel of the SmartHLS IDE is the Project Explorer on the left side of the window, as shown in the following figure. You will use the project explorer throughout this tutorial to view source files and synthesis reports. Click on the small arrow icon to expand the sobel_part1 project. Now double click any of the source files, such as sobel.cpp, and you will see the source file appear in the main panel to the right of the Project Explorer.
    Figure 3-9. Project Explorer for browsing source files and reports.
  11. After creating a project in SmartHLS, you should open one of the source files, like sobel.cpp, or double-click on the sobel_part1 directory in the Project Explorer pane. This action will set sobel_part1 as the active project. If there are multiple projects open, you must select the project in the Project Explorer pane or open a file from the project to activate it before running any SmartHLS commands.

    The steps in the SmartHLS design flow are summarized in the following list:
    1. Create the SmartHLS project and follow the standard software development process for C++ (compile/run/debug).
    2. Apply HLS constraints (like target clock period) and compile the software into hardware using SmartHLS. You can check reports about the generated hardware.
    3. Perform software/hardware co-simulation to validate the generated hardware.
    4. Synthesize the hardware to our target FPGA to report hardware resource usage and Fmax.
    Figure 3-10. SmartHLS Design Flow Steps

    At the top of SmartHLS, you will find a toolbar, shown in the figure below, that allows you to execute the main features of the SmartHLS tool. Hover over each icon to discover its function. The icons are listed in the table below the figure, starting from the left.

    Figure 3-11. SmartHLS Toolbar Icons
    Table 3-1. Icons
    IconDescription
    1Add files to a project
    Software Development Flow
    2Compile software with GCC
    3Run compiled software
    4Debug software
    5Profile software with gprof
    Hardware Development Flow
    6Compile Software to Hardware
    7Compile Software to Processor/Accelerator SoC
    8Simulate Hardware
    9Software/Hardware Co-simulation
    10Synthesize Hardware to FPGA
    Misc
    11Set HLS Constraints
    12Launch Schedule Viewer
    13Clean SmartHLS Project

    You can also run SmartHLS commands from the SmartHLS top bar menu.

    You can now examine the code in sobel.cpp. In the sobel_filter function, the first line #pragma HLS function top indicates that this function is the main function of the project. SmartHLS will only create a hardware module for the main function and all related functions. The sobel_filter main function includes nested loops that process each pixel in the image. Within the loops, another set of nested loops handle the filter window at the current image location. For each non-border pixel, the 3x3 area around the pixel undergoes convolution with Gx and Gy, and then the magnitude is summed to produce the final output pixel.

    The main function validates the functionality of the sobel_filter function. The grayscale (8-bit) input image is stored in the 512x512 array elaine_512_input defined in input.h, while the expected output image is stored in elaine_512_golden_output defined in output.h. The main function sends the input image to the sobel_filter function and displays “PASS!” if the computed output matches the expected output.

  12. Before compiling to hardware, you should verify that the C++ program is correct by compiling and running the software. This is typical of HLS design, where the designer will verify that the design is functionally correct in software before compiling it to hardware. Click on the Compile Software icon
    in the toolbar. This compiles the software with the GCC compiler. You will see the output from the compilation appearing at the bottom of the screen in the Console window of the IDE.
  13. Execute the compiled software by clicking on the Run Software icon
    in the toolbar. You should see the message PASS! appearing in the Console window, as shown in the following figure.
    Figure 3-12. Console after running software execution.
  14. You can now compile the Sobel filter C++ software into hardware using SmartHLS by clicking on the toolbar icon
    to Compile Software to Hardware. This command invokes SmartHLS to compile the top-level sobel_filter function into hardware. If the top-level function calls descendant functions, all descendant functions are also compiled to hardware. You can find the generated Verilog code in sobel_part1.v, as shown in the following figure.
    Figure 3-13. Finding the SmartHLS-generated Verilog in the Project Explorer.
    When the compilation finishes, a SmartHLS report file (summary.hls.rpt) opens. The report shows the RTL interface of the top-level module corresponding to the top-level C++ function, the number of cycles scheduled for each basic block of the function, and the memories that are used in the hardware. In this example, you will see the top-level RTL module has three interfaces, the standard Control interface that is used by any SmartHLS-generated circuit and two Memory interfaces corresponding to the input and output array arguments of the top-level sobel_filter function. In the Memory Usage section of the report, there are no memories inside the generated hardware, as the input and output arrays are passed in as arguments into the top-level function. These input/output function arguments are listed as the I/O Memories table.
  15. You can visualize the schedule and control flow of the hardware using the SmartHLS schedule viewer. Start the schedule viewer by clicking on the Launch Schedule Viewer icon
    in the toolbar. In the left panel of the schedule viewer, you will see the names of the functions and basic blocks of each function. In this example, there is only one function that was compiled to hardware, sobel_filter. In the Explorer pane on the left, you see the sobel_filter function and eight basic blocks within the functions prefixed by BB_.
  16. Double-click on the sobel_filter function in the call-graph pane, and you will see the control-flow graph for the function, similar to the following figure. The names of the basic blocks in the program are prefixed with BB_. Note that the basic block names may be slightly different depending on the version of SmartHLS you use. The basic block names are not easy to relate to the original C++ code. However, you can observe two loops in the control-flow graph, which correspond to the two outermost loops in the C++ code for the sobel_filter function. The inner loop contains basic blocks: BB_for_body3, BB_for_cond14_preheader, BB_for_body3_for_inc54_crit_edge, and BB_for_inc54. Try double-clicking on BB_for_cond14_preheader (if the basic block names are different from the figure, click on the left-most basic block).
    Figure 3-14. Control-Flow Graph for the Sobel Filter.
  17. The following figure shows the schedule for BB_for_cond14_preheader, which is the main part of the inner-most loop body. The middle panel shows the names of the instructions. The right-most panel shows how the instructions are scheduled into states (the figure shows that states 6 to 14 are scheduled for this basic block). Hold your mouse over top of some of the blue boxes in the schedule, and you will see the inputs of the current instruction become red and outputs become orange. Look closely at the names of the instructions and try to connect the computations with those in the original C++ program. You will see that there are some loads, additions, subtractions, and shifts. Close the schedule viewer (FileExit).
    Figure 3-15. Schedule for the Inner-Most Loop.
  18. Simulate the Verilog RTL hardware with ModelSim to find out the number of cycles needed to execute the circuit – the cycle latency. Close the schedule viewer first, then click on the SW/HW Co-Simulation icon
    in the toolbar. SW/HW co-simulation will simulate the generated Verilog module, sobel_filter_top, in RTL using ModelSim while running the rest of the program, main, in software. The co-simulation flow allows you to simulate and verify the SmartHLS-generated hardware without writing a custom RTL test bench. You will see various messages printed by ModelSim related to loading simulation models for the hardware in the Consolewindow. The hardware may take a few minutes to simulate. You must focus on the messages near the end of the simulation, which will look like this:
    ...
    # Cycle latency: 3392549
    # ** Note: $finish : ../simulation/cosim_tb.sv(279)
    # Time: 67851010 ns Iteration: 1 Instance: /cosim_tb
    # End time: 15:39:12 on Jun 30,2021, Elapsed time: 0:00:41
    # Errors: 0, Warnings: 0
    ...
    Info: Verifying RTL simulation
    ...
    Retrieving hardware outputs from RTL simulation for sobel_filter function call 1.
    PASS!
    ...
    Number of calls: 1
    Cycle latency: 3,392,549
    SW/HW co-simulation: PASS
    See that the co-simulation took 3,392,549 clock cycles to finish. The simulation printed SW/HW co-simulation: PASS! which indicates that the RTL generated by SmartHLS matches the software model. The co-simulation flow uses the return value from the main software function to determine whether the co-simulation has passed. If the main function returns 0, then the co-simulation will PASS otherwise, a nonzero return value will FAIL. Please make sure that your main function always follows this convention and returns 0 if the top-level function tests are all successful. In the main function of sobel_part1, also called the software test bench, we iterate over every pixel of the computed output image and verify the pixel against the expected value after calling the top-level function. A mismatch counter is incremented if a pixel is not as expected, and this counter value is returned by the main function. If all values match, then the main function will return 0. Since co-simulation printed PASS (main returned 0) it is verified that the generated hardware is correct.
  19. You can also run co-simulation and launch ModelSim to show the Waveforms. From the SmartHLS top menu, select SW/HW Co-Simulation with Waveforms, as shown in the following figure.
    Figure 3-16. Run SW/HW Co-Simulation with Waveforms
  20. When ModelSim opens, it will prompt, “Are you sure you want to finish?”. Select No. You can view the signal waveforms as shown in the following figure. After you are finished, close ModelSim (FileQuit).
    Figure 3-17. ModelSim waveforms shown during SW/HW Co-Simulation
    Libero is the name of Microchip’s synthesis, placement, routing, and timing analysis tool. SmartHLS can execute Libero to synthesize, place and route the Verilog to the Microchip PolarFire FPGA to obtain information such as the resource usage and the Fmax of this design (that is, the clock period).
  21. Click on the
    icon on the toolbar to Synthesize Hardware to FPGA. SmartHLS will automatically invoke Libero to create a Libero project and synthesize the SmartHLS design targeting the PolarFire FPGA device. Libero may take a while to finish. Once the command completes, SmartHLS will open the summary.results.rpt report file. SmartHLS will summarize the resource usage and Fmax results reported by Libero after place and route. You should get similar results as what is shown below. Your numbers may differ slightly, depending on the version of SmartHLS and Libero you are using. This tutorial used Libero SoC v2021.1. The timing results and resource usage might also differ depending on the random seed used in the synthesis tool flow.
    ====== 2. Timing Result ======
    +--------------+---------------+-------------+-------------+----------+-------------+
    | Clock Domain | Target Period | Target Fmax | Worst Slack | Period | Fmax |
    +--------------+---------------+-------------+-------------+----------+-------------+
    | clk | 10.000 ns | 100.000 MHz | 7.815 ns | 2.185 ns | 457.666 MHz |
    +--------------+---------------+-------------+-------------+----------+-------------+
    The reported Fmax is for the HLS core in isolation (from Libero's post-place-and-route timing analysis). 
    When the HLS core is integrated into a larger system, the system Fmax may be lower depending on the critical path of the system.
    ====== 3. Resource Usage ======
    +--------------------------+---------------+--------+------------+
    | Resource Type | Used | Total | Percentage |
    +--------------------------+---------------+--------+------------+
    | Fabric + Interface 4LUT* | 684 + 0 = 684 | 108600 | 0.63 |
    | Fabric + Interface DFF* | 432 + 0 = 432 | 108600 | 0.40 |
    | I/O Register | 0 | 852 | 0.00 |
    | User I/O | 0 | 284 | 0.00 |
    | uSRAM | 0 | 1008 | 0.00 |
    | LSRAM | 0 | 352 | 0.00 |
    | Math | 0 | 336 | 0.00 |
    +--------------------------+---------------+--------+------------+
    * Interface 4LUTs and DFFs are occupied due to the uses of LSRAM, Math, and uSRAM.
    Number of interface 4LUTs/DFFs = (36 * #.LSRAM) + (36 * #.Math) + (12 * #.uSRAM) = (36 * 0) + (36 * 0) + (12 * 0) = 0. 
    
    Wall-clock time is one of the key performance metrics for an FPGA design, computed as the product of the cycle latency and the clock period. In this case, our cycle latency was 3,392,549, and the clock period was 2.346 ns. The wall-clock time of our implementation is therefore 3,392,549 × 2.346 ns = 7.959 ms.
  22. Close the project by right clicking on the sobel_part1 folder in the Project Explorer pane, and click on Close Project.