6.5.3.2.5 Filter Blit
Filter blit performs high quality scaling, up or down, using a FIR re-sampling filter with 3/5 taps. Sub-pixel coordinates (locations between the pixel grids) are generated by the drawing engine. The filter block in the drawing engine uses the sub-pixel information to select the appropriate filter kernel. The GPU2DC processes one pixel every cycle when performing filter blit.
A stretch-or shrink-factor of 15.16 fixed-point format is supported. To generate a single destination pixel requires 9 source pixels. An image is scaled in two passes, one for X-dimension (HOR_FILTER_BLT) and the other for Y-dimension (VER_FILTER_BLT). Software sets up the filter kernel/coefficient table and the kernel size, as well as a temporary buffer for storing intermediate results. After the first pass is completed, intermediate results are sent back to memory, and then the second pass starts to scale the first-pass image. Because of this two-step procedure, the throughput of filter blit is lower than that of stretch blit. The filter kernel table may need to be reloaded, and some cycles are consumed in calculating the stepping parameters.
When the stretch or shrink factor is 1, the filter blit works as a bit blit copy. In this case, it can be used as a format converter, for instance, YUV to RGB converter. To use as a format converter, only one pass (HOR_FILTER_BLT or VER_FILTER_BLT) is needed. To optimize the memory bandwidth, when using filter blit for YUV to RGB filtering, the temporary target buffer format can be specified as YUY2 to process Y-dimension filtering (VER_FILTER_BLT). This is to avoid converting YUV to A8R8G8B8 in the 1st vertical pass to reduce the memory bandwidth and increase the pixel processing rate. This is the only special case for which the GPU2DC may use YUY2 as the target format.
When the stretch or shrink factor (scale ratio) is not 1:1, filter blit requires both a vertical pass as well as a horizontal pass for scaling. Shrink performance is less than 1 pixel per cycle for each vertical pass and for each horizontal pass. Stretch performance is near to the performance for the 1:1 scale ratio.
Primitive | Peak Performance | Source/destination overlap | Clipping |
---|---|---|---|
Rectangle | 1 pixel / cycle | N/A | Done on primitive basis |
Clear | 1 pixel / cycle | N/A | Done on primitive basis |
Blit | 1 pixel / cycle | Overlap is allowed | Done on primitive basis |
Stretch blit | 1 pixel / cycle | No overlap allowed | Done on pixel basis |
Filter blit | 1 pixel / cycle | N/A | N/A |