deepin-ocr/arm-a53-a55-dual-issue.md at 718c41634f1bed5a1b8cb3e9f1b6e7fd23d36cdb

wangzhengyang 718c41634f feat: 切换后端至PaddleOCR-NCNN，切换工程为CMake

1.项目后端整体迁移至PaddleOCR-NCNN算法，已通过基本的兼容性测试
2.工程改为使用CMake组织，后续为了更好地兼容第三方库，不再提供QMake工程
3.重整权利声明文件，重整代码工程，确保最小化侵权风险

Log: 切换后端至PaddleOCR-NCNN，切换工程为CMake
Change-Id: I4d5d2c5d37505a4a24b389b1a4c5d12f17bfa38c

2022-05-10 10:22:11 +08:00

2.7 KiB

Raw Blame History

natural assembly

no register dependency, no penalty

ld1     {v0.4s}, [r0], #16
fmla    v10.4s, v16.4s, v24.s[0]
fmla    v11.4s, v16.4s, v24.s[1]
fmla    v12.4s, v16.4s, v24.s[2]
fmla    v13.4s, v16.4s, v24.s[3]

A53

128bit vector load cannot be dual issued with fmla, wait 2 cycles
64bit vector load cannot be dual issued with fmla, wait 1 cycle
64bit integer load can be dual issued with fmla, no penalty
pointer update can be dual issued with fmla, no penalty
64bit vector load and 64bit vector insert can be dual issued, no penalty
any vector load cannot be issued on the 4th cycle of each fmla (enters the accumulator pipeline)

practical guide

use 64bit vector load only
issue vector load every three fmla
1 cycle to load 64bit, dual issue with the previous interleaved 64bit insert
load the remaining 64bit into integer register, dual issue with fmla
update pointer, dual issue with fmla
insert 64bit into vector from integer register, dual issue with the next interleaved 64bit load
add nop every three fmla if no load, seems to be faster

ldr     d0, [r0] // 1 cycle, v0 first 64bit
fmla
ldr     x23, [r0, #8] // 0 cycle, v0 second 64bit to temp register
fmla
add     r0, r0, #16 // 0 cycle, update pointer
fmla
ldr     d1, [r0] // 1 cycle, v1 first 64bit
ins     v0.d[1], x23 // 0 cycle, v0 second 64bit complete
fmla
ldr     x23, [r0, #8] // 0 cycle, v1 second 64bit to temp register
fmla
add     r0, r0, #16 // 0 cycle, update pointer
fmla
ins     v1.d[1], x23 // 1 cycle, v1 second 64bit complete
nop
fmla
fmla
fmla
nop
nop
fmla
fmla
fmla

A55

128bit vector load cannot be dual issued with fmla, wait 2 cycles
64bit vector load can be dual issued with fmla, no penalty
64bit integer load can be dual issued with fmla, no penalty
pointer update can be dual issued with fmla, no penalty
64bit vector insert can be dual issued with fmla, no penalty

practical guide

use 64bit vector load only
load 64bit, dual issue with fmla
load the remaining 64bit into integer register, dual issue with fmla
update pointer, dual issue with fmla
insert 64bit into vector from integer register, dual issue with fmla
interleaved load loose register dependency
nop trick is not needed

ldr     d0, [r0] // 0 cycle, v0 first 64bit
fmla
ldr     x23, [r0, #8] // 0 cycle, v0 second 64bit to temp register
fmla
add     r0, r0, #16 // 0 cycle, update pointer
fmla
ldr     d1, [r0] // 0 cycle, v1 first 64bit
fmla
ins     v0.d[1], x23 // 0 cycle, v0 second 64bit complete
fmla
ldr     x23, [r0, #8] // 0 cycle, v1 second 64bit to temp register
fmla
add     r0, r0, #16 // 0 cycle, update pointer
fmla
ins     v1.d[1], x23 // 0 cycle, v1 second 64bit complete
fmla

2.7 KiB Raw Blame History

natural assembly

A53

practical guide

A55

practical guide

2.7 KiB

Raw Blame History