718c41634f
1.项目后端整体迁移至PaddleOCR-NCNN算法,已通过基本的兼容性测试 2.工程改为使用CMake组织,后续为了更好地兼容第三方库,不再提供QMake工程 3.重整权利声明文件,重整代码工程,确保最小化侵权风险 Log: 切换后端至PaddleOCR-NCNN,切换工程为CMake Change-Id: I4d5d2c5d37505a4a24b389b1a4c5d12f17bfa38c
86 lines
2.7 KiB
Markdown
86 lines
2.7 KiB
Markdown
## natural assembly
|
|
* no register dependency, no penalty
|
|
```
|
|
ld1 {v0.4s}, [r0], #16
|
|
fmla v10.4s, v16.4s, v24.s[0]
|
|
fmla v11.4s, v16.4s, v24.s[1]
|
|
fmla v12.4s, v16.4s, v24.s[2]
|
|
fmla v13.4s, v16.4s, v24.s[3]
|
|
```
|
|
|
|
## A53
|
|
* 128bit vector load cannot be dual issued with fmla, wait 2 cycles
|
|
* 64bit vector load cannot be dual issued with fmla, wait 1 cycle
|
|
* 64bit integer load can be dual issued with fmla, no penalty
|
|
* pointer update can be dual issued with fmla, no penalty
|
|
* 64bit vector load and 64bit vector insert can be dual issued, no penalty
|
|
* any vector load cannot be issued on the 4th cycle of each fmla (enters the accumulator pipeline)
|
|
|
|
### practical guide
|
|
* use 64bit vector load only
|
|
* issue vector load every three fmla
|
|
* 1 cycle to load 64bit, dual issue with the previous interleaved 64bit insert
|
|
* load the remaining 64bit into integer register, dual issue with fmla
|
|
* update pointer, dual issue with fmla
|
|
* insert 64bit into vector from integer register, dual issue with the next interleaved 64bit load
|
|
* add nop every three fmla if no load, seems to be faster
|
|
```
|
|
ldr d0, [r0] // 1 cycle, v0 first 64bit
|
|
fmla
|
|
ldr x23, [r0, #8] // 0 cycle, v0 second 64bit to temp register
|
|
fmla
|
|
add r0, r0, #16 // 0 cycle, update pointer
|
|
fmla
|
|
ldr d1, [r0] // 1 cycle, v1 first 64bit
|
|
ins v0.d[1], x23 // 0 cycle, v0 second 64bit complete
|
|
fmla
|
|
ldr x23, [r0, #8] // 0 cycle, v1 second 64bit to temp register
|
|
fmla
|
|
add r0, r0, #16 // 0 cycle, update pointer
|
|
fmla
|
|
ins v1.d[1], x23 // 1 cycle, v1 second 64bit complete
|
|
nop
|
|
fmla
|
|
fmla
|
|
fmla
|
|
nop
|
|
nop
|
|
fmla
|
|
fmla
|
|
fmla
|
|
```
|
|
|
|
## A55
|
|
* 128bit vector load cannot be dual issued with fmla, wait 2 cycles
|
|
* 64bit vector load can be dual issued with fmla, no penalty
|
|
* 64bit integer load can be dual issued with fmla, no penalty
|
|
* pointer update can be dual issued with fmla, no penalty
|
|
* 64bit vector insert can be dual issued with fmla, no penalty
|
|
|
|
### practical guide
|
|
* use 64bit vector load only
|
|
* load 64bit, dual issue with fmla
|
|
* load the remaining 64bit into integer register, dual issue with fmla
|
|
* update pointer, dual issue with fmla
|
|
* insert 64bit into vector from integer register, dual issue with fmla
|
|
* interleaved load loose register dependency
|
|
* nop trick is not needed
|
|
```
|
|
ldr d0, [r0] // 0 cycle, v0 first 64bit
|
|
fmla
|
|
ldr x23, [r0, #8] // 0 cycle, v0 second 64bit to temp register
|
|
fmla
|
|
add r0, r0, #16 // 0 cycle, update pointer
|
|
fmla
|
|
ldr d1, [r0] // 0 cycle, v1 first 64bit
|
|
fmla
|
|
ins v0.d[1], x23 // 0 cycle, v0 second 64bit complete
|
|
fmla
|
|
ldr x23, [r0, #8] // 0 cycle, v1 second 64bit to temp register
|
|
fmla
|
|
add r0, r0, #16 // 0 cycle, update pointer
|
|
fmla
|
|
ins v1.d[1], x23 // 0 cycle, v1 second 64bit complete
|
|
fmla
|
|
```
|