deepin-ocr/3rdparty/ncnn/docs/developer-guide/how-to-write-a-neon-optimized-op-kernel.md

39 lines
445 B
Markdown
Raw Permalink Normal View History

# benchmark
op
# naive C with openmp
for for for
# unroll, first try
h
# register allocation
kernels
# unroll, second try
simd
# neon intrinsics
optional
# naive neon assembly with pld
asm
# pipeline optimize, first try
more register load mla
# pipeline optimize, second try
interleave load mla
# pipeline optimize, third try
loop tail
# usual practice, load/save
233
# usual practice, unroll
233
# usual practice, save register
233