* New Feature:
1. Sum_Rows:
fix cuda kernel overflow
fix block shape error when nrows too big
2. Im2Col:
Support Batch in cuda
Support f32 to f32 both in cpu && cuda
3. DepthWiseConv:
Support by Im2Col && MulMat
4. Pool_2d:
Supoort avg pooling in cuda
5. HardSigmoid:
Imp in cuda
6. HardSwish:
Imp in cuda
* fix tabs instead of spaces
* code clean
* CUDA POOL2D
* ADD POOL2D test case in test-backend-ops.cpp
* code clean
* fix pool2d_kernel
nits
* fix bug in pool2d kernel
* fix avg pooling, count_include_pad
nits
* test-backend-ops : add more pool_2d tests
* cuda : fix warnings and formatting
* ggml : check types in release builds too in pool_2d
* test-backend-ops : remove f16 pool_2d tests
* cuda : more style fixes
* Add assert in ggml_cuda_op_pool2d
* pool2d float padding fallback
* test-backend-ops : add dst_type to im2col
---------
Co-authored-by: slaren <slarengh@gmail.com>
* MobileVLM native implementation
* delete depthwise_conv_2d and permute_cpy relative code, replace the two by the existed functions, and opt ldp definition, support LLAMA_PERF option for CMake
* move android script to example/llava directory
* Fix the editor config checks
---------
Co-authored-by: Chenxiaotao03 <chenxiaotao03@meituan.com>