针对复杂场景下无人机航拍小目标检测时特征提取的主观性和局限性,本文提出了三种改进策略:1) 为了提升无人机对不同尺度目标的检测能力,将多头自注意力机制(MHSA)融入到YOLOv5s骨干网络的最后一层;2) 为了增强特征信息的利用,构建了BiFPN特征融合网络;3) 将SimAM模块集成到YOLOv5s模型中,以提高语义与位置信息的匹配。通过将上述三种改进策略两两组合,构建了三种多策略YOLOv5s检测模型:第一种是多头自注意力机制(MHSA)与BiFPN特征融合网络的结合;第二种是多头自注意力机制(MHSA)与SimAM注意力机制的结合;第三种是SimAM注意力机制与BiFPN特征融合网络的结合。在VisDrone2019数据集上的对比实验结果表明,第二种多策略模型在检测效果上优于其他两种模型,其平均精度(mAP)提升至38.9%,比原模型提高了4.8%。Aiming at the subjectivity and limitations of feature extraction in small target detection of UAV aerial photography in complex scenarios, this paper proposes three improvement strategies: 1) To enhance the detection capability of UAVs for targets of different scales, the Multi-Head Self-Attention mechanism (MHSA) is integrated into the last layer of the YOLOv5s backbone network;2) To strengthen the utilization of feature information, a Bi-directional Feature Pyramid Network (BiFPN) for feature fusion is constructed;3) The SimAM module is incorporated into the YOLOv5s model to improve the matching of semantic and positional information. By combining the above three improvement strategies in pairs, three multi-strategy YOLOv5s detection models are built: The first model combines the Multi-Head Self-Attention mechanism (MHSA) with the BiFPN feature fusion network;The second model combines the Multi-Head Self-Attention mechanism (MHSA) with the SimAM attention mechanism;The third model combines the SimAM attention mechanism with the BiFPN feature fusion network. Comparative experiments on the VisDrone2019 dataset show