Abstract:In intensive sheep farms, behavioral changes can map out whether there are abnormalities in the sheep’s body. For example, when sheep are sick, rumination and feeding time will produce significant changes, and behavioral observation is one of the ways to diagnose their health. Identifying animal behavior can provide a basis for disease prevention and rational feeding, thus improving the focus on animal health and welfare. Therefore, animal behavior recognition has always been a focus of attention for researchers and production managers. Traditional manual observation methods require continuous human monitoring, and the fatigue response from long hours of human work tends to cause subjective errors in the results. In addition, sensor detection methods that require direct contact with the animal’s body tend to stress the animal, affecting animal health and production performance. A deep learning network model AdRes3D-BiLSTM was proposed that incorporated a three dimensional residual convolutional neural network, a bi-directional long and short-term memory network, and an attention mechanism. The AdRes3D component introduced depth separable convolution, a technique instrumental in curtailing computational complexity and enhancing network efficiency. Furthermore, an actionnet attention mechanism based on motion principles was embedded within the AdRes3D section, directing the network’s focus toward discerning behavioral nuances. This augmentation amplified the model’s adeptness in extracting pivotal behavioral key points across consecutive video frames, thereby augmenting its capacity for feature extraction across both temporal and spatial dimensions. Subsequently, the feature vectors extracted from this process were inputted into the BiLSTM module, affording bidirectional filtering and updating for temporal features, and the final sheep behaviors were accurately recognized. A dataset comprising 6000 distinct videos was amassed for training the proposed model. This dataset encompassed different sheep instances, spanning varying periods, lighting conditions, and poses. An additional set of 1200 behavioral videos, distincting from those employed in training, was selected as the testing data. The experimental results showed the efficacy of the AdRes3D-BiLSTM model, as evidenced by an exceptional comprehensive recognition accuracy rate of 98.72% across five fundamental sheep behaviors: standing, lying, feeding, walking, and ruminating. In contrast to five alternative network architectures—namely, C3D, R(2+1)D, Res3D, Res3D-LSTM, and Res3D-BiLSTM-the AdRes3D-BiLSTM model achieved notable improvements in recognition metrics. Specifically, relative to these network models, AdRes3D-BiLSTM exhibited a precision enhancement of 11.32 percentage points, 6.24 percentage points, 4.34 percentage points, 2.04 percentage points and 1.52 percentage points, respectively. The corresponding improvements in recognition recall stood at 11.78 percentage points, 6.38 percentage points, 4.38 percentage points, 2.12 percentage points and 1.68 percentage points, while F1-score improvements registered at 11.70 percentage points, 6.35 percentage points, 4.38 percentage points, 2.08 percentage points and 1.60 percentage points, and the augmentation in recognition accuracy was quantified at 11.97 percentage points, 6.33 percentage points, 4.37 percentage points, 2.32 percentage points and 2.01 percentage points. Furthermore, the method elucidated boasted an impressive frame rate, attaining a remarkable 52.79 frames per second (FPS). This recognition speed substantiated the model’s real-time processing capabilities, thereby satisfying operational demands. Additionally, a 24-hour uninterrupted video segment was randomly culled from the repository of collected videos, effectively validating the model’s efficacy in a real-world environment. This investigation ushers in novel methodologies and conceptual insights for animal behavior recognition based on video streams. The strides furnished fresh avenues for advancing the field, presenting innovative strategies and perspectives for further exploration and implementation.