Real-time activity and fall detection using transformer-based deep learning models for elderly care applications

[ad_1]

Abstract

Objective This study aims to develop a transformer-based deep learning model for real-time activity recognition and fall detection, addressing the limitations of existing methods in terms of accuracy and real-time applicability.

Methods The proposed system uses sliding window segmentation technique to process wearable sensor data, including accelerometer, gyroscope and orientation signals. The transformer encoder models temporal dependencies through a self-attention mechanism, enabling the extraction of global and local temporal patterns. The performance of the model is evaluated on an updated version of the MobiAct data set, which includes over 14 million sensor records collected from 66 participants and 16 activities, including four types of falls and multiple scenario-based activities of daily living.

Result The transformer model achieved an accuracy of over 98% and demonstrated excellent precision and recall for difficult fall categories such as forward-lying and sideward-lying. Comparative analysis shows that transformers outperform convolutional neural networks long short-term memory (CNN-LSTM) and temporal convolutional networks in terms of classification metrics, confusion matrix results and training stability.

Discussion The results highlight the effectiveness of the transformer model in capturing complex temporal dependencies, addressing key challenges such as misclassification and false positives. Compared with traditional models, its parallel processing capabilities improve real-time deployment efficiency.

Conclusion This research establishes transformer-based models as powerful solutions for activity recognition and fall detection, providing reliable applications for elderly care and fall prevention. Future work will focus on optimising edge devices and validating on real-world data sets.

Introduction

Falls are a leading cause of injury and hospitalisation among older adults worldwide, resulting in millions of emergency room visits each year.1 As the global population ages, there is an increasing need for reliable systems to monitor daily activities and detect falls in real-time.2 Accurate fall detection can mitigate injuries through rapid medical intervention, while activity recognition can provide valuable insights into behaviour and overall health.3 Advances in wearable sensor technology, including accelerometers, gyroscopes and orientation sensors, now enable continuous monitoring of high-frequency body activity.4 These sensors, often integrated into wearable devices, classify activity of daily living (ADL) such as walking (WAL), sitting and standing (STD), and detect falls such as leaning forward or lying on one’s side. However, existing systems face challenges in achieving high accuracy for underrepresented fall categories and meeting the computational demands of real-time processing.5–7

Human activity recognition (HAR) has attracted widespread research interest due to its applications in health monitoring, smart cities and elderly care. Many methods have been explored to improve HAR performance, including supervised learning methods such as random forests and k-nearest neighbours and hybrid models such as convolutional neural networks long short-term memory (CNN-LSTM), which extract spatial and temporal features of sensor data.4 8 9 Privacy-preserving methods, such as radar-based HAR using micro-Doppler signatures, have also attracted attention due to their contactless and non-intrusive nature.10 Integrated techniques, such as combining decision trees with AdaBoost, have further demonstrated improvements in recognition accuracy on benchmark data sets.6 Emerging deep learning architectures, including hybrid CNN-LSTM and transformer-based models, show promise in capturing complex spatial and temporal dependencies in activity data, making them highly effective for HAR tasks.9 11 However, the overlap between daily life events and fall events often leads to misclassification, requiring more robust methods.12

When it comes to fall detection, wearable systems and IoT-based (Internet-of-Things) technologies have played a central role in advancements. Accelerometer and gyroscope data are used with machine learning models such as random forests and Kalman filters to detect falls in real time, providing critical time for interventions.12 13 Context-aware systems, such as smart pads, use sensor data for real-time monitoring with high sensitivity and specificity.14 IoT-enabled architectures that combine edge and cloud computing have demonstrated their scalability and efficiency, using decision trees to efficiently transmit and process data in large-scale deployments.7 Despite these advances, current systems still suffer from high false positive rates and computational inefficiency, especially when dealing with complex falls in real-world scenarios.6 15

To address these limitations, we propose a transformer-based deep learning model designed for real-time activity recognition and fall detection. Leveraging transformers’ self-attention mechanism, the model excels at capturing local and global temporal dependencies in wearable sensor data. By employing techniques such as sliding window segmentation, majority voting and predictive smoothing, the model improves classification stability and reduces false positives. An in-depth evaluation of the MobiAct data set demonstrates the state-of-the-art performance of the system, particularly in detecting difficult fall categories such as forward-lying (FOL) and sideward-lying (SDL). By comparing the transformer model with traditional models such as CNN-LSTM and temporal convolutional network (TCN), we highlight its robustness and potential for practical applications in elderly care.7 9 11

Discussion

The results of this study demonstrate the potential of transformer-based deep learning models for accurate and efficient real-time activity recognition and fall detection. By leveraging the temporal processing capabilities of the self-attention mechanism, the proposed model outperforms traditional methods such as CNN-LSTM and TCN in several key areas such as classification accuracy, handling of imbalanced categories and real-time performance. However, the results also highlight challenges and opportunities for further improvements.

Performance of transformer models

Transformer model achieves an overall classification accuracy of over 98%, with particularly high precision and recall for ADL. This performance is attributed to the model’s ability to capture local and global temporal dependencies, allowing it to distinguish subtle patterns in sensor data. For example, the model effectively differentiates between falling activities such as FOL and FKL, which exhibit different signalling patterns. The use of global average pooling in the architecture further reduces complexity and ensures that the system can run smoothly in a real-time environment. The choice of a window size of 100 (equivalent to 2 s at 50 Hz) is supported by previous HAR literature and our own thorough analysis (online supplemental figure S3), which shows that this window size achieves a high F1-score while striking a good balance between temporal context and computational cost.17

Model comparison

Comparative analysis highlights the advantages of the transformer model over CNN-LSTM and TCN. Although CNN-LSTM performs well in ADL, its sequential nature limits its effectiveness in real-time applications. In contrast, TCN is good at capturing long-term dependencies but struggles with complex transitions between fall activities and ADLs. Transformer models address these limitations by combining parallel processing with powerful self-attention mechanisms.

Challenges with under-represented fall classes

Despite its strong performance on the MobiAct data set, the transformer model faced challenges in some under-represented fall courses, such as CSI and CSO. One potential reason for the misclassification between CSI, SCO and SDL is the limited discriminative power of signals captured by a single sensor (eg, waist or pocket). These activities involve similar vertical and transitional movements, which may be indistinguishable in torso-based IMU (inertial measurement unit) data. This issue highlights the need for additional strategies to increase model sensitivity to subtle differences in fall activity. To address this issue, techniques such as class-weighted loss functions and synthetic data augmentation can be used during training.18 Although these methods improve performance, additional improvements, such as placing sensors on the feet or legs, can capture gait dynamics and height changes, while placing sensors on the pelvis or hips can better reflect body transitions. Combining these position-aware data can improve the model’s ability to distinguish these subtle activities.

Applications in elderly care

The high accuracy and real-time capabilities of the transformer-based system make it a promising solution for deployment in wearable devices for elderly care. By continuously monitoring activity and detecting falls with minimal false alarms, the system can improve the safety and quality of life of older adults. For example, it can provide caregivers with timely alerts and detailed activity reports to intervene faster in the event of a fall. However, practical deployment of such systems requires consideration of other factors such as energy efficiency, model optimisation of edge devices and user adaptability.19 A lightweight transformer architecture and hardware-specific optimisations such as quantisation and pruning can make the system more suitable for portable applications.

Limitations and future directions

Although the proposed system exhibits strong performance, there are still some limitations. The current evaluation is based on the MobiAct data set, which significantly comprises comprehensive data sets (66 participants and over 14 million sensor records), still relies on simulated falls and controlled experimental settings. As such, further testing is needed to confirm the model’s robustness under real-world conditions and across different populations, including older adults. Furthermore, although postprocessing techniques do improve prediction stability, they introduce a slight delay that may be critical for time-sensitive applications such as fall prevention. The system’s current focus on identifying predefined activities and falls also limits its applicability; expanding the activity repertoire or enabling unsupervised detection of novel activities could address this issue. Future work could explore modality-specific ensembles or attention heads to better capture the different dynamics of each sensor type. Future work will also focus on evaluating the model on other data sets, such as SisFall, which contains a wider range of fall scenarios and sensor locations. This will allow us to evaluate the generalisability of the model to different populations and environments.20 In addition, integrating reinforcement learning techniques can enable the model to dynamically adapt to new activities and changing environments, improving its overall flexibility and real-world applicability.21

Conclusion

This study demonstrates the potential of transformer-based deep learning models for real-time activity recognition and fall detection using wearable sensor data. By leveraging the self-attention mechanism, the proposed model effectively captures complex temporal patterns, thereby achieving state-of-the-art accuracy on the MobiAct data set. The model’s overall classification accuracy exceeds 98%, outperforming traditional architectures such as CNN-LSTM and TCN, especially in differentiating difficult fall categories and ADL aspect. The system’s applicability to elderly care is particularly promising. By continuously monitoring daily activities and providing timely fall alerts, the system can significantly improve the safety and quality of life of older adults. Its real-time functionality makes it suitable for deployment in wearable devices, allowing caregivers to quickly respond to emergencies and track activity patterns for long-term health monitoring.

[ad_2]

Source link

Real-time activity and fall detection using transformer-based deep learning models for elderly care applications

Abstract

Introduction