Human Activity Recognition System
Sequence modeling system using 2D pose estimation and LSTM networks for efficient activity classification.

Problem
Traditional activity recognition systems rely on raw RGB video or 3D pose data, which are computationally expensive and require large datasets. This makes them unsuitable for real-time or resource-constrained environments.
Solution
Extracted 2D pose keypoints using OpenPose to reduce input dimensionality
Constructed temporal sequences using sliding window approach (32 frames)
Applied stacked LSTM networks for sequence classification
Optimized model architecture to balance accuracy and computational efficiency
Tech Stack
TensorFlowLSTMOpenPoseFlask
Architecture
Video Input → OpenPose Keypoint Extraction → Sequence Windowing → LSTM (2-layer) → Softmax Classification → Activity Output
Challenges
Maintaining high accuracy with reduced feature representation was challenging due to loss of spatial information in 2D pose. Additionally, handling temporal dependencies required careful sequence design and tuning of LSTM parameters to avoid overfitting.
What I’d Improve Next
- • Upgrade to Transformer-based sequence models for better temporal learning
- • Enable real-time streaming inference using optimized pipelines
- • Extend to multi-person and multi-activity detection scenarios