Human Activity Recognition System

Sequence modeling system using 2D pose estimation and LSTM networks for efficient activity classification.

Problem

Traditional activity recognition systems rely on raw RGB video or 3D pose data, which are computationally expensive and require large datasets. This makes them unsuitable for real-time or resource-constrained environments.

Solution

Extracted 2D pose keypoints using OpenPose to reduce input dimensionality

Constructed temporal sequences using sliding window approach (32 frames)

Applied stacked LSTM networks for sequence classification

Optimized model architecture to balance accuracy and computational efficiency

Tech Stack

TensorFlowLSTMOpenPoseFlask

Architecture

Video Input → OpenPose Keypoint Extraction → Sequence Windowing → LSTM (2-layer) → Softmax Classification → Activity Output

Challenges

Maintaining high accuracy with reduced feature representation was challenging due to loss of spatial information in 2D pose. Additionally, handling temporal dependencies required careful sequence design and tuning of LSTM parameters to avoid overfitting.

What I’d Improve Next

• Upgrade to Transformer-based sequence models for better temporal learning
• Enable real-time streaming inference using optimized pipelines
• Extend to multi-person and multi-activity detection scenarios