Ramon Figueiredo
Bachelor's degree and Master's degree in Computer Science
Ph.D. in Software and IT Engineering
DiveKineNet: End-to-End Spatio-Temporal Transformer for Marker-less Diving Kinematic Analysis
Overview
DiveKineNet is a unified end-to-end spatio-temporal transformer architecture for automated marker-less kinematic analysis of diving from monocular video. The system performs four tasks within a single differentiable framework: diver detection, 2D pose estimation (17 keypoints), center of mass tracking, and kinematic event classification.
Key Innovations
- Decoupled Spatio-Temporal Attention: Explicitly models body structure within frames and motion dynamics across temporal windows
- Physics-Informed Learning: Encodes biomechanical constraints (joint angle limits, anthropometric consistency, trajectory smoothness) as soft inductive biases
- Dual CoM Supervision: Bridges data-driven regression with biomechanical anthropometric models via consistency regularization
- Differentiable Trajectory Fitting: Enables gradient flow from physics-based parabolic residuals through the entire network
- Multi-Task Learning: Unified architecture with shared representations across all tasks (detection, pose, CoM, events)
Performance Highlights
Evaluated on 77 diving videos (31,418 frames with Diver's 2D pose) using 5-fold cross-validation:
Datasets
- DivingFy7797: 15 videos, 5,162 manually annotated frames (baseline and ablation studies)
- DivingFy46328: 77 videos, 31,418 frames with semi-automated annotations (scalability validation)
- Athletes: 4 elite Canadian national team divers (2 male, 2 female)
- Annotations: Bounding boxes, 17-keypoint poses, center of mass, kinematic events
Available Resources
All resources will be publicly released upon paper acceptance
Source Code
Complete PyTorch implementation including:
- Swin Transformer backbone integration
- Decoupled spatio-temporal attention mechanisms
- Physics-informed loss functions and differentiable trajectory fitting
- Multi-task decoder heads (pose, CoM, events)
- Training scripts with 5-fold cross-validation
- Evaluation tools and metrics computation
Trained Models
Pre-trained model checkpoints ready for deployment:
- 15-Video Baseline: Pre-trained models used in the 5-fold cross-validation, ablation studies, and the best model for deployment
- 77-Video Scalability: Pre-trained models used in the 5-fold cross-validation and the best model for production deployment
- Model architecture: 121M parameters, 57.5 GFLOPs per frame
- Includes configuration files and inference examples
Annotations & Datasets
Ground truth annotations in COCO JSON format:
- DivingFy7797: 15 videos, 7,797 frames, fully manual annotations (1 year of work annotating and validating)
- DivingFy46328: 77 videos (includes the 15 videos), 46,328 frames, semi-automated pipeline (Extra 1 year of work annotating and validating the extra 62 videos)
- Bounding boxes, 17-keypoint poses (COCO topology)
- Center of mass markers (biomechanically validated)
- Event labels (7 classes: No Dive, Highest point in the preparation phase, touch down, highest point in the flight time, lowest point in the preparation phase (maximum springboard depression), Water Contact/Entry, Under Water)
- Compatible with standard COCO dataset evaluation tools such as COCO Annotator
Video Data
Raw diving video footage:
- 77 videos from Canadian national team training sessions
- Recorded at Institut National du Sport du Québec (INS Québec)
- 1920×1080 resolution, 100 FPS, high-speed cameras
- 4 elite divers, 12 unique FINA dive codes
- Data Ownership: Videos belong to INS Québec
- Access: Available upon request for research purposes
Experimental Results
Complete experimental data for reproducibility:
- Training logs for all 5 folds (15-video and 77-video configurations)
- Ablation study results (7 configurations × 5 folds)
- Scalability analysis metrics and statistical tests
- Evaluation scripts for all figures and tables in the paper
- Raw metric CSVs for custom analysis
Publication Details
- Title: DiveKineNet: End-to-End Spatio-Temporal Transformer for Marker-less Diving Kinematic Analysis
- Authors: Ramon Figueiredo Pessoa, Rachid Aissaoui, Mathieu Charbonneau, Carlos Vazquez
- Journal: Comming soon (under review)
- Year: 2025
Citation
@article{figueiredo2025divekinenet,
title={DiveKineNet: End-to-End Spatio-Temporal Transformer for Marker-less Diving Kinematic Analysis},
author={Figueiredo Pessoa, Ramon and Aissaoui, Rachid and Charbonneau, Mathieu and Vazquez, Carlos},
year={2025},
}
Licensing
- Code: MIT License (permissive open-source)
- Annotations: Creative Commons Attribution 4.0 International (CC BY 4.0)
- Pre-trained Models: Non-commercial research use only
- Video Data: Owned by INS Québec, available upon request for approved research
Acknowledgments
This PhD research was conducted at École de technologie supérieure (ÉTS), Université du Québec, in collaboration with the Institut National du Sport du Québec (INS Québec). The study benefited from the participation and expertise of Canadian national diving team coaches and athletes. The research received valuable support from INS-Québec, MITACS, Own The Podium, and Diving Canada.
Status Update
This page will be updated with active download links and detailed documentation once the paper receives acceptance notification. For any inquiries or requests for access to specific research collaborations, please contact Ramon Figueiredo via LinkedIn.