Fitness Data Analytics

I’ve been uploading my runs to Strava since 2019. Strava is an internet service where athletes can share their activity data from GPS watches with friends. It is popular among runners and bikers, but strava also has communities for a variety of other sports, like squash! Athletes can also manually upload an activity. I was interested in extracting insights from my own fitness data, so I built some visualization that can explore my running career.

Strava has several social media features. Athletes can comment on each other’s activities and give each other “kudos”, which are similar to a “like”. Since running has clear metrics for improvements, running longer or faster, I would expect runs that demonstrate growth to be the highest kudos generating posts on my strava. However, the scatter plot above shows that the factors in receiving kudos is more complex and noisey than a simple linear association with personal records.

Training programs for distance running have specific groups of runs that serve to improve an athlete in particular areas. There are long runs, tempo runs, fartleks, hill sprints, strides, etc. As an athlete aware of my own runs I able to specify which group a run should belong to based on it’s features. In this three dimensional scatter plot I applied an algorithm called Density Based Spatial Cluster of Applications with Noise (DBSCAN) to create groups of runs. At a high level, DBSCAN clusters runs by seeing how close the three features are to each other. Changing the epsilon value for this algorithm reduces the distance that DBSCAN would consider something close enough to be in the same group. At the largest epsilon value we see that DBSCAN captures when I was running in Colorado vs. Illinois and when I had a heart rate monitor vs. when I didn’t. As the epsilon values decrease DBSCAN partitions runs within those four groups (Colorado and HR monitor, Illinois and HR monitor, etc). A cluster value of -1 is consider noise and doesn’t belong to any group, which is abundantly present at the lowest epsilon value.

Zooming out and looking at the changes of my running metrics over time helps me see what phases of training I was in. For example, around January 10 I develop plantar fasciitis and had to reduce my run length. Then in July of 2020 I began building aerobic fitness, with runs that were strictly for distance. Finally, when I left my D2 cross country program in July of 2021 I began lifting weights and running less, which is seen by a drop in distance around that time.