5 min read

Resting heart rate trend from Samsung Health app data

I have started to run a bit about two month ago and a I was told the good way to track your fitness level and to train is to monitor your heart rate and to run according to heart rate zones. So I did some “research” and bought fitness tracker from Samsung. In order to set-up heart rate zones you need to know your resting heart rate and your max heart rate. The latter is (as the name suggests) the maximum possible heart rate of you heart. You can google on your own many examples of how to find out your value. Basically all of them are about to put your body into high stress (like hill running) and then check the maximum recorded value.

I´ll try to explore the resting heart rate from my fitness tracker data. Resting heart rate is described usually as the heart rate right after you wake up in the morning. Not quite exact description, but let´s check what we can get from raw data.

Load data

There is potentially cool function in Samsung´s health app to view trends of your heart rate, however not as useuful as one would expect - we will explore it later. The good news is Samsung allows to export the data right from the app´s options (unfortunately only last three months). Not great, but better than nothing. Let´s explore them:)

library(tidyverse)
library(lubridate)
library(readxl)
df_raw <- read_excel("../../static/data/heart_rate.xls", skip = 8)
df_raw %>% glimpse()
## Observations: 1,211
## Variables: 5
## $ Date               <chr> "30 Oct", "30 Oct", "30 Oct", "30 Oct", "30...
## $ Time               <chr> "21:10", "20:50", "20:40", "20:30", "20:20"...
## $ `Heart rate (bpm)` <chr> "69", "73", "69", "69", "69", "76", "91", "...
## $ Tag                <chr> "Resting", "Resting", "Resting", "General",...
## $ Notes              <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...

I can use lubridate package and create one variable DateTime from Date and Time. It will make life easier for extracting minutes and hours later on.

df_clean <- df_raw %>% 
  mutate(DateTime = ydm_hm(paste("2018", Date, Time))) %>% 
  rename(HeartRate = `Heart rate (bpm)`) %>%
  select(DateTime, HeartRate, Tag, -Date, -Time, -Notes) %>% 
  mutate(HeartRate = HeartRate %>% as.numeric())
head(df_clean)
## # A tibble: 6 x 3
##   DateTime            HeartRate Tag    
##   <dttm>                  <dbl> <chr>  
## 1 2018-10-30 21:10:00        69 Resting
## 2 2018-10-30 20:50:00        73 Resting
## 3 2018-10-30 20:40:00        69 Resting
## 4 2018-10-30 20:30:00        69 General
## 5 2018-10-30 20:20:00        69 Resting
## 6 2018-10-30 19:50:00        76 Resting

How many samples is available per “Tag”?

df_clean %>% 
  count(Tag, sort = TRUE)
## # A tibble: 6 x 2
##   Tag                n
##   <chr>          <int>
## 1 Resting          752
## 2 General          452
## 3 <NA>               3
## 4 Exercising         2
## 5 After exercise     1
## 6 Tired              1

Well, this tells me something. First of all Samsung do not provide the raw data recorded during excerices (Excersising tag). Just two values? Really? After several running and swimming workouts it is strange. It raises a question why there is such tag? Secondly, Resting and General tags seems to be worth to check. What does the “general” actually mean?

df_clean %>% 
  filter(Tag %in% c("General", "Resting")) %>% 
  ggplot(aes(DateTime, HeartRate)) + 
  geom_point() +
  geom_smooth() +
  labs(y = "heart rate [bpm]", x = "date") +
  facet_wrap(~Tag, labeller = label_both)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Interesting what Samsung consider as “Resting” :) I´m either having some problem with my heart or the app misclassify the measurements. The latter seems to be more likely as I wouldn´t feel much resting with heart beats over 80:). Or perhaps those are some measurement after workout. Let´s check how looks the distribution per hour.

df_clean %>% 
  filter(Tag %in% c("Resting", "General")) %>% 
  mutate(hour = hour(DateTime),
         minute = minute(DateTime)) %>% 
  count(Tag, hour) %>% 
  ggplot(aes(hour,n)) +
  geom_col() + 
  scale_x_continuous(breaks = scales::pretty_breaks(n = 10)) +
  labs(y = "number of measurements", x = "hour") +
  facet_wrap(~Tag, labeller = label_both)

Most of the Resting tags comes from sleeping time ~(22:00 - 06:00) and it quite fits. The General tag is somehow not clear to me - I don´t understand why there isn´t less measurements in night time. I mean, when I´m sleeping, I´m resting, right?:)

What is the proportion of General and Resting during my usual sleeping hours?

df_clean %>% 
  mutate(hour = hour(DateTime),
         minute = minute(DateTime)) %>% 
  filter(hour %in% c(23, 0:5)) %>% 
  count(Tag)
## # A tibble: 2 x 2
##   Tag         n
##   <chr>   <int>
## 1 General   120
## 2 Resting   429

Ok, here I would expect something close to zero for General tag. In my case ~28% is misclassified as General instead of correct Resting.

So what is my resting heart rate?

It seems we can´t trust to app´s internal tagging. But we can trust our own code:). Let´s forget about internal tagging logic and simply filter sleeping hours and working days only out of all data - this should be the window when I´m definately sleeping. Furthermore just the wake-up time and calculate the value out of it.

df_clean %>% 
  mutate(hour = hour(DateTime),
         minute = minute(DateTime),
         wday = wday(DateTime, label = TRUE)) %>% 
  filter(!(wday %in% c("Sat", "Sun"))) %>% 
  filter(hour %in% c(0:5)) %>% 
  ggplot(aes(DateTime, HeartRate)) + 
  geom_point() +
  geom_smooth(method = "lm") +
  labs(y = "number of measurements", x = "date", title = "Heart rate during sleep") +
  scale_y_continuous(breaks = scales::pretty_breaks(n = 10))

Cool, it seems that running really have some effect on resting heart rate. In this case it is more “sleeping heart rate”. We can see some decreasing trend from ~60 to ~50 bpm over one month.

So where is the single value?

df_clean %>% 
  mutate(hour = hour(DateTime),
         minute = minute(DateTime),
         wday = wday(DateTime, label = TRUE)) %>% 
  filter(!(wday %in% c("Sat", "Sun"))) %>% 
  filter(hour == 5) %>%  
  filter(minute > 20 & minute < 40) %>% 
  .$HeartRate %>%
  mean
## [1] 59.8

Considering my usual wake-up time window my resting heart rate is ~60bmp.

The message for me is not to take the internal tagging logic of Samsung´s app seriously.