Explainable Sleep Quality Evaluation Model Using Machine Learning Approach
Explainable Sleep Quality Evaluation Model Using Machine Learning Approach
A. Data acquisition
This research used the Fitbit Charge HR fitness tracker, a
wristband-type wearable device. The device consists of a 3-axis
accelerometer, an altimeter, and optical heart rate sensors. The
data provided includes steps, distance, calories, floors, heart rate Figure 3. Structure of explainable sleep quality evaluation model
(beats per minute) and sleep modes: asleep, awake, and really
awake, for every minute [14]. First, the ‘data load’ procedure generates a datum or data set
based on date information, e.g., YYYY/MM/DD, defined by
In the preliminary study, intraday time series data is stored in users. To construct an explainable sleep quality model, we
MongoDB, and Java 1.8 is used in order to collect Fitbit device collected data for one month. The attribute consists of 19
data from the Fitbit RESTful service, as shown in Figure 1. features, and the class label includes three sleep statuses. The
study population consists of 280 factory and office workers and
B. Heart Rate Based Sleep Quality Index covers a period from April 1st to April 30th, 2016. The total
number of collected intraday data sets is 2829. There are 256
female data sets and 2573 male, and the data of males is used
because of the data imbalance. Table 1 lists the data
characteristics used in the proposed model.
Second, we use a discretization technique, which converts
numerical (or quantitative) attributes into discrete (or
qualitative) ones, in order to generate candidate rules associated
with the three sleep statuses. The attribute domains are divided
into consecutive subintervals with a number of candidate
cut-points. There are many discretization methods which can be
applied before rule induction [18].
Figure 2. The estimation of sleep quality status [15] In this study, we used the equal size based discretization
(ESD) algorithm. This method divides the data points into p
Fig. 2 shows a flow chart of the heart rate-based sleep sub-groups which each sub-group contains an approximately
quality index. Initial data is transferred from the Fitbit cloud same number of attribute values. For the both methods, the best
server to our main server. The Fitbit charge HR denotes asleep way of determining p is by looking at the histogram and try
as “1,” awake as “2,” and really awake as “3.” If the sleep different intervals or sub-groups [19].
543
During the third procedure, a global covering rule induction for each attribute a in A do
algorithm is used to search the set of all attribute values. This Q := P – {a}
algorithm, as a component of the data mining system LERS
(Learning from Examples using Rough Sets), is based on some Compute partition Q*
rough set concepts, i.e., lower and upper approximations [20]. If Q* ⊆ {d}* then P := Q
The global covering algorithm checks if the input data is
end for
consistent, i.e., if the data does not contain conflicting examples.
If the data is inconsistent, it computes lower and upper R := P
approximations of all concepts, e.g., ‘Good,’ ’Normal,’ and end if
‘Bad.’ The following shows the process to compute a single
global covering. end
During the fourth procedure, the classifier evaluates new
decisions of test instances from a rule set which was induced
Input: the set A of all attributes, partition {d}*, on U. from training instances. Given a rule set containing rules for
Output: a single global covering, R. each class, we use the best k rules for prediction of each test
instance, with the following procedure: (1) select all the rules
begin
whose conditions (or bodies) partially match the test instance;
Compute partition A* (2) select the best k rules from the candidate rules selected in
P := A step (1); and (3) choose the class of rule with the highest
matching degree of rule conditions as the predicted class.
R := Ø
We used multiple rules in prediction because the accuracy of
if A* ⊆ {d}* rules cannot be precisely estimated, and one cannot expect that
any single rule can perfectly predict the class label of every
Low_Sleep_Minutes Sleep time with low heart rate during sleep (minutes) 20.22±65.04
Normal_Sleep_Minutes Sleep time with normal heart rate during sleep (minutes) 333.95±146.01
High_Sleep_Minutes Sleep time with high heart rate during sleep (minutes) 37.34±81.10
Low_Asleep Asleep time with low heart rate during sleep (minutes) 19.21±61.62
Low_Awake Awake time with low heart rate during sleep (minutes) 0.95±3.92
Low_Really_Awake Really wake time with low heart rate during sleep (minutes) 0.07±0.41
Normal_Asleep Asleep time with normal heart rate during sleep (minutes) 309.48±136.67
Normal_Awake Awake time with normal heart rate during sleep (minutes) 22.11±15.58
Normal_Really_Awake Really wake time with normal heart rate during sleep (minutes) 2.36±2.93
High_Asleep Asleep time with high heart rate during sleep (minutes) 31.78±73.09
High_Awake Awake time with high heart rate during sleep (minutes) 4.26±9.12
High_Really_Awake Really awake time with high heart rate during sleep (minutes) 1.29±6.22
544
example satisfying its conditions [21]. TABLE III. CLASSIFICATION PERFORMANCE RESULTS (2-FOLD CROSS
VALIDATION).
During the last procedure, we perform a cross-validation
experiment for assessing how the results of the statistical Accuracy (%) 73.0
analysis will generalize to an independent data set. In general, it 1-fold No. of rules generated 263.0
is mainly used in settings where the goal is prediction, and one The highest frequency in the rule set 10
wants to estimate how accurately a prediction model will
perform in practice. Accuracy (%) 72.0
2-fold No. of rules generated 268
The highest frequency in the rule set 10
III. EXPERIMENT RESULTS
Average number of rules 265.5
All variables, as listed in Table I, are considered as input Average
Average accuracy (±SD*) 72.5±0.5
attributes to construct an explainable sleep quality evaluation
*SD, standard deviation.
model. For discretization, the size of sub-intervals was defined as
50 percent of the total number of instances. In the case of rule
induction, missing attribute values, labeled ‘Lost’, are ignored, as I) Sleep quality status ‘Bad’
in the previous study of [22]. Rule 18) IF 17.85 < BMI < 25.21 AND Smoking is Yes AND
61.81 < Normal_Avg_HR < 79.0 AND 0 < Normal_Awake <
We used 2-fold cross validation to provide an unbiased 19.0 AND 1.5 < Normal_Really_Awake < 24.0 AND 1.5 <
estimate of the generalization error. The full dataset was High_Asleep < 606.5 AND 1.5 < High_Awake < 155.0 AND
randomly divided to into 2 subsets: one subset was for training
0.5 < High_Really_Awake < 175.5 THEN Sleep Quality Status
(50%), and the other subset was used for testing (50%). The
is ‘Bad’ with support 8
process was then repeated twice.
Rule 25) IF 17.85 < BMI < 25.21 AND Smoking is No AND
The performance of the explainable sleep quality model was 0.0 < Normal_Asleep < 324.5 AND 0 < Normal_Awake < 19.0
evaluated using the accuracy evaluation criterion. A confusion AND 1.5 < Normal_Really_Awake < 24.0 AND 1.5 <
matrix contains the difference between actual and predicted High_Asleep < 606.5 AND 1.5 < High_Awake < 155.0 AND
outcomes assigned by the classification system [23]. 0.0 < High_Really_Awake < 0.5 THEN Sleep Quality Status is
Table II shows the confusion matrix for a binary ‘Bad’ with support 8
classification problem. The accuracy can be defined by using the Rule 9) IF 25.21 < BMI < 43.42 AND Smoking is Yes AND
elements of the confusion matrix: 83.61 < High_Avg_HR < 192.16 AND 0 < Normal_Awake <
19.0 AND 1.0 < Normal_Really_Awake < 0.5 AND 1.5 <
High_Awake < 155.0 AND 0.5 < High_Really_Awake < 175.5
Accuracy = (TP+TN) / (TP+FN+TN+FP) * 100 (2) THEN Sleep Quality Status is ‘Bad’ with support 7
545
AND 0.0 < Normal_Asleep < 324.5 AND 0 < Normal_Awake in primary insomnia,” J. Psychosom. Res., vol. 53, no. 3, pp. 737–740,
Sep. 2002.
< 19.0 AND 1.5 < Normal_Really_Awake < 24.0 AND 1.5 <
[6] S. L. Beck, A. L. Schwartz, G. Towsley, W. Dudley, and A. Barsevick,
High_Asleep < 606.5 AND 1.5 < High_Awake < 155.0 AND “Psychometric evaluation of the Pittsburgh Sleep Quality Index in cancer
0.0 < High_Really_Awake < 0.5 THEN Sleep Quality Status is patients,” J. Pain Symptom Manage., vol. 27, no. 2, pp. 140–148, Feb.
‘Good’ with support 8. 2004.
[7] M. D. Olfert, I. Holaskova, M. L. Barr, Z. Wenjun, J. Morrell, and S. E.
Rule 14) IF 23.0 < Age < 36.5 AND 25.21 < BMI < 43.43
Colby, “Shortening Pittsburgh Sleep Quality Index Survey Using Factor
AND Smoking is Yes AND 324.5 < Normal_Asleep < 840.5 Analysis,” J. Nutr. Educ. Behav., vol. 48, no. 7, pp. S144–S145, Jul.
AND 0 < Normal_Really_Awake < 1.5 AND 1.5 < 2016.
High_Awake < 155.0 AND 0.0 < High_Really_Awake < 0.5 [8] M. de Zambotti et al., “Measures of sleep and cardiac functioning during
THEN Sleep Quality Status is ‘Good’ with support 7. sleep using a multi-sensory commercially-available wristband in
adolescents,” Physiol. Behav., vol. 158, pp. 143–149, May 2016.
Rule 19) IF Smoking is No AND 49.42 < Normal_Avg_HR < [9] T. Åkerstedt, K. Hume, D. Minors, and J. Waterhouse, “The meaning of
61.81 AND 0 < Normal_Awake < 0.5 AND 324.5 < good sleep: a longitudinal study of polysomnography and subjective
Normal_Asleep < 840.5 AND 19.0 < Normal_Awake < 109.0 sleep quality,” J. Sleep Res., vol. 3, no. 3, pp. 152–158, Sep. 1994.
AND 0 < Normal_Really_Awake < 1.5 AND 0.0 < [10] Z. Chen et al., “Unobtrusive sleep monitoring using smartphones,” in
2013 7th International Conference on Pervasive Computing
High_Really_Awake < 0.5 THEN Sleep Quality Status is Technologies for Healthcare and Workshops, 2013, pp. 145–152.
‘Good’ with support 7. [11] B. Bei, J. Milgrom, J. Ericksen, and J. Trinder, “Subjective Perception of
Sleep, but not its Objective Quality, is Associated with Immediate
The above rules are the most frequent or dominant rules used Postpartum Mood Disturbances in Healthy Women,” Sleep, vol. 33, no.
4, pp. 531–538, Apr. 2010.
to interpret the three sleep statuses. Although it may not be a [12] C. Bernardeschi, M. G. C. A. Cimino, A. Domenici, and G. Vaglini,
general logical representation, the differences in results between “Using Smartwatch Sensors to Support the Acquisition of Sleep Quality
‘Bad’ vs. ‘Normal’ and ‘Bad’ vs ‘Good’ can explain the Data for Supervised Machine Learning,” in Wireless Mobile
following consequences: when comparing the ‘Bad’ 18-th rule Communication and Healthcare, 2016, pp. 251–259.
[13] K. A. Kaplan, P. P. Hardas, S. Redline, and J. M. Zeitzer, “Correlates of
and the ‘Normal’ 56-th rule, smoking is a part of the ‘Bad’ status sleep quality in midlife and beyond: a machine learning analysis,” Sleep
and a high heart rate is frequently maintained when awake Med., vol. 34, pp. 162–167, Jun. 2017.
during sleep. A comparison of the ‘Normal’ and the ‘Good’ [14] “Fitbit.” [Online]. Available: https://dev.fitbit.com/kr. [Accessed:
statuses shows that the ‘high_really_awake’ state, which is in a 10-Aug-2017].
[15] Dongjoo Kim, Chang-Sik Son, Won-Seok Kang, “Development of Sleep
‘Normal’ status, has a longer time with a high heart rate Quality Index Using Heart Rate,” Int. Conf. Bioeng. Biomed. Eng. 2016,
compared to the ‘Good’ status. vol. 10, no. 7, Jul. 2016.
[16] “Target Heart Rates.” [Online]. Available:
IV. CONCLUSION http://www.heart.org/HEARTORG/HealthyLiving/PhysicalActivity/Fitn
%20essBasics/Target-Heart-Rates_UCM_434341_Article.jsp#.WYv2W
The sleep intraday time-series data is measured through the lEjEuU. [Accessed: 10-Aug-2017].
'Fitbit charge HR', which is not as accurate as the PSG, but still [17] K. Kräuchi and A. Wirz-Justice, “Circadian clues to sleep onset
mechanisms,” Neuropsychopharmacol. Off. Publ. Am. Coll.
provides a very good measurement result. Heart rate during the
Neuropsychopharmacol., vol. 25, no. 5 Suppl, pp. S92-96, Nov. 2001.
sleep duration time was measured, and in addition, the [18] J. W. Grzymala-Busse and J. Stefanowski, “Three discretization methods
demographic characteristics and historical information of users for rule induction,” Int. J. Intell. Syst., vol. 16, no. 1, pp. 29–38, Jan.
were used by the global covering rule induction algorithm to 2001.
generate rules related to sleep quality status. From the [19] Y. Yang and G. I. Webb, “Discretization for naive-Bayes learning:
managing discretization bias and variance,” Mach. Learn., vol. 74, no. 1,
experimental results, we found it important to interpret the pp. 39–74, Jan. 2009.
difference between the three sleep quality statuses, and [20] J. W. Grzymala-Busse, “Rule Induction,” in Data Mining and
confirmed that the proposed model can more easily understand Knowledge Discovery Handbook, Springer, Boston, MA, 2005, pp.
the sleep quality status by using objective measures rather than 277–294.
[21] X. Yin and J. Han, “CPAR: Classification based on Predictive
the subjective measurement of questionnaires. Association Rules,” in Proceedings of the 2003 SIAM International
Conference on Data Mining, Society for Industrial and Applied
REFERENCES Mathematics, 2003, pp. 331–335.
[22] J. W. Grzymala-Busse and W. J. Grzymala-Busse, “Handling Missing
[1] O. J. Walch, A. Cochran, and D. B. Forger, “A global quantification of Attribute Values,” in Data Mining and Knowledge Discovery Handbook,
‘normal’ sleep schedules using smartphone data,” Sci. Adv., vol. 2, no. 5, Springer, Boston, MA, 2009, pp. 33–51.
pp. e1501705–e1501705, May 2016. [23] C.-S. Son, Y.-N. Kim, H.-S. Kim, H.-S. Park, and M.-S. Kim,
[2] C. Asher, “Scientists identify molecular link between sleep and mood,” “Decision-making model for early diagnosis of congestive heart failure
Science, Feb. 2016. using rough set and decision tree approaches,” J. Biomed. Inform., vol.
[3] H. K. Knudsen, L. J. Ducharme, and P. M. Roman, “Job stress and poor 45, no. 5, pp. 999–1008, Oct. 2012.
sleep quality: Data from an American sample of full-time workers,” Soc.
Sci. Med., vol. 64, no. 10, pp. 1997–2007, May 2007.
[4] D. J. Buysse, C. F. Reynolds, T. H. Monk, S. R. Berman, and D. J.
Kupfer, “The Pittsburgh sleep quality index: A new instrument for
psychiatric practice and research,” Psychiatry Res., vol. 28, no. 2, pp.
193–213, May 1989.
[5] J. Backhaus, K. Junghanns, A. Broocks, D. Riemann, and F. Hohagen,
“Test–retest reliability and validity of the Pittsburgh Sleep Quality Index
546