Implementation of A Smartphone Application in Medical Education: A Randomised Trial (iSTART)
Implementation of A Smartphone Application in Medical Education: A Randomised Trial (iSTART)
Implementation of a Smartphone
application in medical education: a
randomised trial (iSTART)
Felipe Martínez1,2*, Catalina Tobar3 and Carla Taramasco4
Abstract
Background: Smartphones are popular technologies that combine telephone communications and informatics in
portable devices. Limited evidence exists regarding their effectiveness in improving academic performance among
medical students. This study aims to assess whether a smartphone application could improve academic performance
in multiple-choice tests.
Methods: A double-masked randomised trial was held among interns at the School of Medicine of the Universidad de
Valparaiso. Participants were randomised to receive an application designed to review key concepts in Internal Medicine
and its subspecialties using clinical vignettes. Contents were selected and provided in a format akin to a mandatory
national examination required for practising medicine in Chile. Analyses were undertaken under the intention to treat
principle and missing data were handled using multiple imputation techniques.
Results: Eighty interns volunteered to participate in this trial, most were female (48 students, 60%) and had a mean age
of 25.3 ± 2.2 years. Participants showed significant experience with smartphones, with a median use of 4 years (IQR
3–6 years) and 67 (83.7%) reporting routine use in clinical practice. Intention-to-treat analyses showed significant
improvements in performance amongst students allocated to the smartphone application (mean increase of 14.5 ± 8.9
vs 9.4 ± 11.6points, p = 0.03). A reduction in total time and mean time per question was also found, which was
significant in complete-case analyses (p = 0.04).
Discussion: Smartphones were popular among medical trainees. Academic performance was significantly improved
by the use of our application, although the overall effect was smaller than expected from previous trials. This study
provides evidence that smartphone-based interventions can assist in teaching internal medicine.
Trial registration: ClinicalTrials NCT02723136.
Keywords: Medical education, Internal medicine, Smartphones, Student, medical
© The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Martínez et al. BMC Medical Education (2017) 17:168 Page 2 of 9
interest in new technologies and those with prior experi- non-pharmacological interventions [13, 14]. The complete
ences with these platforms [7]. protocol was registered in March 2016 at clinicaltrials.gov
Despite this popularity, there is limited evidence (NCT02723136) and can be reviewed at https://clinical-
regarding the effectiveness of smartphone use in improv- trials.gov/ct2/show/NCT02723136?term=NCT02723136&r
ing academic performance among medical students [9]. ank=1. A flowchart describing participant recruitment and
While there is a wide availability of applications and overall study design is shown in Fig. 1.
resources available for these platforms, only a few rando-
mised trials have addressed their effectiveness in improv- Participants
ing academic performance. In 2011, Low and coworkers Eligible participants were medical interns coursing their
published one of these studies using objective clinical last year of training at the School of Medicine of the
competence scores as a primary endpoint [10]. The lat- Universidad de Valparaiso, who had a personal Smart-
ter trial reported a statistically significant improvement phone with an iOs®- or Android®-based operating
of roughly 15% in the academic performance of students system. Only those that did not wish to participate were
allocated to receive the application. Similar findings were excluded from this study. Informed consent was ob-
seen in a second, before & after, study that was con- tained from all participants.
ducted among Obstetrics & Gynecology residents [11]. Every student sat a baseline 90-question test aimed to
Since 2003, a national examination for undergraduate resemble EUNACOM (see below) and were randomised
medical students that have completed their internships to receive a smartphone-based application training after-
is carried out in Chile. This exam (Examen Unico wards. Randomisation was carried out using permuted
Nacional de Conocimientos en Medicina - EUNACOM) blocks by a statistician that was unaware of treatment
is designed to assess the overall knowledge and practical allocation. Allocation sequences were concealed from
skills that any medical student should attain before prac- other researchers participating in this study. All partici-
tising medicine in the country. Its confection and pants were asked to complete an entry form with basic
administration are regulated by law, and its oversight demographic data, including age, sex, year of training
has been delegated to the Association of Faculties of and prior experiences with smartphones or similar
Medicine of Chile (ASOFAMECH). EUNACOM is made platforms (i.e. tablets). Data regarding academic per-
of two sections, theoretical and practical, and is consid- formance was obtained from the University, including
ered qualifying to practise medicine in Chile. The con- qualifications relevant to the area of Internal Medicine.
tents of both sections are of public knowledge and
include 1543 items distributed according to the curricu- Interventions
lar time spent training in different areas of medicine, Students allocated to receive the active intervention re-
with special emphasis on internal medicine and its sub- ceived a downloadable application that was installed in
specialties [12]. The theoretical component is evaluated their smartphones. Those allocated to the control group
using 180 multiple-choice questions delivered in two did not receive any additional training for EUNACOM.
90-min sessions. Additionally, EUNACOM provides The mobile application was devised by a team of infor-
professional title validation or equivalencies for foreign matic engineers and physicians and made available for
physicians who wish to practice medicine in Chile. Given free at the App Store® and PlayStore® for both iOs® and
the importance of this exam, several medical schools have Android® operating systems. In order to monitor adher-
implemented preparation courses for their students. How- ence, the application required an active internet connec-
ever, the methodologies used in the latter courses are tion for operation. Students also received a brief (5-min)
heterogeneous, and uncertainty exists regarding the best description on functionality that was also made available
way in which contents should be delivered. in text form as a part of the software.
This study aims to determine whether the implemen- Contents were primarily directed at the area of In-
tation of a smartphone application designed to assist in ternal Medicine, which is the most important specialty
delivering key concepts relevant to internal medicine within EUNACOM. It included a series of questions in
might improve academic performance in EUNACOM. the form of brief clinical vignettes constructed in a
format similar to the one described in EUNACOM’s
Methods website [15]. In short, these vignettes correspond to clin-
iSTART is a double-masked randomised trial that was ical scenarios against which the student must answer a
held among medical students at the School of Medicine key aspect relevant to the diagnosis, management or
of the Universidad de Valparaíso, Chile. The study proto- monitoring of several diseases. These multiple-choice
col has been drafted in compliance with the Consoli- questions must be answered from five possible options,
dated Standards of Reporting Trials (CONSORT) with only one being the correct answer. The depth of
statement as in its version adapted for trials evaluating knowledge required to answer was established using the
Martínez et al. BMC Medical Education (2017) 17:168 Page 3 of 9
Fig. 1 CONSORT Study Flowchart. This figure depicts participant’s flow within the iSTART study
provisions of the EUNACOM agenda [12]. All contents resemble EUNACOM between groups. The final test did
of the application were designed by two internists with not repeat any of the questions used within the applica-
5 years experience in developing questions for the exam. tion that was delivered to students and was held 4 weeks
Examples of these vignettes are provided in the after randomisation. This timeframe was selected in
Additional file 1. order to allow students to practise and study internal
The application had two modes to provide the afore- medicine with the application given the extent of
mentioned inquiries. In the first, study mode, students contents required by EUNACOM. Simulation tests were
were not given time constraints to answer the clinical vi- used because of the impossibility to use the actual exam
gnettes. Whenever an answer was provided, instant feed- as part of this study, since it is managed independently
back was delivered alongside a brief explanation of the from universities and kept in strict reserve by ASOFA-
key concept that was being assessed by the inquiry. In MECH. However, previous data has shown that both
the second, training mode, participants had a restricted simulation exams (baseline and final) have good correl-
time window to provide answers. This mode was ation with overall EUNACOM scores (r > 0.7, p < 0.001),
designed because of a perceived difficulty amongst in- as well as an excellent diagnostic accuracy for detecting
terns in managing time in answering questions in previ- students at risk of failing the exam(area under the ROC
ous simulations of the exam. A default of 60 s was curve 0.95, 95% CI 0.90 to 0.99) and identifying students
established, but the application allowed the user to mod- that will obtain high scores in the review (AUC 0.80, 95%
ify this timeframe to 30 or 90 s. Students had knowledge CI 0.71 to 0.88, unpublished data). The correction of both
regarding their individual performance in both modes, practice tests was undertaken by reseachers that were kept
but no additional feedback in terms of concept review unaware of allocation.
was provided in training mode. A secondary endpoint was to establish differences in
the average time required to answer clinical vignettes. In
Outcomes order to allow reliable comparisons to be made, exams
The primary outcome is the mean change in overall were conducted electronically and under supervision by
scores in a 90-question practise test designed to the research team, thus allowing an objective assessment
Martínez et al. BMC Medical Education (2017) 17:168 Page 4 of 9
of the total time required to complete the review. Data The most common operating system was Android® (51
regarding adherence was also collected. students, 63.8%). No relevant imbalances in study groups
were seen at baseline. A detailed description of these
Statistical analyses contrasts and additional information regarding study
Sample size participants is provided in Table 1.
Sample size was calculated using data regarding overall
perfomance in prior experiences with practise exams Intervention effects
and estimates from a randomised trial [10]. It was calcu- The mean score in the baseline test was of 41.1 ± 11.1
lated that a sample size of 64 participants (32 per group) points, and mean total time needed for completion of the
would be required to obtain 80% power to detect an latter review was 65.6 ± 27.0 min. Scores and completion
absolute difference of 5 points between groups, assum- times were similar between groups at baseline. Sixty-five
ing a standard deviation of 7 points for both groups at interns (81.3%) sat the final test 4 weeks after randomisa-
standard significance levels (two-tailed α of 5%). In order tion. In both groups, a significant increase in overall
to correct for up to 20% losses of follow-up, it was scores was seen, which tended to be greater among interns
sought to randomise 75 participants. All estimates were allocated to receive the smartphone application. Partici-
calculated using nQuery Advisor® 3.0 for windows. pants allocated to no intervention showed an increase of
10.6 ± 11.7 points (p < 0.001) from baseline, while interns
Analysis plan who received the smartphone application improved their
Basic descriptive statistics (means, medians, proportions, scores by 16.2 ± 8.3 points (p < 0.001).
interquartile ranges -IQR-, etc) were performed to assess Intention to treat analyses using multiple imputation
the characteristics of the study sample. Fisher’s exact test techniques showed significant differences between study
was used to evaluate univariate association of categorical groups. Missing scores were imputed using results from
variables. Quantitative variables were compared using the baseline test and allocation as independent variables
Mann-Whitney or Student’s T tests according to data in linear regression analyses. On average, interns allocated
distribution and variances. Ninety-five percent confi- to the smartphone application had an increase in scores
dence intervals were constructed whenever appropriate. that was 5 points (9%) higher than those observed in the
Missing data relevant to the primary and secondary out- no-intervention group (p = 0.03). Similar trends were seen
comes were handled using multiple imputation tech- when complete-case analyses were undertaken. When
niques. In order to reduce sampling variability due to overall scores were analysed, an absolute difference of 3.5
the imputation process, 20 datasets were generated for points was observed between groups in favour of those al-
every variable with missing data. Predictor variables located to the smartphone application, but statistical sig-
were included in this procedure using linear regression nificance was not reached (p = 0.22). Study outcomes are
for data showing normal distributions. Predictive mean briefly summarised in Table 2 and Fig. 2.
matchings were preferred to impute data for variables Students allocated to the smartphone application
with skewed distributions. All analyses were undertaken showed reductions in the total time needed to complete
by a statistician who was unaware of participant alloca- the final examination and the mean time spent per ques-
tion using Stata v12.0® (StataCorp LP, 1996–2016) under tion. Intention-to-treat analyses showed a nonsignificant
the intention-to-treat principle, but complementary reduction of 8.5 min for the first outcome and 5.7 s for
complete-case analyses were conducted as part of the latter (p = 0.08 for both). This estimate was calculated
multiple imputation techniques. using predictive mean matching due to the skewed nature
of time data, using allocation and both baseline perform-
Results ance and time required to complete the first examination
Participant characteristics as predictor variables. These differences were more
A total of 80 interns were eligible for this study, and all conservative than the ones observed in complete-case
volunteered to participate. Most were female (48 analyses. Among participants who attended the second as-
students, 60%) with a mean age of 25.3 ± 2.2 years and sessment, a 10-min reduction in overall time and a 6.7 s
had spent a median of 6 years in medical school (IQR reduction in mean time per question were found, and
6–7 years). Eighteen (22.5%) had repeated at least one both reached statistical significance (p = 0.04). Total times
course, and the median number of repetitions was 1 spent by participants answering both baseline and final
(IQR 1–3). The median time using smartphones was of questionnaires are shown in Fig. 3.
4 years (IQR 3–6 years). Most interns reported routine
use of smartphone applications in daily practice (67 Adherence
students, 83.7%), but only a third of them acknowledged The most popular mode amongst participants was study
using them for academic purposes (31 students, 38.8%). mode, which was used by 34 participants allocated to
Martínez et al. BMC Medical Education (2017) 17:168 Page 5 of 9
the intervention (85%, 95% CI 70.2–94.2%). The median students was 2 (IQR 1–4), which translated in 90 (IQR
number of questions answered during the 4-week inter- 45–180) time-limited questions (Table 3).
vention period was 258 (IQR 66–415), and the median
number of completed questionnaires per participant was Discussion
15 (IQR 14–21). Participants used the application’s train- Smartphones are commonly used devices among med-
ing mode less frequently, with only 12 students (30%, ical trainees. In this study, every eligible student had at
95%CI 16.6–46.5%) registering any activity during this least one of these gadgets at their disposal, and most
trial. The median number of tests answered by these reported considerable experience using them in their
Intention-to-treat analyses also showed a nonsignifi- their strategies. Exploring motivations to use these types
cant trend towards a reduction in total test times and of applications should be considered in future qualitative
mean time spent per inquiry. A post-hoc power calcula- research.
tion showed that the estimated power for this contrast
was of only 45%, thus making insufficient power a rea- Strengths and limitations
sonable possibility to explain this observed lack of statis- Our study is strengthened by randomisation, which
tical significance. Nonetheless, the observed reduction of greatly helps controlling biases due to selection and con-
8.5 min is relevant for interns planning to undertake founding. Contents within the application were designed
EUNACOM, and is likely to be the result of practice in by internists with experience in developing questions
answering multiple-choice questions. Clinical vignettes that resemble those used in EUNACOM. Previous data
are constructed using certain features that are typical of available at our centre had shown good correlations with
certain conditions, thus leading to patterns that students overall scores and those specific with internal medicine
exposed to the application might have been able to recog- within the review, which has translated in excellent diag-
nise faster than those allocated to the no-intervention nostic accuracy in detecting students at risk of failing
group. It could also be argued that students allocated to the examination. We also conducted active monitoring
the intervention also had more experience answering of the application’s use, which greatly helps understand-
questions on an electronic platform, thus resulting in ing our results and represents a key element when
familiarity with the interface that might have explained evaluating interventions that are self-delivered by
these findings. However, this explanation seems rather students. These data are likely to be helpful for the de-
unlikely considering the vast experience with smartphone sign of future versions or similar applications.
applications that participants had in this study. Several limitations need to be taken into consideration
Given that the intervention was devised to be self- when analysing our results. The first is that a significant
administered by students, adherence was a key aspect to proportion of students did not attend the final examin-
assess while conducting our study. Thirty-four out of 40 ation (18.7%), which resulted in the loss of key informa-
participants (85%) used the application’s study mode to tion regarding study outcomes. We chose to mitigate this
review internal medicine in this trial, which was very sat- event by using multiple imputation techniques, which
isfactory. Furthermore, the median number of questions have been established as one of the best methods available
and questionnaires completed was more than adequate to handle missing data in randomised trials [18, 19].
considering the relatively brief timeframe in which this Uncertainty always exists when estimates from multiple
study was conducted. Only a minority of students allo- imputation are used to allow the conduction of intention-
cated to the intervention (12 students, 30%) used the ap- to-treat analyses. This stems from the fact that the “miss-
plications’ training mode, the sole feature within the ing completely at random” assumption of missing data is
application in which a time restraint to respond clinical hard to confirm in practice [18]. We did not find any
vignettes was applied. This obvious contrast in use rates contrasts between participants who completed our study
reached statistical significance (p < 0.001), and might be and those who did not, and estimates from complete case
explained by performance pressure. It is possible that in- analyses were very similar to the ones obtained from mul-
terns felt discouraged to undertake activities that tiple imputation. Both facts bring reassurance regarding
recorded results in a manner similar than the one used the reliability of our imputed values. Another limitation
in the actual EUNACOM. Participants could have asso- stems from the impossibility to mask participants to the
ciated underperforming in these exercises with a poten- intervention, which could have resulted in the applica-
tial for poor results in the exam, thus leading to the tion’s contents being shared across study groups. This
observed use rates. Feedback provided by this mode did would result in a minimisation of the intervention’s effects
not include a revision of the key concept in internal between groups, and thus might explain the smaller-than-
medicine that was being assessed, thus possibly making expected difference that was found in this trial. Costs are
pressure for delivering high scores more tangible. always a relevant concern when implementing interven-
Furthermore, interns were warned that time-limited ex- tions in medical education. In this case, an investment of
ercises were accessible only once during our trial, which 50.000USD was required to develop the application and
might have led to lesser use rates in order to save this its key contents, which was covered entirely by the
component of the application after the reviewable research team. Most expenses were incurred in human
contents (study mode) were completed. Given these ex- resource honoraria. Although this might be seen as a sig-
planations and the fact that EUNACOM applies a time nificant barrier to implementation, it should be considered
limit of 60 s per question, future interventions aimed at that after this initial investment, the application was inex-
improving performance in this and/or similar tests pensive to maintain, only requiring monthly payments for
should not disregard applying time restraints as part of a server and a part-time engineer to oversee its
Martínez et al. BMC Medical Education (2017) 17:168 Page 8 of 9