This unawareness of the scheduler with CPU frequency increases unresponsiveness of user interface to user interactions, and consequently reduces user experience on using mobile devices..
Trang 1Extended Process Scheduler for Improving User Experience in Multi-core Mobile Systems
Giang Son Tran
ICTLab, University of Science
and Technology of Hanoi, VAST∗
tran-giang.son@usth.edu.vn
Thi Phuong Nghiem ICTLab, University of Science and Technology of Hanoi, VAST*
nghiem-thi.phuong@usth.edu.vn
Tuong Vinh Ho Institute Francophone International, Vietnam National
University IRD, UMI 209 UMMISCO ho.tuong.vinh@ifi.edu.vn Chi Mai Luong
Institute of Information Technology, VAST*
lcmai@ioit.ac.vn ABSTRACT
Mobile phone is being well integrated into people’s daily life
Due to a large amount of time spending with them, users
expect to have a good experience for their daily tasks The
mobile operating system’s scheduler is in charge of
distribut-ing CPU computational power among these tasks However,
it currently has not yet taken into account dynamic
fre-quencies of CPU cores at runtime This unawareness of the
scheduler with CPU frequency increases unresponsiveness of
user interface to user interactions, and consequently reduces
user experience on using mobile devices In this paper, we
propose an extension of process scheduler which takes into
account the dynamic CPU frequency when scheduling the
tasks Our method increases smoothness of user interface to
user interactions by lowering and stabilizing interface frame
times Experimental results show that our proposed
sched-uler reduces amount of frame time peaks up to 40%, which
helps greatly in improving user experience on mobile devices
CCS Concepts
•Software and its engineering → Scheduling;
•Human-centered computing → Smartphones;
Keywords
Process Scheduler, CPU Frequency, User Experience,
Mo-bile System, Operating System
∗Vietnam Academy of Science and Technology, 18, Hoang
Quoc Viet, Cau Giay, Hanoi, Vietnam
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full
cita-tion on the first page Copyrights for components of this work owned by others than
ACM must be honored Abstracting with credit is permitted To copy otherwise, or
re-publish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee Request permissions from permissions@acm.org.
SoICT ’16, December 08-09, 2016, Ho Chi Minh City, Viet Nam
c
DOI: http://dx.doi.org/10.1145/3011077.3011106
Nowadays, advances in computing infrastructure and tech-nology have made mobile phones become a crucial part of our daily life Almost everyone has their own mobile phones
to be used for their daily life activities such as organizing events with calendar, browsing web, sending and receiving emails, entertaining, etc
In order to meet this enormous use of mobile market, man-ufacturers are in effort of producing mobile devices with as high capabilities as possible For example, it is not uncom-mon nowadays to have mobile phones with 8 cores and low power consumption of 0.3W in a System-on-Chip model [1] This effort of manufacturers can be considered as a market-ing strategy to improve user satisfaction when usmarket-ing mobile phones
In their work, Yong et al., 2006 [2] show that user sat-isfaction on mobile devices not only depends on technolog-ical capabilities of the phones, but also on responsiveness
of mobile user interface to user interactions Unfortunately, users often tend to make excessive use of their mobile de-vices for performing many tasks at the same time For ex-ample, one may simultaneously check email, send message
to his friends, download data from the internet, listen to music, and read news These concurrent actions commonly result in high background load or unresponsiveness of user interface to user interactions, and consequently reduce user experience on mobile devices Responsiveness is one of many non-functional requirements that affect success of any mo-bile applications [3]
One direction to overcome the problem of unresponsive-ness of user interface to user interactions on mobile devices
is improving CPU allocation so that CPU can process mo-bile tasks required by users more efficiently Following this direction, studies focus on a mechanism of operating system kernel called process scheduler [4] In detail, process sched-uler is a component of operating system kernel which shares the CPU resources among running tasks according to their types (classes), priorities and CPU usages The main job
of process scheduler is to decide which tasks to execute and how long each task will be executed The output decision of process scheduler is one of the crucial criteria which affects CPU computational power as well as overall performance of
Trang 2the mobile system [5].
Another important criterion which affects user experience
on mobile phones is their energy consumption [6] If mobile
phones quickly deplete their battery power, users will easily
get annoyed and consequently encounter a negative user
ex-perience Due to this important role of energy consumption,
operating systems running on mobile devices need to
min-imize power consumption To reach this goal, one popular
method is to dynamically adjust CPU frequency with the
demand workload Following this approach, a CPU
gover-nor [4] in the operating system kernel is responsible for this
task: it increases CPU frequency when the required
work-load is high to meet this demand and vice versa
In this research context, we follow the direction of
im-proving CPU allocation so as to enhance user experience
on mobile phones By analyzing the process scheduler, we
are aware that it currently does not take into account CPU
frequency as a criterion to schedule running tasks As a
con-sequence, this increases unresponsiveness of user interface to
user interactions since the same amount of rendering work
(determined by the scheduler) has to be done in a longer
duration (as CPU frequency is controlled by the governor)
Realizing this problem of the process scheduler, in this
pa-per, we propose an extension for the Linux’s default
sched-uler (named Completely Fair Schedsched-uler, or CFS) Our
ex-tended scheduler takes into account CPU frequency into
scheduler decision when selecting appropriate running tasks
We will show that our proposal helps in improving the
smooth-ness of user interface to user interactions on mobile systems
in comparison with the Linux’s default scheduler CFS
The remainder of this paper is organized as follows
Sec-tion 2 briefly reviews related works about CPU allocaSec-tion
In Section 3, we present the concept of CFS scheduler and
point out its current limitation Section 4 is devoted to
introducing the principle and algorithm of our proposed
frequency-aware scheduler In Section 5, we describe our
experiments and an analysis of our results The paper ends
with Section 6, which includes a general conclusion and
pos-sible future works
There exist various works in the literature to improve
ef-ficiency of CPU allocation Yang et al., 2001 [7] proposed
a divide-and-conquer algorithm for improving runtime
flex-ibility and reducing computational complexity The
algo-rithm is divided into two scheduling phases: the design-time
scheduling and runtime scheduling Besides, the algorithm
proves that energy is an important criterion in scheduling
embedded multiprocessor System-on-Chips
Another work about energy-aware scheduler is done by
Rizvandi et al., 2010 [8] In detail, the authors proposed a
slack reclamation algorithm in the scheduler using a linear
combination of the processor’s maximum and minimum
fre-quencies The method helps in saving energy while still
pro-viding enough computational power for applications
Sim-ilarly, Mostafa et al., 2016 [9] proposed an energy-saving
scheduler for high performance computing systems The
authors use a relocation of thread weights for each active
process so as to decrease number of context switches
Al-though these methods [8, 9] reduce energy consumption of
the scheduler, they currently target desktop or server
sys-tems with heavy workloads rather than mobile syssys-tems with much less active threads per process
Another noticeable work, namely GRACE-OS, was pro-posed by Yuan et al., 2003 [5] in order to reduce CPU energy consumption on mobile devices using soft real-time schedul-ing In detail, the method enhances CPU scheduler by per-forming scheduling and speed scaling at the same time Al-though applicable on mobile devices, this approach mainly targets multimedia applications, which require statistical performance guarantees (for example, 96% of deadlines is met [5]), and has not yet taken into account user interac-tion latency, that is, one important aspect for ensuring user experience on mobile applications
Concerning the limitations of kernel scheduler without taken into account CPU frequency, operating system re-searchers have raised a research question of developing new Linux kernel for connecting Linux scheduler and governor [10] Some researchers discussed that it is possible to merge these two components into a single entity [11], proposing an opti-mization in using CPU power for scheduling tasks Valente
et al., [12] proposes a discussion for future research works about improving responsiveness of user interface to user in-teractions This is of importance for mobile devices since re-sponsiveness of user interface and power consumption saving are two major criteria for ensuring mobile user experience
In this work, we propose an extension for Linux kernel scheduler (CFS) which takes into account the current fre-quency of CPU cores when making scheduling decision for mobile systems Our work focuses on requirement of low la-tency for user interaction Unlike the aforementioned works,
we focus on improving user experience when interacting with mobile devices than saving energy consumption By lower-ing and stabilizlower-ing interface frame times, our work helps in increasing responsiveness of user interface to user interac-tions on mobile devices
In this section, we present internal concepts and algo-rithms of CFS, the standard scheduler of the Linux kernel
We then show a scenario in which CFS shows inefficiency in CPU allocation when its frequency is not taken into account
CFS is the Linux kernel scheduler which uses time slice estimation for selecting running tasks [13] CFS was de-veloped based on Earliest Eligible Virtual Deadline First (EEVDF) scheduler [14] To achieve high responsiveness for all tasks, CFS tries to divide a certain amount of time (called period, usually a small value with a minimum of 20ms) to all runnable tasks Time slice for task Ti is given by the following equation:
Si= ωi
Ωr
where
• Si is time slice length for the task Ti at the current decision time;
• ωi is calculated weight for Ti;
Trang 3Workload reduces
CPU Load Frame time
Low CPU Load Reduce CPU speed Frame time increases
High CPU Load
No CPU speed limit Frame time reduces Unresponsive User Interface
Time
Figure 1: Unresponsiveness of user interface (UI) to user interactions in CFS scheduler
• Ωris total weight of the whole run queue of the current
CPU (each weight represents a given process’s
prior-ity); and
• P is the target period that the scheduler tries to
exe-cute all tasks
When the number of tasks in the run queue is increased, P
will be lengthened to reduce performance overheads caused
by too many context switches in a short amount of time
CFS uses an important term called vruntime (virtual
run-time) to track performance and scheduling status of each
active thread in its whole lifecycle Virtual runtime υi of
task Ti is added after each calculated time slice:
υi= υi+ ti
ωi
where ti is the execution time of task Ti in the last
ex-ecution period, N0 is a constant (N0 = 1024) Nice is a
parameter of each task to representing its priority These
vruntime values and other scheduling informations of all
tasks are stored in CFS using a self-balancing binary tree
named “Red-Black tree” CFS tries to put the task Ti with
lowest υi to the left-most node of the tree, so that it can be
retrieved instantly in the next scheduling period
By looking into the internal CFS scheduling algorithm,
we can see that all calculations of time slice Si and υi in
equations (1) and (2) do not take into account target
fre-quency fj of the target CPU core cj in a multi-core or
multi-processor system When a running task at the
run-time is migrated from one core cjto another ckwith
differ-ent frequency fj 6= fk, or when the governor reduces core
frequency, CPU power may be greatly lost and consequently
the system would produce a very bad user experience One
example of this consequence is the case when the migrated
task is responsible for rendering user interface (UI) and the
system becomes very laggy when responding to user
inter-action on mobile devices
In order to demonstrate the current limitation of CFS, we
present a scenario where user interface is unresponsive to
user interaction in CFS scheduler The visualization of this scenario is illustrated in figure 1 Grey bars represent CPU load needed and red solid line represents the corresponding user interface’s frame time at the same moment Frame time is the duration which differentiates one fully rendered user interface’s frame from another A horizontal dotted line indicates the 16.6ms limit for each frame time, equivalent to the ability of rendering 60 frames per second (fps) If frame time is below the limit, user eyes could not perceive real differences between two consecutive frames [15] As such, animations being shown on the user interface appears as smooth and fluid Indeed, Claypool et al., 2006 [16] showed that user’s perception performance improved sevenfold when increasing frame rate from 3 to 60 fps
Figure 1 indicates two possibilities of having interface frame times higher than the optimal 16.6ms limit: overload and underload At overload time a and b, with high CPU load, the UI thread is not provided enough CPU power to main-tain the drawing process below 16.6ms System load reduces
at time c and d, leaving more CPU to the rendering thread This load reduction results in a lower interface frame time (in other words, increases interface frame rate) As sys-tem load continues to decrease (to an underload point), the governor decides that CPU frequency should be reduced to lower power consumption (between time d and e) Lower CPU frequency also leads to a reduction of CPU power pro-vided to the UI rendering thread As a result, the UI thread struggles in maintaining a good frame rate for the user in-terface since an optimal user experience must have at least 60fps, or 16.6ms per frame
Furthermore, the governor works with a larger interval than the scheduler Not until time k does the governor no-tice a high CPU load is present and bump CPU frequency
up This results in a drop of interface frame time (from time
k to time m), keeping it back to under the 16.6ms limit As
a result, the user interface is unresponsive in a duration be-tween time e to k It is caused by the unawareness of the scheduler with the lowered CPU frequency If the scheduler had been aware of this change, it would have re-prioritized the UI thread by increasing time slice length for it and reduc-ing time slice length of other runnreduc-ing background threads
By reconsidering time slices of all threads, the scheduler can
Trang 4potentially provide more CPU power and ensure its fairness
for frequency changes
In this section, we propose a frequency-aware scheduler
(hereinafter called FA-CFS) as an extension of CFS
sched-uler The main idea of our proposed scheduler is to optimize
task weight ωiand time slice Siof target task Ti according
to frequency changes
We propose a scheduler to balance workload and
differ-ence in frequencies In detail, we model a workload with
its parameters in a multi-core CPU with dynamic frequency
managed by the governor and thread scheduling tasks
man-aged by the scheduler This workload is executed in a
multi-tasking, time-sharing and preemptive operating system
Let W be a workload that performed in a single thread
and can be considered as a number of CPU cycles required to
perform a task A workload is measured as a multiplication
of speed and time In the simplest case, if this workload is
scheduled on a single core CPU with constant frequency f
(approximately proportional to number of instructions per
second), we have:
where T is the total time (in seconds) of execution
Generally, the scheduler spends a little CPU time (ζi)
for accounting and selecting the next scheduled thread after
each sampling time τi [4] This time can be considered as
performance overhead of the process scheduler Therefore,
T in equation (3) becomes:
T =
n
X
i=1
(τi+ ζi), (4) where n is the total number of sampling times during the
whole execution duration When taking ζiinto account, our
global workload in equation (3) becomes:
W = f ×
n
X
i=1
(τi+ ζi) (5)
As previously discussed, since the CPU frequency is
man-aged by the governor (in order to minimize power
consump-tion), it fluctuates at runtime based on the total workload
of the whole system As a result, CPU frequency f is not a
constant:
W =
n
X
i=1
fi× (τi+ ζi), (6)
where fi is the CPU frequency at sampling time ti Since
we have a multi-core processor, equation (6) becomes:
W =
n
X
i=1
fij× (τi+ ζi), (7)
where fijis frequency of CPU core cjat sampling time τi
Consider that our global workload W is split into n
micro-workloads ωiperformed during n sampling time: W =Pn
i=1ωi Each micro-workload at sampling time τiis therefore
calcu-lated as:
ωi= fij× (τi+ ζi) (8)
On the other hand, the governor ’s sampling time is usu-ally configured as a multiple of CFS scheduler’s time slice
in the Linux kernel: τi = π × Si In other words, CFS is working with a smaller (and finer) granularity of time than the governor ’s counterpart Thus, we have:
ωi= fij× (π × Si+ ζi) (9) Due to a large difference between the governor ’s sampling time and scheduler ’s time slice, when the running thread of the workload W is migrated from one core cj to another ck
with frequencies fij≥ fk
i, performance penalty δi(in terms
of work) of a single sampling time slot τifor a lowered CPU speed can be estimated as:
δi= ωi− ω0i≤ (fij− fik) × π × Si+ fij× ζi− fik× ζi0 (10)
On the other hand, CFS has scheduling complexity of O(logN ) (N is number of active tasks) [17] N is often un-changed unless there is a new creation of thread or process Therefore, amount of work (frequency × time) for account-ing and schedulaccount-ing is generally a constant between samplaccount-ing intervals In other words, fij× ζi= fik× ζi0 Inequality 10 can be simplified as:
δi≤ (fij− fik) × π × Si (11) During the workload duration, with m migrations or fre-quency changes, the total performance penalty (in terms of amount of work) of inequality (11) becomes:
∆ =
m
X
p=1
δp≤
m
X
p=1
((fpj− fk) × π × Sp) (12)
After defining total performance penalties because of fre-quency changes in equation (12), we can state the main ob-jective of our improvement in FA-CFS as:
Minimize
m
X
p=1
((fpj− fk
) × π × Sp) (13)
To reach this goal, in each single time slice Si, it is possible
to counteract with the changes of frequency (i.e minimiz-ing performance penalty) by providminimiz-ing more CPU compu-tational power to this particular workload The extra CPU power can be allocated to this task on core ckby increasing time slice length to Si0 (the previously allocated time slice is
Sion core cj)
When applying this counterbalance, performance penalty
δi in inequality (11) becomes
δi0≤ fj
i × π × Si− fk
i × π × S0i (14)
In an ideal situation, this performance penalty can be sur-pressed (i.e we completely counterbalance this frequency difference), δi0= 0, therefore
fij× π × Si− fik× π × Si0≥ 0 (15) Thus, we can proportionally resize the time slice scale:
Si0 ≥ f
j i
fk i
Trang 5Like in aforementioned equations (1) and (2) in section 3.1,
time slice estimation of CFS is also proportional to various
task weights and run queue weight:
ωi0
Ωr
× P ≥ f
j i
fk i
× ωi
Ωr
As a result, our scheduler can counteract with frequency
changes by proportionally distribute these weights as
fol-lows:
ωi0≥ f
j i
fk i
We implement our proposed frequency-aware scheduler in
Linux environment where our model acts as a
frequency-aware extension to the CFS scheduler We use the CPUFreq
interface to call the governor for collecting CPU
frequen-cies [18] Having extracted frequenfrequen-cies, we implement our
proposed algorithm to balance time slice in Linux’s CFS
We use the CPUFreq’s userspace sysfs interface in order to
gather statistical information
The goal of this section is to present the improvement of
responsiveness of mobile user interface to user interaction
provided by our FA-CFS scheduler in comparison with CFS
scheduler We first present the setting of our experiments,
and then provide an analysis of our experimental results
Interface Frame Time Measurement:
In order to evaluate our proposed FA-CFS scheduler, we
use interface frame time as the main metric to measure the
improvement of responsiveness of mobile user interface to
user interactions We chose to measure interface frame time
since it plays an important role in ensuring user experience
A fully rendered frame is passed through a set of steps in
Android rendering pipelines: execute the issued layout
com-mands, process the swapping buffers, prepare the texture
and finally draw the content to the screen
Evaluation Scenario:
In our experiment, we implement a popular scenario where
users browse an online news website using smartphones and
tablets, which are installed CFS and FA-CFS Since we want
to compare the efficiency of our FA-CFS with CFS, we
di-vide our scenario into two main steps where in the first step,
users were asked to browse the online news website (http:
//bbc.com in our experiments) with smartphones which was
installed CFS; and in the second step, users were asked to
browse the same online news website but with FA-CFS
in-stalled In both two steps, we recorded interface frame times
created by user interactions during their browsing sessions
A browsing session in our experiment includes: (1) User
starts the stock browser, (2) he types the URL http://bbc
com, (3) he waits for page load, and finally (4) he scrolls
up and down as soon as one or more parts of the page
content appears In this scenario, there are three
differ-ent types of workload created by the UI thread, background
network threads (to fetch data from remote server) and the
browser engine (in charge of parsing HTML and processing
JavaScript with Chronium’s V8 JavaScript engine) In or-der to avoid preloaded images, we clear the browser cache before starting each experiment session
We involved a total of 5 users in our experiment With each user, we asked them to perform 16 browsing sessions (8 on each of the two Android devices, described later) On each device, users performed 4 sessions with CFS scheduler and 4 sessions with our FA-CFS scheduler With each sched-uler, 4 governors with different characteristics were used in order to manage rising and declining system load with fre-quency ramp up and ramp down The governors included
in our experiments are interactive (default, fastest ramp up with intermediate frequencies, best latency), conservative (slow ramp up), ondemand (fast ramp up, fast ramp down, almost between minimum and maximum frequencies), and performance (keep highest frequency, waste energy) [18] Technical Choices:
On the hardware side, our experiments are performed on two categories of Android devices: one LG Nexus 4 and one Asus Nexus 7 Wifi (2012), representing phone and tablet, respectively LG Nexus 4 has a better hardware configura-tion (RAM is doubled and 30% better CPU core frequency) than the Nexus 7 Wifi 2012
On the software side, we build from source an aftermarket open-source operating system called CyanogenMod, based
on Android Open Source Project (AOSP) We use the latest version of CyanogenMod with their supported Linux kernel
to implement our model We decided to build Cyanogen-Mod from source because of the ability to customize Linux kernel and flash (or install) the kernel along with the whole operating system into our devices
We use an Android’s developer option called “Profile GPU rendering” to monitor and gather interface frame times dur-ing the experiments We then use Android’s integrated
“dumpsys” tool on the mobile devices to collect through an USB cable various statistic informations, including the mon-itored interface frame times
Interface Frame Time Peaks:
Figure 2 shows a set of captured frame times from one user session on the LG Nexus 4 with CFS and interactive gover-nor It can be seen from this figure that frame times during this session are not stabilized, but generally are smaller than the optimal 16.6ms In the first part of this session (frame
0 - 100), frame times were relatively high because the web browser needs to perform 3 tasks at the same time: fetch-ing web content, parsfetch-ing partial HTML contents as they ar-rive, and rendering them on the screen Rendering thread is not provided with enough computational power because the background threads are overloading the CPU, thus the UI thread struggles to maintain an optimal frame time Since frame 125, page fetching and HTML parsing tasks are fin-ished, but there exist very high frame times, some exceeded 40ms
These peaks (or spikes) cause “micro stuttering”, a term used to indicate irregular delays between frames being ren-dered [19] Micro stuttering decreases user experience, even though the average frame rate is high enough These high frame time peaks can be explained as a consequence of CPU core frequency changes and the UI thread suffers from these
Trang 60
10
20
30
40
50
Frame
Draw Prepare Process Execute 16.6ms (60fps) limit
Choppy Frames with Frame Time Peaks High frame times, Unresponsive User Interface
Figure 2: Interface frame time peaks on the LG Nexus 4 with CFS scheduler and Interactive governor
Table 1: Average frame time percentile (ms) of CFS vs FA-CFS with 4 governors on Nexus 7 Wifi
90 17.18 14.27 -16.9% 17.94 15.02 -16.3% 22.26 21.19 -4.8% 11.93 11.81 -1.0%
91 17.33 14.46 -16.6% 18.11 15.24 -15.8% 22.64 21.55 -4.8% 12.36 12.21 -1.2%
92 17.76 14.68 -17.3% 18.45 15.45 -16.3% 22.83 22.77 -0.3% 12.75 12.6 -1.2%
93 18.22 14.93 -18.1% 18.72 15.65 -16.4% 23.34 23.19 -0.6% 13.29 13.42 1.0%
94 18.89 15.45 -18.2% 19.96 15.88 -20.4% 23.87 23.56 -1.3% 13.98 14.05 0.5%
95 20.44 15.98 -21.8% 22.89 16.25 -29.0% 27.6 28.02 1.5% 14.73 14.61 -0.8%
96 23.13 16.08 -30.5% 23.49 16.73 -28.8% 30.03 31.62 5.3% 15.62 15.88 1.7%
97 27.25 17.21 -36.8% 29.83 21.29 -28.6% 34.53 33.24 -3.7% 16.79 16.47 -1.9%
98 31.29 23.16 -26.0% 32.83 25.57 -22.1% 45.68 43.92 -3.9% 19.03 19.49 2.4%
99 38.94 29.12 -25.2% 41.77 35.1 -16.0% 60.34 62.17 3.0% 23.68 25.23 6.5%
100 48.58 37.3 -23.2% 55.12 44.05 -20.1% 83.1 77.38 -6.9% 31.21 30.85 -1.2%
Table 2: Average frame time percentile (ms) of CFS vs FA-CFS with 4 governors on Nexus 4
90 14.54 14.03 -3.5% 15.41 13.14 -14.7% 20.75 21.25 2.4% 11.58 12.32 6.4%
91 14.78 14.18 -4.1% 15.62 13.45 -13.9% 21.07 22.32 5.9% 11.86 12.53 5.6%
92 14.82 14.25 -3.8% 16.03 13.87 -13.5% 22.37 22.81 2.0% 12.15 12.85 5.8%
93 15.48 14.81 -4.3% 16.19 14.32 -11.6% 23.86 23.99 0.5% 12.51 13.28 6.2%
94 15.63 15.27 -2.3% 16.49 14.67 -11.0% 25.91 26.46 2.1% 12.99 13.75 5.9%
95 16.03 15.42 -3.8% 16.91 15.29 -9.6% 27.77 27.29 -1.7% 13.34 14.06 5.4%
96 16.49 15.86 -3.8% 17.5 15.83 -9.5% 31.46 31.86 1.3% 14.02 14.5 3.4%
97 17.58 16.56 -5.8% 18.04 16.75 -7.2% 36.17 35.82 -1.0% 15.16 16.42 8.3%
98 18.24 17.14 -6.0% 19.27 19.41 0.7% 40.34 41.19 2.1% 16.42 17.31 5.4%
99 24.52 20.36 -17.0% 23.83 22.01 -7.6% 50.48 52.16 3.3% 19.32 19.38 0.3%
100 35.36 30.31 -14.3% 36.65 33.41 -8.8% 73.55 71.81 -2.4% 24.98 25.21 0.9%
Trang 7differences, similar to the scenario that we previously
dis-cussed in section 3, figure 1
Frame Time Percentile:
In order to analyze the effectiveness of our FA-CFS
sched-uler, we use the statistical metric frame time percentile The
metric is described as follows: an xthframe percentile at y
milliseconds shows that during the experiment, x% of all
frame times are less than y milliseconds Frame time
per-centile represents the stability of frame time and thus, the
“quality” of user experience in interactions In this part of
evaluation, we focus on analyzing average frame time
per-centile of all user sessions
Tables 1 and 2 show average frame time percentiles of
all user sessions on both devices, the LG Nexus 4 and the
Asus Nexus 7 Wifi 2012, with 4 different governors It is
ex-pected that frame time percentiles of the Nexus 7 are larger
than the Nexus 4’s one, because the Nexus 7 has lower
hard-ware configuration yet higher screen resolution It is worth
reminding that interactive is the default governor on most
mobile phones
With the two highly dynamic governors, interactive and
ondemand, these tables show a general observation that
FA-CFS achieves better frame time reduction with the Nexus
7 than the Nexus 4 The Nexus 7 benefits greatly from
our time slice optimization, with average 21.8% and 29%
frame time decreased (with interactive and ondemand,
re-spectively) for 95% amount of total rendered frames While
showing less improvement regarding frame time percentiles,
FA-CFS still achieves 3.8% and 9.6% enhancement These
differences between the Nexus 7 and Nexus 4 can be
inter-preted as difference in hardware configuration (30% faster
CPU and 4% less screen pixels on Nexus 4 than Nexus 7)
Not only does our frequency-aware FA-CFS scheduler
re-duces average frame times but also it provides better frame
time stabilization than traditional CFS: 97th, 98thand 99th
frame time percentiles provide big improvements on both
devices Especially, with better 99thpercentile (25.2% and
16% reduction for interactive and ondemand on Nexus 7),
user has smoother and more responsive interface as well as
experiences less micro stuttering frames during their
inter-actions
Furthermore, it can be seen from table 1 that FA-CFS
achieves considerably lower average frame time than CFS
with interactive The lowest gain (from the lowest level 90th
to 95th) is 16.9% The difference starts increasing at 96th
percentile (30%), reaches its peak at 97th(36.8%) and still
keeps a wide margin until 100th (maximum frame time)
Additionally, we can see an improvement in terms of frame
time stabilization of FA-CFS with its ability to keep 97%
number of frames under 16.6ms limit (instead of under 90%
with the mainline CFS) on Asus Nexus 7 Wifi
On the other hand, the right halves of these tables
ex-hibit less improvements for both devices with conservative
and performance governors These are less dynamic
gov-ernors than the previously discussed interactive and
onde-mand counterparts We observed that with the completely
static governor performance, our FA-CFS barely achieved
improvements throughout all user sessions This can be
ex-plained that performance always provides maximum CPU
computational power to all possible threads without
fre-quency changes (f
j
f k i
= 1) Its conservative sibling achieves
0 1 10 100 1000
0 5 10 15 20 25 30 35 40
Frame Time (ms)
Figure 3: Frame time distribution of CFS on Mo-torola Moto X 2nd edition
0 1 10 100 1000
0 5 10 15 20 25 30 35 40
Frame Time (ms)
Figure 4: Frame time distribution of FA-CFS on Motorola Moto X 2nd edition
as little improvement as 6.9% (on Nexus 7) and 2.4% (on Nexus 4) at 100th percentile General frame time did not earned much reduction because this governor tries to mini-mize CPU frequency as much as possible without many fre-quency changes
The analyses of tables 1 and 2 above show that our FA-CFS enhances frame time stabilization, increases average frame rate and reduces frame time peaks (or spikes) with widely used governors (interactive and ondemand ) Due to this, our FA-CFS scheduler proves its efficiency in improving user experiences while interacting with mobile devices Frame Time Distribution:
In order to further analyze the effectiveness of our FA-CFS scheduler, we use another statistical metric frame time distribution For this, we setup our experiment on an addi-tional user session with a higher end mobile phone, Motorola Moto X (2nd edition) with a quad-core CPU, each running
at 2.5GHz During user interactions, we gather 1189 frame times (in approximately 19 seconds of browsing BBC home-page) of CFS and FA-CFS with interactive governor, and represent them in histograms in figures 3 and 4, respectively
It can be seen from figure 3 that even on a high end phone, CFS causes micro stuttering with frames longer than 16.6ms Some frames take even more than 32ms (doubles the 16.6ms limit) These peaks cause choppy during web content scrolling in the browser Applying our FA-CFS into Cyanogenmod greatly reduces these peaks (figure 4) Max-imum frame time for FA-CFS is 24ms, when compared to 37ms on CFS In total of 1189 frames, FA-CFS produced only 14 frames longer than the 16.6ms limit In contrast, this number of the CFS counterpart is 24 From these
Trang 8re-sults, our FA-CFS achieves a reduction of 40% frame time
peaks (24 frames down to 14 frames) Additionally, frame
times are better packed in the mean 8ms range These two
figures clearly illustrate the benefit to improve user
experi-ence of our FA-CFS, even on high end mobile device
This paper proposed a new frequency-aware process
sched-uler for improving user experience on multi-core mobile
sys-tems We built a model which acts as an extension of the
Linux default scheduler (Completely Fair Scheduler - CFS)
for taking into account the dynamic CPU frequency when
scheduling the tasks Our model helps in increasing
respon-siveness of mobile user interface to user interactions by
low-ering and stabilizing interface frame times The experiments
showed that our proposed FA-CFS scheduler reduces the
amount of frame time peaks up to 40%, which greatly brings
benefits to multi-core mobile systems where user experience
relies largely on responsiveness of user interface
Several research directions can be taken into account to
continue this work First and foremost, since our work helps
in improving user experience on mobile systems, it worths
investigating our model on various workloads to see if it
can bring benefits on larger multi-core and multi-CPU
plat-forms, i.e desktops and virtualized servers [20] Secondly,
combining our frequency-aware scheduler with
performance-oriented scheduler (e.g BFS scheduler) is also an interesting
research direction Our FA-CFS scheduler can take into
ac-count BFS scheduler’s advancements to improve UI
respon-siveness and save CPU power Last but not least, we wonder
if our frequency-aware improvement can be applied on the
Red-Black tree by restructuring it based on core frequencies
at runtime
[1] C Van Berkel Multi-core for mobile phones In
Proceedings of the Conference on Design, Automation
and Test in Europe, pages 1260–1265 European
Design and Automation Association, 2009
[2] Y G Ji, J H Park, C Lee, and M H Yun A
usability checklist for the usability evaluation of
mobile phone user interface International Journal of
Human-Computer Interaction, 20(3):207–231, 2006
[3] A I Wasserman Software engineering issues for
mobile application development In Proceedings of the
FSE/SDP workshop on Future of software engineering
research, pages 397–400 ACM, 2010
[4] A Silberschatz, P Galvin, and G Gagne Applied
operating system concepts John Wiley & Sons, Inc.,
2001
[5] W Yuan and K Nahrstedt Energy-efficient soft
real-time cpu scheduling for mobile multimedia
systems In ACM SIGOPS Operating Systems Review,
volume 37, pages 149–163 ACM, 2003
[6] D Ferreira, A K Dey, and V Kostakos
Understanding human-smartphone concerns: a study
of battery life In Pervasive computing, pages 19–33
Springer, 2011
[7] P Yang, C Wong, P Marchal, F Catthoor,
D Desmet, D Verkest, and R Lauwereins
Energy-aware runtime scheduling for embedded-multiprocessor socs IEEE Design & Test of Computers, (5):46–58, 2001
[8] N B Rizvandi, J Taheri, A Y Zomaya, and Y C Lee Linear combinations of dvfs-enabled processor frequencies to modify the energy-aware scheduling algorithms In Cluster, Cloud and Grid Computing (CCGrid), 2010 10th IEEE/ACM International Conference on, pages 388–397 IEEE, 2010
[9] S M Mostafa and S Kusakabe Towards reducing energy consumption using inter-process scheduling in preemptive multitasking os In 2016 International Conference on Platform Technology and Service (PlatCon), pages 1–6, Feb 2016
[10] V Pallipadi and S B Siddha Processor power management features and process scheduler: Do we need to tie them together? LinuxConf Europe, pages 1–8, 2007
[11] J H Sch¨onherr, J Richling, M Werner, and G M¨uhl Event-driven processor power management In
Proceedings of the 1st International Conference on Energy-Efficient Computing and Networking, pages 61–70 ACM, 2010
[12] P Valente and M Andreolini Improving application responsiveness with the bfq disk i/o scheduler In Proceedings of the 5th Annual International Systems and Storage Conference, page 6 ACM, 2012
[13] C S Pabla Completely fair scheduler Linux Journal, 2009(184):4, 2009
[14] I Stoica, H Abdel-Wahab, K Jeffay, S K Baruah,
J E Gehrke, and C G Plaxton A proportional share resource allocation algorithm for real-time,
time-shared systems In Real-Time Systems Symposium, 1996., 17th IEEE IEEE, 1996
[15] C McAnlis, P Lubbers, B Jones, D Tebbs,
A Manzur, S Bennett, F d’Erfurth, B Garcia,
S Lin, I Popelyshev, et al Applying old-school video game techniques in modern web games In HTML5 Game Development Insights Springer, 2014
[16] M Claypool, K Claypool, and F Damaa The effects
of frame rate and resolution on users playing first person shooter games In Electronic Imaging 2006, pages 607101–607101 International Society for Optics and Photonics, 2006
[17] P Pawar, S Dhotre, and S Patil Cfs for addressing cpu resources in multi-core processors with aa tree International Journal of Computer Science and Information Technologies, 2014
[18] V Pallipadi and A Starikovskiy The ondemand governor In Proceedings of the Linux Symposium, volume 2, pages 215–230 sn, 2006
[19] J.-M Arnau, J.-M Parcerisa, and P Xekalakis Parallel frame rendering: trading responsiveness for energy on a mobile gpu In Proceedings of the 22nd international conference on Parallel architectures and compilation techniques IEEE Press, 2013
[20] G Von Laszewski, L Wang, A J Younge, and X He Power-aware scheduling of virtual machines in dvfs-enabled clusters In Cluster Computing and Workshops, 2009 CLUSTER’09 IEEE International Conference on, pages 1–10 IEEE, 2009