Extended process scheduler for improving user experience in multi core mobile systems

This unawareness of the scheduler with CPU frequency increases unresponsiveness of user interface to user interactions, and consequently reduces user experience on using mobile devices..

Trang 1

Extended Process Scheduler for Improving User Experience in Multi-core Mobile Systems

Giang Son Tran

ICTLab, University of Science

and Technology of Hanoi, VAST∗

tran-giang.son@usth.edu.vn

Thi Phuong Nghiem ICTLab, University of Science and Technology of Hanoi, VAST*

nghiem-thi.phuong@usth.edu.vn

Tuong Vinh Ho Institute Francophone International, Vietnam National

University IRD, UMI 209 UMMISCO ho.tuong.vinh@ifi.edu.vn Chi Mai Luong

Institute of Information Technology, VAST*

lcmai@ioit.ac.vn ABSTRACT

Mobile phone is being well integrated into people’s daily life

Due to a large amount of time spending with them, users

expect to have a good experience for their daily tasks The

mobile operating system’s scheduler is in charge of

distribut-ing CPU computational power among these tasks However,

it currently has not yet taken into account dynamic

fre-quencies of CPU cores at runtime This unawareness of the

scheduler with CPU frequency increases unresponsiveness of

user interface to user interactions, and consequently reduces

user experience on using mobile devices In this paper, we

propose an extension of process scheduler which takes into

account the dynamic CPU frequency when scheduling the

tasks Our method increases smoothness of user interface to

user interactions by lowering and stabilizing interface frame

times Experimental results show that our proposed

sched-uler reduces amount of frame time peaks up to 40%, which

helps greatly in improving user experience on mobile devices

CCS Concepts

•Software and its engineering → Scheduling;

•Human-centered computing → Smartphones;

Keywords

Process Scheduler, CPU Frequency, User Experience,

Mo-bile System, Operating System

∗Vietnam Academy of Science and Technology, 18, Hoang

Quoc Viet, Cau Giay, Hanoi, Vietnam

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for profit or commercial advantage and that copies bear this notice and the full

cita-tion on the first page Copyrights for components of this work owned by others than

ACM must be honored Abstracting with credit is permitted To copy otherwise, or

re-publish, to post on servers or to redistribute to lists, requires prior specific permission

and/or a fee Request permissions from permissions@acm.org.

SoICT ’16, December 08-09, 2016, Ho Chi Minh City, Viet Nam

c

DOI: http://dx.doi.org/10.1145/3011077.3011106

Nowadays, advances in computing infrastructure and tech-nology have made mobile phones become a crucial part of our daily life Almost everyone has their own mobile phones

to be used for their daily life activities such as organizing events with calendar, browsing web, sending and receiving emails, entertaining, etc

In order to meet this enormous use of mobile market, man-ufacturers are in effort of producing mobile devices with as high capabilities as possible For example, it is not uncom-mon nowadays to have mobile phones with 8 cores and low power consumption of 0.3W in a System-on-Chip model [1] This effort of manufacturers can be considered as a market-ing strategy to improve user satisfaction when usmarket-ing mobile phones

In their work, Yong et al., 2006 [2] show that user sat-isfaction on mobile devices not only depends on technolog-ical capabilities of the phones, but also on responsiveness

of mobile user interface to user interactions Unfortunately, users often tend to make excessive use of their mobile de-vices for performing many tasks at the same time For ex-ample, one may simultaneously check email, send message

to his friends, download data from the internet, listen to music, and read news These concurrent actions commonly result in high background load or unresponsiveness of user interface to user interactions, and consequently reduce user experience on mobile devices Responsiveness is one of many non-functional requirements that affect success of any mo-bile applications [3]

One direction to overcome the problem of unresponsive-ness of user interface to user interactions on mobile devices

is improving CPU allocation so that CPU can process mo-bile tasks required by users more efficiently Following this direction, studies focus on a mechanism of operating system kernel called process scheduler [4] In detail, process sched-uler is a component of operating system kernel which shares the CPU resources among running tasks according to their types (classes), priorities and CPU usages The main job

of process scheduler is to decide which tasks to execute and how long each task will be executed The output decision of process scheduler is one of the crucial criteria which affects CPU computational power as well as overall performance of

Trang 2

the mobile system [5].

Another important criterion which affects user experience

on mobile phones is their energy consumption [6] If mobile

phones quickly deplete their battery power, users will easily

get annoyed and consequently encounter a negative user

ex-perience Due to this important role of energy consumption,

operating systems running on mobile devices need to

min-imize power consumption To reach this goal, one popular

method is to dynamically adjust CPU frequency with the

demand workload Following this approach, a CPU

gover-nor [4] in the operating system kernel is responsible for this

task: it increases CPU frequency when the required

work-load is high to meet this demand and vice versa

In this research context, we follow the direction of

im-proving CPU allocation so as to enhance user experience

on mobile phones By analyzing the process scheduler, we

are aware that it currently does not take into account CPU

frequency as a criterion to schedule running tasks As a

con-sequence, this increases unresponsiveness of user interface to

user interactions since the same amount of rendering work

(determined by the scheduler) has to be done in a longer

duration (as CPU frequency is controlled by the governor)

Realizing this problem of the process scheduler, in this

pa-per, we propose an extension for the Linux’s default

sched-uler (named Completely Fair Schedsched-uler, or CFS) Our

ex-tended scheduler takes into account CPU frequency into

scheduler decision when selecting appropriate running tasks

We will show that our proposal helps in improving the

smooth-ness of user interface to user interactions on mobile systems

in comparison with the Linux’s default scheduler CFS

The remainder of this paper is organized as follows

Sec-tion 2 briefly reviews related works about CPU allocaSec-tion

In Section 3, we present the concept of CFS scheduler and

point out its current limitation Section 4 is devoted to

introducing the principle and algorithm of our proposed

frequency-aware scheduler In Section 5, we describe our

experiments and an analysis of our results The paper ends

with Section 6, which includes a general conclusion and

pos-sible future works

There exist various works in the literature to improve

ef-ficiency of CPU allocation Yang et al., 2001 [7] proposed

a divide-and-conquer algorithm for improving runtime

flex-ibility and reducing computational complexity The

algo-rithm is divided into two scheduling phases: the design-time

scheduling and runtime scheduling Besides, the algorithm

proves that energy is an important criterion in scheduling

embedded multiprocessor System-on-Chips

Another work about energy-aware scheduler is done by

Rizvandi et al., 2010 [8] In detail, the authors proposed a

slack reclamation algorithm in the scheduler using a linear

combination of the processor’s maximum and minimum

fre-quencies The method helps in saving energy while still

pro-viding enough computational power for applications

Sim-ilarly, Mostafa et al., 2016 [9] proposed an energy-saving

scheduler for high performance computing systems The

authors use a relocation of thread weights for each active

process so as to decrease number of context switches

Al-though these methods [8, 9] reduce energy consumption of

the scheduler, they currently target desktop or server

sys-tems with heavy workloads rather than mobile syssys-tems with much less active threads per process

Another noticeable work, namely GRACE-OS, was pro-posed by Yuan et al., 2003 [5] in order to reduce CPU energy consumption on mobile devices using soft real-time schedul-ing In detail, the method enhances CPU scheduler by per-forming scheduling and speed scaling at the same time Al-though applicable on mobile devices, this approach mainly targets multimedia applications, which require statistical performance guarantees (for example, 96% of deadlines is met [5]), and has not yet taken into account user interac-tion latency, that is, one important aspect for ensuring user experience on mobile applications

Concerning the limitations of kernel scheduler without taken into account CPU frequency, operating system re-searchers have raised a research question of developing new Linux kernel for connecting Linux scheduler and governor [10] Some researchers discussed that it is possible to merge these two components into a single entity [11], proposing an opti-mization in using CPU power for scheduling tasks Valente

et al., [12] proposes a discussion for future research works about improving responsiveness of user interface to user in-teractions This is of importance for mobile devices since re-sponsiveness of user interface and power consumption saving are two major criteria for ensuring mobile user experience

In this work, we propose an extension for Linux kernel scheduler (CFS) which takes into account the current fre-quency of CPU cores when making scheduling decision for mobile systems Our work focuses on requirement of low la-tency for user interaction Unlike the aforementioned works,

we focus on improving user experience when interacting with mobile devices than saving energy consumption By lower-ing and stabilizlower-ing interface frame times, our work helps in increasing responsiveness of user interface to user interac-tions on mobile devices

In this section, we present internal concepts and algo-rithms of CFS, the standard scheduler of the Linux kernel

We then show a scenario in which CFS shows inefficiency in CPU allocation when its frequency is not taken into account

CFS is the Linux kernel scheduler which uses time slice estimation for selecting running tasks [13] CFS was de-veloped based on Earliest Eligible Virtual Deadline First (EEVDF) scheduler [14] To achieve high responsiveness for all tasks, CFS tries to divide a certain amount of time (called period, usually a small value with a minimum of 20ms) to all runnable tasks Time slice for task Ti is given by the following equation:

Si= ωi

Ωr

where

• Si is time slice length for the task Ti at the current decision time;

• ωi is calculated weight for Ti;

Trang 3

Workload reduces

CPU Load Frame time

Low CPU Load Reduce CPU speed Frame time increases

High CPU Load

No CPU speed limit Frame time reduces Unresponsive User Interface

Time

Figure 1: Unresponsiveness of user interface (UI) to user interactions in CFS scheduler

• Ωris total weight of the whole run queue of the current

CPU (each weight represents a given process’s

prior-ity); and

• P is the target period that the scheduler tries to

exe-cute all tasks

When the number of tasks in the run queue is increased, P

will be lengthened to reduce performance overheads caused

by too many context switches in a short amount of time

CFS uses an important term called vruntime (virtual

run-time) to track performance and scheduling status of each

active thread in its whole lifecycle Virtual runtime υi of

task Ti is added after each calculated time slice:

υi= υi+ ti

ωi

where ti is the execution time of task Ti in the last

ex-ecution period, N0 is a constant (N0 = 1024) Nice is a

parameter of each task to representing its priority These

vruntime values and other scheduling informations of all

tasks are stored in CFS using a self-balancing binary tree

named “Red-Black tree” CFS tries to put the task Ti with

lowest υi to the left-most node of the tree, so that it can be

retrieved instantly in the next scheduling period

By looking into the internal CFS scheduling algorithm,

we can see that all calculations of time slice Si and υi in

equations (1) and (2) do not take into account target

fre-quency fj of the target CPU core cj in a multi-core or

multi-processor system When a running task at the

run-time is migrated from one core cjto another ckwith

differ-ent frequency fj 6= fk, or when the governor reduces core

frequency, CPU power may be greatly lost and consequently

the system would produce a very bad user experience One

example of this consequence is the case when the migrated

task is responsible for rendering user interface (UI) and the

system becomes very laggy when responding to user

inter-action on mobile devices

In order to demonstrate the current limitation of CFS, we

present a scenario where user interface is unresponsive to

user interaction in CFS scheduler The visualization of this scenario is illustrated in figure 1 Grey bars represent CPU load needed and red solid line represents the corresponding user interface’s frame time at the same moment Frame time is the duration which differentiates one fully rendered user interface’s frame from another A horizontal dotted line indicates the 16.6ms limit for each frame time, equivalent to the ability of rendering 60 frames per second (fps) If frame time is below the limit, user eyes could not perceive real differences between two consecutive frames [15] As such, animations being shown on the user interface appears as smooth and fluid Indeed, Claypool et al., 2006 [16] showed that user’s perception performance improved sevenfold when increasing frame rate from 3 to 60 fps

Figure 1 indicates two possibilities of having interface frame times higher than the optimal 16.6ms limit: overload and underload At overload time a and b, with high CPU load, the UI thread is not provided enough CPU power to main-tain the drawing process below 16.6ms System load reduces

at time c and d, leaving more CPU to the rendering thread This load reduction results in a lower interface frame time (in other words, increases interface frame rate) As sys-tem load continues to decrease (to an underload point), the governor decides that CPU frequency should be reduced to lower power consumption (between time d and e) Lower CPU frequency also leads to a reduction of CPU power pro-vided to the UI rendering thread As a result, the UI thread struggles in maintaining a good frame rate for the user in-terface since an optimal user experience must have at least 60fps, or 16.6ms per frame

Furthermore, the governor works with a larger interval than the scheduler Not until time k does the governor no-tice a high CPU load is present and bump CPU frequency

up This results in a drop of interface frame time (from time

k to time m), keeping it back to under the 16.6ms limit As

a result, the user interface is unresponsive in a duration be-tween time e to k It is caused by the unawareness of the scheduler with the lowered CPU frequency If the scheduler had been aware of this change, it would have re-prioritized the UI thread by increasing time slice length for it and reduc-ing time slice length of other runnreduc-ing background threads

By reconsidering time slices of all threads, the scheduler can

Trang 4

potentially provide more CPU power and ensure its fairness

for frequency changes

In this section, we propose a frequency-aware scheduler

(hereinafter called FA-CFS) as an extension of CFS

sched-uler The main idea of our proposed scheduler is to optimize

task weight ωiand time slice Siof target task Ti according

to frequency changes

We propose a scheduler to balance workload and

differ-ence in frequencies In detail, we model a workload with

its parameters in a multi-core CPU with dynamic frequency

managed by the governor and thread scheduling tasks

man-aged by the scheduler This workload is executed in a

multi-tasking, time-sharing and preemptive operating system

Let W be a workload that performed in a single thread

and can be considered as a number of CPU cycles required to

perform a task A workload is measured as a multiplication

of speed and time In the simplest case, if this workload is

scheduled on a single core CPU with constant frequency f

(approximately proportional to number of instructions per

second), we have:

where T is the total time (in seconds) of execution

Generally, the scheduler spends a little CPU time (ζi)

for accounting and selecting the next scheduled thread after

each sampling time τi [4] This time can be considered as

performance overhead of the process scheduler Therefore,

T in equation (3) becomes:

T =

n

X

i=1

(τi+ ζi), (4) where n is the total number of sampling times during the

whole execution duration When taking ζiinto account, our

global workload in equation (3) becomes:

W = f ×

n

X

i=1

(τi+ ζi) (5)

As previously discussed, since the CPU frequency is

man-aged by the governor (in order to minimize power

consump-tion), it fluctuates at runtime based on the total workload

of the whole system As a result, CPU frequency f is not a

constant:

W =

n

X

i=1

fi× (τi+ ζi), (6)

where fi is the CPU frequency at sampling time ti Since

we have a multi-core processor, equation (6) becomes:

W =

n

X

i=1

fij× (τi+ ζi), (7)

where fijis frequency of CPU core cjat sampling time τi

Consider that our global workload W is split into n

micro-workloads ωiperformed during n sampling time: W =Pn

i=1ωi Each micro-workload at sampling time τiis therefore

calcu-lated as:

ωi= fij× (τi+ ζi) (8)

On the other hand, the governor ’s sampling time is usu-ally configured as a multiple of CFS scheduler’s time slice

in the Linux kernel: τi = π × Si In other words, CFS is working with a smaller (and finer) granularity of time than the governor ’s counterpart Thus, we have:

ωi= fij× (π × Si+ ζi) (9) Due to a large difference between the governor ’s sampling time and scheduler ’s time slice, when the running thread of the workload W is migrated from one core cj to another ck

with frequencies fij≥ fk

i, performance penalty δi(in terms

of work) of a single sampling time slot τifor a lowered CPU speed can be estimated as:

δi= ωi− ω0i≤ (fij− fik) × π × Si+ fij× ζi− fik× ζi0 (10)

On the other hand, CFS has scheduling complexity of O(logN ) (N is number of active tasks) [17] N is often un-changed unless there is a new creation of thread or process Therefore, amount of work (frequency × time) for account-ing and schedulaccount-ing is generally a constant between samplaccount-ing intervals In other words, fij× ζi= fik× ζi0 Inequality 10 can be simplified as:

δi≤ (fij− fik) × π × Si (11) During the workload duration, with m migrations or fre-quency changes, the total performance penalty (in terms of amount of work) of inequality (11) becomes:

∆ =

m

X

p=1

δp≤

m

X

p=1

((fpj− fk) × π × Sp) (12)

After defining total performance penalties because of fre-quency changes in equation (12), we can state the main ob-jective of our improvement in FA-CFS as:

Minimize

m

X

p=1

((fpj− fk

) × π × Sp) (13)

To reach this goal, in each single time slice Si, it is possible

to counteract with the changes of frequency (i.e minimiz-ing performance penalty) by providminimiz-ing more CPU compu-tational power to this particular workload The extra CPU power can be allocated to this task on core ckby increasing time slice length to Si0 (the previously allocated time slice is

Sion core cj)

When applying this counterbalance, performance penalty

δi in inequality (11) becomes

δi0≤ fj

i × π × Si− fk

i × π × S0i (14)

In an ideal situation, this performance penalty can be sur-pressed (i.e we completely counterbalance this frequency difference), δi0= 0, therefore

fij× π × Si− fik× π × Si0≥ 0 (15) Thus, we can proportionally resize the time slice scale:

Si0 ≥ f

j i

fk i

Trang 5

Like in aforementioned equations (1) and (2) in section 3.1,

time slice estimation of CFS is also proportional to various

task weights and run queue weight:

ωi0

Ωr

× P ≥ f

j i

fk i

× ωi

Ωr

As a result, our scheduler can counteract with frequency

changes by proportionally distribute these weights as

fol-lows:

ωi0≥ f

j i

fk i

We implement our proposed frequency-aware scheduler in

Linux environment where our model acts as a

frequency-aware extension to the CFS scheduler We use the CPUFreq

interface to call the governor for collecting CPU

frequen-cies [18] Having extracted frequenfrequen-cies, we implement our

proposed algorithm to balance time slice in Linux’s CFS

We use the CPUFreq’s userspace sysfs interface in order to

gather statistical information

The goal of this section is to present the improvement of

responsiveness of mobile user interface to user interaction

provided by our FA-CFS scheduler in comparison with CFS

scheduler We first present the setting of our experiments,

and then provide an analysis of our experimental results

Interface Frame Time Measurement:

In order to evaluate our proposed FA-CFS scheduler, we

use interface frame time as the main metric to measure the

improvement of responsiveness of mobile user interface to

user interactions We chose to measure interface frame time

since it plays an important role in ensuring user experience

A fully rendered frame is passed through a set of steps in

Android rendering pipelines: execute the issued layout

com-mands, process the swapping buffers, prepare the texture

and finally draw the content to the screen

Evaluation Scenario:

In our experiment, we implement a popular scenario where

users browse an online news website using smartphones and

tablets, which are installed CFS and FA-CFS Since we want

to compare the efficiency of our FA-CFS with CFS, we

di-vide our scenario into two main steps where in the first step,

users were asked to browse the online news website (http:

//bbc.com in our experiments) with smartphones which was

installed CFS; and in the second step, users were asked to

browse the same online news website but with FA-CFS

in-stalled In both two steps, we recorded interface frame times

created by user interactions during their browsing sessions

A browsing session in our experiment includes: (1) User

starts the stock browser, (2) he types the URL http://bbc

com, (3) he waits for page load, and finally (4) he scrolls

up and down as soon as one or more parts of the page

content appears In this scenario, there are three

differ-ent types of workload created by the UI thread, background

network threads (to fetch data from remote server) and the

browser engine (in charge of parsing HTML and processing

JavaScript with Chronium’s V8 JavaScript engine) In or-der to avoid preloaded images, we clear the browser cache before starting each experiment session

We involved a total of 5 users in our experiment With each user, we asked them to perform 16 browsing sessions (8 on each of the two Android devices, described later) On each device, users performed 4 sessions with CFS scheduler and 4 sessions with our FA-CFS scheduler With each sched-uler, 4 governors with different characteristics were used in order to manage rising and declining system load with fre-quency ramp up and ramp down The governors included

in our experiments are interactive (default, fastest ramp up with intermediate frequencies, best latency), conservative (slow ramp up), ondemand (fast ramp up, fast ramp down, almost between minimum and maximum frequencies), and performance (keep highest frequency, waste energy) [18] Technical Choices:

On the hardware side, our experiments are performed on two categories of Android devices: one LG Nexus 4 and one Asus Nexus 7 Wifi (2012), representing phone and tablet, respectively LG Nexus 4 has a better hardware configura-tion (RAM is doubled and 30% better CPU core frequency) than the Nexus 7 Wifi 2012

On the software side, we build from source an aftermarket open-source operating system called CyanogenMod, based

on Android Open Source Project (AOSP) We use the latest version of CyanogenMod with their supported Linux kernel

to implement our model We decided to build Cyanogen-Mod from source because of the ability to customize Linux kernel and flash (or install) the kernel along with the whole operating system into our devices

We use an Android’s developer option called “Profile GPU rendering” to monitor and gather interface frame times dur-ing the experiments We then use Android’s integrated

“dumpsys” tool on the mobile devices to collect through an USB cable various statistic informations, including the mon-itored interface frame times

Interface Frame Time Peaks:

Figure 2 shows a set of captured frame times from one user session on the LG Nexus 4 with CFS and interactive gover-nor It can be seen from this figure that frame times during this session are not stabilized, but generally are smaller than the optimal 16.6ms In the first part of this session (frame

0 - 100), frame times were relatively high because the web browser needs to perform 3 tasks at the same time: fetch-ing web content, parsfetch-ing partial HTML contents as they ar-rive, and rendering them on the screen Rendering thread is not provided with enough computational power because the background threads are overloading the CPU, thus the UI thread struggles to maintain an optimal frame time Since frame 125, page fetching and HTML parsing tasks are fin-ished, but there exist very high frame times, some exceeded 40ms

These peaks (or spikes) cause “micro stuttering”, a term used to indicate irregular delays between frames being ren-dered [19] Micro stuttering decreases user experience, even though the average frame rate is high enough These high frame time peaks can be explained as a consequence of CPU core frequency changes and the UI thread suffers from these

Trang 6

0

10

20

30

40

50

Frame

Draw Prepare Process Execute 16.6ms (60fps) limit

Choppy Frames with Frame Time Peaks High frame times, Unresponsive User Interface

Figure 2: Interface frame time peaks on the LG Nexus 4 with CFS scheduler and Interactive governor

Table 1: Average frame time percentile (ms) of CFS vs FA-CFS with 4 governors on Nexus 7 Wifi

90 17.18 14.27 -16.9% 17.94 15.02 -16.3% 22.26 21.19 -4.8% 11.93 11.81 -1.0%

91 17.33 14.46 -16.6% 18.11 15.24 -15.8% 22.64 21.55 -4.8% 12.36 12.21 -1.2%

92 17.76 14.68 -17.3% 18.45 15.45 -16.3% 22.83 22.77 -0.3% 12.75 12.6 -1.2%

93 18.22 14.93 -18.1% 18.72 15.65 -16.4% 23.34 23.19 -0.6% 13.29 13.42 1.0%

94 18.89 15.45 -18.2% 19.96 15.88 -20.4% 23.87 23.56 -1.3% 13.98 14.05 0.5%

95 20.44 15.98 -21.8% 22.89 16.25 -29.0% 27.6 28.02 1.5% 14.73 14.61 -0.8%

96 23.13 16.08 -30.5% 23.49 16.73 -28.8% 30.03 31.62 5.3% 15.62 15.88 1.7%

97 27.25 17.21 -36.8% 29.83 21.29 -28.6% 34.53 33.24 -3.7% 16.79 16.47 -1.9%

98 31.29 23.16 -26.0% 32.83 25.57 -22.1% 45.68 43.92 -3.9% 19.03 19.49 2.4%

99 38.94 29.12 -25.2% 41.77 35.1 -16.0% 60.34 62.17 3.0% 23.68 25.23 6.5%

100 48.58 37.3 -23.2% 55.12 44.05 -20.1% 83.1 77.38 -6.9% 31.21 30.85 -1.2%

Table 2: Average frame time percentile (ms) of CFS vs FA-CFS with 4 governors on Nexus 4

90 14.54 14.03 -3.5% 15.41 13.14 -14.7% 20.75 21.25 2.4% 11.58 12.32 6.4%

91 14.78 14.18 -4.1% 15.62 13.45 -13.9% 21.07 22.32 5.9% 11.86 12.53 5.6%

92 14.82 14.25 -3.8% 16.03 13.87 -13.5% 22.37 22.81 2.0% 12.15 12.85 5.8%

93 15.48 14.81 -4.3% 16.19 14.32 -11.6% 23.86 23.99 0.5% 12.51 13.28 6.2%

94 15.63 15.27 -2.3% 16.49 14.67 -11.0% 25.91 26.46 2.1% 12.99 13.75 5.9%

95 16.03 15.42 -3.8% 16.91 15.29 -9.6% 27.77 27.29 -1.7% 13.34 14.06 5.4%

96 16.49 15.86 -3.8% 17.5 15.83 -9.5% 31.46 31.86 1.3% 14.02 14.5 3.4%

97 17.58 16.56 -5.8% 18.04 16.75 -7.2% 36.17 35.82 -1.0% 15.16 16.42 8.3%

98 18.24 17.14 -6.0% 19.27 19.41 0.7% 40.34 41.19 2.1% 16.42 17.31 5.4%

99 24.52 20.36 -17.0% 23.83 22.01 -7.6% 50.48 52.16 3.3% 19.32 19.38 0.3%

100 35.36 30.31 -14.3% 36.65 33.41 -8.8% 73.55 71.81 -2.4% 24.98 25.21 0.9%

Trang 7

differences, similar to the scenario that we previously

dis-cussed in section 3, figure 1

Frame Time Percentile:

In order to analyze the effectiveness of our FA-CFS

sched-uler, we use the statistical metric frame time percentile The

metric is described as follows: an xthframe percentile at y

milliseconds shows that during the experiment, x% of all

frame times are less than y milliseconds Frame time

per-centile represents the stability of frame time and thus, the

“quality” of user experience in interactions In this part of

evaluation, we focus on analyzing average frame time

per-centile of all user sessions

Tables 1 and 2 show average frame time percentiles of

all user sessions on both devices, the LG Nexus 4 and the

Asus Nexus 7 Wifi 2012, with 4 different governors It is

ex-pected that frame time percentiles of the Nexus 7 are larger

than the Nexus 4’s one, because the Nexus 7 has lower

hard-ware configuration yet higher screen resolution It is worth

reminding that interactive is the default governor on most

mobile phones

With the two highly dynamic governors, interactive and

ondemand, these tables show a general observation that

FA-CFS achieves better frame time reduction with the Nexus

7 than the Nexus 4 The Nexus 7 benefits greatly from

our time slice optimization, with average 21.8% and 29%

frame time decreased (with interactive and ondemand,

re-spectively) for 95% amount of total rendered frames While

showing less improvement regarding frame time percentiles,

FA-CFS still achieves 3.8% and 9.6% enhancement These

differences between the Nexus 7 and Nexus 4 can be

inter-preted as difference in hardware configuration (30% faster

CPU and 4% less screen pixels on Nexus 4 than Nexus 7)

Not only does our frequency-aware FA-CFS scheduler

re-duces average frame times but also it provides better frame

time stabilization than traditional CFS: 97th, 98thand 99th

frame time percentiles provide big improvements on both

devices Especially, with better 99thpercentile (25.2% and

16% reduction for interactive and ondemand on Nexus 7),

user has smoother and more responsive interface as well as

experiences less micro stuttering frames during their

inter-actions

Furthermore, it can be seen from table 1 that FA-CFS

achieves considerably lower average frame time than CFS

with interactive The lowest gain (from the lowest level 90th

to 95th) is 16.9% The difference starts increasing at 96th

percentile (30%), reaches its peak at 97th(36.8%) and still

keeps a wide margin until 100th (maximum frame time)

Additionally, we can see an improvement in terms of frame

time stabilization of FA-CFS with its ability to keep 97%

number of frames under 16.6ms limit (instead of under 90%

with the mainline CFS) on Asus Nexus 7 Wifi

On the other hand, the right halves of these tables

ex-hibit less improvements for both devices with conservative

and performance governors These are less dynamic

gov-ernors than the previously discussed interactive and

onde-mand counterparts We observed that with the completely

static governor performance, our FA-CFS barely achieved

improvements throughout all user sessions This can be

ex-plained that performance always provides maximum CPU

computational power to all possible threads without

fre-quency changes (f

j

f k i

= 1) Its conservative sibling achieves

0 1 10 100 1000

0 5 10 15 20 25 30 35 40

Frame Time (ms)

Figure 3: Frame time distribution of CFS on Mo-torola Moto X 2nd edition

0 1 10 100 1000

0 5 10 15 20 25 30 35 40

Frame Time (ms)

Figure 4: Frame time distribution of FA-CFS on Motorola Moto X 2nd edition

as little improvement as 6.9% (on Nexus 7) and 2.4% (on Nexus 4) at 100th percentile General frame time did not earned much reduction because this governor tries to mini-mize CPU frequency as much as possible without many fre-quency changes

The analyses of tables 1 and 2 above show that our FA-CFS enhances frame time stabilization, increases average frame rate and reduces frame time peaks (or spikes) with widely used governors (interactive and ondemand ) Due to this, our FA-CFS scheduler proves its efficiency in improving user experiences while interacting with mobile devices Frame Time Distribution:

In order to further analyze the effectiveness of our FA-CFS scheduler, we use another statistical metric frame time distribution For this, we setup our experiment on an addi-tional user session with a higher end mobile phone, Motorola Moto X (2nd edition) with a quad-core CPU, each running

at 2.5GHz During user interactions, we gather 1189 frame times (in approximately 19 seconds of browsing BBC home-page) of CFS and FA-CFS with interactive governor, and represent them in histograms in figures 3 and 4, respectively

It can be seen from figure 3 that even on a high end phone, CFS causes micro stuttering with frames longer than 16.6ms Some frames take even more than 32ms (doubles the 16.6ms limit) These peaks cause choppy during web content scrolling in the browser Applying our FA-CFS into Cyanogenmod greatly reduces these peaks (figure 4) Max-imum frame time for FA-CFS is 24ms, when compared to 37ms on CFS In total of 1189 frames, FA-CFS produced only 14 frames longer than the 16.6ms limit In contrast, this number of the CFS counterpart is 24 From these

Trang 8

re-sults, our FA-CFS achieves a reduction of 40% frame time

peaks (24 frames down to 14 frames) Additionally, frame

times are better packed in the mean 8ms range These two

figures clearly illustrate the benefit to improve user

experi-ence of our FA-CFS, even on high end mobile device

This paper proposed a new frequency-aware process

sched-uler for improving user experience on multi-core mobile

sys-tems We built a model which acts as an extension of the

Linux default scheduler (Completely Fair Scheduler - CFS)

for taking into account the dynamic CPU frequency when

scheduling the tasks Our model helps in increasing

respon-siveness of mobile user interface to user interactions by

low-ering and stabilizing interface frame times The experiments

showed that our proposed FA-CFS scheduler reduces the

amount of frame time peaks up to 40%, which greatly brings

benefits to multi-core mobile systems where user experience

relies largely on responsiveness of user interface

Several research directions can be taken into account to

continue this work First and foremost, since our work helps

in improving user experience on mobile systems, it worths

investigating our model on various workloads to see if it

can bring benefits on larger multi-core and multi-CPU

plat-forms, i.e desktops and virtualized servers [20] Secondly,

combining our frequency-aware scheduler with

performance-oriented scheduler (e.g BFS scheduler) is also an interesting

research direction Our FA-CFS scheduler can take into

ac-count BFS scheduler’s advancements to improve UI

respon-siveness and save CPU power Last but not least, we wonder

if our frequency-aware improvement can be applied on the

Red-Black tree by restructuring it based on core frequencies

at runtime

[1] C Van Berkel Multi-core for mobile phones In

Proceedings of the Conference on Design, Automation

and Test in Europe, pages 1260–1265 European

Design and Automation Association, 2009

[2] Y G Ji, J H Park, C Lee, and M H Yun A

usability checklist for the usability evaluation of

mobile phone user interface International Journal of

Human-Computer Interaction, 20(3):207–231, 2006

[3] A I Wasserman Software engineering issues for

mobile application development In Proceedings of the

FSE/SDP workshop on Future of software engineering

research, pages 397–400 ACM, 2010

[4] A Silberschatz, P Galvin, and G Gagne Applied

operating system concepts John Wiley & Sons, Inc.,

2001

[5] W Yuan and K Nahrstedt Energy-efficient soft

real-time cpu scheduling for mobile multimedia

systems In ACM SIGOPS Operating Systems Review,

volume 37, pages 149–163 ACM, 2003

[6] D Ferreira, A K Dey, and V Kostakos

Understanding human-smartphone concerns: a study

of battery life In Pervasive computing, pages 19–33

Springer, 2011

[7] P Yang, C Wong, P Marchal, F Catthoor,

D Desmet, D Verkest, and R Lauwereins

Energy-aware runtime scheduling for embedded-multiprocessor socs IEEE Design & Test of Computers, (5):46–58, 2001

[8] N B Rizvandi, J Taheri, A Y Zomaya, and Y C Lee Linear combinations of dvfs-enabled processor frequencies to modify the energy-aware scheduling algorithms In Cluster, Cloud and Grid Computing (CCGrid), 2010 10th IEEE/ACM International Conference on, pages 388–397 IEEE, 2010

[9] S M Mostafa and S Kusakabe Towards reducing energy consumption using inter-process scheduling in preemptive multitasking os In 2016 International Conference on Platform Technology and Service (PlatCon), pages 1–6, Feb 2016

[10] V Pallipadi and S B Siddha Processor power management features and process scheduler: Do we need to tie them together? LinuxConf Europe, pages 1–8, 2007

[11] J H Sch¨onherr, J Richling, M Werner, and G M¨uhl Event-driven processor power management In

Proceedings of the 1st International Conference on Energy-Efficient Computing and Networking, pages 61–70 ACM, 2010

[12] P Valente and M Andreolini Improving application responsiveness with the bfq disk i/o scheduler In Proceedings of the 5th Annual International Systems and Storage Conference, page 6 ACM, 2012

[13] C S Pabla Completely fair scheduler Linux Journal, 2009(184):4, 2009

[14] I Stoica, H Abdel-Wahab, K Jeffay, S K Baruah,

J E Gehrke, and C G Plaxton A proportional share resource allocation algorithm for real-time,

time-shared systems In Real-Time Systems Symposium, 1996., 17th IEEE IEEE, 1996

[15] C McAnlis, P Lubbers, B Jones, D Tebbs,

A Manzur, S Bennett, F d’Erfurth, B Garcia,

S Lin, I Popelyshev, et al Applying old-school video game techniques in modern web games In HTML5 Game Development Insights Springer, 2014

[16] M Claypool, K Claypool, and F Damaa The effects

of frame rate and resolution on users playing first person shooter games In Electronic Imaging 2006, pages 607101–607101 International Society for Optics and Photonics, 2006

[17] P Pawar, S Dhotre, and S Patil Cfs for addressing cpu resources in multi-core processors with aa tree International Journal of Computer Science and Information Technologies, 2014

[18] V Pallipadi and A Starikovskiy The ondemand governor In Proceedings of the Linux Symposium, volume 2, pages 215–230 sn, 2006

[19] J.-M Arnau, J.-M Parcerisa, and P Xekalakis Parallel frame rendering: trading responsiveness for energy on a mobile gpu In Proceedings of the 22nd international conference on Parallel architectures and compilation techniques IEEE Press, 2013

[20] G Von Laszewski, L Wang, A J Younge, and X He Power-aware scheduling of virtual machines in dvfs-enabled clusters In Cluster Computing and Workshops, 2009 CLUSTER’09 IEEE International Conference on, pages 1–10 IEEE, 2009

Định dạng
Số trang	8
Dung lượng	347,83 KB