Future Manufacturing Systems Part 4 pptx

Evaluating 25 processes on three considered scenarios time in seconds Analyzing scenario iii with α equal to 16, we detected that the first migration is postponed, which results in a lar

Trang 1

(500 Kbytes is fixed to other process’ data) and passes 100 Kilobytes of boundary data to its

right neighbor In the same way, when 25 processes are employed, each one computes 4.108

instructions and occupies 900 Kbytes in memory

5.1.2 Results and Discussions

Table 1 presents the times when testing 10 processes Firstly, we can observe that MigBSP’s

intrusivity on application execution is short when comparing both scenarios i and ii

(over-head lower than 5%) The processes are balanced among themselves with this configuration,

causing the increasing of α at each call for process rescheduling This explain the low impact

when comparing scenarios i and ii Besides this, MigBSP decides that migrations are inviable

for any moment, independing on the amount of executed supersteps In this case, our model

causes a loss of performance in application execution We obtained negative values of PM

when the rescheduling was tested This fact resulted in an empty list of migration candidates

Super-step Scenario i Scen iiα= 4Scen iii Scen iiα=Scen iii8 Scen iiα= 16Scen iii

2000 1344.09 1347.88 1347.88 1346.67 1346.67 1344.91 1344.91

Table 1 Evaluating 10 processes on three considered scenarios (time in seconds)

The results of the execution of 25 processes are presented in Table 2 In this context, the system

remains stable and α grows at each rescheduling call One migration occurred {(p21,a1)} when

testing 10 supersteps and using α equal to 4 Our notation informs that process p21 was

re-assigned to run on node a1 A second and a third migrations happened when considering 50

supersteps: {(p22,a2), (p23,a3)} They happened in the next two calls for process rescheduling

(at supersteps 12 and 28) When evaluating 2000 supersteps and maintaining this value of α,

eight migrations take place: {(p21,a1), (p22,a2), (p23,a3), (p24,a4), (p25,a5), (p18,a6), (p19,a7),

(p20,a8)} We analyzed that all migrations occurred to the fastest cluster (Aquario) The first

five migrations moved processes from cluster Corisco to Aquario After that, three processes

from Labtec were chosen for migration Concluding, we obtained a profit of 14% after

execut-ing 2000 supersteps when α equal to 4 is used.

Super-steps Scenario i Scen iiα= 4Scen iii Scen iiα=Scen iii8 Scen iiα= 16Scen iii

Analyzing scenario iii with α equal to 16, we detected that the first migration is postponed, which results in a larger final time when compared with lower values of α With α 4 for

instance, we have more calls for process rescheduling with migrations during the first super-steps This fact will cause a large overhead to be paid during this period These penalty costs are amortized when the amount of executed supersteps increases Thus, the configuration

with α 4 outperforms other studied values of α when 2000 supersteps are evaluated Figure

10 illustrates the frequency of process rescheduling calls when testing 25 processes and 2000

supersteps We can observe that 6 calls are done with α 16, while 8 are performed when initial

α changes to 4 Considering scenarios ii, we conclude that the greater is α, the lower is the

model’s impact if migrations are not applied (situation in which migration viability is false)





  

  





Fig 10 Number of rescheduling calls when 25 processes and 2000 supersteps are evaluated Table 3 shows the results when the number of processes is increased to 50 The processes

are considered balanced and α increases at each rescheduling call In this manner, we have

the same configuration of calls when testing 25 processes (see Figure 10) We achieved 8 migrations when 2000 supersteps are evaluated: {(p38,a1), (p40,a2), (p42, a3), (p39, a4), (p41, a5), (p37, a6), (p22, a7), (p21, a8)} MigBSP moves all processes from cluster Frontal to Aquario

and transfers two process from Corisco to the fastest cluster Using α 4, 430.95s and 408.25s were obtained for scenarios i and iii, respectively Besides this 5% of gain with α 4, we also achieve a gain when α is equal to 8 However, the final result when changing initial α to 16 in scenario iii is worse than scenario i, since the migrations are delayed and more supersteps are

need to achieve a gain in this situation Table 4 presents the execution of 100 processes over the tested infrastructure As the situations with 25 and 50 processes, the environment when 100 processes are evaluated is stable and the processes are balanced among the resources Thus,

αincreases at each rescheduling call The same migrations occurred when testing 50 and 100 processes, since the configuration with 100 just uses more nodes from cluster ICE In general, the same percentage of gain was achieve with 50 and 100 processes

The results of scenarios i, ii and iii with 200 processes is shown in Table 5 We have an un-stable scenario in this situation, which explains the fact of a large overhead in scenario ii Considering this scenario, α will begin to grow after ω calls for process rescheduling without migrations Taking into account scenario iii and α equal to 4, 2 migrations are done when

ex-ecuting 10 supersteps: {(p195,a1), (p197,a2)} Besides these, 10 migrations take place when 50 supersteps were tested: {(p196,a3), (p198,a4), (p199,a5), (p200,a6), (p38,a7), (p39,a8), (p37,a9), (p40,a10), (p41,a11), (p42, a12)} Despite the happening of these migrations, the processes are

still unbalanced with adopted value of D and, then, α does not increase at each superstep.

Trang 2

After these migrations, MigBSP does not indicate the viability of other ones Thus, after ω

calls without migrations, MigBSP enlarges the value of D and α begins to increase following

adaptation 2 (see Subsection 3.2 for details)

Processes Scenario i - Without process migration Scenario iii - With process migration

Table 6 Barrier times on two situations

Table 6 presents the barrier times captured when 2000 supersteps were tested More

espe-cially, the time is captured when the last superstep is executed We implemented a centralized

master-slave approach for barrier, where process 1 receives and sends a scheduling message from/to other BSP processes Thus, the barrier time is captured on process 1 The times shown

in the third column of Table 6 do not include both scheduling messages and computation Our idea is to demonstrate that the remapping of processes decreases the time to compute the BSP supersteps Therefore, process 1 can reduce the waiting time for barrier computation since the processes reach this moment faster Analyzing such table, we observed that a gain of 22% in

time was achieved when comparing barrier time on scenarios i and iii with 50 processes The

gain was reduced when 100 processes were tested This occurs because we just include more nodes from cluster ICE with 100 processes if compared with the execution of 50 processes

5.2 Smith-Waterman Application

Our second application is based on dynamic programming (DP), which is a popular algorithm design technique for optimization problems (Low et al., 2007) DP algorithms can be classified according to the matrix size and the dependency relationship of each matrix cell An algorithm

for a problem of size n is called a tD/eD algorithm if its matrix size is O(n t) and each matrix

cell depends on O(n e ) other cells 2D/1D algorithms are all irregular with changes on load

computation density along the matrix’s cells In particular, we observed the Smith-Waterman algorithm that is a well-known 2D/1D algorithm for local sequence alignment (Smith, 1988)

5.2.1 Modeling the Problem

Smith-Waterman algorithm proceeds in a series of wavefronts diagonally across the matrix Figure 11 (a) illustrates the concept of the algorithm for a 4×4 matrix with a column-based processes allocation The more intense the shading, the greater is the load computation den-sity of the cell Each wavefront corresponds to a BSP superstep For instance, Figure 11 (b) shows a 4×4 matrix that presents 7 supersteps The computation load is uniform inside a particular superstep, growing up when the number of the superstep increases Both organi-zations of diagonal-based supersteps mapping and column-based processes mapping bring

the following conclusions: (i) 2n −1 supersteps are crossed to compute a square matrix with

order n and; (ii) each process will be involved on n supersteps Figures 11 (b) and (c) show the communication actions among the processes Considering that cell x, y (x means a matrix’ line, while y is a matrix’ column) needs data from the x, y − 1 and x − 1, y other ones, we will have an interaction from process py to process py+1 We do not have communication inside the same matrix column, since it corresponds to the same process

The configuration of scenarios ii and iii depends on the Computation Pattern P comp(i)of each

process i (see Subsection 3.3 for more details) P comp(i)increases or decreases depending on the prediction of the amount of performed instructions at each superstep We consider a

spe-cific process as regular if the forecast is within a δ margin of fluctuation from the amount of

instructions performed actually In our experiments, we are using 106as the amount of in-structions for the first superstep and 109for the last one The increase of load computational density among the supersteps is uniform In other words, we take the difference between 109

and 106and divide by the number of involved supersteps in a specific execution Considering

this, we applied δ equal to 0.01 (1%) and 0.50 (50%) to scenarios ii and iii, respectively This last value was used because I2(1)is 565.105and PI2(1)is 287.105when a 10×10 matrix is tested (see details about the notations in Subsection 3.3) The percentage of 50% enforces instruction

regularity in the system Both values of δ will influence the Computation metric, and

conse-quently the choosing of candidates for migration Scenario ii tends to obtain negatives values

for PM since the Computation Metric will be close to 0 Consequently, no migrations will

Trang 3