1. Trang chủ
  2. » Kỹ Thuật - Công Nghệ

Data Analysis Machine Learning and Applications Episode 2 Part 7 docx

25 298 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 25
Dung lượng 765,68 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

This paper aims to illustrate the advantage of theusage of this taxonomy during three relevant phases of Configurative Reference Modelling, Project Aim Definition, Construction and Confi

Trang 1

Fig 1 Organization of the used multitree data structure

to find a node (sub-process) with a higher support in the branch below This reducesthe time to find the optimal solution significantly, as a good portion of the tree totraverse, can be omitted

Algorithm 1 Branch & Bound algorithm for process optimization

1: procedure TRAVERSETREE( ¯Y )

Verification

The optimum of the problem (3) can only be defined in statistical terms, as in practicethe sample sets are small and the quality measures are only point estimators There-fore, confidence intervals have to be used in order to get a more valid statement ofthe real value of the considered PCI In the special case, where the underlying datafollows a normal distribution, it is straight forward to construct a confidence inter-val As the distribution ofC p

ˆ

C p ( ˆC p denotes the estimator of C p) is known, a(1 − D)% confidence interval for C pis given by

Trang 2

n − 1 , ˆ C p

8

F2n−1;1−D 2

n − 1

For the other parametric basic indices, in general there exits no analytical solution

as they all have a non-centralized F2distribution Different numerical approximation

can be found in literature for C pm ,C pk and C pmk (see Balamurali and daram (2002) and Bissel (1989))

Kalyanasun-If there is no possibility to make an assumption about the distribution of thedata, computer based, statistical methods as the Bootstrap method are used to calcu-late a confidence intervals In Balamurali and Kalyanasundaram (2002), the authorspresent three different methods for calculating confidence intervals and a simulationstudy As result, the method called BCa-Method outperformed the other two meth-ods, and therefore is used in our applications for assigning confidence intervals forthe non-parametric basic PCIs, as described in (3) For the Empirical Capability In-dex Ecia simulation study showed that the Bootstrap-Standard-Method, as defined inBalamurali and Kalyanasundaram (2002), performed the best A (1-D)% confidenceinterval for the Ecican be obtained by

C (X) =  ˆE ci − ) −1 (1 − D)V B , ˆ E ci+ )−1 (1 − D)V B (7)where ˆE cidenotes an estimator for Eci, VBthe Bootstrap standard deviation and )−1

the inverse standard normal

As the results of the introduced algorithm are based on sample sets, it is portant to verify the soundness of the founded solutions Therefore, the sample set

im-to analyze is im-to be randomly divided inim-to two disjoint sets: training and test set Aset of possible optimal sub-process is generated, by applying the describe algorithmand the referenced Bootstrap-methods to calculate confidence intervals In a secondstep, the root cause analysis algorithm is applied to the test set The final output is averified sub-process

3 Computational results

A proof on concept was performed using data of a foundry plant and engine ufacturing in the premium automotive industry The 32 analyzed sample sets com-prised measurement results describing geometric characteristics like the position ofdrill holes or surface texture of the produced products and the corresponding influ-ence sets The data sets consist of 4 to 14 different values, specifying for example aparticular machine number or a workers name An additional data set, recording theresults of a cylinder twist measurement having 76 influence variables, was used toevaluated the algorithm for numerical parameter sets Each of the analyzed data setshas at least 500 and at most 1000 measurement results

man-The evaluation was performed for the non-parametric C pand the empirical pability index Eci using the describe Branch and Bound principle Additionally a

Trang 3

Fig 2 Computational time for combinatorial search vs Branch and Bound

combinatorial search for the optimal solution was carried out to demonstrate the ficiency of our approach The reduction of computational time, using the Branch andBound principle, amounted to two orders of magnitude in comparison to the combi-natorial search as can be seen in Fig 2 In average, the Branch and Bound methodoutperformed the combinatorial search by the factor of 230 For the latter it took

ef-in average 23 mef-inutes to evaluatef-ing the available data sets However, usef-ing Branchand Bound reduced the computing time in average to only 5.7 seconds for the non-

parametric C pand to 7.2 seconds using the Eci The search for an optimal solutionwas performed to depth of 4, which means, that all sub-process have no more than

4 different influence variables A higher depth level did not yield any other results,

as the support of the sub-processes diminishes with increasing number of influencevariables Obviously, the computational time for finding the optimal sub-process in-creases with the number of influence variables and their values This fact explainsthe significant jump of the combinatorial computing time, as the first 12 sample setsare made up of only 4 influence variables, whereas the others consist of up to 17different influence variables

As the number of influence parameters of the numerical data set where, compared

to the other data sets, significantly larger, it took, about 2 minutes to find the optimalsolution The combinatorial search was not performed, as 76 influence variables eachwith 4 values would have take too long

4 Conclusion

In this paper we have presented a root cause analysis algorithm for process tion, with the goal to identify those process parameters having a server impact on the

Trang 4

optimiza-quality of a manufacturing process The basic idea was to transform the search forthose quality drivers into a optimization problem and to identify optimal parametersubsets using Branch and Bound techniques This method allows for reducing thecomputational time to identifying optimal solutions significantly, as the computa-tional results show Also a new class of convex process indices was introduced and aparticular specimen, the process capability index, Eciis defined Since the search forquality drivers in quality management is crucial to industrial practice, the presentedalgorithm and the new class of indices may be useful for a broad scope of qualityand reliability problems.

References

BALAMURALI S and KALYANASUNDARAM M (2002): Bootstrap lower confidence

lim-its for the process capability indices Cp, Cpk and Cpm International Journal of Quality

& Reliability Management , 19, 1088–1097.

BISSELL A (1990): How Reliable is Your Capability Index? Applied Statistics , 39, 331–340

KOTZ, S and JOHNSON, N (2002): Process Capability Indices – A Review, 1992 2000

Journal of quality technology , 34, 2–53.

PEARN, W and CHEN K (1997): Capability indices for non-normal distributions with an

ap-plication in electrolytic capacitor manufacturing Microelectronics Reliability, 37, 1853– 1858.

VÄNNMANN, K (1995): A Unified Approach to Capability Indices Statistica Sinica, 5, 805–820

Trang 5

Configurative Reference Modelling

Ralf Knackstedt and Armin SteinEuropean Research Center for Information Systems

{ralf.knackstedt, armin.stein}@ercis.uni-muenster.de

Abstract The manual customisation of reference models to suite special purposes is an

ex-haustive task that has to be accomplished thoroughly to preserve, explicit and extend the inheritintention This can be facilitated by the usage of automatisms like those being provided by theConfigurative Reference Modelling approach Thus, the reference model has to be enriched

by data describing for which scenario a certain element is relevant By assigning this data toapplication contexts, it builds a taxonomy This paper aims to illustrate the advantage of theusage of this taxonomy during three relevant phases of Configurative Reference Modelling,

Project Aim Definition, Construction and Configuration of the configurable reference model.

1 Introduction

Reference information models – in this context solely called reference models – give

recommendations for the structuring of information systems as best or common tices and can be used as a starting basis for the development of application specificinformation system models The better the reference models are matched with thespecial features of individual application contexts, the bigger the benefit of referencemodel use Configurable reference models contain rules that describe how differentapplication specific variants are derived Each of these rules is placed together with

prac-a condition prac-and prac-an implicprac-ation Eprac-ach condition describes one prac-applicprac-ation context ofthe reference model The respective implication determines the relevant model vari-ant For describing the application contexts configuration parameters are used Theirspecification forms a taxonomy Based upon a procedure model this paper highlightsthe usefulness of taxonomies in the context of Configurative Reference Modelling.Thus, the paper is structured as follows: First, the Configurative Reference Modellingapproach and its procedure model is being described Afterwards, the usefulness ofthe application of taxonomies is being shown during the respective phases An out-look on future research areas concludes the paper

Trang 6

2 Configurative Reference Modelling and the application of taxonomies

2.1 Configurative Reference Modelling

Reference models are representations of knowledge recorded by domain experts to

be used as guidelines for every day business as well as for further research Theirpurpose is to structure and store knowledge and give recommendations like best orcommon practices They should be of general validity in terms of being applicable formore than one user (see Schuette (1998); vom Brocke (2003); Fettke, Loos (2004)).Currently 38 of them have been clustered and categorised, spanning domains likelogistics, supply chain management, production planing and control or retail (seeBraun, Esswein (2006))

General applicability is a necessary requirement for a model to be characterised

as reference model, as it has to grant the possibility to be adopted by more than oneuser or company Thus, the reference model has to include information about dif-ferent business models, different functional areas or different purposes for its usage

A reference model for retail companies might have to cover economic levels like

Retail or Wholesale, trading levels like Inland trade or Foreign trade as well as

func-tional areas like Sales, Production Planning and Control or Human Resource

Man-agement While this constitutes the general applicability for a certain domain, one

special company usually needs just one suitable instance of this reference model, for

example Retail/Inland Trade, leaving the remaining information dispensable This

yields the problem that the perceived demand of information for each individual will

be hardly met The information delivered – in terms of models of different typeswhich might consist of different element types and hold different element instances– might either be too little or too extensive, hence the addressee will be overburdened

on the one hand or insufficiently supplied with information on the other hand sequently, a person requiring the model for the purpose of developing the database

Con-of a company might not want to be burdened with models Con-of the technique driven Process Chain (EPC), whose purpose is to describe processes, but with EntityRelationship Model (ERM), used to describe data structures To compensate this in

Event-a conventionEvent-al mEvent-anner, Event-a complex mEvent-anuEvent-al customisEvent-ation of the reference model isnecessary to meet the addressees demand Another implication is the maintenance

of the reference model Every time changes are committed to the reference model,every instance has to be manually updated as well

This is where Configurable Reference Models come into operation The basicidea is to attach parameters to elements of the integrated reference model in ad-vance, defining the contexts to which these elements are relevant (see e g Knack-stedt (2006)) In reference to the example given above this means that certain ele-

ments of the model might just be relevant for one of the economic levels – retail or

wholesale –, or for both of them The user eventually selects the best suited

parame-ters for his purpose and the respective configured model is generated automatically.This leads to the conclusion that the lifecycle of a configurable reference model can

be divided into two parts called Development and Usage (see Schlagheck (2000)).

Trang 7

The first part – relevant for the reference model developer – consists of the phases

Project Aim Definition, Model Technique Definition, Model Construction and ation for the developer, whereas the second one – relevant for the user – includes the

Evalu-phases Project Aim Definition, Search and Selection of existing and suitable ence models and Model Configuration The configured model can be further adapted

refer-to satisfy individual needs (see Becker et al 2004) Several phases can be identified,

where the application of taxonomies can be of value, especially Project Aim

Defini-tion and Model ConstrucDefini-tion (for the developer) and Model ConfiguraDefini-tion (for the

user) Fig 1 gives an overview of the phases, where the ones that will be discussed

in detail are solid, the ones actually not relevant are greyed out The output of both

Development and Usage is printed in italics.

Fig 1 Development and Usage of Configurable Reference Models

2.2 Project aim definition

During the first phase, Project Aim Definition, the developers have to agree on the

purpose of the reference model to build They have to decide for which domain themodel should be used, which business models should be supported, which func-tional areas should be integrated to support the distribution for different perspectives

and so on To structure these parameters, a morphological box has become

appar-ent to be applicable First, all instances for each possible characteristic have to belisted By shading the relevant parameters for the reference model, the developerscommit themselves to one common project aim and reduce the given complexity

Thus, the emerging morphological box constitutes a taxonomy, implying the

vari-ants included in the integrated configurative reference model (see fig 2; Mertens,Lohmann (2000)) By generating this taxonomy, the developers get aware of all

possible included variants, thus getting a better overview of the to-be-state of the

model One special variant of the model will later on be generated by choosing one

or a set of the parameters by the user The choice of parameters should be

sup-ported by an underlying ontology that can be used throughout both Development and Usage (see Knackstedt et al (2006)) The developers have to decide whether

or not dependencies between parameters exist In some cases, the choice of one

Trang 8

Fig 2 Example of a morphological box, used as taxonomy Becker et al (2001)

specific parameter within one specific characteristic determines the necessity of other parameter within another characteristic For example, the developers mightdecide that the choice of ContactOrientation=MailOrder determines the choice

an-of PurchaseInitiationThrough=AND(Internet;Letter/Fax)

2.3 Construction

During the Model Construction phase, the configurable reference model has to be developed in regards to the decisions made during the preceding phase Project Aim

Definition The example in fig 3 illustrates an EPC regarding the payment of a

bill, distinguishing whether the bill originates from a national or an internationalsource If the origin of the bill is national, it can be paid immediately, otherwise ithas to be cross-checked by the international auditing This scenario can only takeplace, if both instances of the characteristic TradingLevel, namely InlandTradeand ForeignTrade, are chosen If all clients of a company are settled abroad or (inthe meaning of an exclusive or) all of them are inland, the check for the origin isnot necessary The cross-check with the international auditing has only to take place,

if the bill comes from abroad To store this information in the model, the tive parameters are attached to the respective model elements in form of a term and

respec-can later be evaluated to true or false Only if the equation is evaluated to true or

if there is no term attached to an element, the respective element may remain in the

configured model Thus, for example, the function check for origin stays, if the term

TradingLevel=AND(Foreign;Inland) is true, which happens if both parameters

are selected If only one is selected, the equation returns false and the element will

be removed from the model

Trang 9

Fig 3 Annotated parameters to elements, resulting model variants

To specify these terms, which can get complex if many characteristics are used, aterm editor application has been developed, which enables the user to attach them

to the relevant elements Here again, the ontology can support the developer byautomatically testing for correctness and reasonableness of dependent parameters(see Knackstedt et al (2006)) Opposite to dependencies, exclusions take into ac-count that under certain circumstances parameters may not be chosen together Thisminimises the risk of defective modelling and raises the consistency level of theconfigurable reference model In the example given above, if the developer selectsSalesContactForm=VendingMachine, the parameter Beneficiary may not beInvestmentGoodsTrade, as investment goods can hardly be bought via a vend-ing machine Thus, the occurrence of both statements concatenated with a logicalAND is not allowed The same fact has to be regarded when evaluating dependencies:

If, like stated above, ContactOrientation=MailOrder determines the choice ofPurchaseInitiationThrough=AND(Internet;Letter/Fax), the same statementmay not occur with a preceded NOT Again, the previously generated taxonomy cansupport the developer by structurising the included variants

2.4 Configuration

The Usage phase of a configurable reference model starts independently from its velopment During the Project Aim Definition phase the potential user defines the pa-

Trang 10

de-rameters to determine which reference model best meets his needs He has to search

for it during the Search and Selection phase Once the user has selected a certain

configurable reference model, he uses its taxonomy to pick the parameters relevant

to his purpose By automatically including dependent parameters, the ontology can

be of assistance in the same way as before, assuring that the mistakes made by theuser are reduced to a minimum (see Knackstedt et al (2006)) For each parameter– or set of parameters – a certain model variant is created These variants have to

be differentiated by the aim of the configuration On the one hand, the user mightwant to configure a model that cannot be further adapted This happens if a maxi-mum of one parameter per characteristic is chosen In this case, the ontology has toconsider dependencies as well as exclusions On the other hand, if the user decides toconfigure towards a model variant that should be configured again, exclusions maynot be considered Both possibilities have to be covered by the ontology Further-more, a validation should cross-check against the ontology that no terms exist that

always equate to false If an element is removed in every configuration scenario, it

should not have been integrated into the reference model in the first place Thus, thetaxonomy can assist the user during the configuration phase by offering a set of pa-rameters to choose from Combined with an underlying ontology, the possibility ofmaking mistakes by using the taxonomy during the model adaptation is reduced to aminimum

3 Conclusion

As well as the ontology, the taxonomy used as a basic element throughout the phases

of Configurative Reference Modelling has to meet certain demands Most tantly, the developers have to carefully select the constituting characteristics and as-sociated parameters It has to be possible for the user to distinguish between severaloptions, so they can make a clear decision to configure the model towards the variantrelevant for his purpose This means that each parameter has to be understandableand be delimited from the others, which – for example – can be arranged by supply-ing a manual or guide Moreover, the parameters may neither be too abstract nor toodetailed The taxonomy can be of use during the three relevant phases As mentionedbefore, the user has to be assisted in the usage of the taxonomy by automatically in-cluding or excluding parameters as defined by the ontology Furthermore, only suchparameters should be chosen, that have an effect on the model that is comparative

impor-to the necessary effort impor-to identify it Parameters that have no effect at all or are notused should be removed as well, to decreases the complexity for both the developerand the user If the choice of a parameter results in the removal of only one elementand its identification takes a very long time, it should be removed from the taxon-omy because of its little effect at high costs Thus, the way the adaptation process issupported by the taxonomy strongly depends on the associated ontology

Trang 11

4 Outlook

The resulting effect of the selection of one parameter to configure the model shows itsrelevance and can be measured either by the quantity or by the importance of the el-ements that are being removed Each parameter can be associated with a certain cost

that emerges due to the time it takes the user to identify it Thus, cheap parameters are easy to identify and have a huge effect once selected Expensive parameters instead

are hard to identify and have little effect on the model Further research should firsttry to benchmark, which combinations of parameters of a certain reference model arechosen most often In doing so, the developer has the chance to concentrate on theevolution of these parts of the reference model Second, it should be possible to iden-

tify cheap parameters by either running simulations on reference models, measuring

the effect a parameter has – even in combination with other parameters –, or by diting the behavior of reference model users – which is feasible in a limited way due

au-to the small distribution of configurable reference models Third, configured modelsshould be rated with costs, so cheap variants can be identified and – the other wayround – the responsible parameters can be identified To sum up, a objective functionshould be developed, enabling the calculation of the costs for the configuration of acertain model variant in advance by giving the selected parameters as input It should

have the form C(MV ) = n k=1C (P k)

R (P k) with C(MV ) being the cost function of a certain model variant derived from the reference model by using n parameters, C(P k) being

the cost function of a single parameter and R(P k) being a function weighting the

rel-evance of a single parameter P, which is used for the configuration of the respective

model variant Furthermore, the usefulness of the application of the taxonomy has to

be evaluated by empirical studies in every day business This will be realised for theconfiguration phase by integrating consultancies into our research and giving them ataxonomy for a certain domain at hand With the application of supporting softwaretools, we hope that the adoption process of the reference model can be facilitated

References

BECKER, J., DELFMANN, P and KNACKSTEDT, R (2004): Konstruktion von modellierungssprachen – Ein Ordnungsrahmen zur Spezifikation von Adaptionsmecha-

Referenz-nismen fuer Informationsmodelle Wirtschaftsinformatik, 46, 4, 251 – 264.

BECKER, J., UHR, W and VERING, O (2001): Retail Information Systems Based on SAP Products Springer Verlag, Berlin, Heidelberg, New York.

BRAUN, R and ESSWEIN, W (2006): Classification of Reference Models In: Advances

in Data Analysis: Proceedings of the 30th Annual Conference of The Gesellschaft fuer Klassifikation e.V., Freie Universitaet Berlin, March 8 – 10, 2006.

DELFMANN, P., JANIESCH, C., KNACKSTEDT, R., RIEKE, T and SEIDEL, S (2006):Towards Tool Support for Configurative Reference Modelling – Experiences from a Meta

Modeling Teaching Case In: Proceedings of the 2nd Workshop on Meta-Modelling and Ontologies (WoMM 2006) Lecture Notes in Informatics Karlsruhe, Germany, 61 – 83 FETTKE, P and LOOS, P (2004): Referenzmodellierungsforschung Wirtschaftsinformatik,

46, 5, 331 – 340.

Trang 12

KNACKSTEDT, R (2006): Fachkonzeptionelle Referenzmodellierung einer terstuetzung mit quantiativen und qualitativen Daten Methodische Konzepte zur Kon- struktion und Anwendung Logos-Verlag, Berlin.

Managementun-KNACKSTEDT, R., SEIDEL, S and JANIESCH, C (2006): Konfigurative lierung zur Fachkonzeption von Data-Warehouse-Systemen mit dem H2-Toolset In: J

Referenzmodel-Schelp, R Winter, U Frank, B Rieger, K Turowski (Hrsg.): Integration, slogistik und Architektur DW2006, 21 – 22 Sept 2006, Friedrichshafen Lecture Notes

Information-in Informatics Bonn, Germany, 61 – 81.

MERTENS, P and LOHMANN, M (2000): Branche oder Betriebstyp als terien fuer die Standardsoftware der Zukunft? Erste Ueberlegungen, wie kuenftig be-triebswirtschaftliche Standardsoftware entstehen koennte In: F Bodendorf, M Grauer

Klassifikationskri-(Hrsg.): Verbundtagung Wirtschaftsinformatik 2000 Shaker Verlag, Aachen, 110 – 135 SCHLAGHECK, B (2000): Objektorientierte Referenzmodelle fuer das Prozess- und Pro- jektcontrolling Grundlagen – Konstruktion – Anwendungsmoeglichkeiten Deutscher

Ngày đăng: 05/08/2014, 21:21

TỪ KHÓA LIÊN QUAN