Taxonomies in software engineering a systematic mapping study and a revised taxonomy development method

Software Engineering SE is a comprehensive and diverse knowledge ﬁeld that embraces a myriad of different research subareas.. / Information and Software Technology 0 0 0 2017 1–17 to con

Trang 1

ARTICLE IN PRESS

Information and Software Technology 0 0 0 (2017) 1–17

ContentslistsavailableatScienceDirect

journalhomepage:www.elsevier.com/locate/infsof

Muhammad Usmana ,∗, Ricardo Brittoa , Jürgen Börstlera , Emilia Mendesb

a Department of Software Engineering (DIPT), Blekinge Institute of Technology (BTH), Karlskrona, 371 79, Sweden

b Department of Computer Science and Engineering (DIDD), Blekinge Institute of Technology (BTH), Karlskrona, 371 79, Sweden

a r t i c l e i n f o

Article history:

Received 29 September 2015

Revised 13 January 2017

Accepted 14 January 2017

Available online xxx

Keywords:

Taxonomy

Classiﬁcation

Software engineering

Systematic mapping study

a b s t r a c t

Context:SoftwareEngineering(SE)isanevolvingdisciplinewithnewsubareasbeingcontinuously de-velopedandadded.TostructureandbetterunderstandtheSEbodyofknowledge,taxonomieshavebeen proposedinallSEknowledgeareas

Objective:Theobjectiveofthispaperistocharacterizethestate-of-the-artresearchonSEtaxonomies

Method:Asystematicmappingstudywasconducted,basedon270primarystudies

Results:AnincreasingnumberofSEtaxonomieshave beenpublishedsince 2000 inabroadrangeof venues,includingthetopSEjournalsandconferences.Themajorityoftaxonomiescanbegroupedinto thefollowingSWEBOK knowledgeareas:construction (19.55%),design (19.55%),requirements(15.50%) andmaintenance(11.81%).Illustration(45.76%)isthemostfrequentlyusedapproachfortaxonomy vali-dation.Hierarchy(53.14%)andfacetedanalysis(39.48%)arethemostfrequentlyusedclassiﬁcation struc-tures.Mosttaxonomiesrelyonqualitative procedurestoclassifysubjectmatterinstances,butinmost cases(86.53%) theseprocedures arenot describedinsuﬃcientdetail The majorityofthetaxonomies (97%)targetuniquesubjectmattersandmanytaxonomy-papersarecitedfrequently.MostSEtaxonomies aredesignedinanad-hocway.Toaddressthisissue,wehaverevisedanexistingmethodfordeveloping taxonomiesinamoresystematicway

Conclusion:ThereisastronginterestintaxonomiesinSE,butfewtaxonomiesareextendedorrevised Taxonomydesigndecisionsregardingtheusedclassiﬁcationstructures,proceduresanddescriptivebases areusuallynotwelldescribedandmotivated

1 Introduction

In science and engineering, a systematic description and

organization of the investigated subjects helps to advance the

knowledge in this ﬁeld [1] This organization can be achieved

through the classiﬁcation of the existing knowledge Knowledge

classiﬁcationhassupportedthematurationofdifferentknowledge

ﬁeldsmainlyinfourways:

• Classiﬁcation of the objects of a knowledge ﬁeld provides a

common terminology, which eases the sharing of knowledge

[1–3]

• Classiﬁcationcanprovideabetterunderstandingofthe

interre-lationshipsbetweentheobjectsofaknowledgeﬁeld[1]

∗ Corresponding author

E-mail addresses: muhammad.usman@bth.se (M Usman), ricardo.britto@bth.se

(R Britto), jurgen.borstler@bth.se (J Börstler), emilia.mendes@bth.se (E Mendes)

• Classiﬁcation can help to identify gaps in a knowledge ﬁeld [1–3]

• Classificationcansupportdecisionmakingprocesses[1] Summarizing, classification can support researchers and prac-titionersingeneralizing,communicatingandapplyingthefindings

ofaknowledgefield[4] Software Engineering (SE) is a comprehensive and diverse knowledge field that embraces a myriad of different research subareas The knowledge within many subareas is already clas-sified, in particular by means of taxonomies [5–9] According to the Oxford English Dictionary [10] , a taxonomy is “a scheme of classification” A taxonomy allows for the description of terms and their relationships in the context of a knowledge area The conceptoftaxonomywasoriginallyproposedby CarolusLinnaeus [11] to group andclassify organisms by usinga fixed number of hierarchical levels Nowadays, different classification structures (e.g hierarchy, tree and faceted analysis [12] ) have been used http://dx.doi.org/10.1016/j.infsof.2017.01.006

Please citethisarticleas:M Usmanetal.,Taxonomiesinsoftwareengineering:ASystematicmappingstudyanda revisedtaxonomy

Trang 2

2 M Usman et al / Information and Software Technology 0 0 0 (2017) 1–17

to construct taxonomies in different knowledge ﬁelds, such as

Education[13] ,Psychology[14] andComputerScience[15]

TaxonomieshavecontributedtomaturetheSEknowledgeﬁeld

Nevertheless,likewisethetaxonomyproposedbyCarolusLinnaeus

that keeps being extended [16] , SE taxonomies are expected to

evolve over time incorporating new knowledge In addition, due

to the wide spectrum of SE knowledge, there is still a need to

classifytheknowledgeinmanySEsubareas

Although manySE taxonomies havebeen proposed inthe

lit-erature,itappearsthattaxonomieshavebeendesignedorevolved

without following particular patterns, guidelines or processes A

betterunderstandingofhow taxonomieshavebeendesignedand

applied in SE could be very useful for the development of new

taxonomiesandtheevolutionofexistingones

To the best of our knowledge, no systematic mapping or

systematic literature review has been conducted to identify and

analyzethestate-of-the-artoftaxonomiesinSE.Inthispaper,we

describeasystematicmappingstudy[17,18] aimingtocharacterize

thestate-of-the-artresearchonSEtaxonomies

The main contribution of this paper is a characterization of

the state-of-the-art of taxonomies in SE Our results also show

that most taxonomies are developed in an ad-hoc way We

therefore revised a taxonomy development method in the light

of the ﬁndings of this mapping study, our own experience and

literaturefromotherresearchﬁeldswithmorematurityregarding

taxonomies(e.g.,psychologyandcomputerscience)

The remainder of this paper is organized as follows:

Section 2 describes related background Section 3 presents the

employed research methodology The current state-of-the-art on

taxonomiesin SE, aswell asthe validity threats associated with

the mapping study,are presented in Section 4 In Section 5 , we

presentarevisedmethodfordevelopingSEtaxonomies,alongwith

an illustration of the revised method and its limitations Finally,

ourconclusionsandviewonfutureworkareprovidedinSection 6

2 Background

In this section, we discuss important aspects related to

tax-onomydesignthat serveasmotivationfortheresearch questions

describedinSection 3

Taxonomy is neither a trivial nor a commonly used term

According to the most cited English dictionaries, a taxonomy is

mainlyaclassiﬁcationmechanism:

naming and organizing things, especially plants and animals, into

groups that share similar qualities”.

natural relationships”.

of something, especially organisms” or “A scheme of classiﬁcation”.

Since taxonomy is mainly deﬁned as a classiﬁcation system,

one of the main purposes to develop a taxonomy should be to

classifysomething

2.2 Subject matter

The ﬁrst step in the design of a new taxonomy is to clearly

deﬁnetheunitsofclassiﬁcation.Insoftwareengineeringthiscould

1 www.dictionary.cambridge.org

2 www.merriam-webster.com

3 www.oxforddictionaries.com

berequirements,designpatterns,architecturalviews,methodsand techniques,defectsetc.Thisrequiresathoroughunderstandingof thesubjectmatter tobe abletodeﬁne cleartaxonomyclassesor categoriesthatarecommonlyacceptedwithintheﬁeld[19,20]

2.3 Descriptive bases / terminology

Once thesubject matter is clearlydefined oran existing def-inition is adopted, the descriptive terms, which can be used to describe and differentiate subjectmatter instances, must also be specified.Anappropriatedescriptionofthisbasesforclassification

is important to perform the comparison of subject matter in-stances.Descriptivebasescanalsobeviewedasasetofattributes that can be used for the classiﬁcation of the subject matter instances[19,20]

2.4 Classiﬁcation procedure

Classification procedures define how subject matter instances (e.g.,defects) are systematically assigned to classesorcategories Taxonomy’s purpose, descriptive bases and classification proce-duresare related anddependent oneach other.Depending upon the measurement system used, the classification procedure can

bequalitativeorquantitative. Qualitative classiﬁcationprocedures are based on nominalscales In the qualitative classiﬁcation sys-tems,therelationship betweenthe classescannotbe determined

Quantitative classiﬁcation procedures, on the other hand, are basedonnumericalscales[20]

2.5 Classiﬁcation structure

As aforementioned, a taxonomy is mainly a classification mechanism According to Rowley andFarrow [21] there are two main approaches to classification: enumerative and faceted In enumerative classification all classes are fixed, making a classifi-cationscheme intuitiveandeasy toapply.It is,however,difficult

to enumerate all classes in immature or evolving domains In faceted classiﬁcation aspects of classes are described that can

be combined and extended Kwasnik [12] describes four main approaches to structure a classiﬁcation scheme (classiﬁcation structures):hierarchy,tree,paradigmandfacetedanalysis

Hierarchy [12] leads to taxonomies with a single top class that “includes” all sub- and sub-sub classes, i.e a hierarchical relationship with inheritance (“is-a” relationship) Consider, for example, the hierarchyof students in an institutionwherein the top class “student” has two sub-classes of “graduate student” and “undergraduate student” These sub-classes can further have sub-sub classes and so forth A true hierarchy ensures the mu-tual exclusivity property, i.e an entity can only belong to one class Mutualexclusivitymakes hierarchies easy torepresentand understand; however, it cannot represent multiple inheritance relationships though Hierarchy is also not suitable in situations when researchers have to include multiple and diverse criteria for differentiation To define a hierarchical classification, it is mandatory to have good knowledge on the subjectmatter to be classified; the classesand differentiating criteria betweenclasses mustbewelldefinedearlyon

Tree [12] is similar to the hierarchy, however, there is no inheritance relationship between the classes of tree-based tax-onomies.Inthiskindofclassiﬁcationstructure,commontypesof relationshipsbetweenclassesare “part-whole”,“cause-effect” and

“process-product” For example,a treerepresenting a whole-part relationship betweenacountry, itsprovinces andcities.Treeand hierarchysharesimilarstrengthsandlimitations

Paradigm [12] leads totaxonomies withtwo-wayhierarchical relationships between classes The classes are described by a

Trang 3

M Usman et al / Information and Software Technology 0 0 0 (2017) 1–17 3

Fig 1 Employed systematic mapping process

Table 1

Faceted analysis example

Tool name Platform(s) License type SE area Web support

Tool 2 Windows Proprietary Construction No

combination of two attributes at a time For example, paradigm

would be suitable if we have to also represent gender in the

“student” hierarchy example above-mentioned It can also be

viewedasatwo-dimensionalmatrixwhoseverticalandhorizontal

axesallowfortheinclusionoftwoattributesofinterest.Thistype

of classiﬁcationstructure shares similar strengths andlimitations

withthehierarchystructure

Faceted analysis [12,22] leads to taxonomies whose subject

matters are classiﬁed using multiple perspectives (facets) The

basicprincipleinfacetedanalysisisthattherearemorethanone

perspectivestoviewandclassifyacomplexentity.Eachfacetis

in-dependentandcanhaveitsownclasses,whichenablefacet-based

taxonomiestobeeasily adaptedsotheycanevolvesmoothlyover

time.InordertoproperlyclassifyCASEtools,forexample,multiple

facetsneedto beconsidered.Thesefacetsmayincludesupported

platform(s),licensetype,SEarea,websupportetc.Table 1 depicts

theapplicationofthesemultiplefacetstoclassifytwohypothetical

CASEtools.Facetedanalysisissuitablefornewandevolvingﬁelds,

since it is not required to have the complete knowledge related

to the selected subjectmatter to design a facet-based taxonomy

However, it can be challenging to deﬁne an initial set of facets

Inaddition,although itispossibleto deﬁnerelationship between

facets, in most cases the facets are independent and have no

meaningfulrelationshipbetweeneachother

2.6 Validation

Validationstrengthensreliabilityandusefulnessoftaxonomies

Taxonomiescanbevalidatedinthreeways:

• Orthogonality demonstration – The orthogonalityofthe

tax-onomydimensionsandcategoriesisdemonstrated[8,20]

• Benchmarking – Thetaxonomyiscomparedtosimilar

classiﬁ-cationschemes[8]

• Utility demonstration – Theutility of a taxonomyis

demon-strated by actually classifyingsubject matter examples [8,20]

The utility of a taxonomy can be demonstrated or

exempli-ﬁed by classifying existing literature orexpert opinion, or by

employingmorerigorousvalidationapproachessuch asa case

studyorexperiment

3 Research methodology

We chose the systematic mapping study method (SMS) to

identify and analyze the state-of-the-art towards taxonomies in

SE,becausethismethodworkswellforbroadandweaklydeﬁned

researchareas[17,18] WeemployedtheguidelinesbyKitchenham

and Charters [17] and partly implemented the mapping process

providedbyPetersenet al.[18] Theemployedmappingprocessis

summarizedinFig 1 anddescribedfurtherinSubsections 3.1 –3.5

3.1 Research questions

Thefollowingresearchquestionswereformulatedtoguidethis SMS:

• Question 1 (RQ1) – Whattaxonomy deﬁnitionsandpurposes areprovidedbypublicationsonSEtaxonomies?

• Question 2 (RQ2) – WhichsubjectmattersareclassiﬁedinSE taxonomies?

• Question 3 (RQ3) – HowistheutilityofSEtaxonomies demon-strated?

• Question 4 (RQ4) – HowareSEtaxonomiesstructured?

• Question 5 (RQ5) – TowhatextentareSEtaxonomiesused?

• Question 6 (RQ6) – HowareSEtaxonomiesdeveloped? ThemainideabehindRQ1istoidentifyhowandwhytheterm

“taxonomy” isusedinprimarystudiesthatclaimtopresenta tax-onomy RQ2 focuseson identifying the subjectmatters classiﬁed

bymeansoftaxonomiesinSE.RQ3focusesonidentifyingthe ap-proachesusedtodemonstratetheutilityofSEtaxonomies,which

isoneofthewaysofvalidatingataxonomy(seeSection 2 ).With RQ4 we intend to identify the classiﬁcation structures, related descriptivebasesandclassiﬁcationproceduresemployedtodesign

SE taxonomies RQ5 focuses on the extent to which proposed

SE taxonomies are used Finally, RQ6 addresses in which ways

SE taxonomies are developed, i.e whether there are guidelines, methods,andprocessesthatguidethedevelopmentoftaxonomies

inasystematicway

3.2 Search process

ThesearchprocessemployedinthisworkisdisplayedinFig 2 andhas6activities

First,wedeﬁnedthetermstobeincludedinoursearchstring

We selected all SWEBOK knowledge areas [7] to be included as terms, except for the three knowledge areas on related disci-plines (Computing Foundations, Mathematical Foundations and Engineering Foundations) We also included the term “Software Engineering”, to augment the comprehensiveness of the search string.Finally,toreduce the scopeof thesearch stringto studies thatreportSEtaxonomies,weincludedtheterm“taxonomy” SincesomeoftheknowledgeareasarereferredbytheSE com-munitythroughofotherterms(synonyms),wealsoincludedtheir synonyms.Speciﬁcally,thefollowingsynonymswereincludedinto thesearchstring:

• Requirements – requirementsengineering

• Construction – softwaredevelopment

• Design – softwarearchitecture

• Management – software project management, software man-agement

• Process – softwareprocess,softwarelifecycle

• Models and methods – softwaremodel,softwaremethods

• Economics – softwareeconomics

TheselectedSWEBOKknowledgeareasandtheterm“Software Engineering” were all linked using the operator OR The term

“taxonomy” was linked with the other terms using the operator AND.Theﬁnalsearchstringisshownbelow

Trang 4

Fig 2 Search process

Table 2

Summary of search results

(“software requirements” OR “requirements engineering” OR “software design”

OR “software architecture” OR “software construction” OR “software

development” OR “software testing’ OR “software maintenance” OR “software

conﬁguration management” OR “software engineering management” OR

“software project management” OR “software management” OR “software

engineering process” OR “software process” OR “software life cycle” OR

“software engineering models and methods” OR “software model” OR “software

methods” OR “software quality” OR “software engineering professional practice”

OR “software engineering economics” OR “software economics” OR “software

engineering”) AND (taxonomy OR taxonomies)

AlthoughSEknowledgeclassiﬁcationcouldbenamedin

differ-entways,e.g., taxonomy,ontology,classiﬁcationandclassiﬁcation

scheme [1] , we limited the scope of this paper to taxonomies

Extendingour search string to include the terms “ontology” and

“classiﬁcationscheme” wouldhaveledtoan excessivenumberof

search results that would havebeen infeasible to handle4.Using

alternativetermswouldalsoforcetheauthorstointerpretwhether

theprimarystudies’authors’actuallyintendedtopresenta

taxon-omywhenthey donotexplicitlyreferto taxonomies.Tomitigate

thisthreattovalidity,werestrictedthescopetotaxonomies

Once thesearch stringwasdesigned, weselected theprimary

sourcestosearchforrelevantstudies.Scopus5,Compendex/Inspec6

and Web of Science7 were selected because they cover most of

theimportantSEdatabases,suchasIEEE,Springer,ACMand

Else-vier Inaddition, theselected primary sources are ableto handle

advanced queries The search string was applied on meta data

(i.e title, abstract and author keywords) in August 2014 on the

selecteddata sources We later on updated thesearch resultsby

applyingthesearchstringagaininFebruary2016,tofetchstudies

publishedbetweenSeptember 2014 andDecember 2015 Table 2

presentsthenumberofsearchresultsforeachdatasource

4 Inclusion of the terms “ontolog ∗ ” and “classiﬁcation” returned 10,474 hits in to-

tal just for Scopus

5 www.scopus.com

6 www.engineeringvillage.com

7 apps.webofknowledge.com

3.3 Study selection process

The selection process employed in this work is displayed in Fig 3 anddetailedasfollows

First, the following inclusion and exclusion criteria were deﬁned:

• Inclusion criteria

1.StudiesthatproposeorextendataxonomyAND

2.StudiesthatarewithinSoftwareEngineering(SE),according

toSWEBOK’sKAs(seeSubsection 3.2 )

• Exclusion criteria

1.Studieswherethefull-textisnotaccessibleOR;

2.StudiesthatdonotproposeorextendaSEtaxonomyOR;

3.StudiesthatarenotwritteninEnglishOR;

4.Studiesthat arenotreportedinapeer-reviewedworkshop, conference,orjournal

The selection of primary studies wasconducted using a two-stage screening procedure In the ﬁrst stage, only the abstracts andtitlesofthestudieswereconsidered.Inthesecondstage,the fulltextswereread.Notethatweusedinbothstagesaninclusive approachtoavoidprematureexclusionofstudies,i.e.iftherewas doubtaboutastudy,suchastudywastobeincluded

Fortheﬁrststage(level-1screening),thetotalnumberof1517 studies were equallydivided betweenthe two ﬁrst authors.As a result,507studieswerejudgedaspotentiallyrelevant

To increase the reliability of the level-1 screening result, the third author screened a random sample of 10.30% (78 studies) fromthestudiesscreenedbytheﬁrstauthorandthefourthauthor screenedarandomsampleof10.28%(78studies)fromthestudies screenedbythesecondauthor.Theﬁrstandthirdauthorshadthe samejudgmentfor91%(71)ofthestudies.Thesecondandfourth authorshadthesamejudgmentfor93.6%(73)ofthestudies

Toevaluatethereliability oftheinter-rateagreementbetween the authors, we calculated the Cohen’s kappa coefficient [23] The Cohen’s kappa coefficient between the first and third au-thors was statistically significant (significance level = 0.05) and equalto0.801.The Cohen’skappacoefficient betweenthesecond and fourth authors was also statistically significant (significance level = 0.05) and equal to 0.857 According to Fleiss et al. [23] , Cohen’s kappa coefficient values above 0.75mean excellent level

ofagreement

The level-2 screening (second stage), performed by the ﬁrst andsecondauthors,consistedonapplyingtheselectioncriteriaon the full-textof the studies selected during thelevel-1 screening

Trang 5

Fig 3 Selection process

Table 3

Rationale for excluded studies

Not proposing or evolving a SE taxonomy 1167

Total included after study selection 280

Total included after data extraction 270

Thetotalnumberof507studieswereequallydividedbetweenthe

ﬁrsttwoauthors.Asaresult,280studieswerejudgedasrelevant

To increase the reliability ofthe level-2screening, a two-step

validationwasperformed,asfollows:

1 The ﬁrst author screened 27.67% (70) of the studies deemed

as relevant by the second author during the level-2

screen-ing(randomlyselected)andvice-versa.Nodisagreementswere

foundbetweentheauthors

2 Ninestudieswererandomlyselectedfromeachofthetwosets

allocated to the ﬁrst two authors for further validation The

third author applied the study selection process on these 18

studies (about 6.43% of 280) for validation purposes No

dis-agreementswerefoundwithrespecttothestudyselection(i.e

include/exclude)decisions

During the entire screening process (stages 1 and 2), we

trackedthereasonforeachexclusion,aspresentedinTable 3

3.4 Extraction process

The extraction process employed in this work is summarized

in Fig 4 andconsists of four main steps: Deﬁne a classiﬁcation

scheme, deﬁne an extraction form, extract data, andvalidate the

extracteddata

WedesignedclassiﬁcationschemebyfollowingPetersenet al.’s

guidelines[18] Ithasthefollowingfacets:

• Research type – Thisfacetisused todistinguishbetween

dif-ferenttypesofstudies(adaptedfromWieringaetal.[24] )

im-plementedinpractice,i.e.evaluationinarealenvironment,

ingeneralbymeansofthecasestudymethod

wasnot implementedinpracticeyet,although itwas vali-datedinlaboratoryenvironment,ingeneralbymeansof ex-periment

• Solution proposal– Astudythatreportsataxonomythatwas neitherimplementedinpracticenorvalidatedalthoughitis supportedbyasmallexample(illustration)oragoodlineof argumentation

thathasnotypeofevaluation,validationorillustration

• SE knowledge area – Thisfacetisusedtodistinguishbetween the SE knowledge areas in which taxonomies havebeen pro-posed The categories of this facet follow the SWEBOK [7] : software requirements,softwaredesign,softwareconstruction, softwaretesting, softwaremaintenance, softwareconﬁguration management, software engineeringmanagement, software en-gineering process, software engineering models and methods, softwarequality,softwareengineeringprofessionalpracticeand softwareengineeringeconomics

• Presentation approach – Thisfacetisusedtoclassifythe stud-ies accordingtotheoverallapproachused topresenta taxon-omy:textualandgraphical,respectively

Forthedataextraction,therelevantstudies(280)wereequally dividedbetweentheﬁrstandsecondauthors.Foreachpaper,data wascollectedandlateronstoredinaspreadsheet usingthedata extractionformshowninTable 4

To increase the reliability of the extracted data, a two-step validationwasperformed,asfollows:

1.Theﬁrstauthorindependentlyre-extractedthedataof50%(70)

of the studies originally extractedby thesecond author (ran-domlyselected) andvice-versa.Fivedisagreementswere iden-tiﬁed and all of them were relatedto the item “classiﬁcation structure”

2.Eighteenstudieswererandomlyselectedfromthestudies orig-inallyextractedbytheﬁrstandsecondauthors(9studiesfrom eachauthor).Thosestudieswereindependentlyre-extractedby the third author.Twenty threedisagreements were identiﬁed;

2 on the“taxonomy purpose”,10 on “classiﬁcation structure”,

Trang 6

Fig 4 Extraction process

Table 4

Data extraction form

Data item(s) Description

Citation data Title, author(s), year and publication venue

Taxonomy deﬁnition Deﬁnition of taxonomy that is used or referred to

Purpose Text that states the purpose for the taxonomy

Purpose keyword Key word used in the paper to describe the

purpose (e.g classify, understand, describe) Subject matter The name of the thing/concept are that is

taxonomized Descriptive bases Is the subject matter deﬁned in suﬃcient

detail/clarity to enable classiﬁcation (Yes/No) Classiﬁcation structure Hierarchy, tree, paradigm, or faceted analysis,

according to Kwasnik [12]

Classiﬁcation procedure The criteria for putting items in different classes

(qualitative, quantitative or no details provided) Classiﬁcation procedure

description

Do the authors explicitly describe the classiﬁcation procedure (Yes/No)

Design method Did the authors employ any systematic approach to

design the reported taxonomy? If so, which approach?

Presentation approach Textual or graphical

Utility demonstration Is the utility of the taxonomy demonstrated? If so,

how (e.g illustration, case study, experiment)?

Primary knowledge area Primary knowledge area as per SWEBOK v3 [7]

Secondary knowledge area Secondary knowledge area as per SWEBOK v3 (if

applicable) Number of citations Number of times a primary study is cited by other

studies, as per Google Scholar

2on“classiﬁcationproceduretype”, 3on“classiﬁcation

proce-duredescription” and6on“validationapproach”

All disagreements except for “classiﬁcation structure” were

easily resolved We believe that the high level of disagreement

on the item “classiﬁcation structure” was due to the fact that

noneof thestudies explicitlystatedandmotivatedtheemployed

classiﬁcation structure, which demanded the inference of such

datafromthetextineachpaper

Toimprovethereliability oftheextracteddata,we decidedto

re-screenall 280papers,focusing onlyonthe item“classiﬁcation

structure” First, we discussed classiﬁcation structures in detail

(basedon Kwasnik[12] ) to cometo a commonunderstanding of

theterms.Second, three ofusdidan independent re-assessment

oftheclassiﬁcationstructureof52papers.Asaresult,wereached

fullagreementon50papers(3identicalresults)andpartial

agree-menton2 papers(2 reviewersagreeing).There were noprimary

Fig 5 Analysis process

studies without full or partial agreement Third, the remaining

228studieswere re-assessedby theﬁrstandsecondauthorsand they reachedagreement on216 papers.The remaining 12 papers were independentlyre-assessed by thethird author,whodid not know the results from the other two reviewers In the end, full agreementwasachievedfor50studiesandpartialagreementwas achievedfor230studies

During the re-assessment of the primary studies, 10 studies were excluded becausethey donotpresenttaxonomies,reducing theﬁnalnumberofprimarystudiesto2708(seeTable 3 )

3.5 Analysis process

Fig 5 presents the analysis process conducted herein First,

we classiﬁed the extracted data using the scheme deﬁned in Subsection 3.4 This led to the results detailedin Section 4 We also performed a quantitative analysis of the extracted data to answer the research questions of this paper Finally, the overall resultofthe dataanalysis(see Section 4 ),along withinformation

8 The full list of the included 270 primary studies is available at http://tinyurl com/jrdaxhh

Trang 7

Fig 6 Year and venue wise distributions

fromadditionalliterature([12,19,20] ),wasusedtorevisean

exist-ing methodpreviouslyproposed todesign SEtaxonomies[25] ,as

detailedinSection 5

4 Results

In this section, we describe the results of the mapping study

reported herein, which are based on the data extracted from

270 papers reporting 271 taxonomies (one paper presented two

taxonomies) The percentages in Sections 4.1 and 4.7 reﬂect the

number of papers (270), whereas the percentages in all other

subsectionsreﬂectthenumberoftaxonomies(271)

4.1 General results

Fig 6 shows that SE taxonomies have been proposed since

1987,withanincreasingnumberofthesepublishedaftertheyear

2000,whichsuggestsahigherinterestinthisresearchtopic

Table 5 displaysthat53.7%(145)ofthestudieswerepublished

inrelevantconferencesinthetopicsofmaintenance(International

Conference on Software Maintenance), requirements engineering

(Requirements’Engineering Conference) or generalSE topics(e.g

International Conference on Software Engineering) Taxonomies

were published at99 unique conferences with78 featuring only

a singleSE taxonomypublication.Theseresultsfurther indicatea

broadinterestinSEtaxonomiesina widerangeofSEknowledge

areas

Table 5 also shows that 33.7% (91) of the primary studies

were published as journal articles in 44 unique journals

Tax-onomies have been published frequently in relevant SE journals

(e.g IEEE Transactions on Software Engineering and Information

andSoftwareTechnology).We believethatthishasbeenthecase

because the scope of thesejournals is not conﬁned to a speciﬁc

SEknowledgearea

Primary studies were published also in 28 unique workshops

(34 – 12.6%).Asforjournalsandconferences, theresultsindicate

Table 5

Publication venues with more than two taxonomy papers

IEEE Intl Conference on Software Maintenance (ICSM) 10 Intl Conference on Requirements Engineering (RE) 6 Intl Conference on Software Engineering (ICSE) 5 Hawaii Intl Conference on Systems Sciences (HICSS) 4 Asia Paciﬁc Software Engineering Conference (APSEC) 4 European Conference on Software Maintenance and Reengineering

(CSMR)

4 Intl Conference on Software Engineering and Knowledge Engineering

Intl Symposium on Empirical Software Engineering and Measurement (ESEM)

4 Americas Conference on Information Systems (AMCIS) 3

IEEE Transactions on Software Engineering (TSE) 11 Information and Software Technology (IST) 9

Journal of Software: Evolution and Process 5

an increasing interest in SE taxonomies in a broad range of SE knowledgeareas

Fig 7 a–h depict the yearly distribution of SE taxonomies by knowledge area for the KAs with 10 or more taxonomies Note thatmostknowledgeareasfollowan increasingtrendafter2000, withmanytaxonomiesforconstruction,design,andqualityinthe 1980sand1990s

Trang 8

Fig 7 Yearly distribution of primary studies by KAs Horizontal axes represent the years (starting 1987), while vertical axes denote the number of taxonomies

4.2 Classiﬁcation scheme results

In this section, we present the results corresponding to the

threefacetsoftheclassiﬁcationschemedescribedinSection 3 ,i.e

SEknowledgearea(KA),researchtypeandpresentationapproach

The vertical axis in Fig 8 depictsthe SE knowledge areas in

which taxonomies have been proposed Construction and design

are the leading SE knowledge areas each with 53 (19.55%)

tax-onomies.These are relatively matureSE ﬁelds witha large body

ofknowledgeandahighnumberofsubareas

A high number of taxonomies have also been proposed in

the requirements (42 – 15.50%), maintenance (32 – 11.81%) and

testing(27 – 9.96%)knowledgeareas.Few taxonomieshave been

proposed ineconomics(3 – 1.11%)and professionalpractice (3 –

1.11%),whicharemorerecentknowledgeareas

The results show that most SE taxonomies (76.37%) are

pro-posed in the requirements, design, construction, testing and

maintenance knowledge areas, which correspond to the main

activitiesinatypicalsoftwaredevelopmentprocess[26]

The horizontal axis in Fig 8 shows the distribution of

tax-onomies by research types , according to Wieringa et al [24]

Most taxonomies are reported in papers that are classiﬁed as

“solutionproposals” (135– 49.82%),whereintheauthorspropose

ataxonomyandexplainorapplyitwiththehelpofanillustration

Ninety one taxonomies (33.58%) are reported in “philosophical papers”,whereinauthorsproposea taxonomy,butdonotprovide anykind ofvalidation, evaluationor illustration Relatively fewer taxonomiesarereportedin“evaluationpapers” (34 – 12.54%)and

“validationpapers” (11– 4.06%)

Fig 8 also depicts the classiﬁcation of the taxonomies using

2aspects oftheclassiﬁcationscheme,i.e.SEknowledge areaand researchtype

Taxonomies in the knowledge areas construction and design are mostly reported either as solution proposals (construction – 27;design– 31) or philosophicalpapers (construction – 20;design – 17).Taxonomies in the knowledge areas requirements, mainte-nance andtestingare better distributed across differentresearch types,whereinbesidesthesolutionproposalandthephilosophical researchtypes,areasonablepercentageoftaxonomiesarereported

asevaluationorvalidationpapers

The horizontal axis in Fig 9 shows the distribution of tax-onomiesby presentation approach .Mosttaxonomies(57.93%)are presentedpurelyastextortable,while42.07%ofthetaxonomies are presented through some graphical notation in combination withtext

Fig 9 also displays the classiﬁcation of the identiﬁed tax-onomiesintermsofSEknowledgeareaandpresentationapproach Theresultsshow2differenttrends:

Trang 9

Fig 8 Systematic map – knowledge area vs research type

• Forknowledgeareassuchasdesign,quality,modelsand

meth-ods, and process, both textual and graphical approaches are

usedan almost equalnumberoftimes.This suggeststhat the

taxonomiesintheKAsthatinvolvealotofmodelingmightbe

betterpresentedusinggraphicalmodelingapproaches

• Mosttaxonomies in construction (35 out of 53), maintenance

(23outof32),testing(15outof27)andsoftwaremanagement

(7outof10)aretextuallypresented

We extracteddata aboutthe followingtwo aspects to answer

RQ1:

• Taxonomy deﬁnition : We investigated from each study

whetherornottheauthorsmadeanyattempttocommunicate

theirunderstandingabouttheconceptoftaxonomybycitingor

presentinganydeﬁnitionofit

• Taxonomy purpose :Weidentiﬁed fromeachstudythestated

(ifany)mainpurposefordesigningataxonomy

Fig 9 Systematic map – knowledge area vs presentation type

Asstatedearliertaxonomyisnot atrivialconcept.Ithasbeen deﬁned in multiple ways (see Section 2 for some deﬁnitions) ThisRQaims to identifywhetherauthorsmake an expliciteffort

to sharetheir perspective on taxonomyby adopting/usinga def-inition The results show that only 6.3% (17) of the taxonomies were reported with a deﬁnition for the term “taxonomy” Out

of these 17 taxonomies, three use the Cambridge dictionary’s deﬁnition(seeSection 2 ), eightstudiesdonotprovidean explicit source and the remaining six have other unique references: The American heritage dictionary9, Carl Linnaeus [11] , Whatis10, the IEEEstandardtaxonomyforSEstandards,DotyandGlick[27] and

9 www.ahdictionary.com/

10 www.whatis.com

Trang 10

Table 6

Approaches for utility demonstration

Fleishmanet al. [28] For the remaining 93.7% (254) taxonomies,

nodeﬁnitionof“taxonomy” wasprovided

To identify the purpose of each taxonomy, we extracted the

relevant text, referred here as purpose descriptions, from each

of the primary studies, using a process similar to open coding

[29,30] As codes we used the keywords used in the primary

studiestodescribeataxonomy’spurpose

For about 56% of the taxonomies, the authors used “classify”

(48.80%) or “categorize” (7.74%) to describe the purpose of their

taxonomy For 5.9% of the taxonomies it was not possible to

identifyaspeciﬁcpurpose.Fortheremainingtaxonomies(38.37%),

we found 41 different terms for describing the purpose, e.g.,

“identify”,“understand”,and“describe”

4.4 RQ2 – Subject matters

Intotal,weidentiﬁed263uniquesubjectmatters11 forthe271

taxonomies,e.g., technicaldebt,architecturalconstraints,usability

requirements,testingtechniquesandprocessmodels

Thehighnumberofuniquesubjectmattersmeansthatalmost

eachtaxonomydealtwitha uniquesubjectmatter.Thismightbe

duetothefollowingreasons:

• Existing taxonomies do not ﬁt their purpose well Therefore

thereisaneedtodeﬁnenewtaxonomies

• Thesubjectmattersforexistingtaxonomiesaresonarrowly

de-ﬁnedthatthey arenotsuitableforusageoutsidetheiroriginal

context.Newtaxonomiesarethereforedevelopedconstantly

• SE researchers do not reuse or extend existing taxonomies

when there is need for organizing SE knowledge, but rather

proposenewones

One indicator for taxonomy use is the numberof times each

primarystudyiscited.ThisanalysisisdetailedinSubsection 4.7

Thelistofsubjectmatterscontainsmainlytechnicalaspects of

SE.Onlyfewtaxonomiesdealwithpeople-relatedsubjectmatters,

e.g.,stakeholder-relatedandprivacy-relatedissues

The results forthisresearch questionsuggest thattaxonomies

arerarelyrevisited,revisedorextended.However,manytaxonomy

papersarehighlycited,whichshowsthatthereisastronginterest

intaxonomiesintheSEﬁeld

ThemappingofSEtaxonomiespresentedinthispapersupports

SE researchers in identifying and evolving existing taxonomies

Thismayleadto thedevelopmentofamoreconsistent

terminol-ogy

4.5 RQ3 – Utility demonstration

Table 6 displaystheapproachesusedtodemonstratetheutility

ofSEtaxonomies.Illustrationisthemostfrequentlyusedapproach

(124 – 45.76%).Illustration includesapproaches such asexample,

scenarioandcase

11 For full list see: http://tinyurl.com/z4mqfnr

Table 7

Classiﬁcation structure

Table 8

Descriptive bases

Table 9

Classiﬁcation procedure types

Table 10

Classiﬁcation procedure descriptions

Casestudieshavealsobeenusedto demonstratetheutilityof

34 taxonomies (12.54%) Experimentshave been usedto demon-strate the utility of 11 taxonomies (4.06%), while the utility of

a few taxonomies have also been demonstrated through expert opinion(7 – 2.58%)orsurvey(4– 1.48%) Notethat 33.9%(83)of thetaxonomiesdidnothavetheirutilitydemonstrated

The results related to RQ3 show that a few taxonomies have their utility demonstrated through methods like case study or experiment, while the utility of a large number of taxonomies (33.58%) is not demonstrated by any means We do not believe that one particular approach would be the best for all contexts; howeverwebelievethat inmostcaseswouldnot beenoughjust

toproposeataxonomy

To answer RQ4, the following data was gathered: classifi-cation structure, descriptive bases, classification procedure and classificationproceduredescription

Table 7 shows the classification structures identified for the identified taxonomies Hierarchy was the most frequently used classification structure (144– 53.14%), followed by faceted-based structures (107 – 39.48%), tree (14 – 5.17%) and paradigm (6 – 2.21%)

Table 8 displays the status of the taxonomies’ descriptive basis The majority of the taxonomies have a suﬃciently clear description of their elements(248– 91.51%), followed by only 22 taxonomies(8.49%)withoutasuﬃcientdescription

Table 9 presents the classiﬁcation procedure types for the identiﬁed taxonomies The majorityof the taxonomies employed

a qualitative classiﬁcation procedure (262 – 96.68%), followed by quantitative(7– 2.58%)andboth(2– 0.74%)

Table 10 displaysthe status ofthe taxonomies’ classiﬁcation

procedure description The majority of the taxonomies do not have an explicit description forthe classiﬁcation procedure (227

Tiêu đề	Taxonomies in Software Engineering: A Systematic Mapping Study and a Revised Taxonomy Development Method
Tác giả	Muhammad Usman, Ricardo Britto, Jürgen Borstler, Emilia Mendes
Trường học	Blekinge Institute of Technology
Chuyên ngành	Software Engineering
Thể loại	research paper
Năm xuất bản	2017
Thành phố	Karlskrona

Định dạng
Số trang	17
Dung lượng	2,01 MB

Tài liệu tham khảo	Loại	Chi tiết
[1] S. Vegas, N. Juristo, V. Basili, Maturing software engineering knowledge through classifications: a case study on unit testing techniques, Softw. Eng.IEEE Trans. 35 (4) (2009) 551-565	Khác
[2] I. Vessey, V. Ramesh, R.L. Glass, A unified classification system for research in the computing disciplines, Inf. Softw. Technol. 47 (4) (2005) 245-255	Khác
[3] C. Wohlin, Writing for synthesis of evidence in empirical software engineer- ing, in: Proceedings of the 8th ACM/IEEE International Symposium on Empiri- cal Software Engineering and Measurement (ESEM), ACM, New York, NY, USA, 2014, pp. 46:1-46:4	Khác
[4] I. Vessey, V. Ramesh, R.L. Glass, A unified classification system for research in the computing disciplines, Inf Softw. Technol. 47 (4) (2005) 245-255	Khác
[5] IEEE, IEEE Standard Taxonomy for Software Engineering Standards, Technical Report, IEEE Std 1002-1987, IEEE, 1987	Khác
[6] IEEE, Systems and software engineering System life cycle processes, Techni- cal Report, ISO/IEC 15288:2008(E) IEEE Std 15288-2008 (Revision of IEEE Std 15288-2004), IEEE, 2008	Khác
[7] P. Bourque, R.E. Farley (Eds.), Guide to the software engineering body of knowl- edge v3, IEEE Comput. Soc., 2013	Khác
[8] D. Smite, C. Wohlin, Z. Galvina, R. Prikladnicki, An empirically based terminol- ogy and taxonomy for global software engineering, Empirical Softw. Eng. 19(1) (2014) 105-153	Khác
[9] M. Unterkalmsteiner, R. Feldt, T. Gorschek, A taxonomy for requirements engi- neering and software test alignment, ACM Trans. Softw. Eng. Methodol. 23 (2) (2014) 16:1-16:38	Khác
[11] C. Linnaeus, System of nature through the three kingdoms of nature, according to classes, orders, genera and species, with characters, differences, synonyms, places (in Latin), 10th, Laurentius Salvius, 1758	Khác
[12] B.H. Kwasnik, The role of classification in knowledge representation and dis- covery, Lib. Trends 48 (1) (1999) 22-47	Khác
[13] B.S. Bloom, Taxonomy of Educational Objectives. Vol. 1: Cognitive Domain, McKay, 1956	Khác
[14] T.E. Moffitt, Adolescence-limited and life-course-persistent antisocial behavior: a developmental taxonomy., Psychol. Rev. 100 (4) (1993) 674-701	Khác
[15] D. Scharstein, R. Szeliski, A taxonomy and evaluation of dense two-frame stereo correspondence algorithms, Int. J. Comput. Vision 47 (1-3) (2002) 7-42	Khác
[17] B. Kitchenham, S. Charters, Guidelines for performing Systematic Literature Re- views in Software Engineering, Technical Report, Keele University, 2007	Khác
[18] K. Petersen, R. Feldt, S. Mujtaba, M. Mattsson, Systematic mapping studies in software engineering, in: Proceedings of the 12th International Conference on Evaluation and Assessment in Software Engineering, in: EASE’08, British Com- puter Society, Swinton, UK, UK, 2008, pp. 68-77	Khác
[19] R.L. Glass, I. Vessey, Contemporary application-domain taxonomies, IEEE Softw. 12 (4) (1995) 63-76	Khác
[20] G.R. Wheaton, Development of a taxonomy of human performance: A review of classificatory systems relating to tasks and performance, Technical Report, American Institute for Research, Washington DC, 1968	Khác
[22] R. Prieto-Diaz, Implementing faceted classification for software reuse, Com- mun, ACM 34 (5) (1991) 88-97	Khác
[23] J.L. Fleiss, B. Levin, M.C. Paik, Statistical Methods for Rates and Proportions, John Wiley & Sons, 2013	Khác