Software Engineering SE is a comprehensive and diverse knowledge field that embraces a myriad of different research subareas.. / Information and Software Technology 0 0 0 2017 1–17 to con
Trang 1ARTICLE IN PRESS
Information and Software Technology 0 0 0 (2017) 1–17
ContentslistsavailableatScienceDirect
journalhomepage:www.elsevier.com/locate/infsof
Muhammad Usmana ,∗, Ricardo Brittoa , Jürgen Börstlera , Emilia Mendesb
a Department of Software Engineering (DIPT), Blekinge Institute of Technology (BTH), Karlskrona, 371 79, Sweden
b Department of Computer Science and Engineering (DIDD), Blekinge Institute of Technology (BTH), Karlskrona, 371 79, Sweden
a r t i c l e i n f o
Article history:
Received 29 September 2015
Revised 13 January 2017
Accepted 14 January 2017
Available online xxx
Keywords:
Taxonomy
Classification
Software engineering
Systematic mapping study
a b s t r a c t
Context:SoftwareEngineering(SE)isanevolvingdisciplinewithnewsubareasbeingcontinuously de-velopedandadded.TostructureandbetterunderstandtheSEbodyofknowledge,taxonomieshavebeen proposedinallSEknowledgeareas
Objective:Theobjectiveofthispaperistocharacterizethestate-of-the-artresearchonSEtaxonomies
Method:Asystematicmappingstudywasconducted,basedon270primarystudies
Results:AnincreasingnumberofSEtaxonomieshave beenpublishedsince 2000 inabroadrangeof venues,includingthetopSEjournalsandconferences.Themajorityoftaxonomiescanbegroupedinto thefollowingSWEBOK knowledgeareas:construction (19.55%),design (19.55%),requirements(15.50%) andmaintenance(11.81%).Illustration(45.76%)isthemostfrequentlyusedapproachfortaxonomy vali-dation.Hierarchy(53.14%)andfacetedanalysis(39.48%)arethemostfrequentlyusedclassification struc-tures.Mosttaxonomiesrelyonqualitative procedurestoclassifysubjectmatterinstances,butinmost cases(86.53%) theseprocedures arenot describedinsufficientdetail The majorityofthetaxonomies (97%)targetuniquesubjectmattersandmanytaxonomy-papersarecitedfrequently.MostSEtaxonomies aredesignedinanad-hocway.Toaddressthisissue,wehaverevisedanexistingmethodfordeveloping taxonomiesinamoresystematicway
Conclusion:ThereisastronginterestintaxonomiesinSE,butfewtaxonomiesareextendedorrevised Taxonomydesigndecisionsregardingtheusedclassificationstructures,proceduresanddescriptivebases areusuallynotwelldescribedandmotivated
© 2017PublishedbyElsevierB.V
1 Introduction
In science and engineering, a systematic description and
organization of the investigated subjects helps to advance the
knowledge in this field [1] This organization can be achieved
through the classification of the existing knowledge Knowledge
classificationhassupportedthematurationofdifferentknowledge
fieldsmainlyinfourways:
• Classification of the objects of a knowledge field provides a
common terminology, which eases the sharing of knowledge
[1–3]
• Classificationcanprovideabetterunderstandingofthe
interre-lationshipsbetweentheobjectsofaknowledgefield[1]
∗ Corresponding author
E-mail addresses: muhammad.usman@bth.se (M Usman), ricardo.britto@bth.se
(R Britto), jurgen.borstler@bth.se (J Börstler), emilia.mendes@bth.se (E Mendes)
• Classification can help to identify gaps in a knowledge field [1–3]
• Classificationcansupportdecisionmakingprocesses[1] Summarizing, classification can support researchers and prac-titionersingeneralizing,communicatingandapplyingthefindings
ofaknowledgefield[4] Software Engineering (SE) is a comprehensive and diverse knowledge field that embraces a myriad of different research subareas The knowledge within many subareas is already clas-sified, in particular by means of taxonomies [5–9] According to the Oxford English Dictionary [10] , a taxonomy is “a scheme of classification” A taxonomy allows for the description of terms and their relationships in the context of a knowledge area The conceptoftaxonomywasoriginallyproposedby CarolusLinnaeus [11] to group andclassify organisms by usinga fixed number of hierarchical levels Nowadays, different classification structures (e.g hierarchy, tree and faceted analysis [12] ) have been used http://dx.doi.org/10.1016/j.infsof.2017.01.006
0950-5849/© 2017 Published by Elsevier B.V
Please citethisarticleas:M Usmanetal.,Taxonomiesinsoftwareengineering:ASystematicmappingstudyanda revisedtaxonomy
Trang 22 M Usman et al / Information and Software Technology 0 0 0 (2017) 1–17
to construct taxonomies in different knowledge fields, such as
Education[13] ,Psychology[14] andComputerScience[15]
TaxonomieshavecontributedtomaturetheSEknowledgefield
Nevertheless,likewisethetaxonomyproposedbyCarolusLinnaeus
that keeps being extended [16] , SE taxonomies are expected to
evolve over time incorporating new knowledge In addition, due
to the wide spectrum of SE knowledge, there is still a need to
classifytheknowledgeinmanySEsubareas
Although manySE taxonomies havebeen proposed inthe
lit-erature,itappearsthattaxonomieshavebeendesignedorevolved
without following particular patterns, guidelines or processes A
betterunderstandingofhow taxonomieshavebeendesignedand
applied in SE could be very useful for the development of new
taxonomiesandtheevolutionofexistingones
To the best of our knowledge, no systematic mapping or
systematic literature review has been conducted to identify and
analyzethestate-of-the-artoftaxonomiesinSE.Inthispaper,we
describeasystematicmappingstudy[17,18] aimingtocharacterize
thestate-of-the-artresearchonSEtaxonomies
The main contribution of this paper is a characterization of
the state-of-the-art of taxonomies in SE Our results also show
that most taxonomies are developed in an ad-hoc way We
therefore revised a taxonomy development method in the light
of the findings of this mapping study, our own experience and
literaturefromotherresearchfieldswithmorematurityregarding
taxonomies(e.g.,psychologyandcomputerscience)
The remainder of this paper is organized as follows:
Section 2 describes related background Section 3 presents the
employed research methodology The current state-of-the-art on
taxonomiesin SE, aswell asthe validity threats associated with
the mapping study,are presented in Section 4 In Section 5 , we
presentarevisedmethodfordevelopingSEtaxonomies,alongwith
an illustration of the revised method and its limitations Finally,
ourconclusionsandviewonfutureworkareprovidedinSection 6
2 Background
In this section, we discuss important aspects related to
tax-onomydesignthat serveasmotivationfortheresearch questions
describedinSection 3
Taxonomy is neither a trivial nor a commonly used term
According to the most cited English dictionaries, a taxonomy is
mainlyaclassificationmechanism:
naming and organizing things, especially plants and animals, into
groups that share similar qualities”.
natural relationships”.
of something, especially organisms” or “A scheme of classification”.
Since taxonomy is mainly defined as a classification system,
one of the main purposes to develop a taxonomy should be to
classifysomething
2.2 Subject matter
The first step in the design of a new taxonomy is to clearly
definetheunitsofclassification.Insoftwareengineeringthiscould
1 www.dictionary.cambridge.org
2 www.merriam-webster.com
3 www.oxforddictionaries.com
berequirements,designpatterns,architecturalviews,methodsand techniques,defectsetc.Thisrequiresathoroughunderstandingof thesubjectmatter tobe abletodefine cleartaxonomyclassesor categoriesthatarecommonlyacceptedwithinthefield[19,20]
2.3 Descriptive bases / terminology
Once thesubject matter is clearlydefined oran existing def-inition is adopted, the descriptive terms, which can be used to describe and differentiate subjectmatter instances, must also be specified.Anappropriatedescriptionofthisbasesforclassification
is important to perform the comparison of subject matter in-stances.Descriptivebasescanalsobeviewedasasetofattributes that can be used for the classification of the subject matter instances[19,20]
2.4 Classification procedure
Classification procedures define how subject matter instances (e.g.,defects) are systematically assigned to classesorcategories Taxonomy’s purpose, descriptive bases and classification proce-duresare related anddependent oneach other.Depending upon the measurement system used, the classification procedure can
bequalitativeorquantitative. Qualitative classificationprocedures are based on nominalscales In the qualitative classification sys-tems,therelationship betweenthe classescannotbe determined
Quantitative classification procedures, on the other hand, are basedonnumericalscales[20]
2.5 Classification structure
As aforementioned, a taxonomy is mainly a classification mechanism According to Rowley andFarrow [21] there are two main approaches to classification: enumerative and faceted In enumerative classification all classes are fixed, making a classifi-cationscheme intuitiveandeasy toapply.It is,however,difficult
to enumerate all classes in immature or evolving domains In faceted classification aspects of classes are described that can
be combined and extended Kwasnik [12] describes four main approaches to structure a classification scheme (classification structures):hierarchy,tree,paradigmandfacetedanalysis
Hierarchy [12] leads to taxonomies with a single top class that “includes” all sub- and sub-sub classes, i.e a hierarchical relationship with inheritance (“is-a” relationship) Consider, for example, the hierarchyof students in an institutionwherein the top class “student” has two sub-classes of “graduate student” and “undergraduate student” These sub-classes can further have sub-sub classes and so forth A true hierarchy ensures the mu-tual exclusivity property, i.e an entity can only belong to one class Mutualexclusivitymakes hierarchies easy torepresentand understand; however, it cannot represent multiple inheritance relationships though Hierarchy is also not suitable in situations when researchers have to include multiple and diverse criteria for differentiation To define a hierarchical classification, it is mandatory to have good knowledge on the subjectmatter to be classified; the classesand differentiating criteria betweenclasses mustbewelldefinedearlyon
Tree [12] is similar to the hierarchy, however, there is no inheritance relationship between the classes of tree-based tax-onomies.Inthiskindofclassificationstructure,commontypesof relationshipsbetweenclassesare “part-whole”,“cause-effect” and
“process-product” For example,a treerepresenting a whole-part relationship betweenacountry, itsprovinces andcities.Treeand hierarchysharesimilarstrengthsandlimitations
Paradigm [12] leads totaxonomies withtwo-wayhierarchical relationships between classes The classes are described by a
Trang 3M Usman et al / Information and Software Technology 0 0 0 (2017) 1–17 3
Fig 1 Employed systematic mapping process
Table 1
Faceted analysis example
Tool name Platform(s) License type SE area Web support
Tool 2 Windows Proprietary Construction No
combination of two attributes at a time For example, paradigm
would be suitable if we have to also represent gender in the
“student” hierarchy example above-mentioned It can also be
viewedasatwo-dimensionalmatrixwhoseverticalandhorizontal
axesallowfortheinclusionoftwoattributesofinterest.Thistype
of classificationstructure shares similar strengths andlimitations
withthehierarchystructure
Faceted analysis [12,22] leads to taxonomies whose subject
matters are classified using multiple perspectives (facets) The
basicprincipleinfacetedanalysisisthattherearemorethanone
perspectivestoviewandclassifyacomplexentity.Eachfacetis
in-dependentandcanhaveitsownclasses,whichenablefacet-based
taxonomiestobeeasily adaptedsotheycanevolvesmoothlyover
time.InordertoproperlyclassifyCASEtools,forexample,multiple
facetsneedto beconsidered.Thesefacetsmayincludesupported
platform(s),licensetype,SEarea,websupportetc.Table 1 depicts
theapplicationofthesemultiplefacetstoclassifytwohypothetical
CASEtools.Facetedanalysisissuitablefornewandevolvingfields,
since it is not required to have the complete knowledge related
to the selected subjectmatter to design a facet-based taxonomy
However, it can be challenging to define an initial set of facets
Inaddition,although itispossibleto definerelationship between
facets, in most cases the facets are independent and have no
meaningfulrelationshipbetweeneachother
2.6 Validation
Validationstrengthensreliabilityandusefulnessoftaxonomies
Taxonomiescanbevalidatedinthreeways:
• Orthogonality demonstration – The orthogonalityofthe
tax-onomydimensionsandcategoriesisdemonstrated[8,20]
• Benchmarking – Thetaxonomyiscomparedtosimilar
classifi-cationschemes[8]
• Utility demonstration – Theutility of a taxonomyis
demon-strated by actually classifyingsubject matter examples [8,20]
The utility of a taxonomy can be demonstrated or
exempli-fied by classifying existing literature orexpert opinion, or by
employingmorerigorousvalidationapproachessuch asa case
studyorexperiment
3 Research methodology
We chose the systematic mapping study method (SMS) to
identify and analyze the state-of-the-art towards taxonomies in
SE,becausethismethodworkswellforbroadandweaklydefined
researchareas[17,18] WeemployedtheguidelinesbyKitchenham
and Charters [17] and partly implemented the mapping process
providedbyPetersenet al.[18] Theemployedmappingprocessis
summarizedinFig 1 anddescribedfurtherinSubsections 3.1 –3.5
3.1 Research questions
Thefollowingresearchquestionswereformulatedtoguidethis SMS:
• Question 1 (RQ1) – Whattaxonomy definitionsandpurposes areprovidedbypublicationsonSEtaxonomies?
• Question 2 (RQ2) – WhichsubjectmattersareclassifiedinSE taxonomies?
• Question 3 (RQ3) – HowistheutilityofSEtaxonomies demon-strated?
• Question 4 (RQ4) – HowareSEtaxonomiesstructured?
• Question 5 (RQ5) – TowhatextentareSEtaxonomiesused?
• Question 6 (RQ6) – HowareSEtaxonomiesdeveloped? ThemainideabehindRQ1istoidentifyhowandwhytheterm
“taxonomy” isusedinprimarystudiesthatclaimtopresenta tax-onomy RQ2 focuseson identifying the subjectmatters classified
bymeansoftaxonomiesinSE.RQ3focusesonidentifyingthe ap-proachesusedtodemonstratetheutilityofSEtaxonomies,which
isoneofthewaysofvalidatingataxonomy(seeSection 2 ).With RQ4 we intend to identify the classification structures, related descriptivebasesandclassificationproceduresemployedtodesign
SE taxonomies RQ5 focuses on the extent to which proposed
SE taxonomies are used Finally, RQ6 addresses in which ways
SE taxonomies are developed, i.e whether there are guidelines, methods,andprocessesthatguidethedevelopmentoftaxonomies
inasystematicway
3.2 Search process
ThesearchprocessemployedinthisworkisdisplayedinFig 2 andhas6activities
First,wedefinedthetermstobeincludedinoursearchstring
We selected all SWEBOK knowledge areas [7] to be included as terms, except for the three knowledge areas on related disci-plines (Computing Foundations, Mathematical Foundations and Engineering Foundations) We also included the term “Software Engineering”, to augment the comprehensiveness of the search string.Finally,toreduce the scopeof thesearch stringto studies thatreportSEtaxonomies,weincludedtheterm“taxonomy” SincesomeoftheknowledgeareasarereferredbytheSE com-munitythroughofotherterms(synonyms),wealsoincludedtheir synonyms.Specifically,thefollowingsynonymswereincludedinto thesearchstring:
• Requirements – requirementsengineering
• Construction – softwaredevelopment
• Design – softwarearchitecture
• Management – software project management, software man-agement
• Process – softwareprocess,softwarelifecycle
• Models and methods – softwaremodel,softwaremethods
• Economics – softwareeconomics
TheselectedSWEBOKknowledgeareasandtheterm“Software Engineering” were all linked using the operator OR The term
“taxonomy” was linked with the other terms using the operator AND.Thefinalsearchstringisshownbelow
Trang 44 M Usman et al / Information and Software Technology 0 0 0 (2017) 1–17
Fig 2 Search process
Table 2
Summary of search results
(“software requirements” OR “requirements engineering” OR “software design”
OR “software architecture” OR “software construction” OR “software
development” OR “software testing’ OR “software maintenance” OR “software
configuration management” OR “software engineering management” OR
“software project management” OR “software management” OR “software
engineering process” OR “software process” OR “software life cycle” OR
“software engineering models and methods” OR “software model” OR “software
methods” OR “software quality” OR “software engineering professional practice”
OR “software engineering economics” OR “software economics” OR “software
engineering”) AND (taxonomy OR taxonomies)
AlthoughSEknowledgeclassificationcouldbenamedin
differ-entways,e.g., taxonomy,ontology,classificationandclassification
scheme [1] , we limited the scope of this paper to taxonomies
Extendingour search string to include the terms “ontology” and
“classificationscheme” wouldhaveledtoan excessivenumberof
search results that would havebeen infeasible to handle4.Using
alternativetermswouldalsoforcetheauthorstointerpretwhether
theprimarystudies’authors’actuallyintendedtopresenta
taxon-omywhenthey donotexplicitlyreferto taxonomies.Tomitigate
thisthreattovalidity,werestrictedthescopetotaxonomies
Once thesearch stringwasdesigned, weselected theprimary
sourcestosearchforrelevantstudies.Scopus5,Compendex/Inspec6
and Web of Science7 were selected because they cover most of
theimportantSEdatabases,suchasIEEE,Springer,ACMand
Else-vier Inaddition, theselected primary sources are ableto handle
advanced queries The search string was applied on meta data
(i.e title, abstract and author keywords) in August 2014 on the
selecteddata sources We later on updated thesearch resultsby
applyingthesearchstringagaininFebruary2016,tofetchstudies
publishedbetweenSeptember 2014 andDecember 2015 Table 2
presentsthenumberofsearchresultsforeachdatasource
4 Inclusion of the terms “ontolog ∗ ” and “classification” returned 10,474 hits in to-
tal just for Scopus
5 www.scopus.com
6 www.engineeringvillage.com
7 apps.webofknowledge.com
3.3 Study selection process
The selection process employed in this work is displayed in Fig 3 anddetailedasfollows
First, the following inclusion and exclusion criteria were defined:
• Inclusion criteria
1.StudiesthatproposeorextendataxonomyAND
2.StudiesthatarewithinSoftwareEngineering(SE),according
toSWEBOK’sKAs(seeSubsection 3.2 )
• Exclusion criteria
1.Studieswherethefull-textisnotaccessibleOR;
2.StudiesthatdonotproposeorextendaSEtaxonomyOR;
3.StudiesthatarenotwritteninEnglishOR;
4.Studiesthat arenotreportedinapeer-reviewedworkshop, conference,orjournal
The selection of primary studies wasconducted using a two-stage screening procedure In the first stage, only the abstracts andtitlesofthestudieswereconsidered.Inthesecondstage,the fulltextswereread.Notethatweusedinbothstagesaninclusive approachtoavoidprematureexclusionofstudies,i.e.iftherewas doubtaboutastudy,suchastudywastobeincluded
Forthefirststage(level-1screening),thetotalnumberof1517 studies were equallydivided betweenthe two first authors.As a result,507studieswerejudgedaspotentiallyrelevant
To increase the reliability of the level-1 screening result, the third author screened a random sample of 10.30% (78 studies) fromthestudiesscreenedbythefirstauthorandthefourthauthor screenedarandomsampleof10.28%(78studies)fromthestudies screenedbythesecondauthor.Thefirstandthirdauthorshadthe samejudgmentfor91%(71)ofthestudies.Thesecondandfourth authorshadthesamejudgmentfor93.6%(73)ofthestudies
Toevaluatethereliability oftheinter-rateagreementbetween the authors, we calculated the Cohen’s kappa coefficient [23] The Cohen’s kappa coefficient between the first and third au-thors was statistically significant (significance level = 0.05) and equalto0.801.The Cohen’skappacoefficient betweenthesecond and fourth authors was also statistically significant (significance level = 0.05) and equal to 0.857 According to Fleiss et al. [23] , Cohen’s kappa coefficient values above 0.75mean excellent level
ofagreement
The level-2 screening (second stage), performed by the first andsecondauthors,consistedonapplyingtheselectioncriteriaon the full-textof the studies selected during thelevel-1 screening
Trang 5M Usman et al / Information and Software Technology 0 0 0 (2017) 1–17 5
Fig 3 Selection process
Table 3
Rationale for excluded studies
Not proposing or evolving a SE taxonomy 1167
Total included after study selection 280
Total included after data extraction 270
Thetotalnumberof507studieswereequallydividedbetweenthe
firsttwoauthors.Asaresult,280studieswerejudgedasrelevant
To increase the reliability ofthe level-2screening, a two-step
validationwasperformed,asfollows:
1 The first author screened 27.67% (70) of the studies deemed
as relevant by the second author during the level-2
screen-ing(randomlyselected)andvice-versa.Nodisagreementswere
foundbetweentheauthors
2 Ninestudieswererandomlyselectedfromeachofthetwosets
allocated to the first two authors for further validation The
third author applied the study selection process on these 18
studies (about 6.43% of 280) for validation purposes No
dis-agreementswerefoundwithrespecttothestudyselection(i.e
include/exclude)decisions
During the entire screening process (stages 1 and 2), we
trackedthereasonforeachexclusion,aspresentedinTable 3
3.4 Extraction process
The extraction process employed in this work is summarized
in Fig 4 andconsists of four main steps: Define a classification
scheme, define an extraction form, extract data, andvalidate the
extracteddata
WedesignedclassificationschemebyfollowingPetersenet al.’s
guidelines[18] Ithasthefollowingfacets:
• Research type – Thisfacetisused todistinguishbetween
dif-ferenttypesofstudies(adaptedfromWieringaetal.[24] )
im-plementedinpractice,i.e.evaluationinarealenvironment,
ingeneralbymeansofthecasestudymethod
wasnot implementedinpracticeyet,although itwas vali-datedinlaboratoryenvironment,ingeneralbymeansof ex-periment
• Solution proposal– Astudythatreportsataxonomythatwas neitherimplementedinpracticenorvalidatedalthoughitis supportedbyasmallexample(illustration)oragoodlineof argumentation
thathasnotypeofevaluation,validationorillustration
• SE knowledge area – Thisfacetisusedtodistinguishbetween the SE knowledge areas in which taxonomies havebeen pro-posed The categories of this facet follow the SWEBOK [7] : software requirements,softwaredesign,softwareconstruction, softwaretesting, softwaremaintenance, softwareconfiguration management, software engineeringmanagement, software en-gineering process, software engineering models and methods, softwarequality,softwareengineeringprofessionalpracticeand softwareengineeringeconomics
• Presentation approach – Thisfacetisusedtoclassifythe stud-ies accordingtotheoverallapproachused topresenta taxon-omy:textualandgraphical,respectively
Forthedataextraction,therelevantstudies(280)wereequally dividedbetweenthefirstandsecondauthors.Foreachpaper,data wascollectedandlateronstoredinaspreadsheet usingthedata extractionformshowninTable 4
To increase the reliability of the extracted data, a two-step validationwasperformed,asfollows:
1.Thefirstauthorindependentlyre-extractedthedataof50%(70)
of the studies originally extractedby thesecond author (ran-domlyselected) andvice-versa.Fivedisagreementswere iden-tified and all of them were relatedto the item “classification structure”
2.Eighteenstudieswererandomlyselectedfromthestudies orig-inallyextractedbythefirstandsecondauthors(9studiesfrom eachauthor).Thosestudieswereindependentlyre-extractedby the third author.Twenty threedisagreements were identified;
2 on the“taxonomy purpose”,10 on “classification structure”,
Trang 66 M Usman et al / Information and Software Technology 0 0 0 (2017) 1–17
Fig 4 Extraction process
Table 4
Data extraction form
Data item(s) Description
Citation data Title, author(s), year and publication venue
Taxonomy definition Definition of taxonomy that is used or referred to
Purpose Text that states the purpose for the taxonomy
Purpose keyword Key word used in the paper to describe the
purpose (e.g classify, understand, describe) Subject matter The name of the thing/concept are that is
taxonomized Descriptive bases Is the subject matter defined in sufficient
detail/clarity to enable classification (Yes/No) Classification structure Hierarchy, tree, paradigm, or faceted analysis,
according to Kwasnik [12]
Classification procedure The criteria for putting items in different classes
(qualitative, quantitative or no details provided) Classification procedure
description
Do the authors explicitly describe the classification procedure (Yes/No)
Design method Did the authors employ any systematic approach to
design the reported taxonomy? If so, which approach?
Presentation approach Textual or graphical
Utility demonstration Is the utility of the taxonomy demonstrated? If so,
how (e.g illustration, case study, experiment)?
Primary knowledge area Primary knowledge area as per SWEBOK v3 [7]
Secondary knowledge area Secondary knowledge area as per SWEBOK v3 (if
applicable) Number of citations Number of times a primary study is cited by other
studies, as per Google Scholar
2on“classificationproceduretype”, 3on“classification
proce-duredescription” and6on“validationapproach”
All disagreements except for “classification structure” were
easily resolved We believe that the high level of disagreement
on the item “classification structure” was due to the fact that
noneof thestudies explicitlystatedandmotivatedtheemployed
classification structure, which demanded the inference of such
datafromthetextineachpaper
Toimprovethereliability oftheextracteddata,we decidedto
re-screenall 280papers,focusing onlyonthe item“classification
structure” First, we discussed classification structures in detail
(basedon Kwasnik[12] ) to cometo a commonunderstanding of
theterms.Second, three ofusdidan independent re-assessment
oftheclassificationstructureof52papers.Asaresult,wereached
fullagreementon50papers(3identicalresults)andpartial
agree-menton2 papers(2 reviewersagreeing).There were noprimary
Fig 5 Analysis process
studies without full or partial agreement Third, the remaining
228studieswere re-assessedby thefirstandsecondauthorsand they reachedagreement on216 papers.The remaining 12 papers were independentlyre-assessed by thethird author,whodid not know the results from the other two reviewers In the end, full agreementwasachievedfor50studiesandpartialagreementwas achievedfor230studies
During the re-assessment of the primary studies, 10 studies were excluded becausethey donotpresenttaxonomies,reducing thefinalnumberofprimarystudiesto2708(seeTable 3 )
3.5 Analysis process
Fig 5 presents the analysis process conducted herein First,
we classified the extracted data using the scheme defined in Subsection 3.4 This led to the results detailedin Section 4 We also performed a quantitative analysis of the extracted data to answer the research questions of this paper Finally, the overall resultofthe dataanalysis(see Section 4 ),along withinformation
8 The full list of the included 270 primary studies is available at http://tinyurl com/jrdaxhh
Trang 7M Usman et al / Information and Software Technology 0 0 0 (2017) 1–17 7
Fig 6 Year and venue wise distributions
fromadditionalliterature([12,19,20] ),wasusedtorevisean
exist-ing methodpreviouslyproposed todesign SEtaxonomies[25] ,as
detailedinSection 5
4 Results
In this section, we describe the results of the mapping study
reported herein, which are based on the data extracted from
270 papers reporting 271 taxonomies (one paper presented two
taxonomies) The percentages in Sections 4.1 and 4.7 reflect the
number of papers (270), whereas the percentages in all other
subsectionsreflectthenumberoftaxonomies(271)
4.1 General results
Fig 6 shows that SE taxonomies have been proposed since
1987,withanincreasingnumberofthesepublishedaftertheyear
2000,whichsuggestsahigherinterestinthisresearchtopic
Table 5 displaysthat53.7%(145)ofthestudieswerepublished
inrelevantconferencesinthetopicsofmaintenance(International
Conference on Software Maintenance), requirements engineering
(Requirements’Engineering Conference) or generalSE topics(e.g
International Conference on Software Engineering) Taxonomies
were published at99 unique conferences with78 featuring only
a singleSE taxonomypublication.Theseresultsfurther indicatea
broadinterestinSEtaxonomiesina widerangeofSEknowledge
areas
Table 5 also shows that 33.7% (91) of the primary studies
were published as journal articles in 44 unique journals
Tax-onomies have been published frequently in relevant SE journals
(e.g IEEE Transactions on Software Engineering and Information
andSoftwareTechnology).We believethatthishasbeenthecase
because the scope of thesejournals is not confined to a specific
SEknowledgearea
Primary studies were published also in 28 unique workshops
(34 – 12.6%).Asforjournalsandconferences, theresultsindicate
Table 5
Publication venues with more than two taxonomy papers
IEEE Intl Conference on Software Maintenance (ICSM) 10 Intl Conference on Requirements Engineering (RE) 6 Intl Conference on Software Engineering (ICSE) 5 Hawaii Intl Conference on Systems Sciences (HICSS) 4 Asia Pacific Software Engineering Conference (APSEC) 4 European Conference on Software Maintenance and Reengineering
(CSMR)
4 Intl Conference on Software Engineering and Knowledge Engineering
Intl Symposium on Empirical Software Engineering and Measurement (ESEM)
4 Americas Conference on Information Systems (AMCIS) 3
IEEE Transactions on Software Engineering (TSE) 11 Information and Software Technology (IST) 9
Journal of Software: Evolution and Process 5
an increasing interest in SE taxonomies in a broad range of SE knowledgeareas
Fig 7 a–h depict the yearly distribution of SE taxonomies by knowledge area for the KAs with 10 or more taxonomies Note thatmostknowledgeareasfollowan increasingtrendafter2000, withmanytaxonomiesforconstruction,design,andqualityinthe 1980sand1990s
Trang 88 M Usman et al / Information and Software Technology 0 0 0 (2017) 1–17
Fig 7 Yearly distribution of primary studies by KAs Horizontal axes represent the years (starting 1987), while vertical axes denote the number of taxonomies
4.2 Classification scheme results
In this section, we present the results corresponding to the
threefacetsoftheclassificationschemedescribedinSection 3 ,i.e
SEknowledgearea(KA),researchtypeandpresentationapproach
The vertical axis in Fig 8 depictsthe SE knowledge areas in
which taxonomies have been proposed Construction and design
are the leading SE knowledge areas each with 53 (19.55%)
tax-onomies.These are relatively matureSE fields witha large body
ofknowledgeandahighnumberofsubareas
A high number of taxonomies have also been proposed in
the requirements (42 – 15.50%), maintenance (32 – 11.81%) and
testing(27 – 9.96%)knowledgeareas.Few taxonomieshave been
proposed ineconomics(3 – 1.11%)and professionalpractice (3 –
1.11%),whicharemorerecentknowledgeareas
The results show that most SE taxonomies (76.37%) are
pro-posed in the requirements, design, construction, testing and
maintenance knowledge areas, which correspond to the main
activitiesinatypicalsoftwaredevelopmentprocess[26]
The horizontal axis in Fig 8 shows the distribution of
tax-onomies by research types , according to Wieringa et al [24]
Most taxonomies are reported in papers that are classified as
“solutionproposals” (135– 49.82%),whereintheauthorspropose
ataxonomyandexplainorapplyitwiththehelpofanillustration
Ninety one taxonomies (33.58%) are reported in “philosophical papers”,whereinauthorsproposea taxonomy,butdonotprovide anykind ofvalidation, evaluationor illustration Relatively fewer taxonomiesarereportedin“evaluationpapers” (34 – 12.54%)and
“validationpapers” (11– 4.06%)
Fig 8 also depicts the classification of the taxonomies using
2aspects oftheclassificationscheme,i.e.SEknowledge areaand researchtype
Taxonomies in the knowledge areas construction and design are mostly reported either as solution proposals (construction – 27;design– 31) or philosophicalpapers (construction – 20;design – 17).Taxonomies in the knowledge areas requirements, mainte-nance andtestingare better distributed across differentresearch types,whereinbesidesthesolutionproposalandthephilosophical researchtypes,areasonablepercentageoftaxonomiesarereported
asevaluationorvalidationpapers
The horizontal axis in Fig 9 shows the distribution of tax-onomiesby presentation approach .Mosttaxonomies(57.93%)are presentedpurelyastextortable,while42.07%ofthetaxonomies are presented through some graphical notation in combination withtext
Fig 9 also displays the classification of the identified tax-onomiesintermsofSEknowledgeareaandpresentationapproach Theresultsshow2differenttrends:
Trang 9M Usman et al / Information and Software Technology 0 0 0 (2017) 1–17 9
Fig 8 Systematic map – knowledge area vs research type
• Forknowledgeareassuchasdesign,quality,modelsand
meth-ods, and process, both textual and graphical approaches are
usedan almost equalnumberoftimes.This suggeststhat the
taxonomiesintheKAsthatinvolvealotofmodelingmightbe
betterpresentedusinggraphicalmodelingapproaches
• Mosttaxonomies in construction (35 out of 53), maintenance
(23outof32),testing(15outof27)andsoftwaremanagement
(7outof10)aretextuallypresented
We extracteddata aboutthe followingtwo aspects to answer
RQ1:
• Taxonomy definition : We investigated from each study
whetherornottheauthorsmadeanyattempttocommunicate
theirunderstandingabouttheconceptoftaxonomybycitingor
presentinganydefinitionofit
• Taxonomy purpose :Weidentified fromeachstudythestated
(ifany)mainpurposefordesigningataxonomy
Fig 9 Systematic map – knowledge area vs presentation type
Asstatedearliertaxonomyisnot atrivialconcept.Ithasbeen defined in multiple ways (see Section 2 for some definitions) ThisRQaims to identifywhetherauthorsmake an expliciteffort
to sharetheir perspective on taxonomyby adopting/usinga def-inition The results show that only 6.3% (17) of the taxonomies were reported with a definition for the term “taxonomy” Out
of these 17 taxonomies, three use the Cambridge dictionary’s definition(seeSection 2 ), eightstudiesdonotprovidean explicit source and the remaining six have other unique references: The American heritage dictionary9, Carl Linnaeus [11] , Whatis10, the IEEEstandardtaxonomyforSEstandards,DotyandGlick[27] and
9 www.ahdictionary.com/
10 www.whatis.com
Trang 1010 M Usman et al / Information and Software Technology 0 0 0 (2017) 1–17
Table 6
Approaches for utility demonstration
Fleishmanet al. [28] For the remaining 93.7% (254) taxonomies,
nodefinitionof“taxonomy” wasprovided
To identify the purpose of each taxonomy, we extracted the
relevant text, referred here as purpose descriptions, from each
of the primary studies, using a process similar to open coding
[29,30] As codes we used the keywords used in the primary
studiestodescribeataxonomy’spurpose
For about 56% of the taxonomies, the authors used “classify”
(48.80%) or “categorize” (7.74%) to describe the purpose of their
taxonomy For 5.9% of the taxonomies it was not possible to
identifyaspecificpurpose.Fortheremainingtaxonomies(38.37%),
we found 41 different terms for describing the purpose, e.g.,
“identify”,“understand”,and“describe”
4.4 RQ2 – Subject matters
Intotal,weidentified263uniquesubjectmatters11 forthe271
taxonomies,e.g., technicaldebt,architecturalconstraints,usability
requirements,testingtechniquesandprocessmodels
Thehighnumberofuniquesubjectmattersmeansthatalmost
eachtaxonomydealtwitha uniquesubjectmatter.Thismightbe
duetothefollowingreasons:
• Existing taxonomies do not fit their purpose well Therefore
thereisaneedtodefinenewtaxonomies
• Thesubjectmattersforexistingtaxonomiesaresonarrowly
de-finedthatthey arenotsuitableforusageoutsidetheiroriginal
context.Newtaxonomiesarethereforedevelopedconstantly
• SE researchers do not reuse or extend existing taxonomies
when there is need for organizing SE knowledge, but rather
proposenewones
One indicator for taxonomy use is the numberof times each
primarystudyiscited.ThisanalysisisdetailedinSubsection 4.7
Thelistofsubjectmatterscontainsmainlytechnicalaspects of
SE.Onlyfewtaxonomiesdealwithpeople-relatedsubjectmatters,
e.g.,stakeholder-relatedandprivacy-relatedissues
The results forthisresearch questionsuggest thattaxonomies
arerarelyrevisited,revisedorextended.However,manytaxonomy
papersarehighlycited,whichshowsthatthereisastronginterest
intaxonomiesintheSEfield
ThemappingofSEtaxonomiespresentedinthispapersupports
SE researchers in identifying and evolving existing taxonomies
Thismayleadto thedevelopmentofamoreconsistent
terminol-ogy
4.5 RQ3 – Utility demonstration
Table 6 displaystheapproachesusedtodemonstratetheutility
ofSEtaxonomies.Illustrationisthemostfrequentlyusedapproach
(124 – 45.76%).Illustration includesapproaches such asexample,
scenarioandcase
11 For full list see: http://tinyurl.com/z4mqfnr
Table 7
Classification structure
Table 8
Descriptive bases
Table 9
Classification procedure types
Table 10
Classification procedure descriptions
Casestudieshavealsobeenusedto demonstratetheutilityof
34 taxonomies (12.54%) Experimentshave been usedto demon-strate the utility of 11 taxonomies (4.06%), while the utility of
a few taxonomies have also been demonstrated through expert opinion(7 – 2.58%)orsurvey(4– 1.48%) Notethat 33.9%(83)of thetaxonomiesdidnothavetheirutilitydemonstrated
The results related to RQ3 show that a few taxonomies have their utility demonstrated through methods like case study or experiment, while the utility of a large number of taxonomies (33.58%) is not demonstrated by any means We do not believe that one particular approach would be the best for all contexts; howeverwebelievethat inmostcaseswouldnot beenoughjust
toproposeataxonomy
To answer RQ4, the following data was gathered: classifi-cation structure, descriptive bases, classification procedure and classificationproceduredescription
Table 7 shows the classification structures identified for the identified taxonomies Hierarchy was the most frequently used classification structure (144– 53.14%), followed by faceted-based structures (107 – 39.48%), tree (14 – 5.17%) and paradigm (6 – 2.21%)
Table 8 displays the status of the taxonomies’ descriptive basis The majority of the taxonomies have a sufficiently clear description of their elements(248– 91.51%), followed by only 22 taxonomies(8.49%)withoutasufficientdescription
Table 9 presents the classification procedure types for the identified taxonomies The majorityof the taxonomies employed
a qualitative classification procedure (262 – 96.68%), followed by quantitative(7– 2.58%)andboth(2– 0.74%)
Table 10 displaysthe status ofthe taxonomies’ classification
procedure description The majority of the taxonomies do not have an explicit description forthe classification procedure (227