Tanaka proposed the first Fuzzy Possibilistic Regression FPR using the following fuzzy linear model with crisp input and fuzzy parameters: ˜y n= ˜E0+ ˜E1xn1 + .... The estimation problem
Trang 1In the early 80’s, Tanaka proposed the first fuzzy linear regression model, moving
on from fuzzy sets theory and possibility theory (Tanaka et al., 1980) The functional
relation between dependent and independent variables is represented as a fuzzy linear
function whose parameters are given by fuzzy numbers Tanaka proposed the first Fuzzy Possibilistic Regression (FPR) using the following fuzzy linear model with
crisp input and fuzzy parameters:
˜y n= ˜E0+ ˜E1xn1 + + ˜E pxnp, + + ˜EPxnP (4)where the parameters are symmetric triangular fuzzy numbers denoted by ˜Ep=
(c p ;w p)L with c p and w pas center and the spread, respectively
Differently from statistical regression, the deviations between data and linear modelsare assumed to depend on the vagueness of the parameters and not on measurementerrors The basic idea of Tanaka’s approach was to minimize the uncertainty of theestimates, by minimizing the total spread of the fuzzy coefficients Spread minimiza-tion must be pursued under the constraint of the inclusion of the whole given data
set, which satisfies a degree of belief D (0 < D < 1) defined by the decision maker.
The estimation problem is solved via a mathematical programming approach, wherethe objective function aims at minimizing the spread parameters, and the constraintsguarantee that observed data fall inside the fuzzy interval:
The F-PLSPM follows the component based approach SEM-PLS, alternatively
de-fined PLS Path Modeling (PLS-PM) (Tenenhaus et al., 2005) The reason is that
fuzzy regression and PLS path modeling share several characteristics They are both
soft modeling and data oriented approaches.
Specifically, fuzzy regression joins PLS-PM in its final step, allowing for a fuzzy structural model (see, Figure 1) but a still crisp measurement model This connection
implies a two stage estimation procedure:
• stage 1: latent variables are estimated according to the PLS-PM estimation
pro-cedure (Wold, 1982);
Trang 2Fuzzy PLS Path Modeling 693
Fig 1 Fuzzy path model representation
• stage 2: FPR on the estimated latent variables is performed so that the following fuzzy structural model is obtained:
[h= ˜Eh0+
h
where ˜Ehh refers to the generic fuzzy path coefficient, [ h and [h are adjacent
latent variables and h,h ∈ [1, ,H] vary according to the model complexity.
It is worth noticing that the structural model from this procedure is different with respect to the traditional structural model Here the path coefficients are fuzzy num-
bers and there is no error term, as a natural consequence of a FPR In the analysis of
a statistical model one should always, in one way or another, take into account thegoodness of fit, above all in comparing different models The proposal is then to usethe FPR The estimation of fuzzy parameters, instead of single-valued (crisp) param-eters, permits us to gather both the structural and the residual information The char-acteristic to embed the residual in the model via fuzzy parameters (Tanaka and Guo,1999) permits to evaluate the differences between assessors (panel performance) aswell as the reproducibility of each assessor (assessor performance) (Romano andPalumbo, 2006b)
3 Application
The data set comes from sensory profiling of 14 cheese samples by a panel of 12assessors on the basis of twelve attributes in two replicates
The final data matrix consists of 336 rows (12 assessors × 14 samples × 2
repli-cates) and 12 columns (attributes: intensity odour, acidic odour, sun odour, rancidodour, intensity flavour, acidic flavour, sweet flavour, salty flavour, bitter flavour, sunflavour, metallic flavour, rancid flavour) Two blocks of variables describe the latent
variables odour and flavour First the hierarchical PLS model proposed by
Tenen-haus and Vinzi (2005) will be used to estimate a global model after averaging overthe assessors and the replicates (see, Figure 2) Thus, collapsing the data structure
into a two-way table (samples × attributes) Then fuzzy PLS path modeling will
Trang 3provide two sets of synthesized assessments: the overall latent scores for each uct and the partial latent scores for the different blocks of attributes The synthesis ofscores into a global assessment permits to investigate differences between products.However, in such a way, we lose all the information on the individual differencesbetween assessors At this aim, as many path models as assessors will be consideredand compared in terms of fuzzy path coefficients so as to detect eventual hetero-geneity in the panel Figure 2 shows the global path model As can be seen, the latent
prod-variable global depends on the two latent prod-variables odour and flavour The F-PLSPM
Fig 2 Global model
algorithm is used to estimate the fuzzy path coefficients (˜E1and ˜E2) Crisp path efficients in Table 1 show that the global quality of the products mostly depends on
co-the flavour raco-ther than on co-the odour Furco-thermore, fuzzy path coefficients describe a worse panel performance for the flavour emphasized by a more imprecise estimate
(wider fuzzy interval) Therefore, the F-PLSPM algorithm enriches the results ofthe classical PLSPM crisp approach by providing information on the imprecision ofpath coefficients At the same time, the coherence of results is granted as the crispestimates are comprised within the fuzzy intervals
Table 1 Global Model Path Coefficients Latent Variable crisp path coefficients fuzzy path coefficients
The most interesting result coming from the proposed approach is in Figure 3, whichcompares the interval valued estimates on the different assessors
Figure 3 reports the fuzzy path coefficients for the 12 local models referred to
each assessor By looking within each plot (flavour and odour) separately, the
asses-sor performance and the coherence between assesasses-sors can be evaluated: a) the wider
Trang 4Fuzzy PLS Path Modeling 695
Fig 3 Local fuzzy path coefficients
the interval, the less consistent is the assessor; b) the closer the intervals betweenthem, the more coherent are the assessors In the example, for the odour, assessor
7 is the least consistent assessor while assessor 12, being positioned far away fromthe rest of the assessors, is the least coherent as compared to the panel Finally, bycomparing the two plots, differences in the way each assessor perceives flavour andodour may be detected: for instance, assessor 7 is the most imprecise for the odourwhile it is extremely consistent for the flavour; assessor 12 is similarly consistent forboth flavour and odour but, in both cases, it is in clear disagreement with the panel(a much higher influence of the odour as opposed to a much lower influence of theflavour)
4 Conclusion
The joint use of PLS component-based approach to structural equation modelingand fuzzy possibilistic regression has yielded promising results in the framework ofsensory data analysis Namely, while taking into account the multi-block feature ofsensory data, the proposed Fuzzy-PLSPM leads to a fuzzy estimation of the pathcoefficients Such an estimation provides information on the precision of the classi-
cal estimates and allows a thorough comparison of the sensory evaluations between assessors and within assessors for different products Future directions of research
aim to extend the fuzzy approach also to the measurement model by introducing
an appropriate fuzzy possibilistic regression in the external estimation phase of thePLSPM algorithm This further development has a twofold interest: allowing forfuzzy input data; yielding fuzzy estimates of the loadings, of the outer weights and,
as a consequence, of the latent variable scores, thus embedding the measurementerror that naturally affects sensory assessments
Trang 5ALEFELD, G and HERZENBERGER, J (1983): Introduction to Interval computation
Aca-demic Press, New York
BOLLEN, K A (1989): Structural equations with latent variables Wiley, New York.
COPPI, R., GIL, M.A and KIERS, H.L (2006): The fuzzy approach to statistical analysis
Computational statistics & data analysis, 51 (1), 1–14.
J ¨ORESKOG K (1970): A general method for analysis of covariance structure Biometrika,
57, 239–251.
ROMANO, R (2006): Fuzzy Regression and PLS Path Modeling: a combined two-stage
ap-proach for multi-block analysis Doctoral Thesis, Univ of Naples, Italy.
ROMANO, R and PALUMBO, F (2006a): Fuzzy regression and least squares regression: the
relationship between two different fitting criteria Abstracts of the SIS2006 Conference,
2, 693–696.
ROMANO, R and PALUMBO, F (2006b): Classification of SEM based on fuzzy regression
In: Esposito-Vinzi et al (Eds.): Knowledge Extraction and Modeling Tilapia, Anacapri,
67-68
TANAKA, H., UEIJIMA, S and ASAI, K (1980): Fuzzy linear regression model IEEE
Transactions Systems Man Cybernet, 10, 2933–2938.
TANAKA, H and GUO, P (1999) Possibilistic Data Analysis for Operations Research.
Physica-Verlag, Wurzburg
TENENHAUS, M and ESPOSITO VINZI, V (2005): PLS regression, PLS path modeling and
generalized Procrustean analysis: a combined approach for multiblock analysis Journal
of Chemometrics, 19 (3), 145–153.
TENENAHUS, M., ESPOSITO VINZI, V., CHATELIN, Y.-M and LAURO, C (2005): PLS
path modeling Comp Stat and Data Anal 48, 159–205.
WOLD, H (1982) Soft modeling: the basic design and some extensions In: K.G Joreskog
and H Wold (Eds.): Systems under Indirect Observation, Vol Part II North-Holland,
Amsterdam, 1-54
ZADEH, L (1965): Fuzzy Sets Information and Control, 8, 338–353.
ZADEH, L (1973): Outline of a new approach to the analysis of complex systems and decision
processes IEEE Trans Systems Man and Cybernet, 1, 28–44.
Trang 6Scenario Evaluation Using Two-mode Clustering
Approaches in Higher Education
Matthias J Kaiser, Daniel BaierInstitute of Business Administration and Economics,
Brandenburg University of Technology Cottbus,
Postbox 101344, 03013 Cottbus, Germany
{mjkaiser, daniel.baier}@tu-cottbus.de
Abstract Scenario techniques have become popular tools for dealing with possible futures.
Driving forces of the development (the so-called key factors) and their possible projectionsinto the future are determined After a reduction of the possible combinations of projections to
a set of consistent and probable candidates for possible futures, traditionally one-mode clusteranalysis is used for grouping them In this paper, two-mode clustering approaches are proposedfor this purpose and tested in an application for the future of eLearning in higher education
In this application area, scenario techniques are a very young and promising methodology
1 Introduction: Scenario analysis
Since its first applications for business prognostication (e.g., Kahn, Wiener (1967),Meadows et al (1972), Schwartz (1991)), scenario techniques have become populartools for governmental and corporate planners in order to deal with possible futures(“scenarios”) and to support decisions in the face of uncertainty Nowadays, in manyresearch areas scenario analysis is an attractive tool with a huge variety of applica-tions (e.g., Götze (1993), Mißler-Behr (2002), Welfens et al (2004), van der Heij-den (2005), Pasternack (2006), Ringland (2006)) However, for higher education, theapplication of scenario analysis is new (e.g., Sprey (2003)) Different methodologi-cal approaches have been proposed, most of them using (roughly) four stages (e.g.,Coates (2000), Phelps et al (2001)):
• In a first stage, the scope of the scenario analysis has to be defined includingthe focal issues (e.g influence areas) and the driving forces for them (social,economic, political, environmental, technological factors) After a reduction ofthese driving forces with respect to relevance, importance, and inter-connection,
a list of so-called key factors results (e.g., A, B, C)
• Then, in the second stage, alternative projections (possible levels) for these keyfactors (e.g., A1, A2, A3, B1, B2) have to be determined By combining theseprojections, a database of candidates for possible futures (e.g., (A1,B1,C1, ),(A1,B2,C1, )) is available Additionally, the consistency for pairs of projections
Trang 7(e.g., (A1,B1), (A1,B2)) and the probability/realism of single projections withinthe time span under research has to be rated.
• Then, in a third stage, the candidates in the database have to be evaluated on basis
of their projections’ pairwise consistency and probability Using rankings and/orcut-off values or similar approaches, the database is reduced to a set of consistentand probable candidates Finally, the reduced set of candidates (the so-calledfirst mode), described by their projections w.r.t the key factors (the so-calledsecond mode), is grouped via cluster analysis into a small number of candidategroups, the so-called “scenarios” In an unrelated second step these candidategroups have to be analyzed to find out which projections best characterize them.Recently, new fuzzy clustering approaches have been proposed for dealing withthis identification problem (see e.g Mißler-Behr (1993), (2002))
• Finally, in a fourth stage, strategic options how to deal with the selected possiblefutures (“scenarios”) have to be developed
In this paper we develop new two-mode clustering approaches for simultaneouslygrouping candidates and projections in the third stage The new approach bases onBaier et al (1997)’s two-mode additive clustering procedure for simultaneous marketsegmentation and structuring with overlapping and non-overlapping cases
2 Two-Mode clustering (for scenario evaluation)
clusters (clusters of projections) S= (s i j)I×Jis a matrix of (observed) associations
between first and second mode objects (s i j ∈ IR ∀i, j) With association values of 1
– if the projection is part of the candidate – or 0 – if the projection is not part of the
candidate –, S is a binary data matrix (see, e.g., Li (2005) for an analysis of binary
data using two-mode clustering)
Model parameters are the following: P=(pik)I×K is a binary matrix describing
first mode cluster membership with p ik =1 if first mode object i belongs to first mode
cluster k and =0 otherwise Q=(q jl)J×Lis a binary matrix describing second mode
cluster membership with q jl =1 if second mode object j belongs to second mode
cluster l and =0 otherwise W=(w kl)K×L is a matrix of weights (w kl ∈ IR ∀k,l).
In order to provide results where candidates are members of one and only onescenario whereas projections are allowed to be member of none, one, or more thanone scenario, additional assumptions are necessary: The first mode membership ma-
trix P is restricted to be non-overlapping (i.e. K
k=1p ik = 1 ∀i) whereas for the
Trang 8Scenario Evaluation Using Two-mode Clustering Approaches 667
second mode membership matrix Q no such restrictions hold Q is allowed to be
on the basis of the underlying model S = PWQ’ + error.
In our approach, an alternating least squares procedure is applied The different sets
of model parameters (P, W, and Q) are initialized and alternatingly improved w.r.t.
Z Alternatively, a Bayesian model formulation could be used (see DeSarbo et al.(2005) in a market structuring setting) However, for our approach, we first discussthe iterative steps for obtaining improved estimates for selected model parameterswhen estimates for the remaining sets of model parameters are given Finally, thecomplete procedure is presented
a) Estimation of P for given W and Q: Set
(s i jl is constant w.r.t q 1l , ,q Jl ,w 1l , ,w Kl), estimates of Q and W can be obtained
by starting from initial values and alternatingly improving the parameter estimates
for second mode cluster l = 1, ,L via
Trang 9Thus, our estimation procedure can be described as follows:
1 Determine initial estimates of P, W, and Q Compute Z.
2 Repeat
Improve the estimates of P using a).
Improve the estimates of Q and W using b).
Until Z cannot be improved any more.
For applying the above model and algorithms for scenario evaluation, additionally,
the first and second mode clusters can be linked by setting K=L and restricting W
to an identity matrix This can be achieved by initialization and by omitting the
cor-responding algorithmic steps where W is updated In the following section, this proach (with K=L and W restricted to an identity matrix) is applied in stage three of
ap-a scenap-ario ap-anap-alysis in higher educap-ation
3 Example: Scenario evaluation in higher education
3.1 Stage One: Defining the scope of the analysis
Currently, at many universities, the concrete future of higher education and how todeal with this uncertainty is unclear Whereas some developments like the demo-graphics (older and fewer Germans), the ongoing of the Bologna-process (more stan-dardization and Europe-wide exchange in higher education), the importance of betterand life-long education, or the higher competition between universities for funds andtalented students seem to be predictable, other developments are highly uncertain(see, e.g., Michel (2006), Opaschowski (2006), Schulmeister (2006))
Especially for universities that plan to invest in technical teaching and ing environments and/or plan to attract more students for distance learning - this isunbearable Therefore, our main research question deals with the future of highereducation As a focal time point we use the year 2020 Also, this analysis is used as
learn-an application example for our new two-mode clustering approach
In the first stage of our scenario analysis, basing on a Delphi-study on the future
of eLearning, acceptance and preferences surveys, and other research projects at ourinstitute (e.g Göcks (2006)) as well as from other research institutes (e.g Cuhls et
al (2002), Opaschowski (2006)) (university) internal as well as (university) externalinfluencing factors on higher education were identified and possible projections forthe near future were described
Moreover, using expert workshops with teachers, students, people from sity administration and government, these lists and descriptions were extended and
Trang 10univer-Scenario Evaluation Using Two-mode Clustering Approaches 669modified, resulting in six areas of influence and thirty influencing factors (see figure1) with a total of 73 detailed described projections w.r.t these influencing factors.
Fig 1 Influencing factors overview
3.2 Stage Two: Creating a database of candidates
In the second stage of scenario analysis, these thirty influencing factors were reduced
to 12 key factors for the ongoing analysis We did this by filtering redundant aspectsand indirect dependencies Additionally, we used scoring methods and evaluation as-pects from a group of scientific experts and analyzed relevant scientific sources (see,e.g., Kröhnert et al (2004), Michel (2006)) Furthermore, the alternative projectionsfor each key factor were reduced and specified in detail (resulting in one page textfor each projection) As a result, a database of 21131=6,144 candidates (all possi-ble combinations of the 2-3 projections for each of the 12 key factors) for possiblefutures was available
Additionally, the pairwise consistency of these projections was evaluated usingvalues ranging from 1=“totally inconsistent” to 9=“totally consistent” Consequently,
as discussed in the theoretical introduction, a consistency value was calculated foreach candidate (e.g (A1,B2,C3, )) as the mean pairwise consistency of its pairs ofprojections (e.g (A1,B2), (A1,C3), (B2,C3), )
3.3 Stage Three: Evaluating, selecting, and clustering candidates
In a third stage the database was first reduced and then clustered For reduction, theso-called "‘complete combination scanning"’ was used, what means that for eachpair of projections that candidate with the highest mean pairwise consistency waskept for further analysis The reduction resulted into 286 candidates
Trang 11The binary descriptions of these candidates resulted into a binary database S with
286 rows and 25 columns This database was – in the follow-up analysis – subjected
to the two-mode clustering approaches for scenario evaluation from section 2.2 with
identical numbers K and L and W restricted to an identity matrix (for linking
first-and second-mode clusters)
The resulting VAF-values from analyses with totals of K=L=1 to 8 clusters
(VAF=0.056, 0.243, 0.325, 0.362, 0.363, 0.394, 0.448, 0.452) indicate via an elbowcriterion that a two- or a four-class solution should be preferred When focusing onthe two-class solution, the first- and second-mode memberships of the results lead
to two scenario interpretations, a scenario 1 “A Technology Based Future” and ascenario 2 “A Worse Perspective” (Note that the follow-up discussion of the two sce-narios is mainly based on the projections within the two derived two-mode clusters)
3.4 Stage Four: Developing strategic options
Scenario 1: A Technology based future: This scenario presents a dilly future
per-spective for higher education Students have passion for technology in the sense ofeducation technologies and learning software They are motivated to learn like con-scientious learners The university lecturers see a greater importance in giving lec-tures than in doing research
The traditional lecture forms will be enhanced by eLearning components likeonline teaching and blended learning scenarios There will be a unity of traditionaland new lesson forms The future will contain state universities as well as privateones in the education market
The learning infrastructure and administration environment (technology, ings, networks, etc.) will be excellent Because of hard competition in the educationmarket, the universities are very flexible and try to be better than their competi-tors They are able to assimilate new aspects and trends in learning innovations (likeeLearning) very quickly The usage of information and communication technologies
build-is establbuild-ished very well and in higher education eLearning aspects are used veryoften
eLearning aspects help to enforce individualised learning for better results in thestudies of each student These facts will be supported by a high level of educationawareness in the whole society in addition The importance of job market issuesforces the students to acquire an additional expertise in languages, soft skills, andother competences
Scenario 2: A worse perspective: The second extreme scenario presents us the
complete opposite to scenario 1 The future in higher education is not very attractive
No interested and committed students in the study courses, lecturers with little est in teaching, no changes in traditional ways of teaching and no private educationsuppliers in the market Universities have resources to offer an optimal learning envi-ronment and infrastructure (library, internal working places, etc.) No flexibility willprevailed at the universities and no eLearning technologies will be used The conse-quence is that no individualized learning will be offered Education is no longer anemphasis from the society point of view
Trang 12inter-Scenario Evaluation Using Two-mode Clustering Approaches 671When analyzing the four-class solution, the above results are supported: Againthe two extreme scenarios could be found, but now two additional in-between sce-narios are available These two scenarios mainly differ from the above two w.r.t theuniversity principle (state, private, or mixed) and the importance of job market issues
on the teaching contents and environment (high or low influence)
4 Conclusions
In this paper, we have introduced new two-mode clustering approaches for scenarioevaluation It fits naturally in the traditional four-stage-approach to scenario analysis
by alternatively analyzing the database of consistent candidates for possible futures
In contrast to the traditional one-mode clustering approaches for this purpose, thetwo-mode approach quite naturally develops clusters of candidates and describingprojections No follow-up decisions concerning fuzzy memberships of candidates ormemberships of projections have to be made
CUHLS, C., BLIND, K., and GRUPP, H (2002): Innovations for our Future Delphi ’98: New
Foresight on Science and Technology Physica-Verlag, Heidelberg.
DESARBO, W.S., FONG, D., and LIECHTY, J (2005): Two-Mode Cluster Analysis via
Hi-erarchical Bayes In: Baier, D and Wernecke, W (Eds.), Innovations in Classification,
Data Science, and Information Systems Springer, Heidelberg, 19–29.
GÖCKS, M S (2006): Betriebswirtschaftliche eLearning-Anwendungen in der universitären
Ausbildung Shaker, Aachen.
GÖTZE, U (1993): Szenario-Technik in der strategischen Unternehmensplanung 2nd
Edi-tion, DUV, Wiesbaden
KAHN, H and WIENER, A J (1967): The Year 2000: A Framework for Speculation on the
Next Thirty-Three Years Macmillan, New York.
KRÖHNERT, S., VAN OLST, N., and KLINGHOLZ, R (2004): Deutschland 2020: Die
demographische Zukunft der Nation Berlin-Institut für Bevölkerung und Entwicklung,
LI, T (2005): A General Model for Clustering Binary Data In: Conference on Knowledge
Discovery and Data Mining (KDD) 2005 Chicago, 188-197.
MEADOWS, D., RANDERS, J and BEHRENS, W (1972): The Limits to Growth Universe,
New York