UNDERSTANDING COMPLEX DATASETS: DATA MINING WITH MATRIX DECOMPOSITIONS David Skillicorn COMPUTATIONAL METHODS OF FEATURE SELECTION Huan Liu and Hiroshi Motoda CONSTRAINED CLUSTERING: A
Trang 2Educational Data Mining
Trang 3UNDERSTANDING COMPLEX DATASETS:
DATA MINING WITH MATRIX DECOMPOSITIONS
David Skillicorn
COMPUTATIONAL METHODS OF FEATURE
SELECTION
Huan Liu and Hiroshi Motoda
CONSTRAINED CLUSTERING: ADVANCES IN
ALGORITHMS, THEORY, AND APPLICATIONS
Sugato Basu, Ian Davidson, and Kiri L Wagstaff
KNOWLEDGE DISCOVERY FOR
COUNTERTERRORISM AND LAW ENFORCEMENT
David Skillicorn
MULTIMEDIA DATA MINING: A SYSTEMATIC
INTRODUCTION TO CONCEPTS AND THEORY
Zhongfei Zhang and Ruofei Zhang
NEXT GENERATION OF DATA MINING
Hillol Kargupta, Jiawei Han, Philip S Yu,
Rajeev Motwani, and Vipin Kumar
DATA MINING FOR DESIGN AND MARKETING
Yukio Ohsawa and Katsutoshi Yada
THE TOP TEN ALGORITHMS IN DATA MINING
Xindong Wu and Vipin Kumar
GEOGRAPHIC DATA MINING AND
KNOWLEDGE DISCOVERY, SECOND EDITION
Harvey J Miller and Jiawei Han
TEXT MINING: CLASSIFICATION, CLUSTERING, AND APPLICATIONS
Ashok N Srivastava and Mehran Sahami
BIOLOGICAL DATA MINING
Jake Y Chen and Stefano Lonardi
INFORMATION DISCOVERY ON ELECTRONIC HEALTH RECORDS
Bo Long, Zhongfei Zhang, and Philip S Yu
KNOWLEDGE DISCOVERY FROM DATA STREAMS
HANDBOOK OF EDUCATIONAL DATA MINING
Cristóbal Romero, Sebastian Ventura, Mykola Pechenizkiy, and Ryan S.J.d Baker
PUBLISHED TITLES
SERIES EDITOR
Vipin KumarUniversity of Minnesota Department of Computer Science and Engineering Minneapolis, Minnesota, U.S.A
AIMS AND SCOPE
This series aims to capture new developments and applications in data mining and knowledge discovery, while summarizing the computational tools and techniques useful in data analysis This series encourages the integration of mathematical, statistical, and computational methods and techniques through the publication of a broad range of textbooks, reference works, and hand-books The inclusion of concrete examples and applications is highly encouraged The scope of the series includes, but is not limited to, titles in the areas of data mining and knowledge discovery methods and applications, modeling, algorithms, theory and foundations, data and knowledge visualization, data mining systems and tools, and privacy and security issues
Trang 4Edited by
Cristóbal Romero, Sebastian Ventura,
Mykola Pechenizkiy, and Ryan S.J.d Baker
Handbook of Educational Data Mining
Trang 5CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2011 by Taylor and Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S Government works
Printed in the United States of America on acid-free paper
10 9 8 7 6 5 4 3 2 1
International Standard Book Number: 978-1-4398-0457-5 (Hardback)
This book contains information obtained from authentic and highly regarded sources Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the valid- ity of all materials or the consequences of their use The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or lized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopy- ing, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.
uti-For permission to photocopy or use material electronically from this work, please access www.copyright.com (http:// www.copyright.com/) or contact the Copyright Clearance Center, Inc (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400 CCC is a not-for-profit organization that provides licenses and registration for a variety of users For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Trang 6To my wife, Inma, and my daughter, Marta
Trang 8Editors xv
Contributors xvii
1 Introduction 1
Cristóbal Romero, Sebastián Ventura, Mykola Pechenizkiy, and Ryan S J d Baker Part I Basic Techniques, Surveys and Tutorials 2 Visualization in Educational Environments 9
Riccardo Mazza 3 Basics of Statistical Analysis of Interactions Data from Web-Based Learning Environments 27
Judy Sheard 4 A Data Repository for the EDM Community: The PSLC DataShop 43
Kenneth R Koedinger, Ryan S J d Baker, Kyle Cunningham, Alida Skogsholm, Brett Leber, and John Stamper 5 Classifiers for Educational Data Mining 57
Wilhelmiina Hämäläinen and Mikko Vinni 6 Clustering Educational Data 75
Alfredo Vellido, Félix Castro, and Àngela Nebot 7 Association Rule Mining in Learning Management Systems 93
Enrique García, Cristóbal Romero, Sebastián Ventura, Carlos de Castro, and Toon Calders 8 Sequential Pattern Analysis of Learning Logs: Methodology and Applications 107
Mingming Zhou, Yabo Xu, John C Nesbit, and Philip H Winne 9 Process Mining from Educational Data 123
Nikola Trcˇka, Mykola Pechenizkiy, and Wil van der Aalst 10 Modeling Hierarchy and Dependence among Task Responses in Educational Data Mining 143
Brian W Junker
Trang 9Part II Case Studies
11 Novel Derivation and Application of Skill Matrices: The q-Matrix Method 159
Tiffany Barnes
12 Educational Data Mining to Support Group Work in Software
Development Projects 173
Judy Kay, Irena Koprinska, and Kalina Yacef
13 Multi-Instance Learning versus Single-Instance Learning for Predicting
the Student’s Performance 187
Amelia Zafra, Cristóbal Romero, and Sebastián Ventura
14 A Response-Time Model for Bottom-Out Hints as Worked Examples 201
Benjamin Shih, Kenneth R Koedinger, and Richard Scheines
15 Automatic Recognition of Learner Types in Exploratory Learning
Environments 213
Saleema Amershi and Cristina Conati
16 Modeling Affect by Mining Students’ Interactions within Learning
Agathe Merceron and Kalina Yacef
18 Data Mining for Contextual Educational Recommendation and Evaluation
Strategies 257
Tiffany Y Tang and Gordon G McCalla
19 Link Recommendation in E-Learning Systems Based on Content-Based
Student Profiles 273
Daniela Godoy and Analía Amandi
20 Log-Based Assessment of Motivation in Online Learning 287
Arnon Hershkovitz and Rafi Nachmias
21 Mining Student Discussions for Profiling Participation and Scaffolding
Learning 299
Jihie Kim, Erin Shaw, and Sujith Ravi
22 Analysis of Log Data from a Web-Based Learning Environment:
A Case Study 311
Judy Sheard
Trang 1023 Bayesian Networks and Linear Regression Models of Students’ Goals,
Moods, and Emotions 323
Ivon Arroyo, David G Cooper, Winslow Burleson, and Beverly P Woolf
24 Capturing and Analyzing Student Behavior in a Virtual Learning
Environment: A Case Study on Usage of Library Resources 339
David Masip, Julià Minguillón, and Enric Mor
25 Anticipating Students’ Failure As Soon As Possible 353
Cláudia Antunes
26 Using Decision Trees for Improving AEH Courses 365
Javier Bravo, César Vialardi, and Alvaro Ortigosa
27 Validation Issues in Educational Data Mining: The Case of HTML-Tutor
and iHelp 377
Mihaela Cocea and Stephan Weibelzahl
28 Lessons from Project LISTEN’s Session Browser 389
Jack Mostow, Joseph E Beck, Andrew Cuneo, Evandro Gouvea, Cecily Heiner, and
Octavio Juarez
29 Using Fine-Grained Skill Models to Fit Student Performance with Bayesian Networks 417
Zachary A Pardos, Neil T Heffernan, Brigham S Anderson, and Cristina L Heffernan
30 Mining for Patterns of Incorrect Response in Diagnostic Assessment Data 427
Tara M Madhyastha and Earl Hunt
31 Machine-Learning Assessment of Students’ Behavior within Interactive
Learning Environments 441
Manolis Mavrikis
32 Learning Procedural Knowledge from User Solutions to Ill-Defined Tasks
in a Simulated Robotic Manipulator 451
Philippe Fournier-Viger, Roger Nkambou, and Engelbert Mephu Nguifo
33 Using Markov Decision Processes for Automatic Hint Generation 467
Tiffany Barnes, John Stamper, and Marvin Croy
34 Data Mining Learning Objects 481
Manuel E Prieto, Alfredo Zapata, and Victor H Menendez
35 An Adaptive Bayesian Student Model for Discovering the Student’s
Learning Style and Preferences 493
Cristina Carmona, Gladys Castillo, and Eva Millán
Index 505
Trang 12The Purpose of This Book
tional.data.mining.(EDM) The.primary.goal.of.EDM.is.to.use.large-scale.educational.data.sets.to.better.understand.learning.and.to.provide.information.about.the.learning.process Although.researchers.have.been.studying.human.learning.for.over.a.century,.what.is.differ-ent.about.EDM.is.that.it.makes.use.not.of.experimental.subjects.learning.a.contrived.task.for.20.minutes.in.a.lab.setting;.rather,.it.typically.uses.data.from.students.learning.school.sub-jects,.often.over.the.course.of.an.entire.school.year For.example,.it.is.possible.to.observe.stu-dents.learning.a.skill.over.an.eight-month.interval.and.make.discoveries.about.what.types.of.activities.result.in.better.long-term.learning,.to.learn.about.the.impact.of.what.time.students.start.their.homework.has.on.classroom.performance,.or.to.understand.how.the.length.of.time.students.spend.reading.feedback.on.their.work.impacts.the.quality.of.their.later.efforts.In.order.to.conduct.EDM,.researchers.use.a.variety.of.sources.of.data.such.as.intelli-gent.computer.tutors,.classic.computer-based.educational.systems,.online.class.discussion.forums,.electronic.teacher.gradebooks,.school-level.data.on.student.enrollment,.and.stan-dardized.tests Many.of.these.sources.have.existed.for.decades.or,.in.the.case.of.standard-ized.testing,.about.2000.years What.has.recently.changed.is.the.rapid.improvement.in.storage.and.communication.provided.by.computers,.which.greatly.simplifies.the.task.of.collecting.and.collating.large.data.sets This.explosion.of.data.has.revolutionized.the.way.we.study.the.learning.process
The.goal.of.this.book.is.to.provide.an.overview.of.the.current.state.of.knowledge.of.educa-In.many.ways,.this.change.parallels.that.of.bioinformatics.20.years.earlier:.an.explosion.of.available.data.revolutionized.how.much.research.in.biology.was.conducted However,.the.larger.number.of.data.was.only.part.of.the.story It.was.also.necessary.to.discover,.adapt, or invent computational techniques for analyzing and understanding this new,.vast.quantity.of.data Bioinformatics.did.this.by.applying.computer.science.techniques.such.as.data.mining.and.pattern.recognition.to.the.data,.and.the.result.has.revolutionized.research in biology Similarly, EDM has the necessary sources of data More and more.schools.are.using.educational.software.that.is.capable.of.recording.for.later.analysis.every.action.by.the.student.and.the.computer Within.the.United.States,.an.emphasis.on.educa-tional.accountability.and.high.stakes.standardized.tests.has.resulted.in.large.electronic.databases.of.student.performance In.addition.to.these.data,.we.need.the.appropriate.com-putational.and.statistical.frameworks.and.techniques.to.make.sense.of.the.data,.as.well.as.researchers.to.ask.the.right.questions.of.the.data
munity,.as.can.be.seen.by.the.chapter.authors.of.this.book,.is.composed.of.people.from.multiple.disciplines Computer.science.provides.expertise.in.working.with.large.quanti-ties of data, both in terms of machine learning and data-mining techniques that scale.gracefully.to.data.sets.with.millions.of.records,.as.well.as.address.real-world.concerns.such.as.“scrubbing”.data.to.ensure.systematic.errors.in.the.source.data.do.not.lead.to.erro-neous.results Statisticians.and.psychometricians.provide.expertise.in.understanding.how.to.properly.analyze.complex.study.designs,.and.properly.adjust.for.the.fact.that.most.edu-cational.data.are.not.from.a.classic.randomized.controlled.study These.two.communities
Trang 13are.strong.in.statistical.and.computational.techniques,.but.techniques.and.data.are.not.suf-Main Avenues of Research in Educational Data Mining
There.are.three.major.avenues.of.research.in.EDM They.nicely.align.with.the.classic.who–what–where–when.interrogatives
ing.which.ones.are.best.suited.to.working.with.large.educational.data.sets,.and.finding.best.practices.for.evaluation.metrics.and.model.fitting Examples.of.such.efforts.include.experimenting.with.different.visualization.techniques.for.how.to.look.at.and.make.sense.of.the.data Since.educational.data.sets.are.often.longitudinal,.encompassing.months.and.sometimes.years,.and.rich.interactions.with.the.student.can.occur.during.that.time,.some.means.of.making.sense.of.the.data.is.needed Another.common.approach.in.EDM.is.using.variants.of.learning.curves.to.track.changes.in.student.knowledge Learning.curves.are.some.of.the.oldest.techniques.in.cognitive.psychology,.so.EDM.efforts.focus.on.examin-ing.more.flexible.functional.forms,.and.discovering.what.other.factors,.such.as.student.engagement.with.the.learning.process,.are.important.to.include One.difficulty.with.com-plex.modeling.in.EDM.is.there.is.often.no.way.of.determining.the.best.parameters.for.a.particular.model Well-known.techniques.such.as.hill.climbing.can.become.trapped.in.local.maxima Thus,.empirical.work.about.which.model-fitting.techniques.perform.well.for.EDM.tasks.is.necessary
The.first.avenue.is.work.on.developing.computational.tools.and.techniques,.determin-sary.foundation.to.EDM Work.in.this.area.focuses.on.how.we.can.extract.information.from.data At.present,.although.a.majority.of.EDM.research.is.in.this.avenue,.the.other.two.are.not.less.important—just.less.explored
This.work.on.extending.and.better.understanding.our.computational.toolkit.is.a.neces-The.second.avenue.is.determining.what.questions.we.should.ask.the.data There.are.several obvious candidates: Does the class understand the material well enough to go.on?.Do.any.students.require.remedial.instruction?.Which.students.are.likely.to.need.aca-demic counseling to complete school successfully? These are questions that have been.asked.and.answered.by.teachers.for.millennia EDM.certainly.enables.us.to.be.data.driven.and.to.answer.such.questions.more.accurately;.however,.EDM’s.potential.is.much.greater The.enormous.data.and.computational.resources.are.a.tremendous.opportunity,.and.one.of.the.hardest.tasks.is.capitalizing.on.it:.what.are.new.and.interesting.questions.we.can.answer.by.using.EDM?.For.example,.in.educational.settings.there.are.many.advantages.of.group.projects Drawbacks.are.that.it.can.be.hard.to.attribute.credit.and,.perhaps.more.importantly,.to.determine.which.groups.are.having.difficulties—perhaps.even.before.the.group.itself.realizes A.tool.that.is.able.to.analyze.student.conversations.and.activity,.and.automatically.highlight.potential.problems.for.the.instructor.would.be.powerful,.and.has.no.good.analog.in.the.days.before.computers.and.records.of.past.student.collaborations.were.easily.available Looking.into.the.future,.it.would.be.useful.if.we.could.determine
if a particular.student.would be.better.served.by.having.a different.classroom.teacher,.not.because.one.teacher.is.overall.a.better.choice,.but.because.for.this.type.of.student.the
Trang 14teacher.is.a.better.choice The.first.example.is.at.the.edge.of.what.EDM.is.capable;.the.sec-This.job.of.expanding.our.horizons.and.determining.what.are.new,.exciting.questions.to.ask.the.data.is.necessary.for.EDM.to.grow
The.third.avenue.of.EDM.is.finding.who.are.educational.stakeholders.that.could.benefit.from.the.richer.reporting.made.possible.with.EDM Obvious.interested.parties.are.stu-dents.and.teachers However,.what.about.the.students’.parents?.Would.it.make.sense.for.them.to.receive.reports?.Aside.from.report.cards.and.parent–teacher.conferences,.there.is.little.communication.to.parents.about.their.child’s.performance Most.parents.are.too.busy.for.a.detailed.report.of.their.child’s.school.day,.but.what.about.some.distilled.infor-mation?.A.system.that.informed.parents.if.their.child.did.not.complete.the.homework.that.was.due.that.day.could.be.beneficial Similarly,.if.a.student’s.performance.notice-ably.declines,.such.a.change.would.be.detectable.using.EDM.and.the.parents.could.be.informed Other.stakeholders.include.school.principals,.who.could.be.informed.of.teach-ers.who.were.struggling.relative.to.peers,.and.areas.in.which.the.school.was.performing.poorly Finally,.there.are.the.students.themselves Although.students.currently.receive.an.array.of.grades.on.homework,.quizzes,.and.exams,.they.receive.much.less.larger-grain.information, such as using the student’s past.performance.to.suggest which classes to.take,.or.that.the.student’s.homework.scores.are.lower.than.expected.based.on.exam.per-formance Note.that.such.features.also.change.the.context.of.educational.data.from.some-thing.that.is.used.in.the.classroom,.to.something.that.is.potentially.used.in.a.completely.different.place
Research.in.this.area.focuses.on.expanding.the.list.of.stakeholders.for.whom.we.can.provide information, and where this information is received Although there is much.potential.work.in.this.area.that.is.not.technically.demanding,.notifying.parents.of.missed.homework.assignments.is.simple.enough,.such.work.has.to.integrate.with.a.school’s.IT.infrastructure,.and.changes.the.ground.rules Previously,.teachers.and.students.controlled.information.flow.to.parents;.now.parents.are.getting.information.directly Overcoming.such.issues.is.challenging Therefore,.this.area.has.seen.some.attention,.but.is.relatively.unexplored.by.EDM.researchers
shop.referred.to.as.“Educational.data.mining”.occurring.in.2005 Since.then,.it.has.held.its.third.international.conference.in.2010,.had.one.book.published,.has.its.own.online.journal, and.is.now having this.book.published This.growth.is exciting for multiple.reasons First, education is a fundamentally important topic, rivaled only by medi-cine.and.health,.which.cuts.across.countries.and.cultures Being.able.to.better.answer.age-old.questions.in.education,.as.well.as.finding.ways.to.answer.questions.that.have.not.yet.been.asked,.is.an.activity.that.will.have.a.broad.impact.on.humanity Second,.doing.effective.educational.research.is.no.longer.about.having.a.large.team.of.graduate.assistants.to.score.and.code.data,.and.sufficient.offices.with.filing.cabinets.to.store.the.results There.are.public.repositories.of.educational.data.sets.for.others.to.try.their.hand.at.EDM,.and.anyone.with.a.computer.and.Internet.connection.can.join.the.community Thus,.a.much.larger.and.broader.population.can.participate.in.helping.improve.the.state.of.education
The.field.of.EDM.has.grown.substantially.in.the.past.five.years,.with.the.first.work-This.book.is.a.good.first.step.for.anyone.wishing.to.join.the.EDM.community,.or.for.active.researchers.wishing.to.keep.abreast.of.the.field The.chapters.are.written.by.key.EDM.researchers,.and.cover.many.of.the.field’s.essential.topics Thus,.the.reader.gets.a.broad.treatment.of.the.field.by.those.on.the.front.lines
Trang 15MATLAB® is a registered trademark of The MathWorks, Inc For product information,.please.contact:
Trang 16Dr Cristóbal Romero is an associate professor in the.Department.of.Computer.Science.at.the.University.of.Córdoba,.Spain His.research.interests.include.applying.artificial.intelli-gence.and.data-mining.techniques.in.education.and.e-.learning.systems He received his PhD in computer science from the.University of Granada, Spain, in 2003 The title of his PhD.thesis was “Applying data mining techniques for improving.adaptive hypermedia web-based courses.” He has published.several.papers.about.educational.data.mining.in.international.journals and conferences, and has served as a reviewer for.journals.and.as.a.program.committee.(PC).member.for.confer-ences He.is.a.member.of.the.International.Working.Group.on.Educational.Data.Mining.and.an.organizer.or.PC.member.of.conferences.and.workshops.about.EDM He.was.conference.chair.(with.Sebastián.Ventura).of.the.Second.International.Conference.on.Educational.Data.Mining.(EDM’09).
Dr Sebastián Ventura is an associate professor in the.Department of Computer Science at the University of.Córdoba, Spain He received his PhD in sciences from the.University.of.Córdoba.in.2003 His.research.interests.include.machine.learning,.data.mining,.and.their.applications,.and,.recently,.in.the.application.of.KDD.techniques.in.e-learning He.has.published.several.papers.about.educational.data.min-ing (EDM) in international journals and conferences He
has served as a reviewer for several journals such as User
Modelling and User Adapted Interaction , Information Sciences, and Soft Computing He.has.also.served.as.a PC.member.in.
several.research.EDM.forums,.including.as.conference.chair.(with Cristóbal Romero) of the Second International Conference on Educational Data.Mining.(EDM’09)
Dr Mykola Pechenizkiy is an assistant professor in the.Department of Computer Science, Eindhoven University of.Technology, the Netherlands He received his PhD in com-puter science and information systems from the University.of.Jyväskylä,.Finland,.in.2005 His.research.interests.include.knowledge discovery, data mining, and machine learning,.and.their.applications One.of.the.particular.areas.of.focus.is.on.applying.machine.learning.for.modeling,.changing.user.interests.and.characteristics.in.adaptive.hypermedia.applica-tions.including,.but.not.limited.to,.e-learning.and.e-health He.has.published.several.papers.in.these.areas,.and.has.been.involved.in.the.organization.of.conferences,.workshops,.and.special.tracks
Trang 17Dr Ryan S J d Bakerogy and the learning sciences in the Department of Social.Science.and.Policy.Studies.at.Worcester.Polytechnic.Institute,.Massachusetts, with a collaborative appointment in com-puter.science He.graduated.from.Carnegie.Mellon.University,.Pittsburgh, Pennsylvania, in 2005, with a PhD in human–computer interaction He was a program chair (with Joseph.Beck) of the First International Conference on Educational.
.is.an.assistant.professor.of.psychol-Data Mining, and is an associate editor of the Journal of
Educational Data Mining and a founder of the International.Working.Group.on.Educational.Data.Mining His.research.is.at.the.intersection.of.educational.data.mining,.machine.learn-ing,.human–computer.interaction,.and.educational.psychology,.and.he.has.received.five.best.paper.awards.or.nominations.in.these.areas He.is.the.former.technical.director.of.the Pittsburgh Science of Learning DataShop, the world’s largest public repository for.data.on.the.interaction.between.students.and.educational.software
Trang 18Wil van der Aalst
CharlotteCharlotte,.North.Carolina
Joseph E Beck
Computer.Science.DepartmentWorcester.Polytechnic.InstituteWorcester,.Massachusettsand
Machine.Learning.DepartmentCarnegie.Mellon.UniversityPittsburgh,.Pennsylvania
Javier Bravo
Escuela.Politécnica.SuperiorUniversidad.Autónoma.de.MadridMadrid,.Spain
Winslow Burleson
Department.of.Computer.Science.and.Engineering
Arizona.State.UniversityTempe,.Arizona
Toon Calders
Department.of.Mathematics.and
Computer.ScienceEindhoven.University.of.TechnologyEindhoven,.the.Netherlands
Cristina Carmona
Departamento.de.Lenguajes.y.Ciencias.de.la.Computación
Universidad.de.MálagaMálaga,.Spain
Trang 19Kyle Cunningham
Human–Computer.Interaction.InstituteCarnegie.Mellon.University
Pittsburgh,.Pennsylvania
Sidney D’Mello
Institute.for.Intelligent.SystemsThe.University.of.MemphisMemphis,.Tennessee
Philippe Fournier-Viger
Department.of.Computer.ScienceUniversity.of.Quebec.in.MontrealMontreal,.Quebec,.Canada
Enrique García
Department.of.Computer.Science.and.Numerical.Analysis
University.of.CordobaCordoba,.Spain
Daniela Godoy
ISISTAN.Research.InstituteUniversidad.Nacional.del.Centro.de.la.Provincia.de.Buenos.Aires
Tandil,.Argentina
Evandro Gouvea
European.Media.Laboratory.GmbHHeidelberg,.Germany
and
Robotics.InstituteCarnegie.Mellon.UniversityPittsburgh,.Pennsylvania
Art Graesser
Institute.for.Intelligent.SystemsThe.University.of.MemphisMemphis,.Tennessee
Trang 20Kenneth R Koedinger
Human–Computer.Interaction.InstituteCarnegie.Mellon.University
Pittsburgh,.Pennsylvania
Irena Koprinska
School.of.Information.TechnologiesUniversity.of.Sydney
Sydney,.New.South.Wales,.Australia
Brett Leber
Human–Computer.Interaction.InstituteCarnegie.Mellon.University
Pittsburgh,.Pennsylvania
Tara M Madhyastha
Department.of.PsychologyUniversity.of.WashingtonSeattle,.Washington
David Masip
Department.of.Computer.Science,
Multimedia.and.TelecommunicationsUniversitat.Oberta.de.Catalunya
Barcelona,.Spain
Manolis Mavrikis
London.Knowledge.LabThe.University.of.LondonLondon,.United.Kingdom
Riccardo Mazza
Faculty.of.Communication.SciencesUniversity.of.Lugano
Lugano,.Switzerlandand
Department.of.Innovative.TechnologiesUniversity.of.Applied.Sciences.of.Southern.Switzerland
Manno,.Switzerland
Trang 21John C Nesbit
Faculty.of.EducationSimon.Fraser.UniversityBurnaby,.British.Columbia,.Canada
Engelbert Mephu Nguifo
Department.of.Computer.SciencesUniversité.Blaise-Pascal.Clermont.2Clermont-Ferrand,.France
Roger Nkambou
Department.of.Computer.ScienceUniversity.of.Quebec.in.MontrealMontreal,.Quebec,.Canada
Alvaro Ortigosa
Escuela.Politécnica.SuperiorUniversidad.Autónoma.de.MadridMadrid,.Spain
Zachary A Pardos
Department.of.Computer.ScienceWorcester.Polytechnic.InstituteWorcester,.Massachusetts
Mykola Pechenizkiy
Department.of.Mathematics.and
Computer.ScienceEindhoven.University.of.TechnologyEindhoven,.the.Netherlands
Kaska Porayska-Pomsta
London.Knowledge.LabThe.University.of.LondonLondon,.United.Kingdom
Manuel E Prieto
Escuela.Superior.de.InformáticaUniversidad.de.Castilla-La.ManchaCiudad.Real,.Spain
Trang 22Alfredo Vellido
Departament.de.Llenguatges.i.Sistemes.Informàtics
Universitat.Politècnica.de.CatalunyaBarcelona,.Spain
Sebastián Ventura
Department.of.Computer.Science.and.Numerical.Analysis
University.of.CordobaCordoba,.Spain
César Vialardi
Facultad.de.Ingeniería.de.SistemasUniversidad.de.Lima
Lima,.Peru
Mikko Vinni
School.of.ComputingUniversity.of.Eastern.FinlandJoensuu,.Finland
Stephan Weibelzahl
School.of.ComputingNational.College.of.IrelandDublin,.Ireland
Philip H Winne
Faculty.of.EducationSimon.Fraser.UniversityBurnaby,.British.Columbia,.Canada
Beverly P Woolf
Department.of.Computer.ScienceUniversity.of.Massachusetts.AmherstAmherst,.Massachusetts
Trang 23Mingming Zhou
Faculty.of.EducationSimon.Fraser.UniversityBurnaby,.British.Columbia,.Canada
Trang 24its.third.iteration),.a.journal.(the.Journal of Educational Data Mining),.and.a.number.of.highly.
cited.papers.(see.[2].for.a.review.of.some.of.the.most.highly.cited.EDM.papers)
These contributions in education build off of data mining’s past impacts in other.domains.such.as.commerce.and.biology.[11] In.some.ways,.the.advent.of.EDM.can.be.con-sidered.as.education.“catching.up”.to.other.areas,.where.improving.methods.for.exploiting.data have promoted transformative impacts in practice [4,7,12] Although the discovery.methods.used.across.domains.are.similar.(e.g [3]),.there.are.some.important.differences.between.them For.instance,.in.comparing.the.use.of.data.mining.within.e-commerce.and.EDM,.there.are.the.following.differences:
CONTENTS
1.1 Background 11.2 Educational.Applications 31.3 Objectives,.Content,.and.How.to.Read.This.Book 4References 5
Trang 25•
Domain The.goal.of.data.mining.in.e-commerce.is.to.influence.clients.in.purchas-ing.while.the.educational.systems.purpose.is.to.guide.students.in.learning.[10]
• Data In e-commerce, typically data used is limited to web server access logs,.
whereas.in.EDM.there.is.much.more.information.available.about.the.student.[9],.allowing.for.richer.user.(student).modeling This.data.come.possibly.from.differ-ent.sources,.including.field.observations,.motivational.questionnaires,.measure-ments.collected.from.controlled.experiments,.and.so.on Depending.on.the.type.of.the.educational.environment.(traditional.classroom.education,.computer-based.or.web-based.education).and.an.information.system.that.supports.it.(a.learning.management,.an.intelligent.tutoring.or.adaptive.hypermedia.system).also.differ-ent.kinds.of.data.is.being.collected.including.but.not.limited.to.student.profiles,.(inter)activity.data,.interaction.(with.the.system,.with.educators.and.with.peers),.rich.information.about.learning.objects.and.tasks,.and.so.on Gathering.and.inte-grating.this.data.together,.performing.its.exploratory.analysis,.visualization,.and.preparation.for.mining.are.nontrivial.tasks.on.their.own
• Objective The.objective.of.data.mining.in.e-commerce.is.increasing.profit Profit.
is.a.tangible.goal.that.can.be.measured.in.terms.of.amounts.of.money,.and.which.leads.to.clear.secondary.measures.such.as.the.number.of.customers.and.customer.loyalty As.the.objective.of.data.mining.in.education.is.largely.to.improve.learning.[10],.measurements.are.more.difficult.to.obtain,.and.must.be.estimated.through.proxies.such.as.improved.performance
• Techniques The.majority.of.traditional.data.mining.techniques.including.but.not.
limited.to.classification,.clustering,.and.association.analysis.techniques.have.been.already.applied.successfully.in.the.educational.domain And.the most popular.approaches.are.covered.by.the.introductory.chapters.of.the.book Nevertheless,.educational.systems.have.special.characteristics.that.require.a.different.treatment.of.the.mining.problem Data.hierarchy.and.nonindependence.becomes.particu-larly important to account for, as.individual students contribute large amounts.of.data.while.progressing.through.a.learning.trajectory,.and.those.students.are.impacted.by.fellow.classmates.and.teacher.and.school-level.effects As.a.conse-quence, some specific data mining techniques are needed to address learning.[8] and other data about learners Some traditional techniques can be adapted,.some.cannot This.trend.has.led.to.psychometric.methods.designed.to.address.these.issues.of.hierarchy.and.nonindependence.being.integrated.into.EDM,.as.can.be.seen.in.several.chapters.in.this.volume However,.EDM.is.still.an.emerging.research area, and we can foresee that its further development will result in a.better.understanding.of.challenges.peculiar.to.this.field.and.will.help.researchers.involved.in.EDM.to.see.what.techniques.can.be.adopted.and.what.new.tailored.techniques.have.to.be.developed
The application of data mining techniques to educational systems in order to improve.learning.can.be.viewed.as.a.formative.evaluation.technique Formative.evaluation.[1].is.the.evaluation.of.an.educational.program.while.it.is.still.in.development,.and.with.the.purpose.of.continually.improving.the.program Examining.how.students.use.the.system.is.one.way.to.evaluate.instructional.design.in.a.formative.manner.and.may.help.educa-tional.designers.to.improve.the.instructional.materials.[5] Data.mining.techniques.can.be.used.to.gather.information.that.can.be.used.to.assist.educational.designers.to.establish.a
Trang 26pedagogical.basis.for.decisions.when.designing.or.modifying.an.environment’s.pedagogi-The.application.of.data.mining.to.the.design.of.educational.systems.is.an.iterative.cycle.of.hypothesis.formation,.testing,.and.refinement.(see.Figure.1.1)
Mined knowledge should enter the design loop towards guiding, facilitating, and.enhancing.learning.as.a.whole In.this.process,.the.goal.is.not.just.to.turn.data.into.knowl-edge,.but.also.to.filter.mined.knowledge.for.decision.making
As we can see in Figure 1.1, educators and educational designers (whether in school.districts,.curriculum.companies,.or.universities).design,.plan,.build,.and.maintain.educa-tional.systems Students.use.those.educational.systems.to.learn Building.off.of.the.avail-able.information.about.courses,.students,.usage,.and.interaction,.data.mining.techniques.can.be.applied.in.order.to.discover.useful.knowledge.that.helps.to.improve.educational.designs The.discovered.knowledge.can.be.used.not.only.by.educational.designers.and.teachers,.but.also.by.end.users—students Hence,.the.application.of.data.mining.in.educa-tional.systems.can.be.oriented.to.supporting.the.specific.needs.of.each.of.these.categories.of.stakeholders
1.2 Educational Applications
In.the.last.several.years,.EDM.has.been.applied.to.address.a.wide.number.of.goals In.this.book.we.can.distinguish.between.the.following.general.applications.or.tasks:
• Communicating to stakeholders The.objective.is.to.help.to.course.administrators.and.
educators.in.analyzing.students’.activities.and.usage.information.in.courses The.most.frequently.used.techniques.for.this.type.of.goal.are.exploratory.data.analy-sis.through.statistical.analysis.and.visualizations.or.reports,.and.process.mining
Educational systems (traditional classrooms, e-learning systems, LMSs, web-based adaptive systems, intelligent tutoring systems, questionnaires and quizzes)
Provide, store:
Course information, contents, academic data, grades, student usage and interaction data Data mining techniques
(statistics, visualization, clustering, classification, association rule mining, sequence mining, text
mining)
Model learners and learning, communicate findings, make recommendations
FIGURE 1.1
Applying.data.mining.to.the.design.of.educational.systems.
Trang 27• Maintaining and improving courses The.objective.is.to.help.to.course.administrators.
and.educators.in.determining.how.to.improve.courses.(contents,.activities,.links,.etc.),.using.information.(in.particular).about.student.usage.and.learning The.most.frequently.used.techniques.for.this.type.of.goal.are.association,.clustering,.and.classification Chapters.7,.17,.26,.and.34.discuss.methods.and.case.studies.for.this.category.of.application
• Generating
recommendation The.objective.is.to.recommend.to.students.which.con-tent.(or.tasks.or.links).is.most.appropriate.for.them.at.the.current.time The.most.frequently.used.techniques.for.this.type.of.goal.are.association,.sequencing,.clas-sification,.and.clustering Chapters.6,.8,.12,.18,.19,.and.32.discuss.methods.and.case.studies.for.this.category.of.application
• Predicting student grades and learning outcomes The.objective.is.to.predict.a.student’s.
final.grades.or.other.types.of.learning.outcomes.(such.as.retention.in.a.degree.program.or.future.ability.to.learn),.based.on.data.from.course.activities The.most.frequently.used.techniques.for.this.type.of.goal.are.classification,.clustering,.and.association Chapters.5.and.13.discuss.methods.and.case.studies.for.this.category.of.application
• Student
modeling User.modeling.in.the.educational.domain.has.a.number.of.appli-cations,.including.for.example.the.detection.(often.in.real.time).of.student.states.and.characteristics.such.as.satisfaction,.motivation,.learning.progress,.or.certain.types of problems that negatively impact their learning outcomes (making too.many.errors,.misusing.or.underusing.help,.gaming.the.system,.inefficiently.explor-ing.learning.resources,.etc.),.affect,.learning.styles,.and.preferences The.common.objective.here.is.to.create.a.student.model.from.usage.information The.frequently.used.techniques.for.this.type.of.goal.are.not.only.clustering,.classification,.and.association analysis, but also statistical analyses, Bayes networks (including.Bayesian.Knowledge-Tracing),.psychometric.models,.and.reinforcement.learning Chapters.6,.12,.14.through.16,.20,.21,.23,.25,.27,.31,.33,.and.35.discuss.methods.and.case.studies.for.this.category.of.application
• Domain structure analysis The.objective.is.to.determine.domain.structure,.using.
the.ability.to.predict.student.performance.as.a.measure.of.the.quality.of.a.domain.structure.model Performance.on.tests.or.within.a.learning.environment.is.uti-lized.for.this.goal The.most.frequently.used.techniques.for.this.type.of.goal.are.association.rules,.clustering.methods,.and.space-searching.algorithms Chapters.10,.11,.29,.and.30.discuss.methods.and.case.studies.for.this.category.of.application
1.3 Objectives, Content, and How to Read This Book
Our.objective,.in.compiling.this.book,.is.to.provide.as.complete.as.possible.a.picture.of.the current state of the art in the application of data mining techniques in education Recent.developments.in.technology.enhanced.learning.have.resulted.in.a.widespread.use
of e-learning environments and educational software, within many regular university
Trang 28This.expansion.of.data.has.led.to.increasing.interest.among.education.researchers.in.a.variety.of.disciplines,.and.among.practitioners.and.educational.administrators,.in.tools.and.techniques.for.analysis.of.the.accumulated.data.to.improve.understanding.of.learners.and.learning.process,.to.drive.the.development.of.more.effective.educational.software.and.better.educational.decision-making This.interest.has.become.a.driving.force.for.EDM We.believe.that.this.book.can.support.researchers.and.practitioners.in.integrating.EDM.into.their.research.and.practice,.and.bringing.the.educational.and.data.mining.communities.together,.so.that.education.experts.understand.what.types.of.questions.EDM.can.address,.and data miners understand what types of questions are of importance to educational.design.and.educational.decision-making
This.volume,.the.Handbook of Educational Data Mining,.consists.of.two.parts In.the.first.
part,.we.offer.nine.surveys.and.tutorials.about.the.principal.data.mining.techniques.that.have.been.applied.in.education In.the.second.part,.we.give.a.set.of.25.case.studies,.offering.readers.a.rich.overview.of.the.problems.that.EDM.has.produced.leverage.for
The.book.is.structured.so.that.it.can.be.read.in.its.entirety,.first.introducing.concepts.and.methods,.and.then.showing.their.applications However,.readers.can.also.focus.on.areas.of.specific.interest,.as.have.been.outlined.in.the.categorization.of.the.educational.applications We.welcome.readers.to.the.field.of.EDM.and.hope.that.it.is.of.value.to.their.research.or.practical.goals If.you.enjoy.this.book,.we.hope.that.you.will.join.us.at.a.future.iteration.of.the.Educational.Data.Mining.conference;.see.www.educationaldatamining.org.for.the.latest.information,.and.to.subscribe.to.our.community.mailing.list,.edm-announce
future.visions Journal of Educational Data Mining,.1(1),.3–17.
3 Hanna,.M (2004) Data.mining.in.the.e-learning.domain Computers and Education Journal,.42(3),.
7 Lewis,.M (2004) Moneyball: The Art of Winning an Unfair Game New.York:.Norton.
8 Li, J and Zạane, O (2004) Combining usage, content, and structure data to improve web.
site.recommendation In.International Conference on Ecommerce and Web Technologies,.Zaragoza,.
Spain,.pp 305–315.
Trang 299 Pahl,.C and.Donnellan,.C (2003) Data.mining.technology.for.the.evaluation.of.web-based.
teaching.and.learning.systems In.Proceedings of the Congress e-Learning,.Montreal,.Canada 10 Romero,.C and.Ventura,.S (2007) Educational.data.mining:.A.survey.from.1995.to.2005 Expert Systems with Applications,.33(1),.135–146.
Trang 30Basic Techniques, Surveys
and Tutorials
Trang 32Riccardo Mazza
2.1 Introduction
This.chapter.presents.an.introduction.to.information.visualization,.a.new.discipline.with.origins.in.the.late.1980s.that.is.part.of.the.field.of.human–computer.interaction We.will.illustrate.the.purposes.of.this.discipline,.its.basic.concepts,.and.some.design.principles.that.can.be.applied.to.graphically.render.students’.data.from.educational.systems The.chapter starts with.a description.of information visualization.followed by a.discussion.on.some.design.principles,.which.are.defined.by.outstanding.scholars.in.the.field Finally,.some.systems.in.which.visualizations.have.been.used.in.learning.environments.to.repre-sent.user.models,.discussions,.and.tracking.data.are.described
CONTENTS
2.1 Introduction 92.2 What.Is.Information.Visualization? 102.2.1 Visual.Representations 102.2.2 Interaction 112.2.3 Abstract.Data 112.2.4 Cognitive.Amplification 122.3 Design.Principles 132.3.1 Spatial.Clarity 142.3.2 Graphical.Excellence 142.4 Visualizations.in.Educational.Software 162.4.1 Visualizations.of.User.Models 162.4.1.1 UM/QV 162.4.1.2 ViSMod 172.4.1.3 E-KERMIT 182.4.2 Visualizations.of.Online.Communications 192.4.2.1 Simuligne 192.4.2.2 PeopleGarden 202.4.3 Visualizations.of.Student-Tracking.Data 202.5 Conclusions 24References 25
Trang 332.2 What Is Information Visualization?
Visualization,.which.may be.defined.as.“the display.of.data.with the.aim.of ing.comprehension.rather.than.photographic.realism*”,.has.greatly.increased.over.the.last.years.thanks.to.the.availability.of.more.and.more.powerful.computers.at.low.cost The.dis-cipline.of.information.visualization.(IV).[2,16].originated.in.the.late.1980s.for.the.purpose.of.exploring.the.use.of.computers.to.generate.interactive,.visual.representation.to.explain.and.understand.specific.features.of.data The.basic.principle.of.IV.is.to.present.data.in.a.visual.form.and.use.human.perceptual.abilities.for.their.interpretation
maximiz-As.in.many.other.fields,.several.people.have.tried.to.give.a.rigorous,.scientific.definition.of.the.discipline.of.IV The.definition.that.received.most.consensus.from.the.community.of.the.researchers.seems.to.be.the.one.given.by.Card.et.al in.their.famous.collection.of.papers
on.IV:.the
readings.[2] According.to.them,.IV.is.“the.use.of.computer-supported,.interac-tive,.visual.representations.of.abstract.data.to.amplify.cognition.”.By.this.definition,.four.terms.are.the.key.to.understand.this.domain:.visual.representation,.interaction,.abstract.data,.and.cognitive.amplification We.will.try.to.analyze.each.of.them.to.clearly.describe.the.field.and.their.applications
2.2.1 Visual Representations
ena,.data,.and.events.using.graphics Some.aspects,.such.as.when.people.need.to.find.a.route.in.a.city,.the.stock.market.trends.over.a.certain.period,.and.the.weather.forecast,.may.be.understood.better.using.graphics.rather.than.text Graphical.representation.of.data,.compared.to.the.textual.or.tabular.ones.(in.case.of.numbers),.takes.advantage.of.the.human.visual.perception Perception.is.very.powerful.as.it.conveys.large.amount.of.information.to.our.mind,.and.allowing.us.to.recognize.essential.features.and.to.make.important.inferences This.is.possible.thanks.to.the.fact.that.there.is.a.series.of.identifica-tion.and.recognition.operations.that.our.brain.performs.in.an.“automatic”.way.without.the.need.to.focus.our.attention.or.even.be.conscious.of.them Perceptual.tasks.that.can.be.performed.in.a.very.short.time.lapse.(typically.between.200.and.250.ms.or.less).are.called.pre-attentive,.since.they.occur.without.the.intervention.of.consciousness.[20]
bars whose length is proportional to the number on the left
Suppose we have to find the maximum and the minimum
FIGURE 2.1
Comparing perception of lines.with.numbers.
Trang 34Let.us.try.to.do.the.same.operation,.this.time.using.the.bars.on.the.left The.length.of.the.bars.lets.us.to.identify.almost.immediately.the.longest.and.the.shortest.thanks.to.the.pre-attentive.property.of.length,.the.length.of.the.bars.allows.us.to.almost.immediately.identify.the.longest.and.the.shortest
alisation”.in.the.British.version.of.the.term) It.has.been.noted.by.Spence.[16].that.there.is.a.diversity.of.uses.of.the.term.“visualization.”.For.instance,.in.a.dictionary.the.following.definitions.can.be.found:
Graphical.representations.are.often.associated.with.the.term.“visualization”.(or.“visu-Visualize:.form.a.mental.image.of…*
Visualization:.The.display.of.data.with.the.aim.of.maximizing.comprehension.rather than.photographic.realism †
Visualization:.the.act.or.process.of.interpreting.in.visual.terms.or.of.putting.into.visible form ‡
These.definitions.reveal.that.visualization.is.an.activity.in.which.humans.are.engaged,.as.an.internal.construct.of.the.mind.[16,20] It.is.something.that.cannot.be.printed.on.a.paper
or displayed on a computer screen With these considerations, we can summarize that.visualization.is.a.cognitive.activity,.facilitated.by.graphical.external.representations.from.which.people.construct.internal.mental.representation.of.the.world.[16,20]
Computers.may.facilitate.the.visualization.process.with.some.visualization.tools This.is.especially.true.in.recent.years.with.the.availability.of.powerful.computers.at.low.cost However,.the.above.definition.is.independent.from.computers:.although.computers.can.facilitate.visualization,.it.still.remains.an.activity.that.happens.in.the.mind
2.2.2 Interaction
Recently.there.has.been.great.progress.in.high-performance,.affordable.computer.graphics The.common.personal.computer.has.reached.a.graphic.power.that.just.10.years.ago.was.possible.only.with.very.expensive.graphic.workstations.specifically.built.for.the.graphic.process At.the.same.time,.there.has.been.a.rapid.expansion.in.information.that.people.have.to.process.for.their.daily.activities This.need.led.scientists.to.explore.new.ways.to.represent.huge.amounts.of.data.with.computers,.taking.advantage.of.the.possibility.of.users.interact-ing.with.the.algorithms.that.create.the.graphical.representation Interactivity.derives.from.the.people’s.ability.to.also.identify.interesting.facts.when.the.visual.display.changes.and.allows.them.to.manipulate.the.visualization.or.the.underlying.data.to.explore.such.changes
2.2.3 Abstract Data
IV.definitions.introduce.the.term.“abstract.data,”.for.which.some.clarification.is.needed The.data.itself.can.have.a.wide.variety.of.forms,.but.we.can.distinguish.between.data.that.have.a.physical.correspondence.and.is.closely.related.to.mathematical.structures.and.models.(e.g.,.the.airflow.around.the.wing.of.an.airplane,.or.the.density.of.the.ozone.layer.surrounding
* The Concise Oxford Dictionary Ed Judy.Pearsall Oxford.University.Press,.2001 Oxford Reference Online Oxford.
University.Press.
†.A Dictionary of Computing Oxford.University.Press,.1996 Oxford Reference Online Oxford.University.Press.
Trang 35in Figure 2.2), while IV is dealing with unstructured data.
sets as a distinct flavor [4] In Table 2.1 is reported a table
2.2.4 Cognitive Amplification
plication.(a.typical.mental.activity),.e.g.,.27.×.42.in.our.head,.without.having.a.pencil.and.paper This.calculation.made.with.our.mind.will.take.usually.at.least.five.times.longer.than.when.using.a.pencil.and.paper.[2] The.difficulty.in.doing.this.operation.in.the.mind.is.holding.the.partial.results.of.the.multiplication.in.the.memory.until.they.can.be.used:
FIGURE 2.2
Example of scientific tion: The ozone hole the South Pole on September 22, 2004 (Image.from.the.NASA.Goddard Space.Center.archives.and.repro- duced.with.permission.)
Trang 36for.patterns,.recognize.relationship.between.data,.and.perform.some.inferences.more.easily Card.et.al [2].propose.six.major.ways.in.which.visualizations.can.amplify.cognition.by 1 Increasing.the.memory.and.processing.resources.available.to.users
is a well-known old adage that everybody knows But why (and in which situations).graphical.representations.are.effective?
SH
ZH
ZG BS
AR AI
Abstimmung vom 24 September 2006 Votation du 24 Septembre 2006
Änderung des Asylgesetzes
Modification de la loi sur l΄asile
Schweizerische Eidgenossenschaft Eidgenössisches Departement des Innern EDI
Département fédéral de l΄intérieur DFI Bundesamt für Statistik BFS Confédération suisse
Confederazione svizzera
Confederaziun svizra Office fédéral de la statistique OFS
Quelle: Abstimmungsstatistik, BFS Source: Statistique des votations, OFS
© BFS, ThemaKart, Neuenburg 2006/K17.A525.R_bz
© OFS, ThemaKart, Neuchâtel 2006/K17.A525.R_bz
FIGURE 2.3
Graphical.representation.of.results.of.federal.referendum.in.Switzerland.on.September.24,.2006 (Image.from the Swiss Federal Statistical Office, http://www.bfs.admin.ch © Bundesamt für Statistik, ThemaKart 2009, reproduced.with.permission.)
Trang 372.3.1 Spatial Clarity
Graphical.representations.may.facilitate.the.way.we.present.and.understand.large complex.datasets As.Larkin.and.Simon.[7].argued.in.their.seminal.paper.“Why.a.diagram.is.(some-times).worth.ten.thousand.words,”.the.effectiveness.of.graphical.representations.is.due
to their spatial clarity Well-constructed graphical representations of data allow people.to.quickly.to.gain.insights.that.might.lead.to.significant.discoveries.as.a.result.of.spatial.clarity
Larkin.and.Simon.compared.the.computational.efficiency.of.diagrams.and.sentences.in.solving.physics.problems,.and.concluded.that.diagrams.helped.in.three.basic.ways:
Locality.is.enabled.by.grouping.together.information.that.is.used.together This.avoids.large.amounts.of.search.and.allows.different.information.closely.located.to.be.processed.simulta-neously For.example,.Figure.2.4.represents.the.map.of.the.Madrid.metro.transport.system In.this.map.the.locality.principle.is.applied.by.placing.metro.lines.and.zones.in.the.same.map The.traveler.can.find.in.the.same.place.information.about.lines,.connections,.and.stations
Minimizing labeling.is.enament,.avoiding.the.need.to.match.symbolic.labels.and.leading.to.reducing.the.working.memory load For example, the Madrid transport map (Figure 2.4) uses visual entities.such.as.lines.depicted.with.different.colors.to.denote.different.metro.lines Connections.are.clearly.indicated.by.a.white.circle.that.connects.the.corresponding.lines There.is.no.need.to.use.textual.representations.because.the.connections.are.explicitly.represented.in.the.graphics
bled.by.using.location.to.group.information.about.a.single.ele-Perceptual enhancement.is.enabled.by.supporting.a.large.number.of.perceptual.inferences.that.are.easy.for.humans.to.perform For.example,.in.Figure.2.4,.a.traveler.who.has.to.travel
from.Nuevos Ministerios.to.Opera.can.see.that.there.are.different.combination.of.lines.and.
connection.that.he.can.take,.and.probably.can.decide.which.is.the.fastest.way.to.reach.the.destination
2.3.2 Graphical Excellence
Sometimes graphical representations of data have been used to distort the underlying.data Tufte.[18].and.Bertin.[1].list.a.number.of.examples.of.graphics.that.distort.the.under-lying data or communicate incorrect ideas Tufte indicates some principles that should.be.followed.to.build.effective.well-designed.graphics In.particular,.a.graphical.display.should.[18]
Trang 38Metro de Madrid
© 2007 Designed and drawn by Matthew McLauchlin, http://www.metrodemontreal.com/
This version released under Creative Commons Share-Alike Attribution Licence (CC-SA-BY 2.5)
Europa 12
Arganzuela-Méndez Álvardo Puente de Vallecas
Rivas Urbanizaciones Rivas Vaciamadrid
La Poveda
Nueva Numancia Portazgo Buenos Aires Alto del Arenal Miguel Hernández Sierra de Guadalupe Villa de Vallecas Congosto
La Gavia Las Suertes Valdecarros
1
Almendrales Ciudad de los Ángeles Villaverde Bajo-Cruce Villaverde Alto
3San Cristóbal
San Fermin-Orcasur Hospital 12 de Octubre Usera
Urgel Pirámides Laguna
Lucero Pta de
Toledo
La Latina
Atocha Renfe
Tirso de Molina Antón Martin Atocha
Banco de España
Conde de Casal
Ibiza
O΄Donnell
Ascao Quintana
El Carmen
Barrio de la Concepción
Pque de las Avenidas ProsperidadAlfonso XIIIAvenida de la Paz
Canillas Esperanza
Campo de las Naciones
Barajas Aeropuerto T1-T2-T3
Arturo Soria
El Capricho Canillejas Torre Arias Suanzes Ciudad Lineal Cartagena
Lista Velázquez Colón
Rubén Dario
Iglesia Quevedo
Islas
Filipinas
Francos RodriguezValdezarza Alvarado
Estrecho Santiago Bernabéu Tetuán
Valdeacederas
Plaza de Castilla
Barrio del Pilar Ventilla
Las Tablas
La Granja Ronda de la Comunicación
Cuzco
Duque de Pastrana Pío XII Bambú
República Argentina
Concha Espina
Hortaleza Pinar del Rey
Manoteras
Fuencarral Begoña
Marqués
de la Valdavia Baunatal
Reyes Católicos
Manuel de Falla
La Moraleja
Parque de Santa María San lorenzo
Cruz del Rayo
Antonio MachadoPeñagrande
Avenida de la IllustraciónLacoma
Arroyo del Fresno
Metropolitano
Cano Rios Rosas
Serrano
Retiro Chueca Sevilla
Garcia Noblejas Simancas San Blas Las Musas Barrio del Puerto Coslada Central
La Rambla San Fernando Jarama Lavapiés
Menéndez Pelayo Palos de la Frontera
Plaza Eliptica
La Peseta
11
Legazpi 11
Oporto Acacias
Ópera 5
Alonso Martínez Príncipe de Vergara
Tribunal Gran Via Callao
Bilbao Noviciado Pza de España
Príncipe Pío
Núñez
de Balboa
La Elipa Ventas
Diego de León Avda de América
Hospital del Norte
10
8 Pinar de chamartin
Tres Olivos
9
Herrera Oria Pitis
6
Alameda
de Osuna 5
2
Pueblo Nuevo
6 Pacífico
Puerta de Arganda
Henares7
Estrella Vinateros Artilleros Pavones Valdebernardo Vicálvaro San Cipriano
Arganda del Rey
FIGURE 2.4
Map.of.the.Madrid.metro.system (Images.licensed.under.Creative.Commons.Share-Alike.)
Trang 39A.key.question.in.IV.is.how.we.convert.abstract.data.into.a.graphical.representation,.preserving.the.underlying.meaning.and,.at.the.same.time,.providing.new.insight There.is.no.“magic.formula”.that.helps.the.researchers.to.build.systematically.a.graphical.repre-sentation.starting.from.a.raw.set.of.data It.depends.on.the.nature.of.the.data,.the.type.of.information.to.be.represented.and.its.use,.but.more.consistently,.it.depends.on.the.creativ-ity.of.the.designer.of.the.graphical.representation Some.interesting.ideas,.even.if.innova-tive,.have.often.failed.in.practice
Graphics.facilitate.IV,.but.a.number.of.issues.must.be.considered.[16,18]:
1 Data is nearly always multidimensional, while graphics represented on a puter.screen.or.on.a.paper.are.presented.in.a.2D.surface
com- 2com-.com-.Sometimescom-.wecom-.needcom-.tocom-.representcom-.acom-.hugecom-.dataset,com-.whilecom-.thecom-.numbercom-.ofcom-.datacom-.view-able.on.a.computer.screen.or.on.a.paper.is.limited
2 Sometimes.we.need.to.represent.a.huge.dataset,.while.the.number.of.data.view- 32 Sometimes.we.need.to.represent.a.huge.dataset,.while.the.number.of.data.view-.2 Sometimes.we.need.to.represent.a.huge.dataset,.while.the.number.of.data.view-.Data2 Sometimes.we.need.to.represent.a.huge.dataset,.while.the.number.of.data.view-.may2 Sometimes.we.need.to.represent.a.huge.dataset,.while.the.number.of.data.view-.vary2 Sometimes.we.need.to.represent.a.huge.dataset,.while.the.number.of.data.view-.during2 Sometimes.we.need.to.represent.a.huge.dataset,.while.the.number.of.data.view-.the2 Sometimes.we.need.to.represent.a.huge.dataset,.while.the.number.of.data.view-.time,2 Sometimes.we.need.to.represent.a.huge.dataset,.while.the.number.of.data.view-.while2 Sometimes.we.need.to.represent.a.huge.dataset,.while.the.number.of.data.view-.graphics2 Sometimes.we.need.to.represent.a.huge.dataset,.while.the.number.of.data.view-.are2 Sometimes.we.need.to.represent.a.huge.dataset,.while.the.number.of.data.view-.static2 Sometimes.we.need.to.represent.a.huge.dataset,.while.the.number.of.data.view-
4 Humans.have.remarkable.abilities.to.select,.manipulate,.and.rearrange.data,.so.the.graphical.representations.should.provide.users.with.these.features
2.4 Visualizations in Educational Software
In.this.section.we.will.explore.some.graphical.representations.that.have.been.adopted.in.educational.contexts We.will.concentrate.our.analysis.in.software.applications.that.aims.to.provide.learning.to.students.and.gives.the.instructors.some.feedback.on.actions.and.improvements.undertaken.by.students.with.the.subject We.will.consider.three.types.of.applications:.visualization.of.user.models,.visualization.of.online.communications,.and.visualization.of.students’.tracking.data
2.4.1 Visualizations of User Models
edge.in.various.areas,.and.their.goals.and.preferences Student.models.are.a.key.compo-nent.of.intelligent.educational.systems.used.to.represent.the.student’s.understanding.of.material.taught Methods.for.user.modeling.are.often.exploited.in.educational.systems These.models.are.enabling.the.increasing.personalization.of.software,.particularly.on.the.Internet,.where.the.user.model.is.the.set.of.information.and.beliefs.that.is.used.to.person-alize.the.Web.site.[19]
A.user.model.is.a.representation.of.a.set.of.beliefs.about.the.user,.particularly.their.knowl-2.4.1.1 UM/QV
QV.[6].is.an.overview.interface.for.UM.[5],.a.toolkit.for.cooperative.user.modeling A.model.is.structured.as.a.hierarchy.of.elements.of.the.domain QV.uses.a.hierarchical.representa-tion.of.concepts.to.present.the.user.model For.instance,.Figure.2.5.gives.a.graphical.rep-resentation.of.a.model.showing.concepts.of.the.SAM.text.editor It.gives.a.quick.overview.whether.the.user.appears.to.know.each.element.of.the.domain QV.exploits.different.types
Trang 40of.geometric.forms.and.color.to.represent.known/unknown.concepts A.square.indicates.a.knowledge.component,.diamond.a.belief,.a.circle.indicates.a.nonleaf.node,.and.crosses.indi-cate.other.component.types The.filling.of.the.shape.is.used.to.indicate.the.component.value For.instance,.in.the.example,.the.white.squares.show.that.the.user.knows.that.element,.while.the.dark.squares.indicate.lack.of.knowledge Nested.shapes,.such.as.default _ size _ k.
or undo _ k,.indicate.that.the.system.has.not been.able to.determine.whether the user.knows.it.or.not.(e.g.,.if.there.is.inconsistency.in.the.information.about.the.user) The.view.of.the.graph.is.manipulable,.in.particular,.clicking.on.a.nonleaf.node.causes.the.subtree.to.be.displayed,.useful.in.case.of.models.having.a.large.number.of.components.to.be.displayed
2.4.1.2 ViSMod
ViSMod.[22].is.an.interactive.visualization.tool.for.the.representation.of.Bayesian.learner.models In ViSMod, learners and instructors can inspect the learner model using a.graphical.representation.of.the.Bayesian.network ViSMod.uses.concept.maps.to.render.a
Quit overview
Select node
to fold
or unfold
Label all nodes
Minimal quit_k quit_b default_size_k non_cmd_b undo_k load_new_k
set_fname_k more_useful
Sam Editors
Root
write_k Mouse
Other command_window exch_k search_k xerox_k Powerful
mostly_useless emacs
vi c_c pascal_c Programming
languages
lisp_c fortran_c typing_ok_c user_info
Useful Basics
Label leaves only
No labels
very_useful mouse
command_window gotoline_k
FIGURE 2.5
The.QV.tool.showing.a.user.model (Image.courtesy.of.Judy.Kay.)