MongoDB: The Definitive Guide, Second Edition

This book is for application developers and DBAs wanting to learn MongoDB from the ground up. If you’re new to MongoDB, you’ll find in this book a tutorial that moves at a comfortable pace. If you’re already a user, the more detailed reference sections in the book will come in handy and should fill any gaps in your knowledge. In terms of depth, the material should be suitable for all but the most advanced users. Although the book is about the latest MongoDB version, which at the time of writing is 3.0.x, it also covers the previous stable MongoDB version that is 2.6

Trang 3

Kristina Chodorow

SECOND EDITIONMongoDB: The Definitive Guide

Trang 4

MongoDB: The Definitive Guide, Second Edition

by Kristina Chodorow

Printed in the United States of America.

Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.

O’Reilly books may be purchased for educational, business, or sales promotional use Online editions are

also available for most titles (http://my.safaribooksonline.com) For more information, contact our corporate/ institutional sales department: 800-998-9938 or corporate@oreilly.com.

Editor: Ann Spencer

Production Editor: Kara Ebrahim

Proofreader: Amanda Kersey

Indexer: Stephen Ingle, WordCo Indexing

Cover Designer: Randy Comer Interior Designer: David Futato Illustrator: Rebecca Demarest

May 2013: Second Edition

Revision History for the Second Edition:

2013-05-08: First release

See http://oreilly.com/catalog/errata.csp?isbn=9781449344689 for release details.

Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly

Media, Inc MongoDB: The Definitive Guide, Second Edition, the image of a mongoose lemur, and related

trade dress are trademarks of O’Reilly Media, Inc.

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trade‐ mark claim, the designations have been printed in caps or initial caps.

While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.

ISBN: 978-1-449-34468-9

[LSI]

Trang 5

Table of Contents

Foreword xiii

Preface xv

Part I Introduction to MongoDB 1 Introduction 3

Ease of Use 3

Easy Scaling 3

Tons of Features… 4

…Without Sacrificing Speed 5

Let’s Get Started 5

2 Getting Started 7

Documents 7

Collections 8

Dynamic Schemas 8

Naming 9

Databases 10

Getting and Starting MongoDB 11

Introduction to the MongoDB Shell 12

Running the Shell 13

A MongoDB Client 13

Basic Operations with the Shell 14

Data Types 16

Basic Data Types 16

Dates 18

Arrays 18

Embedded Documents 19

_id and ObjectIds 20

Trang 6

Using the MongoDB Shell 21

Tips for Using the Shell 22

Running Scripts with the Shell 23

Creating a mongorc.js 25

Customizing Your Prompt 26

Editing Complex Variables 27

Inconvenient Collection Names 27

3 Creating, Updating, and Deleting Documents 29

Inserting and Saving Documents 29

Batch Insert 29

Insert Validation 30

Removing Documents 31

Remove Speed 31

Updating Documents 32

Document Replacement 32

Using Modifiers 34

Upserts 45

Updating Multiple Documents 47

Returning Updated Documents 48

Setting a Write Concern 51

4 Querying 53

Introduction to find 53

Specifying Which Keys to Return 54

Limitations 55

Query Criteria 55

Query Conditionals 55

OR Queries 56

$not 57

Conditional Semantics 57

Type-Specific Queries 58

null 58

Regular Expressions 58

Querying Arrays 59

Querying on Embedded Documents 63

$where Queries 65

Server-Side Scripting 66

Cursors 67

Limits, Skips, and Sorts 68

Avoiding Large Skips 70

Advanced Query Options 71

Trang 7

Getting Consistent Results 72

Immortal Cursors 75

Database Commands 75

How Commands Work 76

Part II Designing Your Application 5 Indexing 81

Introduction to Indexing 81

Introduction to Compound Indexes 84

Using Compound Indexes 89

How $-Operators Use Indexes 91

Indexing Objects and Arrays 95

Index Cardinality 98

Using explain() and hint() 98

The Query Optimizer 102

When Not to Index 102

Types of Indexes 104

Unique Indexes 104

Sparse Indexes 106

Index Administration 107

Identifying Indexes 108

Changing Indexes 108

6 Special Index and Collection Types 109

Capped Collections 109

Creating Capped Collections 111

Sorting Au Naturel 112

Tailable Cursors 113

No-_id Collections 114

Time-To-Live Indexes 114

Full-Text Indexes 115

Search Syntax 118

Full-Text Search Optimization 119

Searching in Other Languages 119

Geospatial Indexing 120

Types of Geospatial Queries 120

Compound Geospatial Indexes 121

2D Indexes 122

Storing Files with GridFS 123

Getting Started with GridFS: mongofiles 124

Trang 8

Working with GridFS from the MongoDB Drivers 124

Under the Hood 125

7 Aggregation 127

The Aggregation Framework 127

Pipeline Operations 129

$match 129

$project 130

$group 135

$unwind 137

$sort 139

$limit 139

$skip 139

Using Pipelines 140

MapReduce 140

Example 1: Finding All Keys in a Collection 140

Example 2: Categorizing Web Pages 143

MongoDB and MapReduce 143

Aggregation Commands 146

count 146

distinct 147

group 147

8 Application Design 153

Normalization versus Denormalization 153

Examples of Data Representations 154

Cardinality 157

Friends, Followers, and Other Inconveniences 158

Optimizations for Data Manipulation 160

Optimizing for Document Growth 160

Removing Old Data 162

Planning Out Databases and Collections 162

Managing Consistency 163

Migrating Schemas 164

When Not to Use MongoDB 165

Part III Replication 9 Setting Up a Replica Set 169

Introduction to Replication 169

A One-Minute Test Setup 170

Trang 9

Configuring a Replica Set 174

rs Helper Functions 175

Networking Considerations 176

Changing Your Replica Set Configuration 176

How to Design a Set 178

How Elections Work 180

Member Configuration Options 181

Creating Election Arbiters 182

Priority 183

Hidden 184

Slave Delay 185

Building Indexes 185

10 Components of a Replica Set 187

Syncing 187

Initial Sync 188

Handling Staleness 190

Heartbeats 191

Member States 191

Elections 192

Rollbacks 193

When Rollbacks Fail 197

11 Connecting to a Replica Set from Your Application 199

Client-to-Replica-Set Connection Behavior 199

Waiting for Replication on Writes 200

What Can Go Wrong? 201

Other Options for “w” 202

Custom Replication Guarantees 202

Guaranteeing One Server per Data Center 202

Guaranteeing a Majority of Nonhidden Members 204

Creating Other Guarantees 204

Sending Reads to Secondaries 205

Consistency Considerations 205

Load Considerations 205

Reasons to Read from Secondaries 206

12 Administration 209

Starting Members in Standalone Mode 209

Replica Set Configuration 210

Creating a Replica Set 210

Changing Set Members 211

Trang 10

Creating Larger Sets 211

Forcing Reconfiguration 212

Manipulating Member State 213

Turning Primaries into Secondaries 213

Preventing Elections 213

Using Maintenance Mode 213

Monitoring Replication 214

Getting the Status 214

Visualizing the Replication Graph 216

Replication Loops 218

Disabling Chaining 218

Calculating Lag 219

Resizing the Oplog 220

Restoring from a Delayed Secondary 221

Building Indexes 222

Replication on a Budget 223

How the Primary Tracks Lag 224

Master-Slave 225

Converting Master-Slave to a Replica Set 226

Mimicking Master-Slave Behavior with Replica Sets 226

Part IV Sharding 13 Introduction to Sharding 231

Introduction to Sharding 231

Understanding the Components of a Cluster 232

A One-Minute Test Setup 232

14 Configuring Sharding 241

When to Shard 241

Starting the Servers 242

Config Servers 242

The mongos Processes 243

Adding a Shard from a Replica Set 244

Adding Capacity 245

Sharding Data 245

How MongoDB Tracks Cluster Data 246

Chunk Ranges 247

Splitting Chunks 249

Trang 11

The Balancer 253

15 Choosing a Shard Key 257

Taking Stock of Your Usage 257

Picturing Distributions 258

Ascending Shard Keys 258

Randomly Distributed Shard Keys 261

Location-Based Shard Keys 263

Shard Key Strategies 264

Hashed Shard Key 264

Hashed Shard Keys for GridFS 266

The Firehose Strategy 267

Multi-Hotspot 268

Shard Key Rules and Guidelines 271

Shard Key Limitations 271

Shard Key Cardinality 271

Controlling Data Distribution 271

Using a Cluster for Multiple Databases and Collections 272

Manual Sharding 273

16 Sharding Administration 275

Seeing the Current State 275

Getting a Summary with sh.status 275

Seeing Configuration Information 277

Tracking Network Connections 283

Getting Connection Statistics 283

Limiting the Number of Connections 284

Server Administration 285

Adding Servers 285

Changing Servers in a Shard 285

Removing a Shard 286

Changing Config Servers 288

Balancing Data 289

The Balancer 289

Changing Chunk Size 290

Moving Chunks 291

Jumbo Chunks 292

Refreshing Configurations 295

Part V Application Administration

Trang 12

17 Seeing What Your Application Is Doing 299

Seeing the Current Operations 299

Finding Problematic Operations 301

Killing Operations 301

False Positives 302

Preventing Phantom Operations 302

Using the System Profiler 302

Calculating Sizes 305

Documents 305

Collections 305

Databases 306

Using mongotop and mongostat 307

18 Data Administration 311

Setting Up Authentication 311

Authentication Basics 312

Setting Up Authentication 314

How Authentication Works 314

Creating and Deleting Indexes 315

Creating an Index on a Standalone Server 315

Creating an Index on a Replica Set 315

Creating an Index on a Sharded Cluster 316

Removing Indexes 316

Beware of the OOM Killer 317

Preheating Data 317

Moving Databases into RAM 317

Moving Collections into RAM 318

Custom-Preheating 318

Compacting Data 320

Moving Collections 321

Preallocating Data Files 322

19 Durability 323

What Journaling Does 323

Planning Commit Batches 324

Setting Commit Intervals 325

Turning Off Journaling 325

Replacing Data Files 325

Repairing Data Files 325

The mongod.lock File 326

Sneaky Unclean Shutdowns 327

What MongoDB Does Not Guarantee 327

Trang 13

Checking for Corruption 327

Durability with Replication 329

Part VI Server Administration 20 Starting and Stopping MongoDB 333

Starting from the Command Line 333

File-Based Configuration 336

Stopping MongoDB 336

Security 337

Data Encryption 338

SSL Connections 338

Logging 338

21 Monitoring MongoDB 341

Monitoring Memory Usage 341

Introduction to Computer Memory 341

Tracking Memory Usage 342

Tracking Page Faults 343

Minimizing Btree Misses 345

IO Wait 346

Tracking Background Flush Averages 346

Calculating the Working Set 348

Some Working Set Examples 350

Tracking Performance 350

Tracking Free Space 352

Monitoring Replication 353

22 Making Backups 357

Backing Up a Server 357

Filesystem Snapshot 357

Copying Data Files 358

Using mongodump 359

Backing Up a Replica Set 361

Backing Up a Sharded Cluster 362

Backing Up and Restoring an Entire Cluster 362

Backing Up and Restoring a Single Shard 362

Creating Incremental Backups with mongooplog 363

23 Deploying MongoDB 365

Designing the System 365

Trang 14

Choosing a Storage Medium 365

Recommended RAID Configurations 369

CPU 370

Choosing an Operating System 370

Swap Space 371

Filesystem 371

Virtualization 372

Turn Off Memory Overcommitting 372

Mystery Memory 372

Handling Network Disk IO Issues 373

Using Non-Networked Disks 374

Configuring System Settings 374

Turning Off NUMA 374

Setting a Sane Readahead 377

Disabling Hugepages 378

Choosing a Disk Scheduling Algorithm 379

Don’t Track Access Time 380

Modifying Limits 380

Configuring Your Network 382

System Housekeeping 383

Synchronizing Clocks 383

The OOM Killer 383

Turn Off Periodic Tasks 384

A Installing MongoDB 385

B MongoDB Internals 389

Index 393

Trang 15

In the last 10 years, the Internet has challenged relational databases in ways nobodycould have foreseen Having used MySQL at large and growing Internet companiesduring this time, I’ve seen this happen firsthand First you have a single server with asmall data set Then you find yourself setting up replication so you can scale out readsand deal with potential failures And, before too long, you’ve added a caching layer,tuned all the queries, and thrown even more hardware at the problem

Eventually you arrive at the point when you need to shard the data across multipleclusters and rebuild a ton of application logic to deal with it And soon after that yourealize that you’re locked into the schema you modeled so many months before.Why? Because there’s so much data in your clusters now that altering the schema willtake a long time and involve a lot of precious DBA time It’s easier just to work around

it in code This can keep a small team of developers busy for many months In the end,you’ll always find yourself wondering if there’s a better way—or why more of thesefeatures are not built into the core database server

Keeping with tradition, the Open Source community has created a plethora of “betterways” in response to the ballooning data needs of modern web applications They spanthe spectrum from simple in-memory key/value stores to complicated SQL-speakingMySQL/InnoDB derivatives But the sheer number of choices has made finding the rightsolution more difficult I’ve looked at many of them

I was drawn to MongoDB by its pragmatic approach MongoDB doesn’t try to be ev‐erything to everyone Instead it strikes the right balance between features and com‐plexity, with a clear bias toward making previously difficult tasks far easier In otherwords, it has the features that really matter to the vast majority of today’s web applica‐tions: indexes, replication, sharding, a rich query syntax, and a very flexible data model.All of this comes without sacrificing speed

Like MongoDB itself, this book is very straightforward and approachable NewMongoDB users can start with Chapter 1 and be up and running in no time Experienced

Trang 16

users will appreciate this book’s breadth and authority It’s a solid reference for advancedadministrative topics such as replication, backups, and sharding, as well as popular clientAPIs.

Having recently started to use MongoDB in my day job, I have no doubt that this bookwill be at my side for the entire journey—from the first install to production deployment

of a sharded and replicated cluster It’s an essential reference to anyone seriously looking

at using MongoDB

—Jeremy Zawodny

Craigslist Software Engineer

August 2010

Trang 17

How This Book Is Organized

This book is split up into six sections, covering development, administration, and de‐ployment information

Getting Started with MongoDB

is trying to accomplish, and why you might choose to use it for a project We go intomore detail in Chapter 2, which provides an introduction to the core concepts andvocabulary of MongoDB Chapter 2 also provides a first look at working with MongoDB,getting you started with the database and the shell The next two chapters cover thebasic material that developers need to know to work with MongoDB In Chapter 3, wedescribe how to perform those basic write operations, including how to do them withdifferent levels of safety and speed Chapter 4 explains how to find documents and createcomplex queries This chapter also covers how to iterate through results and gives op‐tions for limiting, skipping, and sorting results

Developing with MongoDB

covers a number of techniques for aggregating data with MongoDB, including counting,finding distinct values, grouping documents, the aggregation framework, and usingMapReduce Finally, this section finishes with a chapter on designing your application:

Trang 18

The sharding section starts in Chapter 13 with a quick local setup Chapter 14 then gives

an overview of the components of the cluster and how to set them up Chapter 15 hasadvice on choosing a shard key for a variety of application Finally, Chapter 16 coversadministering a sharded cluster

Application Administration

The next two chapters cover many aspects of MongoDB administration from the per‐spective of your application Chapter 17 discusses how to introspect what MongoDB isdoing Chapter 18 covers administrative tasks such as building indexes, and movingand compacting data Chapter 19 explains how MongoDB stores data durably

Server Administration

The final section is focused on server administration Chapter 20 covers common op‐tions when starting and stopping MongoDB Chapter 21 discusses what to look for andhow to read stats when monitoring Chapter 22 describes how to take and restore back‐ups for each type of deployment Finally, Chapter 23 discusses a number of systemsettings to keep in mind when deploying MongoDB

Appendixes

OS X, and Linux Appendix B details ow MongoDB works internally: its storage engine,data format, and wire protocol

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, email addresses, collection names, database names,filenames, and file extensions

Trang 19

Constant width

Used for program listings, as well as within paragraphs to refer to program elementssuch as variable or function names, command-line utilities, environment variables,statements, and keywords

Constant width bold

Shows commands or other text that should be typed literally by the user

Constant width italic

Shows text that should be replaced with user-supplied values or by values deter‐mined by context

This icon signifies a tip, suggestion, or general note

This icon indicates a warning or caution

Using Code Examples

This book can help you get your job done In general, you may use the code in this book

in your programs and documentation You do not need to contact us for permissionunless you’re reproducing a significant portion of the code For example, writing a pro‐gram that uses several chunks of code from this book does not require permission.Selling or distributing a CD-ROM of examples from O’Reilly books does require per‐mission Answering a question by citing this book and quoting example code does notrequire permission Incorporating a significant amount of example code from this bookinto your product’s documentation does require permission

We appreciate, but do not require, attribution An attribution usually includes the

title, author, publisher, and ISBN For example: “MongoDB: The Definitive Guide, Sec‐

978-1-449-34468-9.”

If you feel your use of code examples falls outside fair use or the permission given here,feel free to contact us at permissions@oreilly.com

Trang 20

Safari® Books Online

Safari Books Online (www.safaribooksonline.com) is an on-demanddigital library that delivers expert content in both book and videoform from the world’s leading authors in technology and business.Technology professionals, software developers, web designers, and business and crea‐tive professionals use Safari Books Online as their primary resource for research, prob‐lem solving, learning, and certification training

Safari Books Online offers a range of product mixes and pricing programs for organi‐zations, government agencies, and individuals Subscribers have access to thousands ofbooks, training videos, and prepublication manuscripts in one fully searchable databasefrom publishers like O’Reilly Media, Prentice Hall Professional, Addison-Wesley Pro‐fessional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, JohnWiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FTPress, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Technol‐ogy, and dozens more For more information about Safari Books Online, please visit us

Trang 21

I would like to thank my tech reviewers, Adam Comerford, Eric Milke, and Greg Studer.You guys made this book immeasurably better (and more correct) Thank you, AnnSpencer, for being such a terrific editor and for helping me every step of the way Thanks

to all of my coworkers at 10gen for sharing your knowledge and advice on MongoDB

as well as Eliot Horowitz and Dwight Merriman, for starting the MongoDB project Andthank you, Andrew, for all of your support and suggestions

Trang 23

PART I Introduction to MongoDB

Trang 25

CHAPTER 1

Introduction

MongoDB is a powerful, flexible, and scalable general-purpose database It combinesthe ability to scale out with features such as secondary indexes, range queries, sorting,aggregations, and geospatial indexes This chapter covers the major design decisionsthat made MongoDB what it is

Ease of Use

MongoDB is a document-oriented database, not a relational one The primary reason

for moving away from the relational model is to make scaling out easier, but there aresome other advantages as well

A document-oriented database replaces the concept of a “row” with a more flexiblemodel, the “document.” By allowing embedded documents and arrays, the document-oriented approach makes it possible to represent complex hierarchical relationshipswith a single record This fits naturally into the way developers in modern object-oriented languages think about their data

There are also no predefined schemas: a document’s keys and values are not of fixedtypes or sizes Without a fixed schema, adding or removing fields as needed becomeseasier Generally, this makes development faster as developers can quickly iterate It isalso easier to experiment Developers can try dozens of models for the data and thenchoose the best one to pursue

Easy Scaling

Data set sizes for applications are growing at an incredible pace Increases in availablebandwidth and cheap storage have created an environment where even small-scale ap‐plications need to store more data than many databases were meant to handle A terabyte

of data, once an unheard-of amount of information, is now commonplace

Trang 26

As the amount of data that developers need to store grows, developers face a difficultdecision: how should they scale their databases? Scaling a database comes down to thechoice between scaling up (getting a bigger machine) or scaling out (partitioning dataacross more machines) Scaling up is often the path of least resistance, but it has draw‐backs: large machines are often very expensive, and eventually a physical limit is reachedwhere a more powerful machine cannot be purchased at any cost The alternative is to

scale out: to add storage space or increase performance, buy another commodity server

and add it to your cluster This is both cheaper and more scalable; however, it is moredifficult to administer a thousand machines than it is to care for one

MongoDB was designed to scale out Its document-oriented data model makes it easierfor it to split up data across multiple servers MongoDB automatically takes care ofbalancing data and load across a cluster, redistributing documents automatically androuting user requests to the correct machines This allows developers to focus on pro‐gramming the application, not scaling it When a cluster need more capacity, new ma‐chines can be added and MongoDB will figure out how the existing data should bespread to them

Aggregation

MongoDB supports an “aggregation pipeline” that allows you to build complexaggregations from simple pieces and allow the database to optimize it

Special collection types

MongoDB supports time-to-live collections for data that should expire at a certaintime, such as sessions It also supports fixed-size collections, which are useful forholding recent data, such as logs

File storage

MongoDB supports an easy-to-use protocol for storing large files and file metadata.Some features common to relational databases are not present in MongoDB, notablyjoins and complex multirow transactions Omitting these was an architectural decision

to allow for greater scalability, as both of those features are difficult to provide efficiently

in a distributed system

Trang 27

…Without Sacrificing Speed

Incredible performance is a major goal for MongoDB and has shaped much of its design.MongoDB adds dynamic padding to documents and preallocates data files to trade extraspace usage for consistent performance It uses as much of RAM as it can as its cacheand attempts to automatically choose the correct indexes for queries In short, almostevery aspect of MongoDB was designed to maintain high performance

Although MongoDB is powerful and attempts to keep many features from relationalsystems, it is not intended to do everything that a relational database does Wheneverpossible, the database server offloads processing and logic to the client side (handledeither by the drivers or by a user’s application code) Maintaining this streamlined design

is one of the reasons MongoDB can achieve such high performance

Let’s Get Started

Throughout the course of the book, we will take the time to note the reasoning ormotivation behind particular decisions made in the development of MongoDB.Through those notes we hope to share the philosophy behind MongoDB The best way

to summarize the MongoDB project, however, is through its main focus—to create afull-featured data store that is scalable, flexible, and fast

Trang 29

CHAPTER 2

Getting Started

MongoDB is powerful but easy to get started with In this chapter we’ll introduce some

of the basic concepts of MongoDB:

• A document is the basic unit of data for MongoDB and is roughly equivalent to a

row in a relational database management system (but much more expressive)

• Similarly, a collection can be thought of as a table with a dynamic schema.

• A single instance of MongoDB can host multiple independent databases, each of

which can have its own collections

• Every document has a special key, "_id", that is unique within a collection

• MongoDB comes with a simple but powerful JavaScript shell, which is useful for

the administration of MongoDB instances and data manipulation

Documents

At the heart of MongoDB is the document: an ordered set of keys with associated values.

The representation of a document varies by programming language, but most languageshave a data structure that is a natural fit, such as a map, hash, or dictionary In JavaScript,for example, documents are represented as objects:

{ "greeting" "Hello, world!" }

This simple document contains a single key, "greeting", with a value of "Hello,world!" Most documents will be more complex than this simple one and often willcontain multiple key/value pairs:

{ "greeting" "Hello, world!" , "foo" }

As you can see from the example above, values in documents are not just “blobs.” Theycan be one of several different data types (or even an entire embedded document—see

Trang 30

“Embedded Documents” on page 19) In this example the value for "greeting" is a string,whereas the value for "foo" is an integer.

The keys in a document are strings Any UTF-8 character is allowed in a key, with a fewnotable exceptions:

• Keys must not contain the character \0 (the null character) This character is used

to signify the end of a key

• The and $ characters have some special properties and should be used only incertain circumstances, as described in later chapters In general, they should beconsidered reserved, and drivers will complain if they are used inappropriately.MongoDB is type-sensitive and case-sensitive For example, these documents aredistinct:

{ "greeting" "Hello, world!" , "greeting" "Hello, MongoDB!" }

Key/value pairs in documents are ordered: {"x" : 1, "y" : 2} is not the same as{"y" : 2, "x" : 1} Field order does not usually matter and you should not designyour schema to depend on a certain ordering of fields (MongoDB may reorder them).This text will note the special cases where field order is important

In some programming languages the default representation of a document does noteven maintain ordering (e.g., dictionaries in Python and hashes in Perl or Ruby 1.8).Drivers for those languages usually have some mechanism for specifying documentswith ordering, when necessary

Collections

A collection is a group of documents If a document is the MongoDB analog of a row in

a relational database, then a collection can be thought of as the analog to a table

Dynamic Schemas

Collections have dynamic schemas This means that the documents within a single col‐

lection can have any number of different “shapes.” For example, both of the followingdocuments could be stored in a single collection:

Trang 31

{ "greeting" "Hello, world!" }

{ "foo" }

Note that the previous documents not only have different types for their values (stringversus integer) but also have entirely different keys Because any document can be putinto any collection, the question often arises: “Why do we need separate collections atall?” It’s a good question—with no need for separate schemas for different kinds of

documents, why should we use more than one collection? There are several good

reasons:

• Keeping different kinds of documents in the same collection can be a nightmarefor developers and admins Developers need to make sure that each query is onlyreturning documents of a certain type or that the application code performing aquery can handle documents of different shapes If we’re querying for blog posts,it’s a hassle to weed out documents containing author data

• It is much faster to get a list of collections than to extract a list of the types in acollection For example, if we had a "type" field in each document that specifiedwhether the document was a “skim,” “whole,” or “chunky monkey,” it would be muchslower to find those three values in a single collection than to have three separatecollections and query the correct collection

• Grouping documents of the same kind together in the same collection allows fordata locality Getting several blog posts from a collection containing only posts willlikely require fewer disk seeks than getting the same posts from a collection con‐taining posts and author data

• We begin to impose some structure on our documents when we create indexes.(This is especially true in the case of unique indexes.) These indexes are defined percollection By putting only documents of a single type into the same collection, wecan index our collections more efficiently

As you can see, there are sound reasons for creating a schema and for grouping relatedtypes of documents together, even though MongoDB does not enforce it

Naming

A collection is identified by its name Collection names can be any UTF-8 string, with

a few restrictions:

• The empty string ("") is not a valid collection name

• Collection names may not contain the character \0 (the null character) becausethis delineates the end of a collection name

• You should not create any collections that start with system., a prefix reserved for internal collections For example, the system.users collection contains the database’s

Trang 32

users, and the system.namespaces collection contains information about all of the

database’s collections

• User-created collections should not contain the reserved character $ in the name.The various drivers available for the database do support using $ in collection namesbecause some system-generated collections contain it You should not use $ in aname unless you are accessing one of these collections

doesn’t even have to exist) and its “children.”

Although subcollections do not have any special properties, they are useful and incor‐porated into many MongoDB tools:

• GridFS, a protocol for storing large files, uses subcollections to store file metadataseparately from content chunks (see Chapter 6 for more information about GridFS)

• Most drivers provide some syntactic sugar for accessing a subcollection of a given

collection For example, in the database shell, db.blog will give you the blog col‐ lection, and db.blog.posts will give you the blog.posts collection.

Subcollections are a great way to organize data in MongoDB, and their use is highlyrecommended

Databases

In addition to grouping documents by collection, MongoDB groups collections into

together zero or more collections A database has its own permissions, and each database

is stored in separate files on disk A good rule of thumb is to store all data for a singleapplication in the same database Separate databases are useful when storing data forseveral application or users on the same MongoDB server

Like collections, databases are identified by name Database names can be any UTF-8string, with the following restrictions:

• The empty string ("") is not a valid database name

• A database name cannot contain any of these characters: /, \, , ", *, <, >, :, |, ?, $, (asingle space), or \0 (the null character) Basically, stick with alphanumeric ASCII

Trang 33

• Database names are case-sensitive, even on non-case-sensitive filesystems To keepthings simple, try to just use lowercase characters.

• Database names are limited to a maximum of 64 bytes

One thing to remember about database names is that they will actually end up as files

on your filesystem This explains why many of the previous restrictions exist in the firstplace

There are also several reserved database names, which you can access but which havespecial semantics These are as follows:

admin

This is the “root” database, in terms of authentication If a user is added to the admin

database, the user automatically inherits permissions for all databases There are

also certain server-wide commands that can be run only from the admin database,

such as listing all of the databases or shutting down the server

local

This database will never be replicated and can be used to store any collections thatshould be local to a single server (see Chapter 9 for more information about repli‐cation and the local database)

config

When MongoDB is being used in a sharded setup (see Chapter 13), it uses the config

database to store information about the shards

By concatenating a database name with a collection in that database you can get a fully

qualified collection name called a namespace For instance, if you are using the blog.posts collection in the cms database, the namespace of that collection would be

cms.blog.posts Namespaces are limited to 121 bytes in length and, in practice, should

be fewer than 100 bytes long For more on namespaces and the internal representation

of collections in MongoDB, see Appendix B

Getting and Starting MongoDB

MongoDB is almost always run as a network server that clients can connect to andperform operations on Download MongoDB and decompress it To start the server,run the mongod executable:

$ mongod

mongod help for help and startup options

Thu Oct 11 12:36:48 [initandlisten] MongoDB starting : pid =2425 port =27017 dbpath =/data/db/ 64-bit host =spock

Thu Oct 11 12:36:48 [initandlisten] db version v2.4.0, pdfile version 4.5

Thu Oct 11 12:36:48 [initandlisten] git version:

3aaea5262d761e0bb6bfef5351cfbfca7af06ec2

Thu Oct 11 12:36:48 [initandlisten] build info: Darwin spock 11.2.0 Darwin Kernel

Trang 34

Version 11.2.0: Tue Aug 9 20:54:00 PDT 2011;

root:xnu-1699.24.8~1/RELEASE_X86_64 x86_64 BOOST_LIB_VERSION =1_48

Thu Oct 11 12:36:48 [initandlisten] options: {}

Thu Oct 11 12:36:48 [initandlisten] journal dir =/data/db/journal

Thu Oct 11 12:36:48 [initandlisten] recover : no journal files present, no recovery needed

Thu Oct 11 12:36:48 [websvr] admin web console waiting for connections on port 28017

Thu Oct 11 12:36:48 [initandlisten] waiting for connections on port 27017

Or if you’re on Windows, run this:

$ mongod.exe

For detailed information on installing MongoDB on your system, see

Appendix A

When run with no arguments, mongod will use the default data directory, /data/db/ (or

exist or is not writable, the server will fail to start It is important to create the data

directory (e.g., mkdir -p /data/db/) and to make sure your user has permission to write

to the directory before starting MongoDB

On startup, the server will print some version and system information and then beginwaiting for connections By default MongoDB listens for socket connections on port

27017 The server will fail to start if the port is not available—the most common cause

of this is another instance of MongoDB that is already running

mongod also sets up a very basic HTTP server that listens on a port 1,000 higher thanthe main port, in this case 28017 This means that you can get some administrativeinformation about your database by opening a web browser and going to http://local host:28017

You can safely stop mongod by typing Ctrl-C in the shell that is running the server

For more information on starting or stopping MongoDB, see Chap‐

ter 20

Introduction to the MongoDB Shell

MongoDB comes with a JavaScript shell that allows interaction with a MongoDB in‐stance from the command line The shell is useful for performing administrative

Trang 35

functions, inspecting a running instance, or just playing around The mongo shell is acrucial tool for using MongoDB and is used extensively throughout the rest of the text.

Running the Shell

To start the shell, run the mongo executable:

We can also leverage all of the standard JavaScript libraries:

> Math sin ( Math PI );

1

> new Date ( "2010/1/1" );

"Fri Jan 01 2010 00:00:00 GMT-0500 (EST)"

> "Hello, World!" replace ( "World" , "MongoDB" );

a row will cancel the half-formed command and get you back to the >-prompt

Trang 36

global variable db This variable is the primary access point to your MongoDB serverthrough the shell.

To see the database db is currently assigned to, type in db and hit Enter:

> db

test

The shell contains some add-ons that are not valid JavaScript syntax but were imple‐mented because of their familiarity to users of SQL shells The add-ons do not provideany extra functionality, but they are nice syntactic sugar For instance, one of the mostimportant operations is selecting which database to use:

Collections can be accessed from the db variable For example, db.baz returns the baz

collection in the current database Now that we can access a collection in the shell, wecan perform almost any database operation

Basic Operations with the Shell

We can use the four basic operations, create, read, update, and delete (CRUD) to ma‐nipulate and view data in the shell

Create

The insert function adds a document to a collection For example, suppose we want

to store a blog post First, we’ll create a local variable called post that is a JavaScriptobject representing our document It will have the keys "title", "content", and "date"(the date that it was published):

> post "title" "My Blog Post" ,

"content" "Here's my blog post." ,

"date" new Date ()}

{

"title" "My Blog Post" ,

"date" ISODate ( "2012-08-24T21:12:09.982Z" )

}

This object is a valid MongoDB document, so we can save it to the blog collection using

the insert method:

Trang 37

> db blog insert ( post )

The blog post has been saved to the database We can see it by calling find on thecollection:

> db blog find ()

{

"_id" ObjectId ( "5037ee4a1084eb3ffeef7228" ),

"date" ISODate ( "2012-08-24T21:12:09.982Z" )

}

You can see that an "_id" key was added and that the other key/value pairs were saved

as we entered them The reason for the sudden appearance of the "_id" field is explained

at the end of this chapter

"date" ISODate ( "2012-08-24T21:12:09.982Z" )

}

find and findOne can also be passed criteria in the form of a query document This will

restrict the documents matched by the query The shell will automatically display up to

20 documents matching a find, but more can be fetched See Chapter 4 for more in‐formation on querying

Update

If we would like to modify our post, we can use update update takes (at least) twoparameters: the first is the criteria to find which document to update, and the second isthe new document Suppose we decide to enable comments on the blog post we createdearlier We’ll need to add an array of comments as the value for a new key in ourdocument

The first step is to modify the variable post and add a "comments" key:

Trang 38

Now the document has a "comments" key If we call find again, we can see the new key:

> db blog find ()

{

"date" ISODate ( "2012-08-24T21:12:09.982Z" ),

"comments"

}

Delete

remove permanently deletes documents from the database Called with no parameters,

it removes all documents from a collection It can also take a document specifyingcriteria for removal For example, this would remove the post we just created:

> db blog remove ({ title "My Blog Post" })

Now the collection will be empty again

Data Types

The beginning of this chapter covered the basics of what a document is Now that youare up and running with MongoDB and can try things on the shell, this section will dive

a little deeper MongoDB supports a wide range of data types as values in documents

In this section, we’ll outline all the supported types

Basic Data Types

Documents in MongoDB can be thought of as “JSON-like” in that they are conceptuallysimilar to objects in JavaScript JSON is a simple representation of data: the specificationcan be described in about one paragraph (their website proves it) and lists only six datatypes This is a good thing in many ways: it’s easy to understand, parse, and remember

On the other hand, JSON’s expressive capabilities are limited because the only types arenull, boolean, numeric, string, array, and object

Although these types allow for an impressive amount of expressivity, there are a couple

of additional types that are crucial for most applications, especially when working with

a database For example, JSON has no date type, which makes working with dates evenmore annoying than it usually is There is a number type, but only one—there is no way

to differentiate floats and integers, never mind any distinction between 32-bit and bit numbers There is no way to represent other commonly used types, either, such asregular expressions or functions

64-MongoDB adds support for a number of additional data types while keeping JSON’sessential key/value pair nature Exactly how values of each type are represented varies

Trang 39

by language, but this is a list of the commonly supported types and how they are rep‐resented as part of a document in the shell The most common types are:

The shell defaults to using 64-bit floating point numbers Thus, these numbers look

“normal” in the shell:

Trang 40

For a full explanation of JavaScript’s Date class and acceptable formats for the con‐structor, see ECMAScript specification section 15.9.

Dates in the shell are displayed using local time zone settings However, dates in thedatabase are just stored as milliseconds since the epoch, so they have no time zoneinformation associated with them (Time zone information could, of course, be stored

as the value for another key.)

Arrays

Arrays are values that can be interchangeably used for both ordered operations (asthough they were lists, stacks, or queues) and unordered operations (as though theywere sets)

In the following document, the key "things" has an array value:

{ "things" "pie" , 3.14]}

Định dạng
Số trang	432
Dung lượng	11,64 MB