In a text file or ASCII file—which is any file that can be opened and read in a text editor such as Notepad; it can be imported into Excel using Excel’s text import wizard.. In a rel
Trang 2in a format suitable for analysis.
In an Excel® file—which might still need to be rearranged to get it in the form of a rectangular data set
In a text file (or ASCII file)—which is any file that can be opened and read
in a text editor such as Notepad; it can be imported into Excel using Excel’s text import wizard
In a relational database (such as Access, SQL Server, Oracle)—which can be imported into Excel by forming a query using the Microsoft Query package
A query specifies exactly which data you want to import.
On the Web—which can be imported into Excel by creating a query and
then running it in Excel
values.
Trang 3Rearranging Excel Data
in the form of a data set—a rectangular array of data with
observations in rows, variables in columns, and variable names in the top row.
Sometimes simple cutting and pasting works
In other cases, advanced Excel functions are required
In all cases, it is best to map out a plan and then decide how to implement it
Trang 4Example 18.1:
Baseball Salaries Original.xlsx
Objective: To rearrange the data from the
baseball Web queries into a single data
set
Solution: Data on baseball salaries was
imported into Excel from a Web site, with
a separate Web query for each of the 30
teams
to the right, with only a few players listed
To rearrange all of the data into four long columns with the headings Player, Team, Salary, and Position, follow
these steps:
1. Insert a blank column before column B, and enter the label Team in cell B2.
2. Cut the Arizona Diamondbacks team name from cell A1 and paste it next to the first Arizona player in cell B3 Then copy it down for the other Arizona players.
3. Repeat step 2 for each of the other teams.
4. Delete unnecessary rows of labels for the other teams.
Trang 5Example 18.2:
CPI.xlsx (slide 1 of 2)
Objective: To rearrange the monthly data into two long columns, one
with month-year and one with the CPI.
Solution: Monthly data on the Consumer Price Index (CPI) was
imported from the Web using a Web query A few rows appear below.
The desired results after rearranging are shown to the
right.
Trang 6Example 18.2:
CPI.xlsx (slide 2 of 2)
Create the range name Data (for all the CPI values, not the headings in row
1 or column A)
Add a new worksheet for the rearranged data, create the column headings
in row 1, enter 1 in cells A2 and B2, and enter 1913 in cell C2
To generate the recurring pattern of 1 to 12 in column B, enter the formula
=IF(B2<12,B2+1,1) in cell B3 and copy this down as far as necessary.
To generate the pattern in column A, enter the formula
=IF(B3=1,A2+1,A2) in cell A3 and copy it down.
To generate the years in column C, enter the formula =IF(B3=1,C2+1,C2)
in cell C3 and copy it down
To generate the month-year values in column D, enter the formula
=DATE(C2,B2,1) in cell D2 and copy it down.
To generate the CPI values in column E, enter the formula
=INDEX(Data,A2,B2) in cell E2 and copy it down.
Copy columns D and E and paste them over themselves as values Then delete columns A-C
Trang 7Importing Text Data
format that is readable only by that package.
file, usually with a txt extension.
With fixed width, each variable’s value starts and stops at fixed positions (columns) in the line
Each line of data has the same length, and the columns line up.
With delimited data, there is a delimiter character, usually a tab, comma,
semicolon, or space, that separates the values in a line
The lines are typically of different lengths and do not line up nicely.
import wizard.
Trang 8Example 18.3:
srn_tmp.txt (slide 1 of 2)
Objective: To import the fixed-width text file data into Excel by using Excel’s
text import wizard
Solution: The text file srn_tmp.txt was downloaded from the Web It contains
state, regional, and national (srn) annual data (1895-2005) on temperature A small portion of the data is shown below
The Web site from which the text file was downloaded
also has a data dictionary, srn_data.txt, that
indicates what the variables are and how they are
stored in columns Part of this data dictionary is shown
to the right.
Trang 9Example 18.3:
srn_tmp.txt (slide 2 of 2)
Open the srn_tmp.txt file within Excel, and the first step of the text import wizard appears Select Fixed width
The second step of the wizard allows you to separate (or parse) the columns
as listed in the data dictionary Click on the third, fourth, and fifth positions
on the ruler
The last step of the wizard allows you to fine-tune the import, column by column, but bypass this step and simply click Finish
The data is imported into Excel, as shown below
Create column headings in row 1, using the data dictionary as a guide
Use “Save As” to save the txt file as an xlsx (or xls) file
Trang 10Example 18.4:
Objective: To see how delimited text data can be imported into Excel
with the import text wizard.
Solution: Annual data by country on the number of mobile
subscribers during 2002-2009 was downloaded from the Web into the
This time there are column headings in row 1, but the ragged lines indicate that this file must be delimited, not fixed width
Trang 11Example 18.4:
Delimited option in step 1 of the wizard.
In step 2, Excel guesses that the file is tab-delimited, which is correct,
so click on Finish to accept this
The Flags column contains no data, so it can be deleted
Column A can also be deleted because it includes a constant value, Mobile Subscribers
The numbers in column D can be reformatted to Numeric with 0 decimals
Trang 12Comments About Importing Text Data
open it directly into Excel, without the import text wizard
Excel automatically parses the values between the commas into columns
option to save it in some type of format, text or otherwise.
You can try copying and pasting the data into Excel, but it is possible that everything will be pasted into a single column
If this happens, highlight the data in this column and click on the Text to Columns button on Excel’s Data ribbon
The purpose of this button is to parse delimited data in a single column into
several columns.
chance that the data will not line up properly—that is, the data will get into the wrong columns
Look closely at the parsed data before proceeding
Trang 13Importing Relational Database Data
others are extremely complex and powerful packages.
For database creation, querying, manipulation, and reporting, they have many advantages over spreadsheets
However, they are not nearly as powerful as spreadsheets for statistical analysis
Excel, where the statistical analysis can be performed.
Microsoft includes software called Microsoft Query in its Office suite that makes the importing relatively easy
Trang 14Introduction to Relational Databases
tables.
They are also called single-table databases, where table is the database
term for a rectangular range of data, with rows corresponding to records
and columns corresponding to fields
Flat files are fine for relatively simple database applications, but they are not powerful enough for more complex applications
A relational database is a set of related tables, where each table is
a rectangular arrangement of fields and records, and the tables are linked explicitly.
The linked fields are called keys
A primary key must contain unique values, whereas a foreign key can contain duplicate values.
Trang 15Using Microsoft Query
The first method uses the From Access button in the Get External Data
group on the Data ribbon
It is limited to importing whole tables or saved queries.
The second method employs the Microsoft Query software, which allows you to import all or part of the data from many database packages into Excel
It comes with the Office package, but may need to be installed.
Once Microsoft Query is installed, importing data from Access (or any other
supported database package) is essentially a three-step process:
1 Define the source, so that Excel knows what type of database the data is in and where
the data is located.
2 Use Microsoft Query to define a query.
3 Return the data to Excel.
Trang 16Example 18.5:
Objective: To illustrate how Microsoft Query can be used to import the
results of queries on the Shirt Orders database into Excel.
Solution: Fine Shirt Company has created an Access database file
that has information on its sales to its customers during the period of
2005 through 2009.
and Orders, with a link between the CustomerID fields in the
Customers and Orders tables and a link between the ProductID fields
in the Products and Orders tables, as shown in the diagram below.
Trang 17Example 18.5:
not Access.
First, define a data source
Open a blank spreadsheet in Excel and select From Microsoft Query from the From Other Sources dropdown menu on the Data ribbon.
Select the top <New Data Source> item in the Choose Data Source dialog box and then click OK.
Fill in the Create New Data Source dialog box, and then the ODBC Microsoft Access Setup dialog box to indicate which database file you want to use Click OK to
return to the Choose Data Source dialog box.
In the Choose Data Source dialog box, make sure the Shirt Orders item is selected and the bottom checkbox is unchecked, and click OK This brings
up the Add Tables dialog box and begins the second step, where you define the query
Specify which tables are relevant for the query, which fields you want to return to
Trang 18Example 18.5:
The results of one query (Find all of the
records in the Orders table that correspond
to orders for at least 80 units made by the
customer Shirts R Us for the product
Long-sleeve Tunic, and return the dates and
units ordered for these orders) are shown
to the right.
Once the results of the query data are
returned to Excel, you can then begin the
statistical analysis of the data.
means that you can refresh the data in Excel
if the Access data change
that you can edit your query
If your ultimate goal is to create a pivot
table based on the database data, you can
do this directly, as shown in the next
Trang 19Example 18.5 (Continued):
Objective: To illustrate how Microsoft Query can be used to import
data directly into a pivot table.
Solution: Fine Shirt Company would like to break down revenue from
its various customers and products by using pivot tables.
through the usual PivotTable button on the Insert menu.
Get into Microsoft Query and define a query
When you select the Return Data to Microsoft Excel menu item from the File menu, you see a dialog box where you can specify the type of report you want and where you want it Select PivotTable Report (or PivotChart and PivotTable Report)
From here, you can create any pivot tables in the usual way
Trang 20Example 18.5 (Continued):
product, customer (using the Report Filter area at the top), and
quarter of year, is shown below.
You have the option of obtaining corresponding pivot charts automatically
The pivot table is linked to the query This means that you can go back to Microsoft Query, edit the query, and return to Excel to update the pivot table
Trang 21SQL Statements
databases
language) was developed several decades ago.
It is often called the “language of databases.”
Behind each query developed in Microsoft Query is an SQL statement
The statements can be viewed by clicking the SQL button in the Query toolbar once you have created a query.
The statements include keywords such as SELECT, FROM, WHERE, and AND, as shown in the example below.
Trang 22Web Queries
(slide 1 of 2)
steps required to import the data into Excel for analysis vary greatly.
Many Web sites provide buttons that allow you to download the data
directly into Excel
On some Web sites, the only way to get the data into Excel is to cut and paste
Web query is an Excel tool that lies between these two extremes
Web queries search for HTML <Table> tags, find the corresponding data, and bring them into Excel in the usual row and column format.
Many data sets on the Web import beautifully with Web queries, but some return virtually nothing.
Sometimes you need to run several Web queries on the same basic site to get all
of the data you want.
In a URL, the part to the right of the question mark (if any) is called the query
string, and it specifies exactly which data you want.
Trang 23Web Queries
(slide 2 of 2)
Make sure you have an active connection to the Web, and open a new
workbook in Excel
Click the From Web button on the Data ribbon
Fill in the dialog box, the most important part of which is the URL (the
address of the page) at the top
Once you enter the URL, click Go You will see the Web page with yellow arrows next to all of the tables
Click any of these yellow arrows to change them to green checkmarks The selected tables will then be imported into Excel
After you click Import, specify where to place the results
A link to the Web page remains, so you can refresh to obtain the latest data
Trang 24Cleansing Data
This is especially the case when you obtain data from external sources such
as the Web
It is your responsibility to correct any problems before you do any serious analysis
Cleansing data requires careful detective work to uncover all
possible errors that might be present
Once an error is found, it is not always clear how to correct it (for example, missing data)
Some subjectivity and common sense must be used when cleansing data sets
Trang 25Example 18.6:
Objective: To find and fix errors in this company’s data set.
Solution: The data file has data on 1500 customers of a particular
company A portion of these data appears below.
Trang 26Example 18.6:
The data set has a number of problems, all of which you might encounter
in real data sets:
and then formatted as a date)
text field)
trailing zeroes)
the total amount spent)
Use Excel tools to search for the suspicious data values.
Then use other Excel tools, such as Find and Replace, to fix the errors.