The data came from four companies and, within each company, some data resided in different information systems. The challenge at this stage was to consolidate all ofthe data into asingle database. This task required the consideration of the following:
@ Importing files and structuring data,
® Standardizing data formats.
@® Building in traceability of transactions.
@ Normalizing the database.
@® Detecting errors and improving data quality.
@ Documenting the database.
@ Communicating with data owners.
Each of these considerations is described next.
Importing Files and Structuring Data
In all, 43 files were received from the four supply chain members. The majority of these files were in the same data structure; however, the import process required the development of 1] quenes to automate if. Missing data or data errors were found in some files; thus, automating the import process reduced time when files had to be imported again. The import process of a file had to be repeated, for example, because it was found that the data set was incomplete and had to be sent again, because transactions not included in the file had to be added, or because the fields requested or seni were incorrect.
The import process was used for more than just importing text files info a relational database. ff was used fo structure data to perform the analyses. Since the purpose of this study was to analyze inventory locations and levels, the supply Chơin was viewed as a series of nodes, where inventory can be held, and arcs {or links} between the nodes. in order fo analyze the dynamics of the supply chain, all
126
product flow activities were viewed as shioments between two nodes. The transportation of product between a distribution center and a restaurant was considered a shipment between two stocking locations of finished goods. A manufacturing run was viewed as a shioment from the raw maternais inventory to the finished goods inventory. A sale, an end-customer ordering a meal ata resiaqurant, was viewed as a shipment between aresiaurant and an end-customer (which is the demand generation process and does not hold inventory}.
The import process, illustrated in Figure 4.1, served to structure the data in four tables in a relational database. The data structure of each of the four is shown in Figure 4.2 . These four tabies are Shipments, Nodes, Arcs and Products. Figure 4.1 depicts the import process. The top section of the figure represenis the files provided by the four companies participating in the study. Figure 4.1 alsa shows that a cross- validation process was implemented in order to identify any inconsistencies in the data. For example, the franchisor identified stores in one way and the distributor used a different way of identifying the stores. The cross-validation process served to ensure that identifiers match among the various data sources, In addition, one of the companies had two headquarter locations and each of these two offices had lis own Computer server running the same inforrnation system. Nonetheless, ihe data formats used in these two information systems were not the same. The cross- validation process was used fo identify the right data to use, and to assure that ihe data used were correct, This step was particularly important to venty that what was shipped from one node was received by the correct destination.
Figure 4.1 also portrays the way in which the nodes and the arcs in ihe supply chain network were identified from the transactions in the table Shipments. This process was required because data availability depended on the firm. Thatis, both manufacturers provided manufacturing runs and shioments of all ingredients manufactured. Part of this croduct was shipped to the distribution centers included in the analysis, and the resi was shipped to other distrioution centers. The distribulor
127
SIQYO 9UIII-p9JJUI] JSDd JO 9sDQqDIDQ 9u BuJJo912 L'y 9InBl4
Se|I: JeuIBl4O Sun BuIin12eJnuejq S2.1V 9|qE |
aS 0} S9pON AjqQuep| siskjeuy 9U] uỊ 9pn|9u|
S}2np04a 9Iqe]
S9PON 9|qL
Buidnoas Y | ueuidIus 9q ẽ
aseqejeq 9ZI|EULION
A
¿esn
SalanD puaddy | ejeg ainjonijg
O} BYEP JEUAA A
UOIIBDIIBA SSO12
S)u9uidIus ế 191n126InuEIN
1ue|d193u| £ 191n12eJnueI\ S1ueuIdIus L 191112eJnuei\q
Sun BuJin26Jnuej ế
191112EJnuEIN
} 19111261nUEIN
§9|I4 {J + Z€)
S92IOAU| ou
EEN ent) S}UEAnE1SôM
TT UTES
TT S1UE1TE1SôM
sajes STC
128
EB thishipments - sable
Field Name
LTO OriginID DestinationID Date
Quantity
% | ProductID
? | Firm
¥ | Transaction Type
% |Transaction ID Landed Cost Additional Cost Selling Price Transportation Cost
EB tbiProducts = table
Field Name
ProductID Description
Portioning Type of Product Source of Reference Comments
ProdPublishName
EB tblArcs = table
Field Name
ArcID
OriginNodeID DestinationNodeID ArcTranspCost ArcMgntTimeCost ArcPenaltyCost LeadTime Preferred UnitOfMeasure StartDate
|EndDate ử #
7
ul
Ee tbiNodes = sable
Field Name
# |NodeID
HoldingCostRate Description
Source of Reference InDP
NodeType NodeFirm NodeLocalID PublishName
NodesWithGoodData NodesInModel
| ScalingFactor
Figure 4.2
Data Structure and Building Traceability of Transactions in the Database 129
orovided data of shiomenis to allrestaurants served by each of the nine distribution centers included in this research. However, there are restaurants that were served by other distributors which were not included. Furthermore, point-of-sale data were available for the 1,200 franchisor-owned restaurants that participated on the LTO used for the analysis. Some of these restaurants were served by the distribution centers included in the research; others were served by other cisiribution centers.
Theretore, the nodes of the network to include were identified from the transactions available in the database (@ in Figure 4.1}. In snort, the process for identifying the nodes of the supply chain that could participate in this study was a discovery process based on availability and quality of data.
All data were consolidated in a table called Shipments. Table Shipments displayed for how much of what product was shipped frorn where to where and when this happened. Therefore, the table Shipments had a reference to each location in the supply chain. Transactions of table Shipments were grouped by location {both origins and destinations}; and then were used to create table Nodes (@ in Figure 4.1). The table Nodes was used to add all data necessary to describe each of the nodes, such as the value that the node adds fo the product, and the inventory holding cost rate used at the node.
After verifying and validating the table Nodes, the table Arcs was produced from the relationship between table Shipments and table Nodes (@ in Figure 4.1}.
Table Arcs was used to include all data that characterized each arc such as the lead time.
At this point of the import process, tables Nodes and Arcs had all the nodes and aillthe arcs of the supply chain network; and table Shipments had alltransactions provided by each of the four companies participating in the study. But not all these data were needed for the analysis. The manufacturer, for instance, provided data of shipments fo all distributors, the ones participating in this study and others. This is
130
also the case with the distributor and the restaurants. Therefore, the nodes included in the study were identified in table Nodes. This somewhat tedious out critical process helped to identify missing transactions and data-quality problems in the source files.
standardizing Data Formats
Standardizing data formats means that the sare identifiers are used in ail files. Each company referred to the nexi-tier customer in a different way and product identification was different for each information sysiem. The process of standardizing data formats makes it possible to establish relationships among data from different sources as weil as to perform calculations. Usuaily this required the study of the data structure used by each firm, and the develoomeni of standards.
Standardization involves not only standardizing identifiers out also the standardization of other aspects such as units of measure. For example, product quantity might be stored in cases, pounas, tons, or pallets. Having disparate data formats does not only occur when integrating data from different companies.
Frequenily within one company, different standards are found. This is particularly true when there are legacy systems. The import process was used to standardize other formats such as units of measure.
Building in Traceability of Transactions
Because data were manipulated and sometimes converted, it was necessary to identify the source of each transaction in case a problem was encountered later. Each transaction needed to include the name of the company that provided it; the information system from which the transaction came; whether ff was an invoice, the delivery of an order, a transshioment, eicetera; and, the identification of the transaction used at the source. Figure 4.2 indicates the fieias that were included in the table Shipments for traceability. Traceability is critical to be able to talk to each individual firm participating in the research in “their own language”.
13]
Normalizing the Database
Database normalization is used to reduce redundancy. The database used in this study contained more than 1.5 millon records and removing redundancy expedited queries to the database. The detailed explanation of database normalization [1] is beyond the scope of this study. In short, normalization is used to achieve functional dependency; thai is, fo identify the data attributes that completely determine all other data. Having redundancy in the data is not necessarily bad, but if slows down queries to the databases and increases the chances of data inconsistency.
Detecting Errors and Improving Data Quality
Detecting errors is not a onetime task, nor does it have a definite end, unfortunately. The errors that were detected and corrected ranged from data missing to incorrect data. An example of data missing is when the point-of-sale data provided did not include one of the menu tiems; a promotional meal comes in three menu ilems, and only two were provided. An example of receiving incorrect data happened when the dates for all manufacturing runs by one of the manufacturers were represented as only occurring on Mondays, when they occurred throughout the week in actuality.
The majority of data quality issues were identified by matching transactions during the cross-validation process. An example of cross-validation is fo sum The amount of product delivered to the restaurants and fram this summation, subtract allsales. ff the calculation produces suspiciously high numbers for product iefiover then there could be a problem with the programming or in the data. In the case of the menu item missing, when management of the firm that provided sales data were consulted, they quickly identified that a menu item was missing in the source fies and corrected the problem.
Documenting the Database
Documentation is another key task that helos not only jo keep track of assumptions and estimations, but helos identify programming errors quickly.
Manipulating large amounts of data with a relational database is challenging because making relationshios among tables of data seldom produces an error message. Instead, a query might result in wrong output which might go unnoticed.
Having a detailed description of the programming and the relationships among the tables of data facilitates the identification of errors when there are any.
Another aspect of documentation is the identification of who provided what estimation, what assumptions were made, and what was the rationale behind the calculated data. Some data were provided by managers from the participant firms bout other had to be calculated or estimated.
Communicating with Data Owners
Detecting errors and identifying the data needed in order to improve the quality of ine database took about three months of one person working full-time. A central issue related to improving data quality and validating the process by which the database was creaied was to be able to communicaie with the owners of the data with which there appears fo be a problem. Initially, tables were used to summarize and present data. Nexi, static graphics {produced in presentation software such as Microsoft PowerPoint®) were developed. But as data were improved, these graphics proved difficull to maintain because If was a manual
DFOCESS.
The best option identified fo maintain the map of the supply chain representing the product flow during a LTO was fo use Microsoft Visio and link the supply chain map to the relational database. in other words, a graphical representation of the supply chain was drawn in Microsoft Visio, and text boxes were linked to database records to show data thai resided in the database. Doing
133
so enabled updating the map automatically when data were modified in the database. An example of the supply chain map used is shown in Figure 4.3.
In the figure, the numbers in bold below each node indicate the amount of product leftovers at that location after the LTO was over. These numbers were calculated as the difference between inbound and outoound shipments at each location. The numbers on the lines connecting two nodes, the arcs, indicate the number of cases shipped between each pair of locations and the number of transactions (or shioments} that were used fo ship that many cases of product.
Linking a database to graphicalrepresentations of a supply chain as shown here has potential in supply chain mapping applications. At an operational level, this kind of supply chain map might provide an on-line graphical view of the status of the product flow in the supply chain. Transactional data coupled with user- maintained data such as forecasts and capacity availability of resources could provide valuabie information ata tactical or strategic level.