Internet / IntranetCIS-536 Class 4 Web Server Technology HTTP Protocol Log Files... Web ServersA Basic Web Server is Just a File Server Client Requests a File via HTTP Protocol Server De
Trang 1Internet / Intranet
CIS-536
Class 4 Web Server Technology
HTTP Protocol
Log Files
Trang 3Web Servers
A Basic Web Server is Just a File Server
Client Requests a File via HTTP Protocol
Server Delivers the File via HTTP Protocol
Server Maps URL to a Subdirectory
Web Server Needs Appropriate Permissions to Access Files/Directories
Supports Non-HTTP Protocols
FTP, Gopher, etc.
A Web Server is Not HTML Specific
Typically Identifies a Filetype by Extension
Or Directory Where File Exists
Trang 4Additional Common Web Server
Features
Additional Security Beyond That Provided by O/S
Scripting
Ability to Dynamically Create a Web Page
Run a Program Instead of Returning a File
Trang 5Advanced Web Server Features
Proxy Servers (For Security and Performance)
Fetch Documents That are on Other Computers
Cache Them Locally
Allows for Easy Scalability
Multiple Proxy Servers Can Cache Documents From One Source Computer
Embedded Scripting
Server Side Includes
Custom Scripting Languages
Server API
Trang 6Web Servers – Added Functionality
Database Connectivity
SQL, MySQL
Directory Listings
Icons, etc.
Built-In Search Engines
Built-In ImageMap Handling
SSL (Secure Sockets Layer) - Netscape
Web Server “Add-Ons”
CGI Substitutes / CGI Optimizations
Trang 7Web Server History
All Web Servers Have a Common Root
Netscape Enterprise Server
Microsoft Internet Information Server
A Slew of Others
Trang 8Apache
UNIX Origins – Now Ported to NT
Evolved From httpd
Freeware
Typical UNIX Application
Public Source Code
Many Defaults, Conventions
BUT: All is Configurable
Trang 9IIS / Netscape
Microsoft IIS
Not Strictly Derived From httpd/Apache
Windows NT
However: Functionally Very Similar to Apache
Emulates Many UNIX Conventions
E.g Forward Slashes
Configuration via GUI
Personal Web Server
Peer Web Server
Netscape
Multi-Platform
UNIX is Preferred Platform
Less “Open” Than Apache
Trang 10UNIX File Structure
Forward Slashes (/) to Separate Filenames, Directories Case Sensitive File Names
Windows is Not
No Limit on Filename Size / Extensions
Extensions are by Convention
Root is “/”
User Home Directory is: “~/”
Symbolic Links / Aliases
Directories Can Be Spread Over Multiple Drives
Can Create Non-Hierarchical Structure
File Permissions
Read, Write, Execute
Separate Permissions for Owner, Group, All
Directories are Special Cases of Files
Execute Permissions = Able to Browse Directory
Trang 11Web Server Configuration
Directory Structure
Virtual Document Tree
Access to User Directories
Server is a Process Started by a User
Has the Permissions of the User Who Started It
Default Documents
Allow Directory Browsing
Scripting
Who is Allowed to Run Scripts?
How are Scripts Identified?
Trang 12Web Server File Access Control /
Security
Directory
O/S Level Security
IP, Domain Level Security
Uses Port 443Microsoft PCT
Response to Holes in SSL 2.0
Trang 13Server Administration
Need Sysadmin and O/S Expertise
Lots of “Holes” Gotchas Whenever Scripts are Allowed
FTP
Who is Allowed to Change Documents?
Who is Allowed to Change Server
Trang 14The Protocol For Requesting and Delivering Web Pages
Not Restricted to Returning HTML Files
Client Server Model
Request / Reponse
TCP/IP Protocol Using Port 80
Supports Other Ports, Can Be Run Over Other Protocols
“Replaced” FTP as the Primary Method For Internet File Transfer
Stateless
Uses MIME Format to Encapsulate Data
Message Structure Similar to SMTP Mail Messages
Message Header (metadata)
Message Body (data)
Separated From Header by a Blank Line
Browser Only Displays Body, Not Header
No Restrictions on Message Size / Format (as with SMTP)
Trang 15HTTP Versions
HTTP 1.0 - Commonly Used Version
HTTP 1.1
Formalizes Many Extensions to Version 1.0
Supports Persistent Connections
Supports Compression/Decompression
Supports Virtual Hosting
Single Server With Multiple IP Addresses Supports Multiple Languages
Supports Byte Range Transfers
Useful For Re-Sending Interrupted Data Transfers
Similar to Process Used By XMODEM, etc.
Trang 16HTTP OVERVIEW
Client
(Browser)
WebServer
FileSystem
HTTP Request
HTTP Response HTML
HTML
Server ApplicationHTML
CGI
Trang 17Request the HTTP Header Information Only
Don’t Return the File ItselfPOST
Sends Data to The ServerTypically Data From a Form
Defined, But Not Widely Implemented
PUT
DELETE
LINK
UNLINK
Trang 18Common HTTP Header Fields
Additional “Parameters” to the HTTP Commands Used in HTTP Requests:
Accept
Lists the MIME Types That Client Can Accept
E.g Accept text/plain, text/html or Accept *
Accept-Charset
Lists Accepted Character Sets That Client Can Accept
ASCII, ISO-8859-1 Are Assumed
E-mail Address of Requesting User
Not Typically Used For Privacy Reasons
Trang 19Common HTTP Header Fields (2)
Trang 20Common HTTP Header Fields (3)
Trang 21Common HTTP Header Fields (4)
Title
Descriptive Title of the File
WWW-Authenticate
When Authorization Denied, Tells Client Which
Methods of Authentication are Supported
Trang 22Common Status Values
200 – OK
201 – Created (Post Request Was Fulfilled)
204 - No Content (OK Nothing For Client to Display
300 - Multiple Choices
Requested Resource Available From Multiple Locations
List of Locations Returned in the Response
500 – Internal Server Error
501 – Not Implemented (Server Does Not Support ThisRequest)
502 – Bad Gateway (Invalid Response From Server)
503 – Service Unavailable
Trang 23Cookies Are Name Value Pairs
Stored by the Client
Passed in the HTTP Header
Cookies Have Associated Expiration
Session (Default)
Date / Time
Associated With a URL Path, Not a Page!
Allows Passing Parameters Between Web Pages
Thus Cookies are Used to Provide State Information to a Stateless Protocol
Trang 24Web Server HTTP Functionality
Server Doesn’t Add HTTP Headers
Allows You to Create Specific Behavior
Redirect to Another Site
Never Saved in Browser’s Cache
Trang 25Some Definitions
Hits
Each HTTP Request is a Hit
Accessing a Web Page May Result in Multiple Hits
E.g Each Graphic is a Hit
Page Views
Accessing a Single Web Page is a Page View
E.g Typing in a URL or Clicking on a Link
Visits
A Single Client’s Visit to Your Entire Site (Session)
May Include Multiple Page Views
What Constitutes a Second Visit From the Same Client?
Why is This Important?
Terms are Sometimes Used Interchangeably and ImproperlyCompare Apples to Apples
Important for Commercial Web Sites
Advertising is Based on Site Access
Typically Sold on Page View Basis
Trang 26Server Log Files
Many Variations to Web Server Log File Formats
Four Log Files
Access (Transfer) Log
Each Hit is Recorded
User, Date/Time, HTTP Request, etc Error Log
Date/Time, Error Referrer Log
Referring Page, Destination Page Agent (User) Log
Client’s Browser
Clearly a Need for Standardization
Linking the Four Log Files Together
Trang 27Common Log Format
Host
IP Address (or Hostname) of Client
Some Servers Perform Lookup of IP Address
The Actual HTTP Request
E.g GET /index.htm HTTP/1.1
Trang 28Common Log Format (2)
Status
The HTTP Response Status Code
Transfer Volume
HTTP Response: Content-Length
Trang 29Extended Log File Format
Seven Common Log Format Fields Plus
Trang 30Client vs User
Typically Don’t Have User Level Information
Only Record IP Address of Computer Used For Access
If Fixed IP Address For a Single User’s Machine
This Can Identify the UserDynamically Assigned IP Addresses
Identifies the Overall Domain (e.g AOL.com)Proxy Servers
All Client’s Have IP Address of Proxy Server
Multiple “Sessions” at Same Time
Impossible to Have Truly Accurate Information
Log File Analysis Software Has Algorithms to Identify Page Views, Visits
Client Level Caching Affects Logs
“ISP” Level Caching Affects Logs
E.g AOL Maintains a Cache
No Requirement for Clients, ISPs to Follow Expiration Info
Trang 31Log File Maintenance on Server
Log Files Grow Rapidly
Log Files Compress Very Nicely
Server Configurable
Generate Daily/Weekly/Monthly Logs
Maintenance Scripts to Cleanup Log Files
Trang 32Log File Analysis
Big Business
Bread and Butter of Sites Driven By Advertising Revenue
Evaluation Factors
Log File Formats Supported
Ability to Link Multiple Logs
How Log Files are Accessed (e.g via FTP)
Display Methodology
E.g Available Via Web Pages
Lookup Capabilities
E.g Map User-Agent to Browser
E.g Resolve IP Addresses to Domains, Regions
Trang 33Log File Analysis Options
Important to Understand the Core Log Files
Log File Analysis Programs Make Some
Assumptions
Freeware
Commercial
Service Bureaus