Chapter1

Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Chapter 1 Web Essentials: Clients, Servers, and Communication : 

Chapter 1 Web Essentials: Clients, Servers, and Communication CSI3140 WWW Structures, Techniques, and Standards

The Internet : 

The Internet Technical origin: ARPANET (late 1960’s) One of earliest attempts to network heterogeneous, geographically dispersed computers Email first available on ARPANET in 1972 (and quickly very popular!) ARPANET access was limited to select DoD-funded organizations

The Internet : 

The Internet Open-access networks Regional university networks (e.g., SURAnet) CSNET for CS departments not on ARPANET NSFNET (1985-1995) Primary purpose: connect supercomputer centers Secondary purpose: provide backbone to connect regional networks

The Internet : 

The Internet

The Internet : 

The Internet Internet: the network of networks connected via the public backbone and communicating using TCP/IP communication protocol Backbone initially supplied by NSFNET, privately funded (ISP fees) beginning in 1995

Internet Protocols : 

Internet Protocols Communication protocol: detailed specification of how communication between two computers will be carried out to serve some purpose. E.g. telephone “protocol”: how you answer and end call, what language you speak, etc. Internet protocols developed as part of ARPANET research ARPANET began using TCP/IP in 1982 Designed for use both within local area networks (LAN’s) and between networks

Internet Protocol (IP) : 

Internet Protocol (IP) IP is the fundamental protocol defining the Internet (as the name implies!) IP address: 32-bit number (in IPv4) Associated with at most one device at a time (although device may have more than one) Written as four dot-separated bytes, e.g. 192.0.34.166

IP : 

IP IP function: transfer data from source device to destination device IP source software creates a packet representing the data Header: source and destination IP addresses, length of data, etc. Data itself If destination computer is on the same local network, then IP software will send the packet to the destination directly via this network If destination is on another LAN, packet is sent to a gateway that connects to more than one network

IP : 

IP Source Gateway Gateway Network 1 Network 2 Destination Network 3

IP : 

IP Source Gateway Gateway LAN 1 Internet Backbone Destination LAN 2

Transmission Control Protocol (TCP) : 

Transmission Control Protocol (TCP) Limitations of IP: No guarantee of packet delivery (packets can be dropped) Communication is one-way (source to destination) TCP adds concept of a connection on top of IP Provides guarantee that packets delivered Provide two-way (full duplex) communication

TCP : 

TCP Source Destination Can I talk to you? OK. Can I talk to you? OK. Here’s a packet. Got it. Here’s a packet. Here’s a resent packet. Got it. Establish connection. { { { Send packet with acknowledgment. Resend packet if no (or delayed) acknowledgment.

TCP : 

TCP TCP also adds concept of a port, which allows TCP to communicate with many different applications on a machine. TCP header contains port number representing an application program on the destination computer Some port numbers have standard meanings Example: port 25 is normally used for email transmitted using the Simple Mail Transfer Protocol (SMTP) Other port numbers are available first-come-first served to any application

TCP : 

TCP

User Datagram Protocol (UDP) : 

User Datagram Protocol (UDP) Alternative protocol to TCP Like TCP in that: Builds on IP Provides port concept Unlike TCP in that: Does not provide the two-way connection Does not provide guaranteed delivery Advantage of UDP vs. TCP: Lightweight, so faster for one-time messages

Domain Name Service (DNS) : 

Domain Name Service (DNS) DNS is the “phone book” for the Internet Mapping between host names and IP addresses DNS often uses UDP for communication Host names consist of a sequence of Labels separated by dots, e.g., www.example.org Final label is top-level domain with 2 standard types: Generic: .com, .edu, .org, etc. Country-code: .us, .ca, etc.

DNS : 

DNS Domains are divided into second-level domains, which can be further divided into subdomains, etc. E.g., in www.example.com, example is a second-level domain A host name plus domain name information is called the fully qualified domain name (FQDN) of the computer Above, www is the host name, www.example.com is the FQDN

DNS : 

DNS nslookup program provides command-line access to DNS (on most systems) looking up a host name given an IP address is known as a reverse lookup Note that single host may have mutliple IP addresses (and a single IP address can be associated with multiple domain names). Address (or name) returned is the canonical IP address (or name) specified in the DNS system. Other addresses (or names) are considered aliases.

Analogy to Telephone Network : 

Analogy to Telephone Network IP ~ the telephone network TCP ~ calling someone who answers, having a conversation, and hanging up UDP ~ calling someone and leaving a message DNS ~ directory assistance

Higher-level Protocols : 

Higher-level Protocols Many protocols build on TCP Telephone analogy: TCP specifies how we initiate and terminate the phone call, but some other protocol specifies how we carry on the actual conversation Some examples: SMTP (email) FTP (file transfer) HTTP (transfer of Web documents)

World Wide Web : 

World Wide Web Originally, one of several technologies for organizing and managing Internet-based information Competitors: WAIS, Gopher, ARCHIE Distinctive feature of Web: support for hypertext (text containing links) Communication via Hypertext Transport Protocol (HTTP): A rather generic protocol that supports a client requesting a document from a server and the server returning the requested document. Document representation using Hypertext Markup Language (HTML) that supports hyperlinks and inline graphics.

World Wide Web : 

World Wide Web The Web is the collection of machines (Web servers) on the Internet that provide information, particularly HTML documents, via HTTP. Machines that access information on the Web are known as Web clients. A Web browser is software used by an end user to access the Web.

Hypertext Transport Protocol (HTTP) : 

Hypertext Transport Protocol (HTTP) HTTP is based on the request-response communication model: Client sends a request Server sends a response HTTP is a stateless protocol: The protocol does not require the server to remember anything about the client between requests.

HTTP : 

HTTP Normally implemented over a TCP connection (80 is standard port number for HTTP) Typical browser-server interaction: User enters Web address in browser Browser uses DNS to locate IP address Browser opens TCP connection to server Browser sends HTTP request over connection Server sends HTTP response to browser over connection Browser displays body of response in the client area of the browser window

HTTP : 

HTTP The information transmitted using HTTP is often entirely text Can use the Internet’s Telnet protocol to simulate browser request and view server response

HTTP : 

HTTP $ telnet www.example.org 80 Trying 192.0.34.166... Connected to www.example.com (192.0.34.166). Escape character is ’^]’. GET / HTTP/1.1 Host: www.example.org HTTP/1.1 200 OK Date: Thu, 09 Oct 2003 20:30:49 GMT… { Send Request { Receive Response Connect {

HTTP Request : 

HTTP Request Structure of the request: start line header field(s) (one or more) blank line message body (optional)

HTTP Request : 

HTTP Request Structure of the request: start line header field(s) blank line optional message body

HTTP Request : 

HTTP Request Start line Example: GET / HTTP/1.1 Every start line consists of three parts, with a single space used to separate adjacent parts: HTTP request method Request-URI HTTP version

HTTP Request : 

HTTP Request Start line Example: GET / HTTP/1.1 Three space-separated parts: HTTP request method Request-URI HTTP version We will cover 1.1, in which version part of start line must be exactly as shown

HTTP Request : 

HTTP Request Start line Example: GET / HTTP/1.1 Three space-separated parts: HTTP request method Request-URI HTTP version

HTTP Request : 

HTTP Request Uniform Resource Identifier (URI) Every URI consists of 2 parts: the scheme, which appears before the colon (:), and another part that depends on the scheme. Syntax: scheme : scheme-depend-part Ex: In http://www.example.com/The scheme is http Request-URI is the portion of the requested URI that follows the host name (which is supplied by the required Host header field) Ex: / is Request-URI portion of http://www.example.com/

URI : 

URI URI’s are of two types: Uniform Resource Name (URN) Can be used to identify resources with unique names, such as books (which have unique ISBN’s) Scheme is urn Uniform Resource Locator (URL) Specifies location at which a resource can be found In addition to http, some other URL schemes are https, ftp, mailto, and file

HTTP Request : 

HTTP Request Start line Example: GET / HTTP/1.1 Three space-separated parts: HTTP request method Request-URI HTTP version

HTTP Request : 

HTTP Request Common request methods: GET Used if link is clicked or address typed in browser There is no message body in a request with GET method Requests server to return the resource specified by the Request-URI as the body of a response message POST Used when submit button is clicked on a form Form information contained in body of request Requests server to pass the body of this request message as data to be processed by the resource specified by the Request-URI HEAD Requests server to return the same HTTP header fields that would be returned if a GET method were used, but not return the message body

HTTP Request : 

HTTP Request Structure of the request: start line header field(s) blank line optional body

HTTP Request : 

HTTP Request Header field structure: field name : field value Syntax Field name is not case sensitive Field value may continue on multiple lines by starting continuation lines with white space Field values may contain MIME types, quality values, and wildcard characters (*’s)

Multipurpose Internet Mail Extensions (MIME) : 

Multipurpose Internet Mail Extensions (MIME) Convention for specifying content type of a message In HTTP, MIME types are typically used to specify content type of the body of the response MIME content type syntax: top-level type / subtype Examples: text/html, image/jpeg

HTTP Quality Values and Wildcards : 

HTTP Quality Values and Wildcards Example of header field with quality values to indicate preferences:accept: text/xml,text/html;q=0.9, text/plain;q=0.8, image/jpeg, image/gif;q=0.2,*/*;q=0.1 Quality values are decimal numbers between 0 and 1 A quality value applies to all preceding items, back to the next earlier quality value Higher the value, higher the preference Note use of wildcards to specify quality 0.1 for any MIME type not specified earlier

HTTP Request : 

HTTP Request Common header fields: Host: host name from URL (required) User-Agent: type of browser sending request Accept: MIME types of acceptable documents Connection: value close tells server to close connection after single request/response is sent Content-Type: MIME type of (POST) body, normally application/x-www-form-urlencoded Content-Length: bytes in body Referer: URI of the resource from which the browser obtained the URI for this HTTP request.

HTTP Response : 

HTTP Response Structure of the response: status line header field(s) blank line optional body

HTTP Response : 

HTTP Response Structure of the response: status line header field(s) blank line optional body

HTTP Response : 

HTTP Response Status line Example: HTTP/1.1 200 OK Three space-separated parts: HTTP version status code reason phrase (intended for human use)

HTTP Response : 

HTTP Response Status code Three-digit number First digit is class of the status code: 1=Informational (used to provide information to client before request processing has been completed) 2=Success (used to indicate that the request has been successfully processed) 3=Redirection (client needs to use a different resource to fulfill the request - alternate URL is supplied) 4=Client Error (client request is not valid) 5=Server Error (an error occurred during server processing of a valid client request) Other two digits provide additional information

HTTP Response : 

HTTP Response Structure of the response: status line header field(s) blank line optional body

HTTP Response : 

HTTP Response Common header fields: Connection, Content-Type, Content-Length Date: date and time at which response was generated (required) Location: alternate URI if status is redirection Last-Modified: date and time the requested resource was last modified on the server Expires: date and time after which the client’s copy of the resource will be out-of-date ETag: a unique identifier for this version of the requested resource (changes if resource changes)

Client Caching : 

Client Caching A cache is a local copy of information obtained from some other source Most web browsers use cache to store requested resources so that subsequent requests to the same resource will not necessarily require an HTTP request/response Ex: icon appearing multiple times in a Web page

Client Caching : 

Client Caching Browser Web Server 1. HTTP request for image 2. HTTP response containing image Client Server Cache 3. Store image

Client Caching : 

Client Caching Browser Web Server Client Server Cache I need that image again…

Client Caching : 

Client Caching Browser Web Server Client Server Cache I need that image again… HTTP request for image HTTP response containing image This…

Client Caching : 

Client Caching Browser Web Server Client Server Cache I need that image again… Getimage … or this

Client Caching : 

Client Caching Cache advantages (Much) faster than HTTP request/response Less network traffic Less load on server Cache disadvantage Cached copy of resource may be invalid (inconsistent with remote version)

Client Caching : 

Client Caching Validating cached resource: Send HTTP HEAD request and check Last-Modified or ETag header in response Compare current date/time with Expires header sent in response containing resource If no Expires header was sent, use heuristic algorithm to estimate value for Expires Ex: Expires = 0.01 * (Date – Last-Modified) + Date

Character Sets : 

Character Sets Every document is represented by a string of integer values (code points) The mapping from code points to characters is defined by a character set Some header fields have character set values: Accept-Charset: allows client to express preferences to server about character sets that the client can recognize Ex: accept-charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 This header field said that the client would prefer to receive documents uisng ISO-8859-1 character set or UTF-8 encoding Content-Type: allows for the indication of the character set used to represent the body of the HTTP message Ex: Content-Type: text/html; charset=UTF-8 This header field indicates that the body of the message is an HTML document written using the UTF-8 character encoding

Character Sets : 

Character Sets Technically, many “character sets” are actually character encodings Character encodings represent code points using variable-length byte strings An encoding must be decoded into a code point integer that is then mapped to a character Common characters are represented using shorter strings, and less common characters using longer strings Most common examples are Unicode-based encodings UTF-8 and UTF-16 IANA (Internet Assigned Numbers Authority) maintains complete list of Internet-recognized character sets/encodings

Character Sets : 

Character Sets Typical US PC produces ASCII documents US-ASCII character set can be used for such documents, but is not recommended UTF-8 and ISO-8859-1 are supersets of US-ASCII and provide international compatibility UTF-8 can represent all ASCII characters using a single byte each, and arbitrary Unicode characters using up to 4 bytes each ISO-8859-1 is 1-byte code that has many characters common in Western European languages, such as é

Web Clients : 

Web Clients A web client is a software that accesses a web server by sending an HTTP request message and processing the resulting HTTP response. Most common form: Web browsers running on desktop or laptop computers Text-only “browser” (lynx) Browsers running on cell phones Robots (software-only clients, e.g., search engine “crawlers”) etc. We will focus on traditional web browsers

Web Browsers : 

Web Browsers The Mosaic browser, developed at the National Center for Supercomputer Applications (NCSA) in 1993, was the starting point for bringing graphical web browsers to the general public.

Web Browsers : 

Web Browsers

Web Browsers : 

Web Browsers Primary tasks: Reformat the URL entered as a valid HTTP request message If the server is specified using a host name (rather than an IP address), use DNS to convert this name to the appropriate IP address Establish a TCP connection using the IP address of the specified web server Send HTTP request over the TCP connection and wait for the server’s response Appropriately display (render) the document returned by the server

HTTP URL’s : 

HTTP URL’s Browser uses authority to connect via TCP Request-URI is included in start line of HTTP request (/, known as root path, is used for path if none supplied) Fragment identifier not sent to server (used by browsers to scroll HTML documents in browser client area) http://www.example.org:56789/a/b/c.txt?t=win&s=chess#para5 host (FQDN) port authority path query fragment Request-URI

Web Browsers : 

Web Browsers Standard features Save web page to disk Find string in page Fill forms automatically (browser can remember information entered in certain forms, such as passwords, phone number, etc.) Set preferences (language, character set, cache and HTTP parameters) Modify display style (e.g., increase font sizes) Display raw HTML source and HTTP header info (e.g., Last-Modified, etc.) Choose browser themes (sometimes called “skins”), i.e., the look of one or more of the browser bars View history of web addresses visited Bookmark favorite pages for easy return

Web Browsers : 

Web Browsers Additional functionality: Execution of scripts (running programs to perform a variety of tasks) Event handling (performing variety of actions in response to an event such as mouse click) GUI for controls (allowing user to perform text-editing functions if a web page contains a form with fill-in fields) Secure communication with servers (encoding sensitive information sent to server) Display of non-HTML documents (e.g., PDF) via plug-ins

Web Servers : 

Web Servers Basic functionality: Receive HTTP request via TCP Map Host header to specific virtual host (one of many host names sharing an IP address) Map Request-URI to specific resource associated with the virtual host File: Return file in HTTP response Program: Run program and return output in HTTP response Map type of resource to appropriate MIME type and set Content-Type header in HTTP response Log information about the request and response

Web Servers : 

Web Servers NCSA’s httpd web server was also the starting point for server development. NCSA discontinued development of the server in mid 1990s Apache: A “patchy” version of httpd, now the most popular server (esp. on Linux platforms) IIS: Microsoft Internet Information Server Tomcat: Popular, free, and Java-based Can run as servlet container called on by web servers (such as Apache or IIS) for executing Java servlets (HTML-generating programs) Can also run as stand-alone web server that communicates directly with web clients Configuration can be broken into 2 areas: External communication – corresponding to Coyote Java package, which provides HTTP/1.1 communication Internal processing – corresponding to Catalina Java package, which is the actual servlet container

Web Servers : 

Web Servers Some Coyote communication parameters: Allowed or blocked IP addresses Maximum number of simultaneous active TCP connections Maximum number of queued TCP connection requests “Keep-alive” time for inactive TCP connections The setting of these parameters can have significant influence on the performance of the server Changing the values of these parameters in order to optimize performance is referred to as tuning the server

Web Servers : 

Web Servers Some Catalina container parameters: Virtual host names and associated ports Logging preferences Mapping from Request-URI’s to server resources Password protection of resources Use of server-side caching

Tomcat Web Server : 

Tomcat Web Server Suppose Tomcat 5.0 server has been installed at the default port 8080 (see Appendix A for installation instructions) Open a browser on the machine running the server and browse to http://localhost:8080and click on Server Administration link This should cause a log-in page to be displayed You are asked to enter the user name and password when you installed Tomcat Note that localhost is a special host name that means “this machine”

Tomcat Web Server : 

Tomcat Web Server

Tomcat Web Server : 

Tomcat Web Server

Tomcat Web Server : 

Tomcat Web Server

Tomcat Web Server : 

Tomcat Web Server Some Connector fields: Port Number: port “owned” by this connector Max Threads: max connections processed simultaneously Connection Timeout: keep-alive time

Tomcat Web Server : 

Tomcat Web Server

Tomcat Web Server : 

Tomcat Web Server Each Host is a virtual host (can have multiple per Connector) Some fields: Host: localhost or a fully qualified domain name Application Base: directory (may be path relative to JWSDP installation directory) containing resources associated with this Host

Tomcat Web Server : 

Tomcat Web Server

Tomcat Web Server : 

Tomcat Web Server Context provides mapping from Request-URI path to a web application Document Base field is directory (possibly relative to Application Base) that contains resources for this web application For this example, browsing tohttp://localhost:8080/returns resource fromc:\jwsdp-1.3\webapps\ROOT Returns index.html (standard welcome file)

Tomcat Web Server : 

Tomcat Web Server Web server logs record information about server activity The primary web server log recording normal activity is access log, a file that records information about HTTP requests processed by the server Parameters set using AccessLogValve Default location: logs/access_log.* under JWSDP installation directory Example “common” log format entry (one line): www.example.org - admin [20/Jul/2005:08:03:22 -0500] "GET /admin/frameset.jsp HTTP/1.1" 200 920

Tomcat Web Server : 

Tomcat Web Server Other logs provided by default in JWSDP: Message log messages sent to log service by web applications or Tomcat itself logs/jwsdp_log.*: default message log logs/localhost_admin_log.*: message log for web applications within /admin context System.out and System.err output (exception traces often found here): logs/launcher.server.log

Tomcat Web Server : 

Tomcat Web Server Access control: Provide password protection for resources that it serves Users and roles defined in conf/tomcat-users.xml Deny access to machines Useful for denying access to certain users by denying access from the machines they use List of denied machines maintained in RemoteHostValve (deny by host name) or RemoteAddressValve (deny by IP address)

Secure Servers : 

Secure Servers Since HTTP messages typically travel over a public network, private information (such as credit card numbers) should be encrypted to prevent eavesdropping https URL scheme tells browser to use encryption Common encryption standards: Secure Socket Layer (SSL) protocol Transport Layer Security (TLS) protocol

Secure Servers : 

Secure Servers Browser Web Server I’d like to talk securely to you (over port 443) Here’s my certificate and encryption data Here’s an encrypted HTTP request Here’s an encrypted HTTP response Here’s an encrypted HTTP request Here’s an encrypted HTTP response TLS/SSL TLS/SSL HTTP Requests HTTP Responses HTTP Requests HTTP Responses

Secure ServersMan-in-the-Middle Attack : 

Secure ServersMan-in-the-Middle Attack Browser Fake DNS Server What’s IP address for www.example.org? 100.1.1.1 Fake www.example.org 100.1.1.1 Real www.example.org My credit card number is…

Secure ServersPreventing Man-in-the-Middle : 

Secure ServersPreventing Man-in-the-Middle Browser Fake DNS Server What’s IP address for www.example.org? 100.1.1.1 Fake www.example.org 100.1.1.1 Real www.example.org Send me a certificate of identity

Case Study : 

Case Study

Case Study : 

Case Study What web server will we use? Tomcat What web browsers will we support? IE6, Mozilla What level of security will we implement? Non-secure (http scheme) Password required to add to blog