Application Layer – Hypertext Transfer Protocol (http) and http Proxy Server(caching)

Howdy fellas !!

In the previous post , we have seen the most frequently used terms on the internet. Thus it is now amply clear that the World Wide Web (www) provides a standard framework for accessing documents and resources that are located in the computers connected to internet. 

Now in this post, we will understand the meaning of the term ‘HTTP’ , that we use in any URL to be more precise.

The information that is to be offered to the user (client) is stored in the web servers in the form of web pages. Further, these web pages are created by the means of a language called Hyper Text Markup Language(html) . These web pages are interconnected by the hyperlinks (or simply links).

Now any normal user can view these pages by using a web browser (client program) like google chrome, IE , firefox , safari etc . Now each link provides the browser with a uniform resource locator (url) that specifies the name of the machine where the web page is located as well as the name of the file that contains the requested document. The communication between the web browser(client) and the webserver is supported by the Hypertext Transfer Protocol (http)

The principal characteristic of HTTP is to get access to information on WWW.This protocol can get data in various forms such as with plaintext, hypertext, audio, video etc .

The function of HTTP is equivalent to a combination of FTP and SMTP (discussed in the previous posts). It uses the services of TCP. It uses one TCP connection (port eighty).

There is be no separate manage(control) connection just like the one in FTP. Only the data transfer takes place between the client and server , so there is only one connection and it is the data connection.

The data transfer in HTTP is similar to SMTP. The layout (format) of the messages is managed by means of MIME like headers. 

Working of HTTP in simple terms :

HTTP protocol mainly defines the way in which the client makes a service request to web server for a service (fetching of information/document). So let us understand the steps involved in this communication :

Initially client (end user) will click on a link (or enters any url) .

 Browser program (chrome , firefox, safari etc) provides the required url.

It will then get the ip address of the server using DNS (domain name server).

HTTP makes TCP connection (port 80) between client and server.

The document is then transferred from the server to client.

The user then starts to view the document.

The established TCP connection is then closed after a certain timeout period.

Principle of HTTP Operation

The principle of HTTP is straightforward. A client sends a request. Then the server sends a response. The request and reply messages carry data in the form of a letter with a MIME like format.

The client initializes the transaction by way of sending a request message and the server responds by way of sending a response.

HTTP is the web’s application layer protocol. It is an integral part of the Web. It has been defined in [RFC 1945] and [RFC 2616]. It mainly consists of two programs i.e.

  • Client program   
  • Server program

The HTTP client first initiates a TCP connection with the server. After establishing a connection, the browser and the server processes access TCP through their socket interface. 

Unlike SMTP, the HTTP messages are not destined to be read by humans; they are read and interpreted by the HTTP server and HTTP client (browser).

SMTP messages are stored and forwarded but the HTTP messages are delivered immediately. The commands from the client to the server are then embedded in a request message. 

Statelessness Associated With HTTP

In HTTP, the server sends the files requested to the client without storing any state information about the client. So it may happen that the same client may ask the same information repeatedly to the server and the server would not even understand it. So it will keep resending those files.

As the HTTP servers do not maintain any information about the state of the client it is called as a stateless protocol. (concept of cookies already explained in the previous post)

Persistent and Non – Persistent connection in HTTP

HTTP is capable of using both non-persistent and persistent connections. HTTP uses persistent connection in its default mode. But HTTP clients and servers can be configured to use the non – persistent connection as well .

Non persistent connection 

In a non persistent connection, one TCP connection is made for each request/response. The following lists the steps in this strategy:

  • The client first opens a TCP connection and sends a request.
  • The server then sends the response and closes the connection.
  • The client then reads the data until it encounters an end-of-file marker,  it then closes the connection.

In this strategy, for ‘N’ different pictures in different files, the connection must be opened and closed N number of times. The nonpersistent strategy further imposes high overhead on the server because the server needs N different buffers and requires a slow start procedure each time a connection is opened.

Thus in this connection, there will be more delay and memory wastage . 

Persistent Connection

HTTP version 1.1 basically specifies a persistent connection by default. In a persistent connection, the server leaves the connection open for more requests after sending a response. The server can then close the connection at the request of a client or if a time-out has been reached.

The sender usually sends the length of the data with each response. However, there are some of occasions when the sender does not know the length of the data. This is the case when a document is created dynamically or actively.

In those cases, the server first informs the client that the length is not known and it closes the connection after sending the data so the client knows that the end of the data has been reached.

HTTP Transaction (message formats)

The first line in a request message is called request line and the first line in the response message is called the status line. A request message normally consists of a request line, a header, and sometimes a body. Further a response message consists of a status line, a header, and sometimes a body.

The request line format is given as :

Request Type  (space) URL  (space) http version

The status line format is given as :

http version  (space) Status code  (space) Status phrase

HTTP messages are written in ASCII text format.(only one field is common between both)

Request Type : This field is used in the request message. In version 1.1 of the HTTP, several request types are defined. The request type is categorized into methods as shown below :

Methods Actions
GET Requests a document from the server
POST Sends some information from the client to the server
PUT Sends a document from the server to the client

URL : Address of the machine and file

Version : HTTP version

Status code : This field is used in the response message. The status code field is similar to that in the FTP and the SMTP protocols. It consists of three digits.(shown below)

Status phrase : This field is used in the response message. It explains the status code in text form.

      Status code             Status phrase                                  Description
100 continue

 

The initial part of the request has been received,
and the client may continue with its request
200 Ok The request is successful.
301 Moved permanently The requested URL is no longer used by the server.
302 Moved temporarily The requested URL has moved temporarily
400 Bad request There is a syntax error in the request.
403 Forbidden Service is denied.
404 Not found The document is not found.
500 internal server error There is an error, such as a crash, at the server site.
501 Not implemented The action requested cannot be performed.
503 Service unavailable The service is temporarily unavailable, but may be requested in the future.

HTTP Header format

Headers in the response message are used for exchanging additional information between the client and server.The header can be a one-liner or multiple lines.

Header name    : (space) Header value

Now let us look into some header names and their description

Header  Description
Date Shows the current date
Content length Shows the length of the document
MIME version Shows the MIME version used
Accept Shows the medium format the client can accept
Server Shows the server name and version number

HTTP Proxy Server and Caching

When the volume of requests(for data) from popular sites becomes sufficiently large, it makes sense to cache (extra buffer) web information in servers closer to the user. A web proxy server can be deployed by any ISP (internet service provider) to control the traffic. HTTP supports proxy servers.

A proxy server is a computer that basically keeps copies of responses to recent requests. The HTTP client then sends a request to the proxy server. The proxy server checks its cache.

If the response is not stored in cache, the proxy server then sends the request to the corresponding server. Incoming responses are further sent to the proxy server and stored for future requests from other clients.

The proxy server reduces the load on the original server by decreasing the traffic, and also improves latency. However, to use the proxy server, the client must be configured to access the proxy instead of the target server.

Proxy server is basically a gateway that communicates using HTTP to the browser FTP.

HTTP Secure (https) 

This includes an additional layer of security with the implementation of  TSL(transport layer security with encryption) or with the help of a secure socket layer (SSL) . It uses port 443 of TCP (unlike http’s  port 80 of TCP) and is much secure than normal HTTP.

Here comes the end of this long post (hope you enjoyed :-p) . In the next post, we will look into further details about other important features of the application layer.

Spread the Wisdom !!

Techie Aric

Aric is a tech enthusiast , who love to write about the tech related products and 'How To' blogs . IT Engineer by profession , right now working in the Automation field in a Software product company . The other hobbies includes singing , trekking and writing blogs .

Leave a Reply

Your email address will not be published. Required fields are marked *