http-keyboardbanger-f

Understanding The Basics of HTTP Protocol

Before starting to talk about HTTP, I begin with a definition of socket interfaces. A socket interface is the “door” between the application layer (here HTTP) and the transport layer (here TCP). So if we have two hosts communicating at the application layer (for instance HTTP), then both hosts will have socket interfaces.

http-socket-interface
Figure: Socket interfaces in the interaction between the application layer and the transport layer – © homepage.smc.edu

Sending requests

The client process at the application layer sends HTTP requests down to the Transport layer through the socket interface. At the server side, the Transport layer receives the HTTP request and hands it to the server process through the socket interface.

Receiving responses

The server process at the application layer sends a HTTP rely down the Transport layer through the socket interface. When the client TCP connection receives the HTTP response, it forwards it to the client process through the socket interface.

Note here that HTTP is not concerned with the integrity or reliability of data, because this is not its job; it’s the job of the underlying TCP connection.

What does a web page consist of?

A web page consists of a base HTML file and a set of referenced objects. An object can be text, an image, a video, a script,…

The objects are referenced by their URL.  Let’s back down and remember the format of a URL:

Protocol://Hostname/pathname

Here is an example of a URL of a referenced object:

HTTP-url-host-name-path-name
Anatomy of a URL © www.slideshare.net

There are two types of HTTP messages: request messages and response messages.

Both types are written in ASCII coding. And both types use the following characters:

sp: “space”.

cr: carriage return. It’s going back to the beginning of the line.

lf: line feed. It’s marker of the end of the sentence and the beginning of a new one.

HTTP Request message format

The most popular HTTP methods are: GET, POST, HEAD.

The GET method is used to request a web page. The POST method is used when a user fills an online form or does an Internet search (Google, Yahoo!…). The HEAD method is used especially by developers.

The entity Body is empty with the GET method. But with the POST method, it contains data (the user-filled form values, the search string,…)

 

http-request-message-format
Figure: HTTP Request message format © Computer Networking: A Top Down Approach

HTTP Response message format

http-response-message-format
Figure: HTTP Response message format © Computer Networking: A Top Down Approach

HTTP status codes

The status codes can be classified as follows:

1xx: informational

2xx: Success

3xx: Redirect

4xx: Client error

5xx: Server error

A real HTTP error message example is the CUCM 404 ccmadmin error.

Other notes

A HTTP server does not maintain state of the connection. What this means is that, if a web client requests page 1 at time “t”, and requests the same page at time “t+x”, then the server will simply reply to the request as if it were a new client. That’s why HTTP is described as a stateless protocol.

What about merchant websites like Amazon? How can Amazon website remember my shopping cart? Well, it’s true that web servers are stateless, but they can maintain some some state if the client accepts that. That’s the concept of cookies.

HTTP version 1.0

With HTTP 1.0, when a client wants to open a webpage that has one image, it creates one TCP connection to request the page, then a second TCP connection to request the image. In fact, version 1.0 creates a TCP connection for each HTTP object (web page, script, image,…).

When calculating web page loading time, we can make the following assumptions:

  • usually the packetization time of SYN, SYN/ACK and ACK segments is negligible
  • since ACK packetization time is negligible, we can immediately send the request to the server
  • If the webpage has many images to load (5 images for example), each image requires a TCP connection by itself. Here you must pay attention because you can not do “simultaneous Open” unless the number of allowed concurrent TCP connections allows you so.
  • In a HTTP file transfer, and if the RTT is constant and the packetization time is negligible, the transfer time of 1 segment (from server to client)  is the same as the transfer time for N segments

It was observed that with version 1.0, rich web pages take longer time to load, which has a negative impact on the user experience, according to a research (1) made by Google and Microsoft. So we need to implement mechanisms to reduce the page loading speed. One basic mechanism is to reduce the number of TCP connections. And that’s what HTTP 1.1 brings to the world.

HTTP version 1.1

  • Web browsers began using it in 1998
  • The client can request many files in the same connection, such as when a web page contains many images, the client requests the page and the images in the same TCP connection (back-to-back request and response)
  • can be four times faster than HTTP

There are other techniques to speed up the web page loading speed, such as:

  • reducing the number of cookies
  • caching
  • using Content Delivery Networks
  • leveraging web browser capabilities for stylesheets, etc.

Methods to improve HTTP performance

  • parallel TCP sessions
  • persistent HTTP sessions
  • worker processes problem -> not a problem with the new web servers called EngineX
    • keepalive
  • HTTP pipelining
    • head-of-line blocking: possible if response packets are big
  • Increase initial congestion window to 10
    • Google’s proposal
    • easily done on Linux servers with a simple CLI command
    • some known load balancers (like F5) provide the possibility to modify the initial congestion window size
  • TCP Fast Open
    • a Google’s proposal, still in the experimental phase
    • request sent with SYN
    • HTTP answer sent before the client sends back ACK
    • gain 1 RTT (the initial RTT of the 3 way handshake)
    • probably firewalls in the network path would drop the packets down.
    • danger: performing TCP SYN floods are more harmful with TCP Fast Open –> solution: implement TCP Fast Open cookies
    • cookies sent and received between the client and the server with each packet
  • Minification
  • Compression
  • Caching: the best method to improve web performance

SPDY

  • This is a protocol introduced and used by Google. It leverages the pipelining technology: instead of loading a page then waiting for images to load, it loads both the page and the images in parallel.
  • SPDY is the basis of HTTP 2.0
  •  is a framing layer that sits above TCP (or TLS as an option)
  • allows parallel HTTP sessions (requests and responses) or “SPDY sessions” over a single TCP connection
  • sessions that are opened over the TCP connection use IP addresses and not hostnames
  • is not a Session layer protocol
  • in theory, SPDY can be used with other application protocols. But initially it was made for HTTP
    already supported by F5 WebAccelerators
  • SYN_STREAM packets create a new SPDY stream. Many SYN_STREAM packets can be created over a single TCP connection. Of course, before that, a TCP connection is established
  • to end a SPDY stream, for instance there is no more data to send, a FIN packet is sent.
  • communication is bidirectional (request and response) or server-initiated (server PUSH)
  • in theory, it should be faster than HTTP and HTTPS. But, in reality, it is almost as fast as HTTPS, because of the way web pages and servers are coded
  • real-life implementations of SPDY run with TLS.

Testing websites and web applications

  • test on 3G networks
  • test on mobile devices to detect potential JavaScript issues

References

  • (1) http://radar.oreilly.com/2009/06/bing-and-google-agree-slow-pag.html
  • Computer Networking: A Top Down Approach
  • http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-170-software-studio-spring-2013/lecture-notes/MIT6_170S13_07-http-prtcol.pdf
  • https://www.udemy.com/tcp-http-spdy-deep-dive

Leave a Comment