Skip to content

HTTP Protocol

In web applications, when a server delivers a webpage to the browser, it essentially sends the HTML code of the webpage for the browser to display. The transmission protocol used between the browser and the server is HTTP. Therefore:

  • HTML is a text-based language used to define webpages. Knowing HTML allows you to create webpages.
  • HTTP is the protocol used for transmitting HTML over the network, facilitating communication between the browser and the server.

Before diving into examples, we need to install Google Chrome.

Why use Chrome instead of IE? Because we need a browser that allows us to debug our web applications easily, and Chrome provides a complete set of debugging tools, making it ideal for web development.

After installing Chrome, open it and select "View," then "Developer," and "Developer Tools" from the menu to display the developer tools:

dev-tools.webp

The "Elements" tab shows the structure of the webpage, while "Network" displays the communication between the browser and the server. Click "Network" and make sure the first small red light is on. Chrome will then record all communications between the browser and the server:

record.jpg

When we enter www.sina.com.cn in the address bar, the browser displays the Sina homepage. During this process, what exactly does the browser do? The "Network" records provide the answer. In the "Network" section, locate the first record, click it, and on the right side, you will see "Request Headers." Click "view source" to see the request sent from the browser to the Sina server:

http-request.webp

The analysis of the first two lines is as follows:

First Line:

GET / HTTP/1.1
  • GET represents a read request that retrieves webpage data from the server. The / indicates the URL path, which always starts with /, representing the homepage. The final HTTP/1.1 indicates the HTTP protocol version, which is 1.1. Currently, the HTTP version is 1.1, although most servers also support version 1.0. The main difference is that version 1.1 allows multiple HTTP requests to reuse a single TCP connection, speeding up transmission.

Second Line:

Host: www.sina.com.cn
  • This indicates that the requested domain is www.sina.com.cn. If a server hosts multiple websites, it uses the "Host" header to differentiate which site the browser is requesting.

Scroll down to "Response Headers" and click "view source" to display the original response data from the server:

http-response.webp

An HTTP response is divided into two parts: the Header and the Body (the Body is optional). The most important lines in the Header are:

200 OK

  • 200 indicates a successful response, with "OK" as an explanation. Failure responses include 404 Not Found (page not found), 500 Internal Server Error (server error), etc.

Content-Type: text/html

  • Content-Type specifies the type of response content. Here, text/html indicates an HTML webpage. The browser relies on Content-Type to determine whether the content is a webpage, image, video, or audio. The browser does not use the URL to determine the content type; even if the URL is http://example.com/abc.jpg, it may not necessarily be an image.

The Body of the HTTP response contains the HTML source code. In the menu, select "View," "Developer," and "View Page Source" to view the HTML source code in the browser:

source.webp

After the browser reads the HTML source code of the Sina homepage, it parses the HTML, displays the page, and then sends additional HTTP requests to the Sina server for images, videos, Flash, JavaScript scripts, CSS, and other resources. Eventually, a complete page is displayed. Hence, we see many additional HTTP requests in the "Network" section.

HTTP Requests

After tracing the Sina homepage, let's summarize the HTTP request process:

Step 1: The Browser Sends an HTTP Request to the Server, Which Includes:

  • Method: GET or POST. GET only requests resources, while POST includes user data.
  • Path: /full/url/path.
  • Domain: Specified by the "Host" header: Host: www.sina.com.cn.
  • Other relevant headers.
  • If it's a POST request, the request also includes a Body containing user data.

Step 2: The Server Returns an HTTP Response to the Browser, Which Includes:

  • Response Code: 200 indicates success, 3xx indicates redirection, 4xx indicates client-side errors, and 5xx indicates server-side errors.
  • Response Type: Specified by Content-Type, e.g., Content-Type: text/html;charset=utf-8 for an HTML document encoded in UTF-8, or Content-Type: image/jpeg for a JPEG image.
  • Other relevant headers.
  • The HTTP response usually contains content in the Body, which includes the HTML source code.

Step 3: If the Browser Needs to Request Additional Resources, Such as Images, It Sends Another HTTP Request, Repeating Steps 1 and 2.

The HTTP protocol used by the web adopts a simple request-response model, simplifying development. When we create a webpage, we only need to send the HTML in the HTTP response without worrying about including images or videos. The browser will send another HTTP request for those resources. Thus, each HTTP request handles one resource.

HTTP Format

Each HTTP request and response follows the same format, consisting of Header and Body parts, where the Body is optional.

The HTTP protocol is a text-based protocol, making its format very simple.

HTTP GET Request Format:

GET /path HTTP/1.1
Header1: Value1
Header2: Value2
Header3: Value3

Each header is on a separate line, ending with \r\n.

HTTP POST Request Format:

POST /path HTTP/1.1
Header1: Value1
Header2: Value2
Header3: Value3

body data goes here...

When two consecutive \r\n are encountered, the Header section ends, and all subsequent data is considered the Body.

HTTP Response Format:

200 OK
Header1: Value1
Header2: Value2
Header3: Value3

body data goes here...

If an HTTP response includes a Body, it is also separated by \r\n\r\n. Note that the data type in the Body is determined by the Content-Type header. If it’s a webpage, the Body is text; if it’s an image, the Body is binary data.

When Content-Encoding is present, the Body data is compressed, with the most common compression method being gzip. When you see Content-Encoding: gzip, you need to decompress the Body data to get the actual content. Compression reduces the size of the Body, speeding up network transmission.

For a detailed understanding of the HTTP protocol, the book "HTTP: The Definitive Guide" is highly recommended, and there is a Chinese translation titled "HTTP权威指南".

HTTP Protocol has loaded