Appearance
HTTP Programming
What is HTTP? HTTP is the foundational protocol currently used by the most widespread web applications. For example, when browsers access websites or mobile apps communicate with backend servers, they do so using the HTTP protocol.
HTTP stands for HyperText Transfer Protocol, and it is a request-response protocol built on top of the TCP protocol.
Let's look at an HTTP request-response cycle when a browser requests access to a website. When a browser wants to access a website, it first establishes a TCP connection with the website server, which always uses port 80 or the encrypted port 443. Then, the browser sends an HTTP request to the server. Upon receiving the request, the server returns an HTTP response containing the HTML content of the webpage. The browser parses the HTML and displays the webpage to the user. A complete HTTP request-response cycle is as follows:
GET / HTTP/1.1
Host: www.sina.com.cn
User-Agent: Mozilla/5 MSIE
Accept: */* ┌────────┐
┌─────────┐ ├─────────┤ ├─────────┤ ├─────────┤ ┌───────┐
│O ░░░░░░░│───▶│ ┌─┐ ┌┐┌┐ │───▶│ ┌─┐ ┌┐┌┐ │───▶│ ┌─┐ ┌┐┌┐ │◀───│ │
├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤ ├───────┤
│ │◀───────────────────────────│░░░░░░░░│ │░░░░░░░░│ │░░░░░░░░│
│ │ HTTP/1.1 200 OK ├─────────┤ ├─────────┤ └───────┘
└─────────┘ Content-Type: text/html │░░░░░░░░│ │░░░░░░░░│
Browser Content-Length: 133251 └─────────┘ └─────────┘
<!DOCTYPE html> Server
<html><body>
<h1>Hello</h1>
...
The format of an HTTP request is fixed, consisting of two parts: the HTTP Header and the HTTP Body. The first line always contains the request method, path, and HTTP version. For example, GET / HTTP/1.1
indicates a GET request to the path /
using HTTP version 1.1.
Each subsequent line follows the Header: Value
format, known as HTTP Headers. Servers use specific headers to identify client requests. For example:
- Host: Indicates the domain name of the request. Since a single server can host multiple websites, the
Host
header is necessary to determine which website the request is targeting. - User-Agent: Represents the client's identification information. Different browsers have different identifiers, allowing servers to determine if the client is IE, Chrome, Firefox, or a Python crawler.
- Accept: Specifies the HTTP response formats the client can handle.
*/*
means any format,text/*
means any text format, andimage/png
specifies PNG image format. - Accept-Language: Indicates the languages the client can accept, sorted by priority. Servers use this field to return webpages in specific languages for the user.
If it is a GET request, the HTTP request only contains HTTP Headers without an HTTP Body. If it is a POST request, the HTTP request includes a Body, separated by an empty line. A typical HTTP request with a Body looks like this:
POST /login HTTP/1.1
Host: www.example.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 30
username=hello&password=123456
POST requests usually set the Content-Type
to indicate the type of the Body and Content-Length
to specify the length of the Body. This allows the server to correctly respond based on the request's Headers and Body.
Additionally, GET request parameters must be appended to the URL and URL-encoded. For example: http://www.example.com/?a=1&b=K%26R
, where the parameters are a=1
and b=K&R
. Due to URL length limitations, GET request parameters cannot be too extensive. In contrast, POST request parameters have no such length restrictions because they are placed in the Body. Moreover, POST request parameters do not have to be URL-encoded and can be encoded in any format, as long as the Content-Type
is correctly set. A common POST request sending JSON data looks like this:
POST /login HTTP/1.1
Content-Type: application/json
Content-Length: 38
{"username":"bob","password":"123456"}
HTTP responses also consist of Headers and a Body. A typical HTTP response looks like this:
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 133251
<!DOCTYPE html>
<html><body>
<h1>Hello</h1>
</body></html>
The first line of the response always contains the HTTP version, response code, and response message. For example, HTTP/1.1 200 OK
indicates HTTP version 1.1, response code 200, and response message OK. Clients rely solely on the response code to determine if the HTTP response was successful. HTTP defines fixed response codes:
- 1xx: Informational responses, such as
101 Switching Protocols
, commonly used in WebSocket connections. - 2xx: Successful responses, such as
200 OK
for success and206 Partial Content
for partial content. - 3xx: Redirection responses, such as
301 Moved Permanently
and303 See Other
, indicating that the client should resend the request to a specified path. - 4xx: Client error responses, such as
400 Bad Request
due to invalidContent-Type
or other reasons, and404 Not Found
indicating that the specified path does not exist. - 5xx: Server error responses, such as
500 Internal Server Error
and503 Service Unavailable
.
When a browser receives the first HTTP response, it parses the HTML and sends a series of additional HTTP requests, such as GET /logo.jpg HTTP/1.1
to request an image. After the server responds to the image request, it sends the binary content of the image directly to the browser:
HTTP/1.1 200 OK
Content-Type: image/jpeg
Content-Length: 18391
????JFIFHH??XExifMM?i&??X?...(binary JPEG image)
Therefore, servers passively receive HTTP requests from clients and respond to them. Clients send multiple HTTP requests as needed.
For the earliest HTTP/1.0 protocol, each HTTP request required the client to establish a new TCP connection. After receiving the server's response, the TCP connection was closed. Establishing TCP connections is relatively time-consuming. To improve efficiency, the HTTP/1.1 protocol allows multiple request-response cycles within a single TCP connection, significantly enhancing efficiency:
┌─────────┐
┌─────────┐ │░░░░░░░░░│
│O ░░░░░░░│ ├─────────┤
├─────────┤ │░░░░░░░░░│
│ │ ├─────────┤
│ │ │░░░░░░░░░│
└─────────┘ └─────────┘
│ request 1 │
│─────────────────────▶│
│ response 1 │
│◀─────────────────────│
│ request 2 │
│─────────────────────▶│
│ response 2 │
│◀─────────────────────│
│ request 3 │
│─────────────────────▶│
│ response 3 │
│◀─────────────────────│
▼ ▼
Because the HTTP protocol is a request-response protocol, the client must wait for the server's response after sending an HTTP request before sending the next request. If a response is slow, it can block subsequent requests.
To further improve speed, HTTP/2.0 allows clients to send multiple HTTP requests without waiting for responses. Servers can return responses out of order as long as both sides can identify which response corresponds to which request, enabling parallel sending and receiving:
┌─────────┐
┌─────────┐ │░░░░░░░░░│
│O ░░░░░░░│ ├─────────┤
├─────────┤ │░░░░░░░░░│
│ │ ├─────────┤
│ │ │░░░░░░░░░│
└─────────┘ └─────────┘
│ request 1 │
│─────────────────────▶│
│ request 2 │
│─────────────────────▶│
│ response 1 │
│◀─────────────────────│
│ request 3 │
│─────────────────────▶│
│ response 3 │
│◀─────────────────────│
│ response 2 │
│◀─────────────────────│
▼ ▼
As seen, HTTP/2.0 further improves efficiency.
HTTP Programming
Since HTTP involves both client and server sides, similar to TCP, we need to perform both client-side and server-side programming.
In this section, we will not discuss server-side HTTP programming because it essentially involves writing a web server, which is a very complex system and the core content of Java EE development. We will explore it in detail in later chapters.
This section will only discuss client-side HTTP programming.
Since browsers are also a type of HTTP client, client-side HTTP programming behaves similarly to browsers: sending an HTTP request and receiving the server's response to obtain the response content. However, while browsers further parse and render the response content for the user, using Java for HTTP client programming is limited to obtaining the response content.
Let's look at how Java performs HTTP client programming.
Java's standard library provides packages based on HTTP. However, note that early JDK versions use HttpURLConnection
to access HTTP. A typical example is as follows:
java
URL url = new URL("http://www.example.com/path/to/target?a=1&b=2");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
conn.setUseCaches(false);
conn.setConnectTimeout(5000); // Request timeout of 5 seconds
// Set HTTP headers:
conn.setRequestProperty("Accept", "*/*");
conn.setRequestProperty("User-Agent", "Mozilla/5.0 (compatible; MSIE 11; Windows NT 5.1)");
// Connect and send HTTP request:
conn.connect();
// Check if HTTP response is 200:
if (conn.getResponseCode() != 200) {
throw new RuntimeException("bad response");
}
// Get all response headers:
Map<String, List<String>> map = conn.getHeaderFields();
for (String key : map.keySet()) {
System.out.println(key + ": " + map.get(key));
}
// Get response content:
InputStream input = conn.getInputStream();
...
The above code is quite cumbersome to write and requires manual handling of the InputStream
, making it difficult to use.
Starting with Java 11, a new HttpClient
was introduced, which uses a fluent API and greatly simplifies HTTP handling.
Let's see how to use the new HttpClient
. First, create a global HttpClient
instance because HttpClient
internally uses a thread pool to optimize multiple HTTP connections, allowing reuse:
java
static HttpClient httpClient = HttpClient.newBuilder().build();
Using GET Requests to Retrieve Text Content
Here is how to use the new HttpClient
to perform a GET request and retrieve text content:
java
import java.net.URI;
import java.net.http.*;
import java.net.http.HttpClient.Version;
import java.time.Duration;
import java.util.*;
public class Main {
// Global HttpClient:
static HttpClient httpClient = HttpClient.newBuilder().build();
public static void main(String[] args) throws Exception {
String url = "https://www.sina.com.cn/";
HttpRequest request = HttpRequest.newBuilder(new URI(url))
// Set Headers:
.header("User-Agent", "Java HttpClient")
.header("Accept", "*/*")
// Set timeout:
.timeout(Duration.ofSeconds(5))
// Set HTTP version:
.version(Version.HTTP_2)
.build();
HttpResponse<String> response = httpClient.send(request, HttpResponse.BodyHandlers.ofString());
// HTTP allows duplicate Headers, so one Header can correspond to multiple Values:
Map<String, List<String>> headers = response.headers().map();
for (String header : headers.keySet()) {
System.out.println(header + ": " + headers.get(header).get(0));
}
System.out.println(response.body().substring(0, 1024) + "...");
}
}
If you want to retrieve binary content like images, simply replace HttpResponse.BodyHandlers.ofString()
with HttpResponse.BodyHandlers.ofByteArray()
, which will give you a HttpResponse<byte[]>
object. If the response content is large and you do not want to load it entirely into memory at once, you can use HttpResponse.BodyHandlers.ofInputStream()
to obtain an InputStream
.
Using POST Requests
To send a POST request, prepare the Body data to be sent and correctly set the Content-Type
:
java
String url = "http://www.example.com/login";
String body = "username=bob&password=123456";
HttpRequest request = HttpRequest.newBuilder(new URI(url))
// Set Headers:
.header("Accept", "*/*")
.header("Content-Type", "application/x-www-form-urlencoded")
// Set timeout:
.timeout(Duration.ofSeconds(5))
// Set HTTP version:
.version(Version.HTTP_2)
// Use POST and set Body:
.POST(BodyPublishers.ofString(body, StandardCharsets.UTF_8))
.build();
HttpResponse<String> response = httpClient.send(request, HttpResponse.BodyHandlers.ofString());
String s = response.body();
As seen, sending POST data is also straightforward.
Exercise
Use HttpClient
.
Summary
- Java provides
HttpClient
as a new HTTP client programming interface to replace the oldHttpURLConnection
interface. HttpClient
uses a fluent API and leverages built-inBodyPublishers
andBodyHandlers
to handle data more conveniently.