Appearance
Encryption and Security
What are encryption and security in computer systems?
Let’s take an example: Suppose Bob wants to send an email to Alice. During the transmission of the email, a hacker might intercept the email's content, so eavesdropping prevention is necessary. The hacker might also tamper with the email's content, so Alice must be able to detect if the email has been altered. Finally, the hacker could impersonate Bob to send a fake email to Alice, so Alice must be able to identify forged emails.
Therefore, to counter potential security threats, three defenses are required:
- Prevent Eavesdropping
- Prevent Tampering
- Prevent Forgery
Computer encryption technology is designed to achieve these goals. Modern computer cryptography is built on rigorous mathematical theories, and cryptography has gradually developed into a scientific discipline. For the vast majority of developers, designing a secure encryption algorithm is extremely difficult, and verifying whether an encryption algorithm is secure is even more challenging. Currently, encryption algorithms considered secure are those that have not yet been broken. Therefore, to write secure computer programs, we must adhere to the following principles:
- Do not design your own homemade encryption algorithms.
- Do not implement existing encryption algorithms yourself.
- Do not modify existing encryption algorithms yourself.
In this chapter, we will introduce the most commonly used encryption algorithms and how to implement them using Java code.
Encoding Algorithms
To learn encoding algorithms, let's first understand what encoding is.
ASCII encoding is one type of encoding where the letter 'A' is encoded as hexadecimal 0x41
, 'B' as 0x42
, and so on:
Letter | ASCII Encoding |
---|---|
A | 0x41 |
B | 0x42 |
C | 0x43 |
D | 0x44 |
… | … |
Since ASCII encoding can only represent up to 128 characters, to encode more characters, Unicode is required. For example, the Chinese character "中" uses Unicode 0x4e2d
and UTF-8 encoding requires 3 bytes:
Chinese Character | Unicode Encoding | UTF-8 Encoding |
---|---|---|
中 | 0x4e2d | 0xe4b8ad |
文 | 0x6587 | 0xe69687 |
编 | 0x7f16 | 0xe7bc96 |
码 | 0x7801 | 0xe7a081 |
… | … | … |
Therefore, the simplest encoding directly assigns an integer represented by a certain number of bytes to each character. A more complex encoding, like UTF-8, can be derived from an existing encoding scheme.
For example, UTF-8 encoding is a variable-length encoding that can be derived from the given Unicode encoding of a character.
URL Encoding
URL encoding is used when a browser sends data to a server and is typically appended to the URL's parameter section, for example:
https://www.baidu.com/s?wd=%E4%B8%AD%E6%96%87
URL encoding is necessary because, for compatibility reasons, many servers only recognize ASCII characters. But what if the URL contains non-ASCII characters like Chinese or Japanese? No problem; URL encoding follows a set of rules:
- If the character is A-Z, a-z, 0-9, or one of the characters
-
,_
,.
,*
, it remains unchanged. - For other characters, first convert them to UTF-8 encoding, then represent each byte as
%XX
.
For example, the UTF-8 encoding of the character "中" is 0xe4b8ad
, so its URL encoding is %E4%B8%AD
. Note that URL encoding always uses uppercase letters.
The Java standard library provides a URLEncoder
class to perform URL encoding on any string:
java
import java.net.URLEncoder;
import java.nio.charset.StandardCharsets;
public class Main {
public static void main(String[] args) {
String encoded = URLEncoder.encode("中文!", StandardCharsets.UTF_8);
System.out.println(encoded);
}
}
The above code outputs %E4%B8%AD%E6%96%87%21
. Here, the URL encoding for "中" is %E4%B8%AD
, for "文" is %E6%96%87
, and "!"—although an ASCII character—is also encoded as %21
.
Slightly different from the standard URL encoding, URLEncoder
encodes space characters as +
, while the current URL encoding standard requires spaces to be encoded as %20
. However, servers can handle both cases.
If the server receives a URL-encoded string, it can decode it back to the original string. The Java standard library's URLDecoder
can perform decoding:
java
import java.net.URLDecoder;
import java.nio.charset.StandardCharsets;
public class Main {
public static void main(String[] args) {
String decoded = URLDecoder.decode("%E4%B8%AD%E6%96%87%21", StandardCharsets.UTF_8);
System.out.println(decoded);
}
}
Important Note: URL encoding is an encoding algorithm, not an encryption algorithm. The purpose of URL encoding is to encode any text data into text prefixed with %
, ensuring the encoded text only contains A-Z, a-z, 0-9, -
, _
, .
, *
, and %
for ease of handling by browsers and servers.
Base64 Encoding
While URL encoding encodes characters into %xx
formats, Base64 encoding encodes binary data into text format.
Base64 encoding can convert binary data of any length into plain text, containing only the characters A-Z, a-z, 0-9, +
, /
, and =
. The principle behind Base64 is to group every 3 bytes of binary data into 4 groups of 6 bits each, represent them as 4 integers, and then use a lookup table to map these integers to corresponding characters, resulting in the encoded string.
For example, consider 3 bytes of data: e4
, b8
, ad
, which are grouped into 6-bit segments as 39
, 0b
, 22
, and 2d
:
┌───────────────┬───────────────┬───────────────┐
│ e4 │ b8 │ ad │
└───────────────┴───────────────┴───────────────┘
┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
│1│1│1│0│0│1│0│0│1│0│1│1│1│0│0│0│1│0│1│0│1│1│0│1│
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
┌───────────┬───────────┬───────────┬───────────┐
│ 39 │ 0b │ 22 │ 2d │
└───────────┴───────────┴───────────┴───────────┘
Since the range of a 6-bit integer is always 0-63, it can be represented using 64 characters: characters A-Z correspond to indices 0-25, a-z to 26-51, 0-9 to 52-61, and the last two indices 62 and 63 are represented by the characters +
and /
, respectively.
In Java, binary data is represented as a byte[]
array. The Java standard library provides the Base64
class to encode and decode byte[]
arrays:
java
import java.util.*;
public class Main {
public static void main(String[] args) {
byte[] input = new byte[] { (byte) 0xe4, (byte) 0xb8, (byte) 0xad };
String b64encoded = Base64.getEncoder().encodeToString(input);
System.out.println(b64encoded);
}
}
The encoded result is 5Lit
. To decode Base64, use the Base64
class as well:
java
import java.util.*;
public class Main {
public static void main(String[] args) {
byte[] output = Base64.getDecoder().decode("5Lit");
System.out.println(Arrays.toString(output)); // [-28, -72, -83]
}
}
What if the length of the input byte[]
array is not a multiple of 3? In such cases, you need to pad the input with one or two 0x00
bytes at the end, and add one =
to the encoded string if one 0x00
is added, or two =
s if two 0x00
s are added. During decoding, remove the added 0x00
bytes at the end.
In reality, because the length of the encoded string plus =
is always a multiple of 4, you can calculate the original byte[]
even without =
padding. When encoding Base64, you can use withoutPadding()
to remove =
, and the decoded result will be the same:
java
import java.util.*;
public class Main {
public static void main(String[] args) {
byte[] input = new byte[] { (byte) 0xe4, (byte) 0xb8, (byte) 0xad, 0x21 };
String b64encoded = Base64.getEncoder().encodeToString(input);
String b64encoded2 = Base64.getEncoder().withoutPadding().encodeToString(input);
System.out.println(b64encoded);
System.out.println(b64encoded2);
byte[] output = Base64.getDecoder().decode(b64encoded2);
System.out.println(Arrays.toString(output));
}
}
Since standard Base64 encoding includes +
, /
, and =
, it is not suitable for placing Base64-encoded strings in URLs. A URL-safe Base64 encoding replaces +
with -
and /
with _
:
java
import java.util.*;
public class Main {
public static void main(String[] args) {
byte[] input = new byte[] { 0x01, 0x02, 0x7f, 0x00 };
String b64encoded = Base64.getUrlEncoder().encodeToString(input);
System.out.println(b64encoded);
byte[] output = Base64.getUrlDecoder().decode(b64encoded);
System.out.println(Arrays.toString(output));
}
}
The purpose of Base64 encoding is to convert binary data into a text format, making it easier to handle binary data in many text-based protocols. For example, the email protocol is a text-based protocol; if you want to attach a binary file to an email, you can use Base64 encoding and transmit it as text.
Disadvantage of Base64 Encoding: Transmission efficiency decreases because it increases the original data length by one-third.
Like URL encoding, Base64 encoding is an encoding algorithm, not an encryption algorithm.
If you replace the 64-character Base64 table with 32, 48, or 58 characters, you can use Base32 encoding, Base48 encoding, and Base58 encoding respectively. The fewer characters used, the lower the encoding efficiency.
Summary
- URL Encoding and Base64 Encoding are both encoding algorithms; they are not encryption algorithms.
- URL Encoding aims to encode any text data into
%
-prefixed text for easy handling by browsers and servers. - Base64 Encoding aims to encode any binary data into text, but the encoded data volume increases by one-third.