translation: Add the translation of the data structure chapter (#1007)
* Add the translation of the data structure chapter. Synchronize the headings in mkdocs-en.yml * Fix a typopull/1008/head^2
parent
1ee0a7a7bf
commit
42523b8879
@ -0,0 +1,164 @@
|
||||
# Fundamental Data Types
|
||||
|
||||
When we think of data in computers, we imagine various forms like text, images, videos, voice, 3D models, etc. Despite their different organizational forms, they are all composed of various fundamental data types.
|
||||
|
||||
**Fundamental data types are those that the CPU can directly operate on** and are directly used in algorithms, mainly including the following.
|
||||
|
||||
- Integer types: `byte`, `short`, `int`, `long`.
|
||||
- Floating-point types: `float`, `double`, used to represent decimals.
|
||||
- Character type: `char`, used to represent letters, punctuation, and even emojis in various languages.
|
||||
- Boolean type: `bool`, used for "yes" or "no" decisions.
|
||||
|
||||
**Fundamental data types are stored in computers in binary form**. One binary digit is equal to 1 bit. In most modern operating systems, 1 byte consists of 8 bits.
|
||||
|
||||
The range of values for fundamental data types depends on the size of the space they occupy. Below, we take Java as an example.
|
||||
|
||||
- The integer type `byte` occupies 1 byte = 8 bits and can represent \(2^8\) numbers.
|
||||
- The integer type `int` occupies 4 bytes = 32 bits and can represent \(2^{32}\) numbers.
|
||||
|
||||
The following table lists the space occupied, value range, and default values of various fundamental data types in Java. This table does not need to be memorized, but understood roughly and referred to when needed.
|
||||
|
||||
<p align="center"> Table <id> Space Occupied and Value Range of Fundamental Data Types </p>
|
||||
|
||||
| Type | Symbol | Space Occupied | Minimum Value | Maximum Value | Default Value |
|
||||
| ------- | -------- | -------------- | -------------------------- | ------------------------- | ---------------- |
|
||||
| Integer | `byte` | 1 byte | \(-2^7\) (\(-128\)) | \(2^7 - 1\) (\(127\)) | 0 |
|
||||
| | `short` | 2 bytes | \(-2^{15}\) | \(2^{15} - 1\) | 0 |
|
||||
| | `int` | 4 bytes | \(-2^{31}\) | \(2^{31} - 1\) | 0 |
|
||||
| | `long` | 8 bytes | \(-2^{63}\) | \(2^{63} - 1\) | 0 |
|
||||
| Float | `float` | 4 bytes | \(1.175 \times 10^{-38}\) | \(3.403 \times 10^{38}\) | \(0.0\text{f}\) |
|
||||
| | `double` | 8 bytes | \(2.225 \times 10^{-308}\) | \(1.798 \times 10^{308}\) | 0.0 |
|
||||
| Char | `char` | 2 bytes | 0 | \(2^{16} - 1\) | 0 |
|
||||
| Boolean | `bool` | 1 byte | \(\text{false}\) | \(\text{true}\) | \(\text{false}\) |
|
||||
|
||||
Please note that the above table is specific to Java's fundamental data types. Each programming language has its own data type definitions, and their space occupied, value ranges, and default values may differ.
|
||||
|
||||
- In Python, the integer type `int` can be of any size, limited only by available memory; the floating-point `float` is double precision 64-bit; there is no `char` type, as a single character is actually a string `str` of length 1.
|
||||
- C and C++ do not specify the size of fundamental data types, which varies with implementation and platform. The above table follows the LP64 [data model](https://en.cppreference.com/w/cpp/language/types#Properties), used for Unix 64-bit operating systems including Linux and macOS.
|
||||
- The size of `char` in C and C++ is 1 byte, while in most programming languages, it depends on the specific character encoding method, as detailed in the "Character Encoding" chapter.
|
||||
- Even though representing a boolean only requires 1 bit (0 or 1), it is usually stored in memory as 1 byte. This is because modern computer CPUs typically use 1 byte as the smallest addressable memory unit.
|
||||
|
||||
So, what is the connection between fundamental data types and data structures? We know that data structures are ways to organize and store data in computers. The focus here is on "structure" rather than "data".
|
||||
|
||||
If we want to represent "a row of numbers", we naturally think of using an array. This is because the linear structure of an array can represent the adjacency and order of numbers, but whether the stored content is an integer `int`, a decimal `float`, or a character `char`, is irrelevant to the "data structure".
|
||||
|
||||
In other words, **fundamental data types provide the "content type" of data, while data structures provide the "way of organizing" data**. For example, in the following code, we use the same data structure (array) to store and represent different fundamental data types, including `int`, `float`, `char`, `bool`, etc.
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python title=""
|
||||
# Using various fundamental data types to initialize arrays
|
||||
numbers: list[int] = [0] * 5
|
||||
decimals: list[float] = [0.0] * 5
|
||||
# Python's characters are actually strings of length 1
|
||||
characters: list[str] = ['0'] * 5
|
||||
bools: list[bool] = [False] * 5
|
||||
# Python's lists can freely store various fundamental data types and object references
|
||||
data = [0, 0.0, 'a', False, ListNode(0)]
|
||||
```
|
||||
|
||||
=== "C++"
|
||||
|
||||
```cpp title=""
|
||||
// Using various fundamental data types to initialize arrays
|
||||
int numbers[5];
|
||||
float decimals[5];
|
||||
char characters[5];
|
||||
bool bools[5];
|
||||
```
|
||||
|
||||
=== "Java"
|
||||
|
||||
```java title=""
|
||||
// Using various fundamental data types to initialize arrays
|
||||
int[] numbers = new int[5];
|
||||
float[] decimals = new float[5];
|
||||
char[] characters = new char[5];
|
||||
boolean[] bools = new boolean[5];
|
||||
```
|
||||
|
||||
=== "C#"
|
||||
|
||||
```csharp title=""
|
||||
// Using various fundamental data types to initialize arrays
|
||||
int[] numbers = new int[5];
|
||||
float[] decimals = new float[5];
|
||||
char[] characters = new char[5];
|
||||
bool[] bools = new bool[5];
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
|
||||
```go title=""
|
||||
// Using various fundamental data types to initialize arrays
|
||||
var numbers = [5]int{}
|
||||
var decimals = [5]float64{}
|
||||
var characters = [5]byte{}
|
||||
var bools = [5]bool{}
|
||||
```
|
||||
|
||||
=== "Swift"
|
||||
|
||||
```swift title=""
|
||||
// Using various fundamental data types to initialize arrays
|
||||
let numbers = Array(repeating: Int(), count: 5)
|
||||
let decimals = Array(repeating: Double(), count: 5)
|
||||
let characters = Array(repeating: Character("a"), count: 5)
|
||||
let bools = Array(repeating: Bool(), count: 5)
|
||||
```
|
||||
|
||||
=== "JS"
|
||||
|
||||
```javascript title=""
|
||||
// JavaScript's arrays can freely store various fundamental data types and objects
|
||||
const array = [0, 0.0, 'a', false];
|
||||
```
|
||||
|
||||
=== "TS"
|
||||
|
||||
```typescript title=""
|
||||
// Using various fundamental data types to initialize arrays
|
||||
const numbers: number[] = [];
|
||||
const characters: string[] = [];
|
||||
const bools: boolean[] = [];
|
||||
```
|
||||
|
||||
=== "Dart"
|
||||
|
||||
```dart title=""
|
||||
// Using various fundamental data types to initialize arrays
|
||||
List<int> numbers = List.filled(5, 0);
|
||||
List<double> decimals = List.filled(5, 0.0);
|
||||
List<String> characters = List.filled(5, 'a');
|
||||
List<bool> bools = List.filled(5, false);
|
||||
```
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust title=""
|
||||
// Using various fundamental data types to initialize arrays
|
||||
let numbers: Vec<i32> = vec![0; 5];
|
||||
let decimals: Vec<f32> = vec![0.0, 5];
|
||||
let characters: Vec<char> = vec!['0'; 5];
|
||||
let bools: Vec<bool> = vec![false; 5];
|
||||
```
|
||||
|
||||
=== "C"
|
||||
|
||||
```c title=""
|
||||
// Using various fundamental data types to initialize arrays
|
||||
int numbers[10];
|
||||
float decimals[10];
|
||||
char characters[10];
|
||||
bool bools[10];
|
||||
```
|
||||
|
||||
=== "Zig"
|
||||
|
||||
```zig title=""
|
||||
// Using various fundamental data types to initialize arrays
|
||||
var numbers: [5]i32 = undefined;
|
||||
var decimals: [5]f32 = undefined;
|
||||
var characters: [5]u8 = undefined;
|
||||
var bools: [5]bool = undefined;
|
||||
```
|
After Width: | Height: | Size: 60 KiB |
After Width: | Height: | Size: 20 KiB |
After Width: | Height: | Size: 26 KiB |
@ -1,49 +1,48 @@
|
||||
# Classification Of Data Structures
|
||||
# Classification of Data Structures
|
||||
|
||||
Common data structures include arrays, linked lists, stacks, queues, hash tables, trees, heaps, and graphs. They can be divided into two categories: logical structure and physical structure.
|
||||
Common data structures include arrays, linked lists, stacks, queues, hash tables, trees, heaps, and graphs. They can be classified into two dimensions: "Logical Structure" and "Physical Structure".
|
||||
|
||||
## Logical Structures: Linear And Non-linear
|
||||
## Logical Structure: Linear and Non-Linear
|
||||
|
||||
**Logical structures reveal logical relationships between data elements**. In arrays and linked lists, data are arranged in sequential order, reflecting the linear relationship between data; while in trees, data are arranged hierarchically from the top down, showing the derived relationship between ancestors and descendants; and graphs are composed of nodes and edges, reflecting the complex network relationship.
|
||||
**The logical structure reveals the logical relationships between data elements**. In arrays and linked lists, data is arranged in a certain order, reflecting a linear relationship between them. In trees, data is arranged from top to bottom in layers, showing a "ancestor-descendant" hierarchical relationship. Graphs, consisting of nodes and edges, represent complex network relationships.
|
||||
|
||||
As shown in the figure below, logical structures can further be divided into "linear data structure" and "non-linear data structure". Linear data structures are more intuitive, meaning that the data are arranged linearly in terms of logical relationships; non-linear data structures, on the other hand, are arranged non-linearly.
|
||||
As shown in the figure below, logical structures can be divided into two major categories: "Linear" and "Non-linear". Linear structures are more intuitive, indicating data is arranged linearly in logical relationships; non-linear structures, conversely, are arranged non-linearly.
|
||||
|
||||
- **Linear data structures**: arrays, linked lists, stacks, queues, hash tables.
|
||||
- **Nonlinear data structures**: trees, heaps, graphs, hash tables.
|
||||
- **Linear Data Structures**: Arrays, Linked Lists, Stacks, Queues, Hash Tables.
|
||||
- **Non-Linear Data Structures**: Trees, Heaps, Graphs, Hash Tables.
|
||||
|
||||
![Linear and nonlinear data structures](classification_of_data_structure.assets/classification_logic_structure.png)
|
||||
![Linear and Non-Linear Data Structures](classification_of_data_structure.assets/classification_logic_structure.png)
|
||||
|
||||
Non-linear data structures can be further divided into tree and graph structures.
|
||||
Non-linear data structures can be further divided into tree structures and network structures.
|
||||
|
||||
- **Linear structures**: arrays, linked lists, queues, stacks, hash tables, with one-to-one sequential relationship between elements.
|
||||
- **Tree structure**: tree, heap, hash table, with one-to-many relationship between elements.
|
||||
- **Graph**: graph with many-to-many relationship between elements.
|
||||
- **Tree Structures**: Trees, Heaps, Hash Tables, where elements have one-to-many relationships.
|
||||
- **Network Structures**: Graphs, where elements have many-to-many relationships.
|
||||
|
||||
## Physical Structure: Continuous vs. Dispersed
|
||||
## Physical Structure: Contiguous and Dispersed
|
||||
|
||||
**When an algorithm is running, the data being processed is stored in memory**. The figure below shows a computer memory module where each black square represents a memory space. We can think of the memory as a giant Excel sheet in which each cell can store data of a certain size.
|
||||
**When an algorithm program runs, the data being processed is mainly stored in memory**. The following figure shows a computer memory stick, each black block containing a memory space. We can imagine memory as a huge Excel spreadsheet, where each cell can store a certain amount of data.
|
||||
|
||||
**The system accesses the data at the target location by means of a memory address**. As shown in the figure below, the computer assigns a unique identifier to each cell in the table according to specific rules, ensuring that each memory space has a unique memory address. With these addresses, the program can access the data in memory.
|
||||
**The system accesses data at the target location through memory addresses**. As shown in the figure below, the computer allocates numbers to each cell in the table according to specific rules, ensuring each memory space has a unique memory address. With these addresses, programs can access data in memory.
|
||||
|
||||
![memory_strip, memory_space, memory_address](classification_of_data_structure.assets/computer_memory_location.png)
|
||||
![Memory Stick, Memory Spaces, Memory Addresses](classification_of_data_structure.assets/computer_memory_location.png)
|
||||
|
||||
!!! tip
|
||||
|
||||
It is worth noting that comparing memory to the Excel sheet is a simplified analogy. The actual memory working mechanism is more complicated, involving the concepts of address, space, memory management, cache mechanism, virtual and physical memory.
|
||||
It's worth noting that comparing memory to an Excel spreadsheet is a simplified analogy. The actual working mechanism of memory is more complex, involving concepts like address space, memory management, cache mechanisms, virtual memory, and physical memory.
|
||||
|
||||
Memory is a shared resource for all programs, and when a block of memory is occupied by one program, it cannot be used by other programs at the same time. **Therefore, considering memory resources is crucial in designing data structures and algorithms**. For example, the algorithm's peak memory usage should not exceed the remaining free memory of the system; if there is a lack of contiguous memory blocks, then the data structure chosen must be able to be stored in non-contiguous memory blocks.
|
||||
Memory is a shared resource for all programs. When a block of memory is occupied by one program, it cannot be used by others simultaneously. **Therefore, memory resources are an important consideration in the design of data structures and algorithms**. For example, the peak memory usage of an algorithm should not exceed the system's remaining free memory. If there is a lack of contiguous large memory spaces, the chosen data structure must be able to store data in dispersed memory spaces.
|
||||
|
||||
As shown in the figure below, **Physical structure reflects the way data is stored in computer memory and it can be divided into consecutive space storage (arrays) and distributed space storage (linked lists)**. The physical structure determines how data is accessed, updated, added, deleted, etc. Logical and physical structure complement each other in terms of time efficiency and space efficiency.
|
||||
As shown in the figure below, **the physical structure reflects how data is stored in computer memory**, which can be divided into contiguous space storage (arrays) and dispersed space storage (linked lists). The physical structure determines from the bottom level how data is accessed, updated, added, or deleted. Both types of physical structures exhibit complementary characteristics in terms of time efficiency and space efficiency.
|
||||
|
||||
![continuous vs. decentralized spatial storage](classification_of_data_structure.assets/classification_phisical_structure.png)
|
||||
![Contiguous Space Storage and Dispersed Space Storage](classification_of_data_structure.assets/classification_phisical_structure.png)
|
||||
|
||||
**It is worth stating that all data structures are implemented based on arrays, linked lists, or a combination of the two**. For example, stacks and queues can be implemented using both arrays and linked lists; and implementations of hash tables may contain both arrays and linked lists.
|
||||
It's important to note that **all data structures are implemented based on arrays, linked lists, or a combination of both**. For example, stacks and queues can be implemented using either arrays or linked lists; while hash tables may include both arrays and linked lists.
|
||||
|
||||
- **Array-based structures**: stacks, queues, hash tables, trees, heaps, graphs, matrices, tensors (arrays of dimension $\geq 3$), and so on.
|
||||
- **Linked list-based structures**: stacks, queues, hash tables, trees, heaps, graphs, etc.
|
||||
- **Array-based Implementations**: Stacks, Queues, Hash Tables, Trees, Heaps, Graphs, Matrices, Tensors (arrays with dimensions $\geq 3$).
|
||||
- **Linked List-based Implementations**: Stacks, Queues, Hash Tables, Trees, Heaps, Graphs, etc.
|
||||
|
||||
Data structures based on arrays are also known as "static data structures", which means that such structures' length remains constant after initialization. In contrast, data structures based on linked lists are called "dynamic data structures", meaning that their length can be adjusted during program execution after initialization.
|
||||
Data structures implemented based on arrays are also called “Static Data Structures,” meaning their length cannot be changed after initialization. Conversely, those based on linked lists are called “Dynamic Data Structures,” which can still adjust their size during program execution.
|
||||
|
||||
!!! tip
|
||||
|
||||
If you find it difficult to understand the physical structure, it is recommended that you read the next chapter, "Arrays and Linked Lists," before reviewing this section.
|
||||
If you find it difficult to understand the physical structure, it's recommended to read the next chapter first and then revisit this section.
|
||||
|
@ -1,13 +1,13 @@
|
||||
# Data Structure
|
||||
# Data Structures
|
||||
|
||||
<div class="center-table" markdown>
|
||||
|
||||
![data structure](../assets/covers/chapter_data_structure.jpg)
|
||||
![Data Structures](../assets/covers/chapter_data_structure.jpg)
|
||||
|
||||
</div>
|
||||
|
||||
!!! abstract
|
||||
|
||||
Data structures resemble a stable and diverse framework.
|
||||
Data structures serve as a robust and diverse framework.
|
||||
|
||||
They serve as a blueprint for organizing data orderly, enabling algorithms to come to life upon this foundation.
|
||||
They offer a blueprint for the orderly organization of data, upon which algorithms come to life.
|
||||
|
After Width: | Height: | Size: 21 KiB |
After Width: | Height: | Size: 24 KiB |
@ -0,0 +1,150 @@
|
||||
# Number Encoding *
|
||||
|
||||
!!! note
|
||||
|
||||
In this book, chapters marked with an * symbol are optional reads. If you are short on time or find them challenging, you may skip these initially and return to them after completing the essential chapters.
|
||||
|
||||
## Integer Encoding
|
||||
|
||||
In the table from the previous section, we noticed that all integer types can represent one more negative number than positive numbers, such as the `byte` range of $[-128, 127]$. This phenomenon, somewhat counterintuitive, is rooted in the concepts of sign-magnitude, one's complement, and two's complement encoding.
|
||||
|
||||
Firstly, it's important to note that **numbers are stored in computers using the two's complement form**. Before analyzing why this is the case, let's define these three encoding methods:
|
||||
|
||||
- **Sign-magnitude**: The highest bit of a binary representation of a number is considered the sign bit, where $0$ represents a positive number and $1$ represents a negative number. The remaining bits represent the value of the number.
|
||||
- **One's complement**: The one's complement of a positive number is the same as its sign-magnitude. For negative numbers, it's obtained by inverting all bits except the sign bit.
|
||||
- **Two's complement**: The two's complement of a positive number is the same as its sign-magnitude. For negative numbers, it's obtained by adding $1$ to their one's complement.
|
||||
|
||||
The following diagram illustrates the conversions among sign-magnitude, one's complement, and two's complement:
|
||||
|
||||
![Conversions between Sign-Magnitude, One's Complement, and Two's Complement](number_encoding.assets/1s_2s_complement.png)
|
||||
|
||||
Although sign-magnitude is the most intuitive, it has limitations. For one, **negative numbers in sign-magnitude cannot be directly used in calculations**. For example, in sign-magnitude, calculating $1 + (-2)$ results in $-3$, which is incorrect.
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
& 1 + (-2) \newline
|
||||
& \rightarrow 0000 \; 0001 + 1000 \; 0010 \newline
|
||||
& = 1000 \; 0011 \newline
|
||||
& \rightarrow -3
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
To address this, computers introduced the **one's complement**. If we convert to one's complement and calculate $1 + (-2)$, then convert the result back to sign-magnitude, we get the correct result of $-1$.
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
& 1 + (-2) \newline
|
||||
& \rightarrow 0000 \; 0001 \; \text{(Sign-magnitude)} + 1000 \; 0010 \; \text{(Sign-magnitude)} \newline
|
||||
& = 0000 \; 0001 \; \text{(One's complement)} + 1111 \; 1101 \; \text{(One's complement)} \newline
|
||||
& = 1111 \; 1110 \; \text{(One's complement)} \newline
|
||||
& = 1000 \; 0001 \; \text{(Sign-magnitude)} \newline
|
||||
& \rightarrow -1
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
Additionally, **there are two representations of zero in sign-magnitude**: $+0$ and $-0$. This means two different binary encodings for zero, which could lead to ambiguity. For example, in conditional checks, not differentiating between positive and negative zero might result in incorrect outcomes. Addressing this ambiguity would require additional checks, potentially reducing computational efficiency.
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
+0 & \rightarrow 0000 \; 0000 \newline
|
||||
-0 & \rightarrow 1000 \; 0000
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
Like sign-magnitude, one's complement also suffers from the positive and negative zero ambiguity. Therefore, computers further introduced the **two's complement**. Let's observe the conversion process for negative zero in sign-magnitude, one's complement, and two's complement:
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
-0 \rightarrow \; & 1000 \; 0000 \; \text{(Sign-magnitude)} \newline
|
||||
= \; & 1111 \; 1111 \; \text{(One's complement)} \newline
|
||||
= 1 \; & 0000 \; 0000 \; \text{(Two's complement)} \newline
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
Adding $1$ to the one's complement of negative zero produces a carry, but with `byte` length being only 8 bits, the carried-over $1$ to the 9th bit is discarded. Therefore, **the two's complement of negative zero is $0000 \; 0000$**, the same as positive zero, thus resolving the ambiguity.
|
||||
|
||||
One last puzzle is the $[-128, 127]$ range for `byte`, with an additional negative number, $-128$. We observe that for the interval $[-127, +127]$, all integers have corresponding sign-magnitude, one's complement, and two's complement, and these can be converted between each other.
|
||||
|
||||
However, **the two's complement $1000 \; 0000$ is an exception without a corresponding sign-magnitude**. According to the conversion method, its sign-magnitude would be $0000 \; 0000$, which is a contradiction since this represents zero, and its two's complement should be itself. Computers designate this special two's complement $1000 \; 0000$ as representing $-128$. In fact, the calculation of $(-1) + (-127)$ in two's complement results in $-128$.
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
& (-127) + (-1) \newline
|
||||
& \rightarrow 1111 \; 1111 \; \text{(Sign-magnitude)} + 1000 \; 0001 \; \text{(Sign-magnitude)} \newline
|
||||
& = 1000 \; 0000 \; \text{(One's complement)} + 1111 \; 1110 \; \text{(One's complement)} \newline
|
||||
& = 1000 \; 0001 \; \text{(Two's complement)} + 1111 \; 1111 \; \text{(Two's complement)} \newline
|
||||
& = 1000 \; 0000 \; \text{(Two's complement)} \newline
|
||||
& \rightarrow -128
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
As you might have noticed, all these calculations are additions, hinting at an important fact: **computers' internal hardware circuits are primarily designed around addition operations**. This is because addition is simpler to implement in hardware compared to other operations like multiplication, division, and subtraction, allowing for easier parallelization and faster computation.
|
||||
|
||||
It's important to note that this doesn't mean computers can only perform addition. **By combining addition with basic logical operations, computers can execute a variety of other mathematical operations**. For example, the subtraction $a - b$ can be translated into $a + (-b)$; multiplication and division can be translated into multiple additions or subtractions.
|
||||
|
||||
We can now summarize the reason for using two's complement in computers: with two's complement representation, computers can use the same circuits and operations to handle both positive and negative number addition, eliminating the need for special hardware circuits for subtraction and avoiding the ambiguity of positive and negative zero. This greatly simplifies hardware design and enhances computational efficiency.
|
||||
|
||||
The design of two's complement is quite ingenious, and due to space constraints, we'll stop here. Interested readers are encouraged to explore further.
|
||||
|
||||
## Floating-Point Number Encoding
|
||||
|
||||
You might have noticed something intriguing: despite having the same length of 4 bytes, why does a `float` have a much larger range of values compared to an `int`? This seems counterintuitive, as one would expect the range to shrink for `float` since it needs to represent fractions.
|
||||
|
||||
In fact, **this is due to the different representation method used by floating-point numbers (`float`)**. Let's consider a 32-bit binary number as:
|
||||
|
||||
$$
|
||||
b_{31} b_{30} b_{29} \ldots b_2 b_1 b_0
|
||||
$$
|
||||
|
||||
According to the IEEE 754 standard, a 32-bit `float` consists of the following three parts:
|
||||
|
||||
- Sign bit $\mathrm{S}$: Occupies 1 bit, corresponding to $b_{31}$.
|
||||
- Exponent bit $\mathrm{E}$: Occupies 8 bits, corresponding to $b_{30} b_{29} \ldots b_{23}$.
|
||||
- Fraction bit $\mathrm{N}$: Occupies 23 bits, corresponding to $b_{22} b_{21} \ldots b_0$.
|
||||
|
||||
The value of a binary `float` number is calculated as:
|
||||
|
||||
$$
|
||||
\text{val} = (-1)^{b_{31}} \times 2^{\left(b_{30} b_{29} \ldots b_{23}\right)_2 - 127} \times \left(1 . b_{22} b_{21} \ldots b_0\right)_2
|
||||
$$
|
||||
|
||||
Converted to a decimal formula, this becomes:
|
||||
|
||||
$$
|
||||
\text{val} = (-1)^{\mathrm{S}} \times 2^{\mathrm{E} - 127} \times (1 + \mathrm{N})
|
||||
$$
|
||||
|
||||
The range of each component is:
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
\mathrm{S} \in & \{ 0, 1\}, \quad \mathrm{E} \in \{ 1, 2, \dots, 254 \} \newline
|
||||
(1 + \mathrm{N}) = & (1 + \sum_{i=1}^{23} b_{23-i} \times 2^{-i}) \subset [1, 2 - 2^{-23}]
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
![Example Calculation of a float in IEEE 754 Standard](number_encoding.assets/ieee_754_float.png)
|
||||
|
||||
Observing the diagram, given an example data $\mathrm{S} = 0$, $\mathrm{E} = 124$, $\mathrm{N} = 2^{-2} + 2^{-3} = 0.375$, we have:
|
||||
|
||||
$$
|
||||
\text{val} = (-1)^0 \times 2^{124 - 127} \times (1 + 0.375) = 0.171875
|
||||
$$
|
||||
|
||||
Now we can answer the initial question: **The representation of `float` includes an exponent bit, leading to a much larger range than `int`**. Based on the above calculation, the maximum positive number representable by `float` is approximately $2^{254 - 127} \times (2 - 2^{-23}) \approx 3.4 \times 10^{38}$, and the minimum negative number is obtained by switching the sign bit.
|
||||
|
||||
**However, the trade-off for `float`'s expanded range is a sacrifice in precision**. The integer type `int` uses all 32 bits to represent the number, with values evenly distributed; but due to the exponent bit, the larger the value of a `float`, the greater the difference between adjacent numbers.
|
||||
|
||||
As shown in the table below, exponent bits $E = 0$ and $E = 255$ have special meanings, **used to represent zero, infinity, $\mathrm{NaN}$, etc.**
|
||||
|
||||
<p align="center"> Table <id> Meaning of Exponent Bits </p>
|
||||
|
||||
| Exponent Bit E | Fraction Bit $\mathrm{N} = 0$ | Fraction Bit $\mathrm{N} \ne 0$ | Calculation Formula |
|
||||
| ------------------ | ----------------------------- | ------------------------------- | ---------------------------------------------------------------------- |
|
||||
| $0$ | $\pm 0$ | Subnormal Numbers | $(-1)^{\mathrm{S}} \times 2^{-126} \times (0.\mathrm{N})$ |
|
||||
| $1, 2, \dots, 254$ | Normal Numbers | Normal Numbers | $(-1)^{\mathrm{S}} \times 2^{(\mathrm{E} -127)} \times (1.\mathrm{N})$ |
|
||||
| $255$ | $\pm \infty$ | $\mathrm{NaN}$ | |
|
||||
|
||||
It's worth noting that subnormal numbers significantly improve the precision of floating-point numbers. The smallest positive normal number is $2^{-126}$, and the smallest positive subnormal number is $2^{-126} \times 2^{-23}$.
|
||||
|
||||
Double-precision `double` also uses a similar representation method to `float`, which is not elaborated here for brevity.
|
@ -0,0 +1,33 @@
|
||||
# Summary
|
||||
|
||||
### Key Review
|
||||
|
||||
- Data structures can be categorized from two perspectives: logical structure and physical structure. Logical structure describes the logical relationships between data elements, while physical structure describes how data is stored in computer memory.
|
||||
- Common logical structures include linear, tree-like, and network structures. We generally classify data structures into linear (arrays, linked lists, stacks, queues) and non-linear (trees, graphs, heaps) based on their logical structure. The implementation of hash tables may involve both linear and non-linear data structures.
|
||||
- When a program runs, data is stored in computer memory. Each memory space has a corresponding memory address, and the program accesses data through these addresses.
|
||||
- Physical structures are primarily divided into contiguous space storage (arrays) and dispersed space storage (linked lists). All data structures are implemented using arrays, linked lists, or a combination of both.
|
||||
- Basic data types in computers include integers (`byte`, `short`, `int`, `long`), floating-point numbers (`float`, `double`), characters (`char`), and booleans (`boolean`). Their range depends on the size of the space occupied and the representation method.
|
||||
- Original code, complement code, and two's complement code are three methods of encoding numbers in computers, and they can be converted into each other. The highest bit of the original code of an integer is the sign bit, and the remaining bits represent the value of the number.
|
||||
- Integers are stored in computers in the form of two's complement. In this representation, the computer can treat the addition of positive and negative numbers uniformly, without the need for special hardware circuits for subtraction, and there is no ambiguity of positive and negative zero.
|
||||
- The encoding of floating-point numbers consists of 1 sign bit, 8 exponent bits, and 23 fraction bits. Due to the presence of the exponent bit, the range of floating-point numbers is much greater than that of integers, but at the cost of sacrificing precision.
|
||||
- ASCII is the earliest English character set, 1 byte in length, and includes 127 characters. The GBK character set is a commonly used Chinese character set, including more than 20,000 Chinese characters. Unicode strives to provide a complete character set standard, including characters from various languages worldwide, thus solving the problem of garbled characters caused by inconsistent character encoding methods.
|
||||
- UTF-8 is the most popular Unicode encoding method, with excellent universality. It is a variable-length encoding method with good scalability and effectively improves the efficiency of space usage. UTF-16 and UTF-32 are fixed-length encoding methods. When encoding Chinese characters, UTF-16 occupies less space than UTF-8. Programming languages like Java and C# use UTF-16 encoding by default.
|
||||
|
||||
### Q & A
|
||||
|
||||
!!! question "Why does a hash table contain both linear and non-linear data structures?"
|
||||
|
||||
The underlying structure of a hash table is an array. To resolve hash collisions, we may use "chaining": each bucket in the array points to a linked list, which, when exceeding a certain threshold, might be transformed into a tree (usually a red-black tree).
|
||||
From a storage perspective, the foundation of a hash table is an array, where each bucket slot might contain a value, a linked list, or a tree. Therefore, hash tables may contain both linear data structures (arrays, linked lists) and non-linear data structures (trees).
|
||||
|
||||
!!! question "Is the length of the `char` type 1 byte?"
|
||||
|
||||
The length of the `char` type is determined by the encoding method used by the programming language. For example, Java, JavaScript, TypeScript, and C# all use UTF-16 encoding (to save Unicode code points), so the length of the char type is 2 bytes.
|
||||
|
||||
!!! question "Is there ambiguity in calling data structures based on arrays 'static data structures'? Because operations like push and pop on stacks are 'dynamic.'"
|
||||
|
||||
While stacks indeed allow for dynamic data operations, the data structure itself remains "static" (with unchangeable length). Even though data structures based on arrays can dynamically add or remove elements, their capacity is fixed. If the data volume exceeds the pre-allocated size, a new, larger array needs to be created, and the contents of the old array copied into it.
|
||||
|
||||
!!! question "When building stacks (queues) without specifying their size, why are they considered 'static data structures'?"
|
||||
|
||||
In high-level programming languages, we don't need to manually specify the initial capacity of stacks (queues); this task is automatically handled internally by the class. For example, the initial capacity of Java's ArrayList is usually 10. Furthermore, the expansion operation is also implemented automatically. See the subsequent "List" chapter for details.
|
Loading…
Reference in new issue