* Capitalize all the headers, list headers and figure captions
* Fix the term "LRU"
* Fix the names of source code link in avl_tree.md
* Capitalize only first letter for nav trees in mkdocs.yml
* Update code comments
* Update linked_list.md
* Update linked_list.md
An "array" is a linear data structure that operates as a lineup of similar items, stored together in a computer's memory in contiguous spaces. It's like a sequence that maintains organized storage. Each item in this lineup has its unique 'spot' known as an "index". Please refer to the figure below to observe how arrays work and grasp these key terms.
An "array" is a linear data structure that operates as a lineup of similar items, stored together in a computer's memory in contiguous spaces. It's like a sequence that maintains organized storage. Each item in this lineup has its unique 'spot' known as an "index". Please refer to the figure below to observe how arrays work and grasp these key terms.
![Array Definition and Storage Method](array.assets/array_definition.png)
![Array definition and storage method](array.assets/array_definition.png)
## Common Operations on Arrays
## Common operations on arrays
### Initializing Arrays
### Initializing arrays
Arrays can be initialized in two ways depending on the needs: either without initial values or with specified initial values. When initial values are not specified, most programming languages will set the array elements to $0$:
Arrays can be initialized in two ways depending on the needs: either without initial values or with specified initial values. When initial values are not specified, most programming languages will set the array elements to $0$:
@ -119,11 +119,11 @@ Arrays can be initialized in two ways depending on the needs: either without ini
var nums = [_]i32{ 1, 3, 2, 5, 4 };
var nums = [_]i32{ 1, 3, 2, 5, 4 };
```
```
### Accessing Elements
### Accessing elements
Elements in an array are stored in contiguous memory spaces, making it simpler to compute each element's memory address. The formula shown in the Figure below aids in determining an element's memory address, utilizing the array's memory address (specifically, the first element's address) and the element's index. This computation streamlines direct access to the desired element.
Elements in an array are stored in contiguous memory spaces, making it simpler to compute each element's memory address. The formula shown in the Figure below aids in determining an element's memory address, utilizing the array's memory address (specifically, the first element's address) and the element's index. This computation streamlines direct access to the desired element.
![Memory Address Calculation for Array Elements](array.assets/array_memory_location_calculation.png)
![Memory address calculation for array elements](array.assets/array_memory_location_calculation.png)
As observed in the above illustration, array indexing conventionally begins at $0$. While this might appear counterintuitive, considering counting usually starts at $1$, within the address calculation formula, **an index is essentially an offset from the memory address**. For the first element's address, this offset is $0$, validating its index as $0$.
As observed in the above illustration, array indexing conventionally begins at $0$. While this might appear counterintuitive, considering counting usually starts at $1$, within the address calculation formula, **an index is essentially an offset from the memory address**. For the first element's address, this offset is $0$, validating its index as $0$.
@ -133,11 +133,11 @@ Accessing elements in an array is highly efficient, allowing us to randomly acce
[file]{array}-[class]{}-[func]{random_access}
[file]{array}-[class]{}-[func]{random_access}
```
```
### Inserting Elements
### Inserting elements
Array elements are tightly packed in memory, with no space available to accommodate additional data between them. Illustrated in Figure below, inserting an element in the middle of an array requires shifting all subsequent elements back by one position to create room for the new element.
Array elements are tightly packed in memory, with no space available to accommodate additional data between them. Illustrated in Figure below, inserting an element in the middle of an array requires shifting all subsequent elements back by one position to create room for the new element.
![Array Element Insertion Example](array.assets/array_insert_element.png)
![Array element insertion example](array.assets/array_insert_element.png)
It's important to note that due to the fixed length of an array, inserting an element will unavoidably result in the loss of the last element in the array. Solutions to address this issue will be explored in the "List" chapter.
It's important to note that due to the fixed length of an array, inserting an element will unavoidably result in the loss of the last element in the array. Solutions to address this issue will be explored in the "List" chapter.
@ -145,11 +145,11 @@ It's important to note that due to the fixed length of an array, inserting an el
[file]{array}-[class]{}-[func]{insert}
[file]{array}-[class]{}-[func]{insert}
```
```
### Deleting Elements
### Deleting elements
Similarly, as depicted in the figure below, to delete an element at index $i$, all elements following index $i$ must be moved forward by one position.
Similarly, as depicted in the figure below, to delete an element at index $i$, all elements following index $i$ must be moved forward by one position.
![Array Element Deletion Example](array.assets/array_remove_element.png)
![Array element deletion example](array.assets/array_remove_element.png)
Please note that after deletion, the former last element becomes "meaningless," hence requiring no specific modification.
Please note that after deletion, the former last element becomes "meaningless," hence requiring no specific modification.
@ -159,11 +159,11 @@ Please note that after deletion, the former last element becomes "meaningless,"
In summary, the insertion and deletion operations in arrays present the following disadvantages:
In summary, the insertion and deletion operations in arrays present the following disadvantages:
- **High Time Complexity**: Both insertion and deletion in an array have an average time complexity of $O(n)$, where $n$ is the length of the array.
- **High time complexity**: Both insertion and deletion in an array have an average time complexity of $O(n)$, where $n$ is the length of the array.
- **Loss of Elements**: Due to the fixed length of arrays, elements that exceed the array's capacity are lost during insertion.
- **Loss of elements**: Due to the fixed length of arrays, elements that exceed the array's capacity are lost during insertion.
- **Waste of Memory**: Initializing a longer array and utilizing only the front part results in "meaningless" end elements during insertion, leading to some wasted memory space.
- **Waste of memory**: Initializing a longer array and utilizing only the front part results in "meaningless" end elements during insertion, leading to some wasted memory space.
### Traversing Arrays
### Traversing arrays
In most programming languages, we can traverse an array either by using indices or by directly iterating over each element:
In most programming languages, we can traverse an array either by using indices or by directly iterating over each element:
@ -171,7 +171,7 @@ In most programming languages, we can traverse an array either by using indices
[file]{array}-[class]{}-[func]{traverse}
[file]{array}-[class]{}-[func]{traverse}
```
```
### Finding Elements
### Finding elements
Locating a specific element within an array involves iterating through the array, checking each element to determine if it matches the desired value.
Locating a specific element within an array involves iterating through the array, checking each element to determine if it matches the desired value.
@ -181,7 +181,7 @@ Because arrays are linear data structures, this operation is commonly referred t
[file]{array}-[class]{}-[func]{find}
[file]{array}-[class]{}-[func]{find}
```
```
### Expanding Arrays
### Expanding arrays
In complex system environments, ensuring the availability of memory space after an array for safe capacity extension becomes challenging. Consequently, in most programming languages, **the length of an array is immutable**.
In complex system environments, ensuring the availability of memory space after an array for safe capacity extension becomes challenging. Consequently, in most programming languages, **the length of an array is immutable**.
@ -191,26 +191,26 @@ To expand an array, it's necessary to create a larger array and then copy the e
[file]{array}-[class]{}-[func]{extend}
[file]{array}-[class]{}-[func]{extend}
```
```
## Advantages and Limitations of Arrays
## Advantages and limitations of arrays
Arrays are stored in contiguous memory spaces and consist of elements of the same type. This approach provides substantial prior information that systems can leverage to optimize the efficiency of data structure operations.
Arrays are stored in contiguous memory spaces and consist of elements of the same type. This approach provides substantial prior information that systems can leverage to optimize the efficiency of data structure operations.
- **High Space Efficiency**: Arrays allocate a contiguous block of memory for data, eliminating the need for additional structural overhead.
- **High space efficiency**: Arrays allocate a contiguous block of memory for data, eliminating the need for additional structural overhead.
- **Support for Random Access**: Arrays allow $O(1)$ time access to any element.
- **Support for random access**: Arrays allow $O(1)$ time access to any element.
- **Cache Locality**: When accessing array elements, the computer not only loads them but also caches the surrounding data, utilizing high-speed cache to enchance subsequent operation speeds.
- **Cache locality**: When accessing array elements, the computer not only loads them but also caches the surrounding data, utilizing high-speed cache to enchance subsequent operation speeds.
However, continuous space storage is a double-edged sword, with the following limitations:
However, continuous space storage is a double-edged sword, with the following limitations:
- **Low Efficiency in Insertion and Deletion**: As arrays accumulate many elements, inserting or deleting elements requires shifting a large number of elements.
- **Low efficiency in insertion and deletion**: As arrays accumulate many elements, inserting or deleting elements requires shifting a large number of elements.
- **Fixed Length**: The length of an array is fixed after initialization. Expanding an array requires copying all data to a new array, incurring significant costs.
- **Fixed length**: The length of an array is fixed after initialization. Expanding an array requires copying all data to a new array, incurring significant costs.
- **Space Wastage**: If the allocated array size exceeds the what is necessary, the extra space is wasted.
- **Space wastage**: If the allocated array size exceeds the what is necessary, the extra space is wasted.
## Typical Applications of Arrays
## Typical applications of arrays
Arrays are fundamental and widely used data structures. They find frequent application in various algorithms and serve in the implementation of complex data structures.
Arrays are fundamental and widely used data structures. They find frequent application in various algorithms and serve in the implementation of complex data structures.
- **Random Access**: Arrays are ideal for storing data when random sampling is required. By generating a random sequence based on indices, we can achieve random sampling efficiently.
- **Random access**: Arrays are ideal for storing data when random sampling is required. By generating a random sequence based on indices, we can achieve random sampling efficiently.
- **Sorting and Searching**: Arrays are the most commonly used data structure for sorting and searching algorithms. Techniques like quick sort, merge sort, binary search, etc., are primarily operate on arrays.
- **Sorting and searching**: Arrays are the most commonly used data structure for sorting and searching algorithms. Techniques like quick sort, merge sort, binary search, etc., are primarily operate on arrays.
- **Lookup Tables**: Arrays serve as efficient lookup tables for quick element or relationship retrieval. For instance, mapping characters to ASCII codes becomes seamless by using the ASCII code values as indices and storing corresponding elements in the array.
- **Lookup tables**: Arrays serve as efficient lookup tables for quick element or relationship retrieval. For instance, mapping characters to ASCII codes becomes seamless by using the ASCII code values as indices and storing corresponding elements in the array.
- **Machine Learning**: Within the domain of neural networks, arrays play a pivotal role in executing crucial linear algebra operations involving vectors, matrices, and tensors. Arrays serve as the primary and most extensively used data structure in neural network programming.
- **Machine learning**: Within the domain of neural networks, arrays play a pivotal role in executing crucial linear algebra operations involving vectors, matrices, and tensors. Arrays serve as the primary and most extensively used data structure in neural network programming.
- **Data Structure Implementation**: Arrays serve as the building blocks for implementing various data structures like stacks, queues, hash tables, heaps, graphs, etc. For instance, the adjacency matrix representation of a graph is essentially a two-dimensional array.
- **Data structure implementation**: Arrays serve as the building blocks for implementing various data structures like stacks, queues, hash tables, heaps, graphs, etc. For instance, the adjacency matrix representation of a graph is essentially a two-dimensional array.
Memory space is a shared resource among all programs. In a complex system environment, available memory can be dispersed throughout the memory space. We understand that the memory allocated for an array must be continuous. However, for very large arrays, finding a sufficiently large contiguous memory space might be challenging. This is where the flexible advantage of linked lists becomes evident.
Memory space is a shared resource among all programs. In a complex system environment, available memory can be dispersed throughout the memory space. We understand that the memory allocated for an array must be continuous. However, for very large arrays, finding a sufficiently large contiguous memory space might be challenging. This is where the flexible advantage of linked lists becomes evident.
@ -6,7 +6,7 @@ A "linked list" is a linear data structure in which each element is a node objec
The design of linked lists allows for their nodes to be distributed across memory locations without requiring contiguous memory addresses.
The design of linked lists allows for their nodes to be distributed across memory locations without requiring contiguous memory addresses.
![Linked List Definition and Storage Method](linked_list.assets/linkedlist_definition.png)
![Linked list definition and storage method](linked_list.assets/linkedlist_definition.png)
As shown in the figure, we see that the basic building block of a linked list is the "node" object. Each node comprises two key components: the node's "value" and a "reference" to the next node.
As shown in the figure, we see that the basic building block of a linked list is the "node" object. Each node comprises two key components: the node's "value" and a "reference" to the next node.
@ -20,7 +20,7 @@ As the code below illustrates, a `ListNode` in a linked list, besides holding a
```python title=""
```python title=""
class ListNode:
class ListNode:
"""Linked List Node Class"""
"""Linked list node class"""
def __init__(self, val: int):
def __init__(self, val: int):
self.val: int = val # Node value
self.val: int = val # Node value
self.next: ListNode | None = None # Reference to the next node
self.next: ListNode | None = None # Reference to the next node
@ -29,7 +29,7 @@ As the code below illustrates, a `ListNode` in a linked list, besides holding a
=== "C++"
=== "C++"
```cpp title=""
```cpp title=""
/* Linked List Node Structure */
/* Linked list node structure */
struct ListNode {
struct ListNode {
int val; // Node value
int val; // Node value
ListNode *next; // Pointer to the next node
ListNode *next; // Pointer to the next node
@ -40,7 +40,7 @@ As the code below illustrates, a `ListNode` in a linked list, besides holding a
=== "Java"
=== "Java"
```java title=""
```java title=""
/* Linked List Node Class */
/* Linked list node class */
class ListNode {
class ListNode {
int val; // Node value
int val; // Node value
ListNode next; // Reference to the next node
ListNode next; // Reference to the next node
@ -51,7 +51,7 @@ As the code below illustrates, a `ListNode` in a linked list, besides holding a
=== "C#"
=== "C#"
```csharp title=""
```csharp title=""
/* Linked List Node Class */
/* Linked list node class */
class ListNode(int x) { // Constructor
class ListNode(int x) { // Constructor
int val = x; // Node value
int val = x; // Node value
ListNode? next; // Reference to the next node
ListNode? next; // Reference to the next node
@ -61,7 +61,7 @@ As the code below illustrates, a `ListNode` in a linked list, besides holding a
=== "Go"
=== "Go"
```go title=""
```go title=""
/* Linked List Node Structure */
/* Linked list node structure */
type ListNode struct {
type ListNode struct {
Val int // Node value
Val int // Node value
Next *ListNode // Pointer to the next node
Next *ListNode // Pointer to the next node
@ -79,7 +79,7 @@ As the code below illustrates, a `ListNode` in a linked list, besides holding a
=== "Swift"
=== "Swift"
```swift title=""
```swift title=""
/* Linked List Node Class */
/* Linked list node class */
class ListNode {
class ListNode {
var val: Int // Node value
var val: Int // Node value
var next: ListNode? // Reference to the next node
var next: ListNode? // Reference to the next node
@ -93,7 +93,7 @@ As the code below illustrates, a `ListNode` in a linked list, besides holding a
@ -105,7 +105,7 @@ As the code below illustrates, a `ListNode` in a linked list, besides holding a
=== "TS"
=== "TS"
```typescript title=""
```typescript title=""
/* Linked List Node Class */
/* Linked list node class */
class ListNode {
class ListNode {
val: number;
val: number;
next: ListNode | null;
next: ListNode | null;
@ -119,7 +119,7 @@ As the code below illustrates, a `ListNode` in a linked list, besides holding a
=== "Dart"
=== "Dart"
```dart title=""
```dart title=""
/* 链表节点类 */
/* Linked list node class */
class ListNode {
class ListNode {
int val; // Node value
int val; // Node value
ListNode? next; // Reference to the next node
ListNode? next; // Reference to the next node
@ -132,7 +132,7 @@ As the code below illustrates, a `ListNode` in a linked list, besides holding a
```rust title=""
```rust title=""
use std::rc::Rc;
use std::rc::Rc;
use std::cell::RefCell;
use std::cell::RefCell;
/* Linked List Node Class */
/* Linked list node class */
#[derive(Debug)]
#[derive(Debug)]
struct ListNode {
struct ListNode {
val: i32, // Node value
val: i32, // Node value
@ -143,7 +143,7 @@ As the code below illustrates, a `ListNode` in a linked list, besides holding a
=== "C"
=== "C"
```c title=""
```c title=""
/* Linked List Node Structure */
/* Linked list node structure */
typedef struct ListNode {
typedef struct ListNode {
int val; // Node value
int val; // Node value
struct ListNode *next; // Pointer to the next node
struct ListNode *next; // Pointer to the next node
@ -168,7 +168,7 @@ As the code below illustrates, a `ListNode` in a linked list, besides holding a
=== "Zig"
=== "Zig"
```zig title=""
```zig title=""
// Linked List Node Class
// Linked list node class
pub fn ListNode(comptime T: type) type {
pub fn ListNode(comptime T: type) type {
return struct {
return struct {
const Self = @This();
const Self = @This();
@ -185,9 +185,9 @@ As the code below illustrates, a `ListNode` in a linked list, besides holding a
}
}
```
```
## Common Operations on Linked Lists
## Common operations on linked lists
### Initializing a Linked List
### Initializing a linked list
Constructing a linked list is a two-step process: first, initializing each node object, and second, forming the reference links between the nodes. After initialization, we can traverse all nodes sequentially from the head node by following the `next` reference.
Constructing a linked list is a two-step process: first, initializing each node object, and second, forming the reference links between the nodes. After initialization, we can traverse all nodes sequentially from the head node by following the `next` reference.
@ -404,31 +404,31 @@ Constructing a linked list is a two-step process: first, initializing each node
The array as a whole is a variable, for instance, the array `nums` includes elements like `nums[0]`, `nums[1]`, and so on, whereas a linked list is made up of several distinct node objects. **We typically refer to a linked list by its head node**, for example, the linked list in the previous code snippet is referred to as `n0`.
The array as a whole is a variable, for instance, the array `nums` includes elements like `nums[0]`, `nums[1]`, and so on, whereas a linked list is made up of several distinct node objects. **We typically refer to a linked list by its head node**, for example, the linked list in the previous code snippet is referred to as `n0`.
### Inserting a Node
### Inserting nodes
Inserting a node into a linked list is very easy. As shown in the figure, let's assume we aim to insert a new node `P` between two adjacent nodes `n0` and `n1`. **This can be achieved by simply modifying two node references (pointers)**, with a time complexity of $O(1)$.
Inserting a node into a linked list is very easy. As shown in the figure, let's assume we aim to insert a new node `P` between two adjacent nodes `n0` and `n1`. **This can be achieved by simply modifying two node references (pointers)**, with a time complexity of $O(1)$.
By comparison, inserting an element into an array has a time complexity of $O(n)$, which becomes less efficient when dealing with large data volumes.
By comparison, inserting an element into an array has a time complexity of $O(n)$, which becomes less efficient when dealing with large data volumes.
![Linked List Node Insertion Example](linked_list.assets/linkedlist_insert_node.png)
![Linked list node insertion example](linked_list.assets/linkedlist_insert_node.png)
```src
```src
[file]{linked_list}-[class]{}-[func]{insert}
[file]{linked_list}-[class]{}-[func]{insert}
```
```
### Deleting a Node
### Deleting nodes
As shown in the figure, deleting a node from a linked list is also very easy, **involving only the modification of a single node's reference (pointer)**.
As shown in the figure, deleting a node from a linked list is also very easy, **involving only the modification of a single node's reference (pointer)**.
It's important to note that even though node `P` continues to point to `n1` after being deleted, it becomes inaccessible during linked list traversal. This effectively means that `P` is no longer a part of the linked list.
It's important to note that even though node `P` continues to point to `n1` after being deleted, it becomes inaccessible during linked list traversal. This effectively means that `P` is no longer a part of the linked list.
![Linked List Node Deletion](linked_list.assets/linkedlist_remove_node.png)
![Linked list node deletion](linked_list.assets/linkedlist_remove_node.png)
```src
```src
[file]{linked_list}-[class]{}-[func]{remove}
[file]{linked_list}-[class]{}-[func]{remove}
```
```
### Accessing Nodes
### Accessing nodes
**Accessing nodes in a linked list is less efficient**. As previously mentioned, any element in an array can be accessed in $O(1)$ time. In contrast, with a linked list, the program involves starting from the head node and sequentially traversing through the nodes until the desired node is found. In other words, to access the $i$-th node in a linked list, the program must iterate through $i - 1$ nodes, resulting in a time complexity of $O(n)$.
**Accessing nodes in a linked list is less efficient**. As previously mentioned, any element in an array can be accessed in $O(1)$ time. In contrast, with a linked list, the program involves starting from the head node and sequentially traversing through the nodes until the desired node is found. In other words, to access the $i$-th node in a linked list, the program must iterate through $i - 1$ nodes, resulting in a time complexity of $O(n)$.
@ -436,7 +436,7 @@ It's important to note that even though node `P` continues to point to `n1` afte
[file]{linked_list}-[class]{}-[func]{access}
[file]{linked_list}-[class]{}-[func]{access}
```
```
### Finding Nodes
### Finding nodes
Traverse the linked list to locate a node whose value matches `target`, and then output the index of that node within the linked list. This procedure is also an example of linear search. The corresponding code is provided below:
Traverse the linked list to locate a node whose value matches `target`, and then output the index of that node within the linked list. This procedure is also an example of linear search. The corresponding code is provided below:
@ -444,11 +444,11 @@ Traverse the linked list to locate a node whose value matches `target`, and then
[file]{linked_list}-[class]{}-[func]{find}
[file]{linked_list}-[class]{}-[func]{find}
```
```
## Arrays vs. Linked Lists
## Arrays vs. linked lists
The table below summarizes the characteristics of arrays and linked lists, and it also compares their efficiencies in various operations. Because they utilize opposing storage strategies, their respective properties and operational efficiencies exhibit distinct contrasts.
The table below summarizes the characteristics of arrays and linked lists, and it also compares their efficiencies in various operations. Because they utilize opposing storage strategies, their respective properties and operational efficiencies exhibit distinct contrasts.
<palign="center"> Table <id> Efficiency Comparison of Arrays and Linked Lists </p>
<palign="center"> Table <id> Efficiency comparison of arrays and linked lists </p>
@ -459,19 +459,19 @@ The table below summarizes the characteristics of arrays and linked lists, and i
| Adding Elements | $O(n)$ | $O(1)$ |
| Adding Elements | $O(n)$ | $O(1)$ |
| Deleting Elements | $O(n)$ | $O(1)$ |
| Deleting Elements | $O(n)$ | $O(1)$ |
## Common Types of Linked Lists
## Common types of linked lists
As shown in the figure, there are three common types of linked lists.
As shown in the figure, there are three common types of linked lists.
- **Singly Linked List**: This is the standard linked list described earlier. Nodes in a singly linked list include a value and a reference to the next node. The first node is known as the head node, and the last node, which points to null (`None`), is the tail node.
- **Singly linked list**: This is the standard linked list described earlier. Nodes in a singly linked list include a value and a reference to the next node. The first node is known as the head node, and the last node, which points to null (`None`), is the tail node.
- **Circular Linked List**: This is formed when the tail node of a singly linked list points back to the head node, creating a loop. In a circular linked list, any node can function as the head node.
- **Circular linked list**: This is formed when the tail node of a singly linked list points back to the head node, creating a loop. In a circular linked list, any node can function as the head node.
- **Doubly Linked List**: In contrast to a singly linked list, a doubly linked list maintains references in two directions. Each node contains references (pointer) to both its successor (the next node) and predecessor (the previous node). Although doubly linked lists offer more flexibility for traversing in either direction, they also consume more memory space.
- **Doubly linked list**: In contrast to a singly linked list, a doubly linked list maintains references in two directions. Each node contains references (pointer) to both its successor (the next node) and predecessor (the previous node). Although doubly linked lists offer more flexibility for traversing in either direction, they also consume more memory space.
=== "Python"
=== "Python"
```python title=""
```python title=""
class ListNode:
class ListNode:
"""Bidirectional linked list node class""""
"""Bidirectional linked list node class"""
def __init__(self, val: int):
def __init__(self, val: int):
self.val: int = val # Node value
self.val: int = val # Node value
self.next: ListNode | None = None # Reference to the successor node
self.next: ListNode | None = None # Reference to the successor node
@ -664,23 +664,23 @@ As shown in the figure, there are three common types of linked lists.
}
}
```
```
![Common Types of Linked Lists](linked_list.assets/linkedlist_common_types.png)
![Common types of linked lists](linked_list.assets/linkedlist_common_types.png)
## Typical Applications of Linked Lists
## Typical applications of linked lists
Singly linked lists are frequently utilized in implementing stacks, queues, hash tables, and graphs.
Singly linked lists are frequently utilized in implementing stacks, queues, hash tables, and graphs.
- **Stacks and Queues**: In singly linked lists, if insertions and deletions occur at the same end, it behaves like a stack (last-in-first-out). Conversely, if insertions are at one end and deletions at the other, it functions like a queue (first-in-first-out).
- **Stacks and queues**: In singly linked lists, if insertions and deletions occur at the same end, it behaves like a stack (last-in-first-out). Conversely, if insertions are at one end and deletions at the other, it functions like a queue (first-in-first-out).
- **Hash Tables**: Linked lists are used in chaining, a popular method for resolving hash collisions. Here, all collided elements are grouped into a linked list.
- **Hash tables**: Linked lists are used in chaining, a popular method for resolving hash collisions. Here, all collided elements are grouped into a linked list.
- **Graphs**: Adjacency lists, a standard method for graph representation, associate each graph vertex with a linked list. This list contains elements that represent vertices connected to the corresponding vertex.
- **Graphs**: Adjacency lists, a standard method for graph representation, associate each graph vertex with a linked list. This list contains elements that represent vertices connected to the corresponding vertex.
Doubly linked lists are ideal for scenarios requiring rapid access to preceding and succeeding elements.
Doubly linked lists are ideal for scenarios requiring rapid access to preceding and succeeding elements.
- **Advanced Data Structures**: In structures like red-black trees and B-trees, accessing a node's parent is essential. This is achieved by incorporating a reference to the parent node in each node, akin to a doubly linked list.
- **Advanced data structures**: In structures like red-black trees and B-trees, accessing a node's parent is essential. This is achieved by incorporating a reference to the parent node in each node, akin to a doubly linked list.
- **Browser History**: In web browsers, doubly linked lists facilitate navigating the history of visited pages when users click forward or back.
- **Browser history**: In web browsers, doubly linked lists facilitate navigating the history of visited pages when users click forward or back.
- **LRU Algorithm**: Doubly linked lists are apt for Least Recently Used (LRU) cache eviction algorithms, enabling swift identification of the least recently used data and facilitating fast node addition and removal.
- **LRU algorithm**: Doubly linked lists are apt for Least Recently Used (LRU) cache eviction algorithms, enabling swift identification of the least recently used data and facilitating fast node addition and removal.
Circular linked lists are ideal for applications that require periodic operations, such as resource scheduling in operating systems.
Circular linked lists are ideal for applications that require periodic operations, such as resource scheduling in operating systems.
- **Round-Robin Scheduling Algorithm**: In operating systems, the round-robin scheduling algorithm is a common CPU scheduling method, requiring cycling through a group of processes. Each process is assigned a time slice, and upon expiration, the CPU rotates to the next process. This cyclical operation can be efficiently realized using a circular linked list, allowing for a fair and time-shared system among all processes.
- **Round-robin scheduling algorithm**: In operating systems, the round-robin scheduling algorithm is a common CPU scheduling method, requiring cycling through a group of processes. Each process is assigned a time slice, and upon expiration, the CPU rotates to the next process. This cyclical operation can be efficiently realized using a circular linked list, allowing for a fair and time-shared system among all processes.
- **Data Buffers**: Circular linked lists are also used in data buffers, like in audio and video players, where the data stream is divided into multiple buffer blocks arranged in a circular fashion for seamless playback.
- **Data buffers**: Circular linked lists are also used in data buffers, like in audio and video players, where the data stream is divided into multiple buffer blocks arranged in a circular fashion for seamless playback.
@ -11,9 +11,9 @@ To solve this problem, we can implement lists using a "dynamic array." It inheri
In fact, **many programming languages' standard libraries implement lists using dynamic arrays**, such as Python's `list`, Java's `ArrayList`, C++'s `vector`, and C#'s `List`. In the following discussion, we will consider "list" and "dynamic array" as synonymous concepts.
In fact, **many programming languages' standard libraries implement lists using dynamic arrays**, such as Python's `list`, Java's `ArrayList`, C++'s `vector`, and C#'s `List`. In the following discussion, we will consider "list" and "dynamic array" as synonymous concepts.
## Common List Operations
## Common list operations
### Initializing a List
### Initializing a list
We typically use two initialization methods: "without initial values" and "with initial values".
We typically use two initialization methods: "without initial values" and "with initial values".
@ -141,7 +141,7 @@ We typically use two initialization methods: "without initial values" and "with
try nums.appendSlice(&[_]i32{ 1, 3, 2, 5, 4 });
try nums.appendSlice(&[_]i32{ 1, 3, 2, 5, 4 });
```
```
### Accessing Elements
### Accessing elements
Lists are essentially arrays, thus they can access and update elements in $O(1)$ time, which is very efficient.
Lists are essentially arrays, thus they can access and update elements in $O(1)$ time, which is very efficient.
@ -266,7 +266,7 @@ Lists are essentially arrays, thus they can access and update elements in $O(1)$
nums.items[1] = 0; // Update the element at index 1 to 0
nums.items[1] = 0; // Update the element at index 1 to 0
```
```
### Inserting and Removing Elements
### Inserting and removing elements
Compared to arrays, lists offer more flexibility in adding and removing elements. While adding elements to the end of a list is an $O(1)$ operation, the efficiency of inserting and removing elements elsewhere in the list remains the same as in arrays, with a time complexity of $O(n)$.
Compared to arrays, lists offer more flexibility in adding and removing elements. While adding elements to the end of a list is an $O(1)$ operation, the efficiency of inserting and removing elements elsewhere in the list remains the same as in arrays, with a time complexity of $O(n)$.
@ -502,7 +502,7 @@ Compared to arrays, lists offer more flexibility in adding and removing elements
_ = nums.orderedRemove(3); // Remove the element at index 3
_ = nums.orderedRemove(3); // Remove the element at index 3
```
```
### Iterating the List
### Iterating the list
Similar to arrays, lists can be iterated either by using indices or by directly iterating through each element.
Similar to arrays, lists can be iterated either by using indices or by directly iterating through each element.
@ -691,7 +691,7 @@ Similar to arrays, lists can be iterated either by using indices or by directly
}
}
```
```
### Concatenating Lists
### Concatenating lists
Given a new list `nums1`, we can append it to the end of the original list.
Given a new list `nums1`, we can append it to the end of the original list.
@ -798,7 +798,7 @@ Given a new list `nums1`, we can append it to the end of the original list.
try nums.insertSlice(nums.items.len, nums1.items); // Concatenate nums1 to the end of nums
try nums.insertSlice(nums.items.len, nums1.items); // Concatenate nums1 to the end of nums
```
```
### Sorting the List
### Sorting the list
Once the list is sorted, we can employ algorithms commonly used in array-related algorithm problems, such as "binary search" and "two-pointer" algorithms.
Once the list is sorted, we can employ algorithms commonly used in array-related algorithm problems, such as "binary search" and "two-pointer" algorithms.
@ -891,15 +891,15 @@ Once the list is sorted, we can employ algorithms commonly used in array-related
Many programming languages come with built-in lists, including Java, C++, Python, etc. Their implementations tend to be intricate, featuring carefully considered settings for various parameters, like initial capacity and expansion factors. Readers who are curious can delve into the source code for further learning.
Many programming languages come with built-in lists, including Java, C++, Python, etc. Their implementations tend to be intricate, featuring carefully considered settings for various parameters, like initial capacity and expansion factors. Readers who are curious can delve into the source code for further learning.
To enhance our understanding of how lists work, we will attempt to implement a simplified version of a list, focusing on three crucial design aspects:
To enhance our understanding of how lists work, we will attempt to implement a simplified version of a list, focusing on three crucial design aspects:
- **Initial Capacity**: Choose a reasonable initial capacity for the array. In this example, we choose 10 as the initial capacity.
- **Initial capacity**: Choose a reasonable initial capacity for the array. In this example, we choose 10 as the initial capacity.
- **Size Recording**: Declare a variable `size` to record the current number of elements in the list, updating in real-time with element insertion and deletion. With this variable, we can locate the end of the list and determine whether expansion is needed.
- **Size recording**: Declare a variable `size` to record the current number of elements in the list, updating in real-time with element insertion and deletion. With this variable, we can locate the end of the list and determine whether expansion is needed.
- **Expansion Mechanism**: If the list reaches full capacity upon an element insertion, an expansion process is required. This involves creating a larger array based on the expansion factor, and then transferring all elements from the current array to the new one. In this example, we stipulate that the array size should double with each expansion.
- **Expansion mechanism**: If the list reaches full capacity upon an element insertion, an expansion process is required. This involves creating a larger array based on the expansion factor, and then transferring all elements from the current array to the new one. In this example, we stipulate that the array size should double with each expansion.
In the first two sections of this chapter, we explored arrays and linked lists, two fundamental and important data structures, representing "continuous storage" and "dispersed storage" respectively.
In the first two sections of this chapter, we explored arrays and linked lists, two fundamental and important data structures, representing "continuous storage" and "dispersed storage" respectively.
In fact, **the physical structure largely determines the efficiency of a program's use of memory and cache**, which in turn affects the overall performance of the algorithm.
In fact, **the physical structure largely determines the efficiency of a program's use of memory and cache**, which in turn affects the overall performance of the algorithm.
## Computer Storage Devices
## Computer storage devices
There are three types of storage devices in computers: "hard disk," "random-access memory (RAM)," and "cache memory." The following table shows their different roles and performance characteristics in computer systems.
There are three types of storage devices in computers: "hard disk," "random-access memory (RAM)," and "cache memory." The following table shows their different roles and performance characteristics in computer systems.
@ -23,7 +23,7 @@ We can imagine the computer storage system as a pyramid structure shown in the f
- **Hard disks are difficult to replace with memory**. Firstly, data in memory is lost after power off, making it unsuitable for long-term data storage; secondly, the cost of memory is dozens of times that of hard disks, making it difficult to popularize in the consumer market.
- **Hard disks are difficult to replace with memory**. Firstly, data in memory is lost after power off, making it unsuitable for long-term data storage; secondly, the cost of memory is dozens of times that of hard disks, making it difficult to popularize in the consumer market.
- **It is difficult for caches to have both large capacity and high speed**. As the capacity of L1, L2, L3 caches gradually increases, their physical size becomes larger, increasing the physical distance from the CPU core, leading to increased data transfer time and higher element access latency. Under current technology, a multi-level cache structure is the best balance between capacity, speed, and cost.
- **It is difficult for caches to have both large capacity and high speed**. As the capacity of L1, L2, L3 caches gradually increases, their physical size becomes larger, increasing the physical distance from the CPU core, leading to increased data transfer time and higher element access latency. Under current technology, a multi-level cache structure is the best balance between capacity, speed, and cost.
@ -33,9 +33,9 @@ Overall, **hard disks are used for long-term storage of large amounts of data, m
As shown in the figure below, during program execution, data is read from the hard disk into memory for CPU computation. The cache can be considered a part of the CPU, **smartly loading data from memory** to provide fast data access to the CPU, significantly enhancing program execution efficiency and reducing reliance on slower memory.
As shown in the figure below, during program execution, data is read from the hard disk into memory for CPU computation. The cache can be considered a part of the CPU, **smartly loading data from memory** to provide fast data access to the CPU, significantly enhancing program execution efficiency and reducing reliance on slower memory.
![Data Flow Between Hard Disk, Memory, and Cache](ram_and_cache.assets/computer_storage_devices.png)
![Data flow between hard disk, memory, and cache](ram_and_cache.assets/computer_storage_devices.png)
## Memory Efficiency of Data Structures
## Memory efficiency of data structures
In terms of memory space utilization, arrays and linked lists have their advantages and limitations.
In terms of memory space utilization, arrays and linked lists have their advantages and limitations.
@ -43,7 +43,7 @@ On one hand, **memory is limited and cannot be shared by multiple programs**, so
On the other hand, during program execution, **as memory is repeatedly allocated and released, the degree of fragmentation of free memory becomes higher**, leading to reduced memory utilization efficiency. Arrays, due to their continuous storage method, are relatively less likely to cause memory fragmentation. In contrast, the elements of a linked list are dispersedly stored, and frequent insertion and deletion operations make memory fragmentation more likely.
On the other hand, during program execution, **as memory is repeatedly allocated and released, the degree of fragmentation of free memory becomes higher**, leading to reduced memory utilization efficiency. Arrays, due to their continuous storage method, are relatively less likely to cause memory fragmentation. In contrast, the elements of a linked list are dispersedly stored, and frequent insertion and deletion operations make memory fragmentation more likely.
## Cache Efficiency of Data Structures
## Cache efficiency of data structures
Although caches are much smaller in space capacity than memory, they are much faster and play a crucial role in program execution speed. Since the cache's capacity is limited and can only store a small part of frequently accessed data, when the CPU tries to access data not in the cache, a "cache miss" occurs, forcing the CPU to load the needed data from slower memory.
Although caches are much smaller in space capacity than memory, they are much faster and play a crucial role in program execution speed. Since the cache's capacity is limited and can only store a small part of frequently accessed data, when the CPU tries to access data not in the cache, a "cache miss" occurs, forcing the CPU to load the needed data from slower memory.
@ -51,17 +51,17 @@ Clearly, **the fewer the cache misses, the higher the CPU's data read-write effi
To achieve higher efficiency, caches adopt the following data loading mechanisms.
To achieve higher efficiency, caches adopt the following data loading mechanisms.
- **Cache Lines**: Caches don't store and load data byte by byte but in units of cache lines. Compared to byte-by-byte transfer, the transmission of cache lines is more efficient.
- **Cache lines**: Caches don't store and load data byte by byte but in units of cache lines. Compared to byte-by-byte transfer, the transmission of cache lines is more efficient.
- **Prefetch Mechanism**: Processors try to predict data access patterns (such as sequential access, fixed stride jumping access, etc.) and load data into the cache according to specific patterns to improve the hit rate.
- **Prefetch mechanism**: Processors try to predict data access patterns (such as sequential access, fixed stride jumping access, etc.) and load data into the cache according to specific patterns to improve the hit rate.
- **Spatial Locality**: If data is accessed, data nearby is likely to be accessed in the near future. Therefore, when loading certain data, the cache also loads nearby data to improve the hit rate.
- **Spatial locality**: If data is accessed, data nearby is likely to be accessed in the near future. Therefore, when loading certain data, the cache also loads nearby data to improve the hit rate.
- **Temporal Locality**: If data is accessed, it's likely to be accessed again in the near future. Caches use this principle to retain recently accessed data to improve the hit rate.
- **Temporal locality**: If data is accessed, it's likely to be accessed again in the near future. Caches use this principle to retain recently accessed data to improve the hit rate.
In fact, **arrays and linked lists have different cache utilization efficiencies**, mainly reflected in the following aspects.
In fact, **arrays and linked lists have different cache utilization efficiencies**, mainly reflected in the following aspects.
- **Occupied Space**: Linked list elements occupy more space than array elements, resulting in less effective data volume in the cache.
- **Occupied space**: Linked list elements occupy more space than array elements, resulting in less effective data volume in the cache.
- **Cache Lines**: Linked list data is scattered throughout memory, and since caches load "by line," the proportion of loading invalid data is higher.
- **Cache lines**: Linked list data is scattered throughout memory, and since caches load "by line," the proportion of loading invalid data is higher.
- **Prefetch Mechanism**: The data access pattern of arrays is more "predictable" than that of linked lists, meaning the system is more likely to guess which data will be loaded next.
- **Prefetch mechanism**: The data access pattern of arrays is more "predictable" than that of linked lists, meaning the system is more likely to guess which data will be loaded next.
- **Spatial Locality**: Arrays are stored in concentrated memory spaces, so the data near the loaded data is more likely to be accessed next.
- **Spatial locality**: Arrays are stored in concentrated memory spaces, so the data near the loaded data is more likely to be accessed next.
Overall, **arrays have a higher cache hit rate and are generally more efficient in operation than linked lists**. This makes data structures based on arrays more popular in solving algorithmic problems.
Overall, **arrays have a higher cache hit rate and are generally more efficient in operation than linked lists**. This makes data structures based on arrays more popular in solving algorithmic problems.
- Arrays and linked lists are two basic data structures, representing two storage methods in computer memory: contiguous space storage and non-contiguous space storage. Their characteristics complement each other.
- Arrays and linked lists are two basic data structures, representing two storage methods in computer memory: contiguous space storage and non-contiguous space storage. Their characteristics complement each other.
- Arrays support random access and use less memory; however, they are inefficient in inserting and deleting elements and have a fixed length after initialization.
- Arrays support random access and use less memory; however, they are inefficient in inserting and deleting elements and have a fixed length after initialization.
@ -29,7 +29,7 @@ Linked lists consist of nodes connected by references (pointers), and each node
In contrast, array elements must be of the same type, allowing the calculation of offsets to access the corresponding element positions. For example, an array containing both int and long types, with single elements occupying 4 bytes and 8 bytes respectively, cannot use the following formula to calculate offsets, as the array contains elements of two different lengths.
In contrast, array elements must be of the same type, allowing the calculation of offsets to access the corresponding element positions. For example, an array containing both int and long types, with single elements occupying 4 bytes and 8 bytes respectively, cannot use the following formula to calculate offsets, as the array contains elements of two different lengths.
```shell
```shell
# Element memory address = Array memory address + Element length * Element index
# Element memory address = array memory address + element length * element index
```
```
**Q**: After deleting a node, is it necessary to set `P.next` to `None`?
**Q**: After deleting a node, is it necessary to set `P.next` to `None`?
In algorithms, the repeated execution of a task is quite common and is closely related to the analysis of complexity. Therefore, before delving into the concepts of time complexity and space complexity, let's first explore how to implement repetitive tasks in programming. This involves understanding two fundamental programming control structures: iteration and recursion.
In algorithms, the repeated execution of a task is quite common and is closely related to the analysis of complexity. Therefore, before delving into the concepts of time complexity and space complexity, let's first explore how to implement repetitive tasks in programming. This involves understanding two fundamental programming control structures: iteration and recursion.
@ -6,7 +6,7 @@ In algorithms, the repeated execution of a task is quite common and is closely r
"Iteration" is a control structure for repeatedly performing a task. In iteration, a program repeats a block of code as long as a certain condition is met until this condition is no longer satisfied.
"Iteration" is a control structure for repeatedly performing a task. In iteration, a program repeats a block of code as long as a certain condition is met until this condition is no longer satisfied.
### For Loops
### For loops
The `for` loop is one of the most common forms of iteration, and **it's particularly suitable when the number of iterations is known in advance**.
The `for` loop is one of the most common forms of iteration, and **it's particularly suitable when the number of iterations is known in advance**.
@ -18,11 +18,11 @@ The following function uses a `for` loop to perform a summation of $1 + 2 + \dot
The flowchart below represents this sum function.
The flowchart below represents this sum function.
![Flowchart of the Sum Function](iteration_and_recursion.assets/iteration.png)
![Flowchart of the sum function](iteration_and_recursion.assets/iteration.png)
The number of operations in this summation function is proportional to the size of the input data $n$, or in other words, it has a "linear relationship." This "linear relationship" is what time complexity describes. This topic will be discussed in more detail in the next section.
The number of operations in this summation function is proportional to the size of the input data $n$, or in other words, it has a "linear relationship." This "linear relationship" is what time complexity describes. This topic will be discussed in more detail in the next section.
### While Loops
### While loops
Similar to `for` loops, `while` loops are another approach for implementing iteration. In a `while` loop, the program checks a condition at the beginning of each iteration; if the condition is true, the execution continues, otherwise, the loop ends.
Similar to `for` loops, `while` loops are another approach for implementing iteration. In a `while` loop, the program checks a condition at the beginning of each iteration; if the condition is true, the execution continues, otherwise, the loop ends.
@ -42,7 +42,7 @@ For example, in the following code, the condition variable $i$ is updated twice
Overall, **`for` loops are more concise, while `while` loops are more flexible**. Both can implement iterative structures. Which one to use should be determined based on the specific requirements of the problem.
Overall, **`for` loops are more concise, while `while` loops are more flexible**. Both can implement iterative structures. Which one to use should be determined based on the specific requirements of the problem.
### Nested Loops
### Nested loops
We can nest one loop structure within another. Below is an example using `for` loops:
We can nest one loop structure within another. Below is an example using `for` loops:
@ -52,7 +52,7 @@ We can nest one loop structure within another. Below is an example using `for` l
The flowchart below represents this nested loop.
The flowchart below represents this nested loop.
![Flowchart of the Nested Loop](iteration_and_recursion.assets/nested_iteration.png)
![Flowchart of the nested loop](iteration_and_recursion.assets/nested_iteration.png)
In such cases, the number of operations of the function is proportional to $n^2$, meaning the algorithm's runtime and the size of the input data $n$ has a 'quadratic relationship.'
In such cases, the number of operations of the function is proportional to $n^2$, meaning the algorithm's runtime and the size of the input data $n$ has a 'quadratic relationship.'
@ -79,7 +79,7 @@ Observe the following code, where simply calling the function `recur(n)` can com
The figure below shows the recursive process of this function.
The figure below shows the recursive process of this function.
![Recursive Process of the Sum Function](iteration_and_recursion.assets/recursion_sum.png)
![Recursive process of the sum function](iteration_and_recursion.assets/recursion_sum.png)
Although iteration and recursion can achieve the same results from a computational standpoint, **they represent two entirely different paradigms of thinking and problem-solving**.
Although iteration and recursion can achieve the same results from a computational standpoint, **they represent two entirely different paradigms of thinking and problem-solving**.
@ -91,7 +91,7 @@ Let's take the earlier example of the summation function, defined as $f(n) = 1 +
- **Iteration**: In this approach, we simulate the summation process within a loop. Starting from $1$ and traversing to $n$, we perform the summation operation in each iteration to eventually compute $f(n)$.
- **Iteration**: In this approach, we simulate the summation process within a loop. Starting from $1$ and traversing to $n$, we perform the summation operation in each iteration to eventually compute $f(n)$.
- **Recursion**: Here, the problem is broken down into a sub-problem: $f(n) = n + f(n-1)$. This decomposition continues recursively until reaching the base case, $f(1) = 1$, at which point the recursion terminates.
- **Recursion**: Here, the problem is broken down into a sub-problem: $f(n) = n + f(n-1)$. This decomposition continues recursively until reaching the base case, $f(1) = 1$, at which point the recursion terminates.
### Call Stack
### Call stack
Every time a recursive function calls itself, the system allocates memory for the newly initiated function to store local variables, the return address, and other relevant information. This leads to two primary outcomes.
Every time a recursive function calls itself, the system allocates memory for the newly initiated function to store local variables, the return address, and other relevant information. This leads to two primary outcomes.
@ -100,16 +100,16 @@ Every time a recursive function calls itself, the system allocates memory for th
As shown in the figure below, there are $n$ unreturned recursive functions before triggering the termination condition, indicating a **recursion depth of $n$**.
As shown in the figure below, there are $n$ unreturned recursive functions before triggering the termination condition, indicating a **recursion depth of $n$**.
In practice, the depth of recursion allowed by programming languages is usually limited, and excessively deep recursion can lead to stack overflow errors.
In practice, the depth of recursion allowed by programming languages is usually limited, and excessively deep recursion can lead to stack overflow errors.
### Tail Recursion
### Tail recursion
Interestingly, **if a function performs its recursive call as the very last step before returning,** it can be optimized by the compiler or interpreter to be as space-efficient as iteration. This scenario is known as "tail recursion."
Interestingly, **if a function performs its recursive call as the very last step before returning,** it can be optimized by the compiler or interpreter to be as space-efficient as iteration. This scenario is known as "tail recursion."
- **Regular Recursion**: In standard recursion, when the function returns to the previous level, it continues to execute more code, requiring the system to save the context of the previous call.
- **Regular recursion**: In standard recursion, when the function returns to the previous level, it continues to execute more code, requiring the system to save the context of the previous call.
- **Tail Recursion**: Here, the recursive call is the final operation before the function returns. This means that upon returning to the previous level, no further actions are needed, so the system does not need to save the context of the previous level.
- **Tail recursion**: Here, the recursive call is the final operation before the function returns. This means that upon returning to the previous level, no further actions are needed, so the system does not need to save the context of the previous level.
For example, in calculating $1 + 2 + \dots + n$, we can make the result variable `res` a parameter of the function, thereby achieving tail recursion:
For example, in calculating $1 + 2 + \dots + n$, we can make the result variable `res` a parameter of the function, thereby achieving tail recursion:
@ -119,16 +119,16 @@ For example, in calculating $1 + 2 + \dots + n$, we can make the result variable
The execution process of tail recursion is shown in the following figure. Comparing regular recursion and tail recursion, the point of the summation operation is different.
The execution process of tail recursion is shown in the following figure. Comparing regular recursion and tail recursion, the point of the summation operation is different.
- **Regular Recursion**: The summation operation occurs during the "returning" phase, requiring another summation after each layer returns.
- **Regular recursion**: The summation operation occurs during the "returning" phase, requiring another summation after each layer returns.
- **Tail Recursion**: The summation operation occurs during the "calling" phase, and the "returning" phase only involves returning through each layer.
- **Tail recursion**: The summation operation occurs during the "calling" phase, and the "returning" phase only involves returning through each layer.
Note that many compilers or interpreters do not support tail recursion optimization. For example, Python does not support tail recursion optimization by default, so even if the function is in the form of tail recursion, it may still encounter stack overflow issues.
Note that many compilers or interpreters do not support tail recursion optimization. For example, Python does not support tail recursion optimization by default, so even if the function is in the form of tail recursion, it may still encounter stack overflow issues.
### Recursion Tree
### Recursion tree
When dealing with algorithms related to "divide and conquer", recursion often offers a more intuitive approach and more readable code than iteration. Take the "Fibonacci sequence" as an example.
When dealing with algorithms related to "divide and conquer", recursion often offers a more intuitive approach and more readable code than iteration. Take the "Fibonacci sequence" as an example.
@ -149,7 +149,7 @@ Using the recursive relation, and considering the first two numbers as terminati
Observing the above code, we see that it recursively calls two functions within itself, **meaning that one call generates two branching calls**. As illustrated below, this continuous recursive calling eventually creates a "recursion tree" with a depth of $n$.
Observing the above code, we see that it recursively calls two functions within itself, **meaning that one call generates two branching calls**. As illustrated below, this continuous recursive calling eventually creates a "recursion tree" with a depth of $n$.
Fundamentally, recursion embodies the paradigm of "breaking down a problem into smaller sub-problems." This divide-and-conquer strategy is crucial.
Fundamentally, recursion embodies the paradigm of "breaking down a problem into smaller sub-problems." This divide-and-conquer strategy is crucial.
@ -160,7 +160,7 @@ Fundamentally, recursion embodies the paradigm of "breaking down a problem into
Summarizing the above content, the following table shows the differences between iteration and recursion in terms of implementation, performance, and applicability.
Summarizing the above content, the following table shows the differences between iteration and recursion in terms of implementation, performance, and applicability.
<palign="center"> Table: Comparison of Iteration and Recursion Characteristics </p>
<palign="center"> Table: Comparison of iteration and recursion characteristics </p>
In algorithm design, we pursue the following two objectives in sequence.
In algorithm design, we pursue the following two objectives in sequence.
@ -7,14 +7,14 @@ In algorithm design, we pursue the following two objectives in sequence.
In other words, under the premise of being able to solve the problem, algorithm efficiency has become the main criterion for evaluating the merits of an algorithm, which includes the following two dimensions.
In other words, under the premise of being able to solve the problem, algorithm efficiency has become the main criterion for evaluating the merits of an algorithm, which includes the following two dimensions.
- **Time Efficiency**: The speed at which an algorithm runs.
- **Time efficiency**: The speed at which an algorithm runs.
- **Space Efficiency**: The size of the memory space occupied by an algorithm.
- **Space efficiency**: The size of the memory space occupied by an algorithm.
In short, **our goal is to design data structures and algorithms that are both fast and memory-efficient**. Effectively assessing algorithm efficiency is crucial because only then can we compare various algorithms and guide the process of algorithm design and optimization.
In short, **our goal is to design data structures and algorithms that are both fast and memory-efficient**. Effectively assessing algorithm efficiency is crucial because only then can we compare various algorithms and guide the process of algorithm design and optimization.
There are mainly two methods of efficiency assessment: actual testing and theoretical estimation.
There are mainly two methods of efficiency assessment: actual testing and theoretical estimation.
## Actual Testing
## Actual testing
Suppose we have algorithms `A` and `B`, both capable of solving the same problem, and we need to compare their efficiencies. The most direct method is to use a computer to run these two algorithms and monitor and record their runtime and memory usage. This assessment method reflects the actual situation but has significant limitations.
Suppose we have algorithms `A` and `B`, both capable of solving the same problem, and we need to compare their efficiencies. The most direct method is to use a computer to run these two algorithms and monitor and record their runtime and memory usage. This assessment method reflects the actual situation but has significant limitations.
@ -22,7 +22,7 @@ On one hand, **it's difficult to eliminate interference from the testing environ
On the other hand, **conducting a full test is very resource-intensive**. As the volume of input data changes, the efficiency of the algorithms may vary. For example, with smaller data volumes, algorithm `A` might run faster than `B`, but the opposite might be true with larger data volumes. Therefore, to draw convincing conclusions, we need to test a wide range of input data sizes, which requires significant computational resources.
On the other hand, **conducting a full test is very resource-intensive**. As the volume of input data changes, the efficiency of the algorithms may vary. For example, with smaller data volumes, algorithm `A` might run faster than `B`, but the opposite might be true with larger data volumes. Therefore, to draw convincing conclusions, we need to test a wide range of input data sizes, which requires significant computational resources.
## Theoretical Estimation
## Theoretical estimation
Due to the significant limitations of actual testing, we can consider evaluating algorithm efficiency solely through calculations. This estimation method is known as "asymptotic complexity analysis," or simply "complexity analysis."
Due to the significant limitations of actual testing, we can consider evaluating algorithm efficiency solely through calculations. This estimation method is known as "asymptotic complexity analysis," or simply "complexity analysis."
"Space complexity" is used to measure the growth trend of the memory space occupied by an algorithm as the amount of data increases. This concept is very similar to time complexity, except that "running time" is replaced with "occupied memory space".
"Space complexity" is used to measure the growth trend of the memory space occupied by an algorithm as the amount of data increases. This concept is very similar to time complexity, except that "running time" is replaced with "occupied memory space".
## Space Related to Algorithms
## Space related to algorithms
The memory space used by an algorithm during its execution mainly includes the following types.
The memory space used by an algorithm during its execution mainly includes the following types.
- **Input Space**: Used to store the input data of the algorithm.
- **Input space**: Used to store the input data of the algorithm.
- **Temporary Space**: Used to store variables, objects, function contexts, and other data during the algorithm's execution.
- **Temporary space**: Used to store variables, objects, function contexts, and other data during the algorithm's execution.
- **Output Space**: Used to store the output data of the algorithm.
- **Output space**: Used to store the output data of the algorithm.
Generally, the scope of space complexity statistics includes both "Temporary Space" and "Output Space".
Generally, the scope of space complexity statistics includes both "Temporary Space" and "Output Space".
Temporary space can be further divided into three parts.
Temporary space can be further divided into three parts.
- **Temporary Data**: Used to save various constants, variables, objects, etc., during the algorithm's execution.
- **Temporary data**: Used to save various constants, variables, objects, etc., during the algorithm's execution.
- **Stack Frame Space**: Used to save the context data of the called function. The system creates a stack frame at the top of the stack each time a function is called, and the stack frame space is released after the function returns.
- **Stack frame space**: Used to save the context data of the called function. The system creates a stack frame at the top of the stack each time a function is called, and the stack frame space is released after the function returns.
- **Instruction Space**: Used to store compiled program instructions, which are usually negligible in actual statistics.
- **Instruction space**: Used to store compiled program instructions, which are usually negligible in actual statistics.
When analyzing the space complexity of a program, **we typically count the Temporary Data, Stack Frame Space, and Output Data**, as shown in the figure below.
When analyzing the space complexity of a program, **we typically count the Temporary Data, Stack Frame Space, and Output Data**, as shown in the figure below.
![Space Types Used in Algorithms](space_complexity.assets/space_types.png)
![Space types used in algorithms](space_complexity.assets/space_types.png)
The relevant code is as follows:
The relevant code is as follows:
@ -28,13 +28,13 @@ The relevant code is as follows:
```python title=""
```python title=""
class Node:
class Node:
"""Classes""""
"""Classes"""
def __init__(self, x: int):
def __init__(self, x: int):
self.val: int = x # node value
self.val: int = x # node value
self.next: Node | None = None # reference to the next node
self.next: Node | None = None # reference to the next node
def function() -> int:
def function() -> int:
""""Functions"""""
"""Functions"""
# Perform certain operations...
# Perform certain operations...
return 0
return 0
@ -271,7 +271,7 @@ The relevant code is as follows:
next: Option<Rc<RefCell<Node>>>,
next: Option<Rc<RefCell<Node>>>,
}
}
/* Creating a Node structure */
/* Constructor */
impl Node {
impl Node {
fn new(val: i32) -> Self {
fn new(val: i32) -> Self {
Self { val: val, next: None }
Self { val: val, next: None }
@ -322,7 +322,7 @@ The relevant code is as follows:
```
```
## Calculation Method
## Calculation method
The method for calculating space complexity is roughly similar to that of time complexity, with the only change being the shift of the statistical object from "number of operations" to "size of used space".
The method for calculating space complexity is roughly similar to that of time complexity, with the only change being the shift of the statistical object from "number of operations" to "size of used space".
@ -443,10 +443,10 @@ Consider the following code, the term "worst-case" in worst-case space complexit
```rust title=""
```rust title=""
fn algorithm(n: i32) {
fn algorithm(n: i32) {
let a = 0; // O(1)
let a = 0; // O(1)
let b = [0; 10000]; // O(1)
let b = [0; 10000]; // O(1)
if n > 10 {
if n > 10 {
let nums = vec![0; n as usize]; // O(n)
let nums = vec![0; n as usize]; // O(n)
}
}
}
}
```
```
@ -484,12 +484,12 @@ Consider the following code, the term "worst-case" in worst-case space complexit
return 0
return 0
def loop(n: int):
def loop(n: int):
"""Loop O(1)"""""
"""Loop O(1)"""
for _ in range(n):
for _ in range(n):
function()
function()
def recur(n: int):
def recur(n: int):
"""Recursion O(n)"""""
"""Recursion O(n)"""
if n == 1:
if n == 1:
return
return
return recur(n - 1)
return recur(n - 1)
@ -723,7 +723,7 @@ The time complexity of both `loop()` and `recur()` functions is $O(n)$, but thei
- The `loop()` function calls `function()` $n$ times in a loop, where each iteration's `function()` returns and releases its stack frame space, so the space complexity remains $O(1)$.
- The `loop()` function calls `function()` $n$ times in a loop, where each iteration's `function()` returns and releases its stack frame space, so the space complexity remains $O(1)$.
- The recursive function `recur()` will have $n$ instances of unreturned `recur()` existing simultaneously during its execution, thus occupying $O(n)$ stack frame space.
- The recursive function `recur()` will have $n$ instances of unreturned `recur()` existing simultaneously during its execution, thus occupying $O(n)$ stack frame space.
## Common Types
## Common types
Let the size of the input data be $n$, the following chart displays common types of space complexities (arranged from low to high).
Let the size of the input data be $n$, the following chart displays common types of space complexities (arranged from low to high).
![Recursive Function Generating Quadratic Order Space Complexity](space_complexity.assets/space_complexity_recursive_quadratic.png)
![Recursive function generating quadratic order space complexity](space_complexity.assets/space_complexity_recursive_quadratic.png)
### Exponential Order $O(2^n)$
### Exponential order $O(2^n)$
Exponential order is common in binary trees. Observe the below image, a "full binary tree" with $n$ levels has $2^n - 1$ nodes, occupying $O(2^n)$ space:
Exponential order is common in binary trees. Observe the below image, a "full binary tree" with $n$ levels has $2^n - 1$ nodes, occupying $O(2^n)$ space:
@ -786,15 +786,15 @@ Exponential order is common in binary trees. Observe the below image, a "full bi
![Full Binary Tree Generating Exponential Order Space Complexity](space_complexity.assets/space_complexity_exponential.png)
![Full binary tree generating exponential order space complexity](space_complexity.assets/space_complexity_exponential.png)
### Logarithmic Order $O(\log n)$
### Logarithmic order $O(\log n)$
Logarithmic order is common in divide-and-conquer algorithms. For example, in merge sort, an array of length $n$ is recursively divided in half each round, forming a recursion tree of height $\log n$, using $O(\log n)$ stack frame space.
Logarithmic order is common in divide-and-conquer algorithms. For example, in merge sort, an array of length $n$ is recursively divided in half each round, forming a recursion tree of height $\log n$, using $O(\log n)$ stack frame space.
Another example is converting a number to a string. Given a positive integer $n$, its number of digits is $\log_{10} n + 1$, corresponding to the length of the string, thus the space complexity is $O(\log_{10} n + 1) = O(\log n)$.
Another example is converting a number to a string. Given a positive integer $n$, its number of digits is $\log_{10} n + 1$, corresponding to the length of the string, thus the space complexity is $O(\log_{10} n + 1) = O(\log n)$.
## Balancing Time and Space
## Balancing time and space
Ideally, we aim for both time complexity and space complexity to be optimal. However, in practice, optimizing both simultaneously is often difficult.
Ideally, we aim for both time complexity and space complexity to be optimal. However, in practice, optimizing both simultaneously is often difficult.
Time complexity is a concept used to measure how the run time of an algorithm increases with the size of the input data. Understanding time complexity is crucial for accurately assessing the efficiency of an algorithm.
Time complexity is a concept used to measure how the run time of an algorithm increases with the size of the input data. Understanding time complexity is crucial for accurately assessing the efficiency of an algorithm.
@ -200,7 +200,7 @@ $$
However, in practice, **counting the run time of an algorithm is neither practical nor reasonable**. First, we don't want to tie the estimated time to the running platform, as algorithms need to run on various platforms. Second, it's challenging to know the run time for each type of operation, making the estimation process difficult.
However, in practice, **counting the run time of an algorithm is neither practical nor reasonable**. First, we don't want to tie the estimated time to the running platform, as algorithms need to run on various platforms. Second, it's challenging to know the run time for each type of operation, making the estimation process difficult.
## Assessing Time Growth Trend
## Assessing time growth trend
Time complexity analysis does not count the algorithm's run time, **but rather the growth trend of the run time as the data volume increases**.
Time complexity analysis does not count the algorithm's run time, **but rather the growth trend of the run time as the data volume increases**.
@ -470,7 +470,7 @@ The following figure shows the time complexities of these three algorithms.
- Algorithm `B` involves a print operation looping $n$ times, and its run time grows linearly with $n$. Its time complexity is "linear order."
- Algorithm `B` involves a print operation looping $n$ times, and its run time grows linearly with $n$. Its time complexity is "linear order."
- Algorithm `C` has a print operation looping 1,000,000 times. Although it takes a long time, it is independent of the input data size $n$. Therefore, the time complexity of `C` is the same as `A`, which is "constant order."
- Algorithm `C` has a print operation looping 1,000,000 times. Although it takes a long time, it is independent of the input data size $n$. Therefore, the time complexity of `C` is the same as `A`, which is "constant order."
![Time Growth Trend of Algorithms A, B, and C](time_complexity.assets/time_complexity_simple_example.png)
![Time growth trend of algorithms a, b, and c](time_complexity.assets/time_complexity_simple_example.png)
Compared to directly counting the run time of an algorithm, what are the characteristics of time complexity analysis?
Compared to directly counting the run time of an algorithm, what are the characteristics of time complexity analysis?
@ -478,7 +478,7 @@ Compared to directly counting the run time of an algorithm, what are the charact
- **Time complexity analysis is more straightforward**. Obviously, the running platform and the types of computational operations are irrelevant to the trend of run time growth. Therefore, in time complexity analysis, we can simply treat the execution time of all computational operations as the same "unit time," simplifying the "computational operation run time count" to a "computational operation count." This significantly reduces the complexity of estimation.
- **Time complexity analysis is more straightforward**. Obviously, the running platform and the types of computational operations are irrelevant to the trend of run time growth. Therefore, in time complexity analysis, we can simply treat the execution time of all computational operations as the same "unit time," simplifying the "computational operation run time count" to a "computational operation count." This significantly reduces the complexity of estimation.
- **Time complexity has its limitations**. For example, although algorithms `A` and `C` have the same time complexity, their actual run times can be quite different. Similarly, even though algorithm `B` has a higher time complexity than `C`, it is clearly superior when the input data size $n$ is small. In these cases, it's difficult to judge the efficiency of algorithms based solely on time complexity. Nonetheless, despite these issues, complexity analysis remains the most effective and commonly used method for evaluating algorithm efficiency.
- **Time complexity has its limitations**. For example, although algorithms `A` and `C` have the same time complexity, their actual run times can be quite different. Similarly, even though algorithm `B` has a higher time complexity than `C`, it is clearly superior when the input data size $n$ is small. In these cases, it's difficult to judge the efficiency of algorithms based solely on time complexity. Nonetheless, despite these issues, complexity analysis remains the most effective and commonly used method for evaluating algorithm efficiency.
## Asymptotic Upper Bound
## Asymptotic upper bound
Consider a function with an input size of $n$:
Consider a function with an input size of $n$:
@ -671,15 +671,15 @@ In essence, time complexity analysis is about finding the asymptotic upper bound
As illustrated below, calculating the asymptotic upper bound involves finding a function $f(n)$ such that, as $n$ approaches infinity, $T(n)$ and $f(n)$ have the same growth order, differing only by a constant factor $c$.
As illustrated below, calculating the asymptotic upper bound involves finding a function $f(n)$ such that, as $n$ approaches infinity, $T(n)$ and $f(n)$ have the same growth order, differing only by a constant factor $c$.
![Asymptotic Upper Bound of a Function](time_complexity.assets/asymptotic_upper_bound.png)
![Asymptotic upper bound of a function](time_complexity.assets/asymptotic_upper_bound.png)
## Calculation Method
## Calculation method
While the concept of asymptotic upper bound might seem mathematically dense, you don't need to fully grasp it right away. Let's first understand the method of calculation, which can be practiced and comprehended over time.
While the concept of asymptotic upper bound might seem mathematically dense, you don't need to fully grasp it right away. Let's first understand the method of calculation, which can be practiced and comprehended over time.
Once $f(n)$ is determined, we obtain the time complexity $O(f(n))$. But how do we determine the asymptotic upper bound $f(n)$? This process generally involves two steps: counting the number of operations and determining the asymptotic upper bound.
Once $f(n)$ is determined, we obtain the time complexity $O(f(n))$. But how do we determine the asymptotic upper bound $f(n)$? This process generally involves two steps: counting the number of operations and determining the asymptotic upper bound.
### Step 1: Counting the Number of Operations
### Step 1: counting the number of operations
This step involves going through the code line by line. However, due to the presence of the constant $c$ in $c \cdot f(n)$, **all coefficients and constant terms in $T(n)$ can be ignored**. This principle allows for simplification techniques in counting operations.
This step involves going through the code line by line. However, due to the presence of the constant $c$ in $c \cdot f(n)$, **all coefficients and constant terms in $T(n)$ can be ignored**. This principle allows for simplification techniques in counting operations.
### Step 2: Determining the Asymptotic Upper Bound
### Step 2: determining the asymptotic upper bound
**The time complexity is determined by the highest order term in $T(n)$**. This is because, as $n$ approaches infinity, the highest order term dominates, rendering the influence of other terms negligible.
**The time complexity is determined by the highest order term in $T(n)$**. This is because, as $n$ approaches infinity, the highest order term dominates, rendering the influence of other terms negligible.
The following table illustrates examples of different operation counts and their corresponding time complexities. Some exaggerated values are used to emphasize that coefficients cannot alter the order of growth. When $n$ becomes very large, these constants become insignificant.
The following table illustrates examples of different operation counts and their corresponding time complexities. Some exaggerated values are used to emphasize that coefficients cannot alter the order of growth. When $n$ becomes very large, these constants become insignificant.
<palign="center"> Table: Time Complexity for Different Operation Counts </p>
<palign="center"> Table: Time complexity for different operation counts </p>
| Operation Count $T(n)$ | Time Complexity $O(f(n))$ |
| Operation Count $T(n)$ | Time Complexity $O(f(n))$ |
![Common Types of Time Complexity](time_complexity.assets/time_complexity_common_types.png)
![Common types of time complexity](time_complexity.assets/time_complexity_common_types.png)
### Constant Order $O(1)$
### Constant order $O(1)$
Constant order means the number of operations is independent of the input data size $n$. In the following function, although the number of operations `size` might be large, the time complexity remains $O(1)$ as it's unrelated to $n$:
Constant order means the number of operations is independent of the input data size $n$. In the following function, although the number of operations `size` might be large, the time complexity remains $O(1)$ as it's unrelated to $n$:
@ -970,7 +970,7 @@ Constant order means the number of operations is independent of the input data s
Linear order indicates the number of operations grows linearly with the input data size $n$. Linear order commonly appears in single-loop structures:
Linear order indicates the number of operations grows linearly with the input data size $n$. Linear order commonly appears in single-loop structures:
@ -986,7 +986,7 @@ Operations like array traversal and linked list traversal have a time complexity
It's important to note that **the input data size $n$ should be determined based on the type of input data**. For example, in the first example, $n$ represents the input data size, while in the second example, the length of the array $n$ is the data size.
It's important to note that **the input data size $n$ should be determined based on the type of input data**. For example, in the first example, $n$ represents the input data size, while in the second example, the length of the array $n$ is the data size.
### Quadratic Order $O(n^2)$
### Quadratic order $O(n^2)$
Quadratic order means the number of operations grows quadratically with the input data size $n$. Quadratic order typically appears in nested loops, where both the outer and inner loops have a time complexity of $O(n)$, resulting in an overall complexity of $O(n^2)$:
Quadratic order means the number of operations grows quadratically with the input data size $n$. Quadratic order typically appears in nested loops, where both the outer and inner loops have a time complexity of $O(n)$, resulting in an overall complexity of $O(n^2)$:
@ -996,7 +996,7 @@ Quadratic order means the number of operations grows quadratically with the inpu
The following image compares constant order, linear order, and quadratic order time complexities.
The following image compares constant order, linear order, and quadratic order time complexities.
![Constant, Linear, and Quadratic Order Time Complexities](time_complexity.assets/time_complexity_constant_linear_quadratic.png)
![Constant, linear, and quadratic order time complexities](time_complexity.assets/time_complexity_constant_linear_quadratic.png)
For instance, in bubble sort, the outer loop runs $n - 1$ times, and the inner loop runs $n-1$, $n-2$, ..., $2$, $1$ times, averaging $n / 2$ times, resulting in a time complexity of $O((n - 1) n / 2) = O(n^2)$:
For instance, in bubble sort, the outer loop runs $n - 1$ times, and the inner loop runs $n-1$, $n-2$, ..., $2$, $1$ times, averaging $n / 2$ times, resulting in a time complexity of $O((n - 1) n / 2) = O(n^2)$:
@ -1004,7 +1004,7 @@ For instance, in bubble sort, the outer loop runs $n - 1$ times, and the inner l
Biological "cell division" is a classic example of exponential order growth: starting with one cell, it becomes two after one division, four after two divisions, and so on, resulting in $2^n$ cells after $n$ divisions.
Biological "cell division" is a classic example of exponential order growth: starting with one cell, it becomes two after one division, four after two divisions, and so on, resulting in $2^n$ cells after $n$ divisions.
@ -1014,7 +1014,7 @@ The following image and code simulate the cell division process, with a time com
![Exponential Order Time Complexity](time_complexity.assets/time_complexity_exponential.png)
![Exponential order time complexity](time_complexity.assets/time_complexity_exponential.png)
In practice, exponential order often appears in recursive functions. For example, in the code below, it recursively splits into two halves, stopping after $n$ divisions:
In practice, exponential order often appears in recursive functions. For example, in the code below, it recursively splits into two halves, stopping after $n$ divisions:
@ -1024,7 +1024,7 @@ In practice, exponential order often appears in recursive functions. For example
Exponential order growth is extremely rapid and is commonly seen in exhaustive search methods (brute force, backtracking, etc.). For large-scale problems, exponential order is unacceptable, often requiring dynamic programming or greedy algorithms as solutions.
Exponential order growth is extremely rapid and is commonly seen in exhaustive search methods (brute force, backtracking, etc.). For large-scale problems, exponential order is unacceptable, often requiring dynamic programming or greedy algorithms as solutions.
### Logarithmic Order $O(\log n)$
### Logarithmic order $O(\log n)$
In contrast to exponential order, logarithmic order reflects situations where "the size is halved each round." Given an input data size $n$, since the size is halved each round, the number of iterations is $\log_2 n$, the inverse function of $2^n$.
In contrast to exponential order, logarithmic order reflects situations where "the size is halved each round." Given an input data size $n$, since the size is halved each round, the number of iterations is $\log_2 n$, the inverse function of $2^n$.
@ -1034,7 +1034,7 @@ The following image and code simulate the "halving each round" process, with a t
![Logarithmic Order Time Complexity](time_complexity.assets/time_complexity_logarithmic.png)
![Logarithmic order time complexity](time_complexity.assets/time_complexity_logarithmic.png)
Like exponential order, logarithmic order also frequently appears in recursive functions. The code below forms a recursive tree of height $\log_2 n$:
Like exponential order, logarithmic order also frequently appears in recursive functions. The code below forms a recursive tree of height $\log_2 n$:
@ -1054,7 +1054,7 @@ Logarithmic order is typical in algorithms based on the divide-and-conquer strat
This means the base $m$ can be changed without affecting the complexity. Therefore, we often omit the base $m$ and simply denote logarithmic order as $O(\log n)$.
This means the base $m$ can be changed without affecting the complexity. Therefore, we often omit the base $m$ and simply denote logarithmic order as $O(\log n)$.
### Linear-Logarithmic Order $O(n \log n)$
### Linear-logarithmic order $O(n \log n)$
Linear-logarithmic order often appears in nested loops, with the complexities of the two loops being $O(\log n)$ and $O(n)$ respectively. The related code is as follows:
Linear-logarithmic order often appears in nested loops, with the complexities of the two loops being $O(\log n)$ and $O(n)$ respectively. The related code is as follows:
@ -1064,11 +1064,11 @@ Linear-logarithmic order often appears in nested loops, with the complexities of
The image below demonstrates how linear-logarithmic order is generated. Each level of a binary tree has $n$ operations, and the tree has $\log_2 n + 1$ levels, resulting in a time complexity of $O(n \log n)$.
The image below demonstrates how linear-logarithmic order is generated. Each level of a binary tree has $n$ operations, and the tree has $\log_2 n + 1$ levels, resulting in a time complexity of $O(n \log n)$.
![Linear-Logarithmic Order Time Complexity](time_complexity.assets/time_complexity_logarithmic_linear.png)
![Linear-logarithmic order time complexity](time_complexity.assets/time_complexity_logarithmic_linear.png)
Mainstream sorting algorithms typically have a time complexity of $O(n \log n)$, such as quicksort, mergesort, and heapsort.
Mainstream sorting algorithms typically have a time complexity of $O(n \log n)$, such as quicksort, mergesort, and heapsort.
### Factorial Order $O(n!)$
### Factorial order $O(n!)$
Factorial order corresponds to the mathematical problem of "full permutation." Given $n$ distinct elements, the total number of possible permutations is:
Factorial order corresponds to the mathematical problem of "full permutation." Given $n$ distinct elements, the total number of possible permutations is:
@ -1082,11 +1082,11 @@ Factorials are typically implemented using recursion. As shown in the image and
![Factorial Order Time Complexity](time_complexity.assets/time_complexity_factorial.png)
![Factorial order time complexity](time_complexity.assets/time_complexity_factorial.png)
Note that factorial order grows even faster than exponential order; it's unacceptable for larger $n$ values.
Note that factorial order grows even faster than exponential order; it's unacceptable for larger $n$ values.
## Worst, Best, and Average Time Complexities
## Worst, best, and average time complexities
**The time efficiency of an algorithm is often not fixed but depends on the distribution of the input data**. Assume we have an array `nums` of length $n$, consisting of numbers from $1$ to $n$, each appearing only once, but in a randomly shuffled order. The task is to return the index of the element $1$. We can draw the following conclusions:
**The time efficiency of an algorithm is often not fixed but depends on the distribution of the input data**. Assume we have an array `nums` of length $n$, consisting of numbers from $1$ to $n$, each appearing only once, but in a randomly shuffled order. The task is to return the index of the element $1$. We can draw the following conclusions:
When discussing data in computers, various forms like text, images, videos, voice and 3D models comes to mind. Despite their different organizational forms, they are all composed of various basic data types.
When discussing data in computers, various forms like text, images, videos, voice and 3D models comes to mind. Despite their different organizational forms, they are all composed of various basic data types.
@ -18,7 +18,7 @@ The range of values for basic data types depends on the size of the space they o
The following table lists the space occupied, value range, and default values of various basic data types in Java. While memorizing this table isn't necessary, having a general understanding of it and referencing it when required is recommended.
The following table lists the space occupied, value range, and default values of various basic data types in Java. While memorizing this table isn't necessary, having a general understanding of it and referencing it when required is recommended.
<palign="center"> Table <id> Space Occupied and Value Range of Basic Data Types </p>
<palign="center"> Table <id> Space occupied and value range of basic data types </p>
| Type | Symbol | Space Occupied | Minimum Value | Maximum Value | Default Value |
| Type | Symbol | Space Occupied | Minimum Value | Maximum Value | Default Value |
In the computer system, all data is stored in binary form, and characters (represented by char) are no exception. To represent characters, we need to develop a "character set" that defines a one-to-one mapping between each character and binary numbers. With the character set, computers can convert binary numbers to characters by looking up the table.
In the computer system, all data is stored in binary form, and characters (represented by char) are no exception. To represent characters, we need to develop a "character set" that defines a one-to-one mapping between each character and binary numbers. With the character set, computers can convert binary numbers to characters by looking up the table.
## ASCII Character Set
## ASCII character set
The "ASCII code" is one of the earliest character sets, officially known as the American Standard Code for Information Interchange. It uses 7 binary digits (the lower 7 bits of a byte) to represent a character, allowing for a maximum of 128 different characters. As shown in the figure below, ASCII includes uppercase and lowercase English letters, numbers 0 ~ 9, various punctuation marks, and certain control characters (such as newline and tab).
The "ASCII code" is one of the earliest character sets, officially known as the American Standard Code for Information Interchange. It uses 7 binary digits (the lower 7 bits of a byte) to represent a character, allowing for a maximum of 128 different characters. As shown in the figure below, ASCII includes uppercase and lowercase English letters, numbers 0 ~ 9, various punctuation marks, and certain control characters (such as newline and tab).
However, **ASCII can only represent English characters**. With the globalization of computers, a character set called "EASCII" was developed to represent more languages. It expands from the 7-bit structure of ASCII to 8 bits, enabling the representation of 256 characters.
However, **ASCII can only represent English characters**. With the globalization of computers, a character set called "EASCII" was developed to represent more languages. It expands from the 7-bit structure of ASCII to 8 bits, enabling the representation of 256 characters.
Globally, various region-specific EASCII character sets have been introduced. The first 128 characters of these sets are consistent with the ASCII, while the remaining 128 characters are defined differently to accommodate the requirements of different languages.
Globally, various region-specific EASCII character sets have been introduced. The first 128 characters of these sets are consistent with the ASCII, while the remaining 128 characters are defined differently to accommodate the requirements of different languages.
## GBK Character Set
## GBK character set
Later, it was found that **EASCII still could not meet the character requirements of many languages**. For instance, there are nearly a hundred thousand Chinese characters, with several thousand used regularly. In 1980, the Standardization Administration of China released the "GB2312" character set, which included 6763 Chinese characters, essentially fulfilling the computer processing needs for the Chinese language.
Later, it was found that **EASCII still could not meet the character requirements of many languages**. For instance, there are nearly a hundred thousand Chinese characters, with several thousand used regularly. In 1980, the Standardization Administration of China released the "GB2312" character set, which included 6763 Chinese characters, essentially fulfilling the computer processing needs for the Chinese language.
However, GB2312 could not handle some rare and traditional characters. The "GBK" character set expands GB2312 and includes 21886 Chinese characters. In the GBK encoding scheme, ASCII characters are represented with one byte, while Chinese characters use two bytes.
However, GB2312 could not handle some rare and traditional characters. The "GBK" character set expands GB2312 and includes 21886 Chinese characters. In the GBK encoding scheme, ASCII characters are represented with one byte, while Chinese characters use two bytes.
## Unicode Character Set
## Unicode character set
With the rapid evolution of computer technology and a plethora of character sets and encoding standards, numerous problems arose. On the one hand, these character sets generally only defined characters for specific languages and could not function properly in multilingual environments. On the other hand, the existence of multiple character set standards for the same language caused garbled text when information was exchanged between computers using different encoding standards.
With the rapid evolution of computer technology and a plethora of character sets and encoding standards, numerous problems arose. On the one hand, these character sets generally only defined characters for specific languages and could not function properly in multilingual environments. On the other hand, the existence of multiple character set standards for the same language caused garbled text when information was exchanged between computers using different encoding standards.
@ -32,11 +32,11 @@ Unicode is a universal character set that assigns a number (called a "code point
A straightforward solution to this problem is to store all characters as equal-length encodings. As shown in the figure below, each character in "Hello" occupies 1 byte, while each character in "算法" (algorithm) occupies 2 bytes. We could encode all characters in "Hello 算法" as 2 bytes by padding the higher bits with zeros. This method would enable the system to interpret a character every 2 bytes, recovering the content of the phrase.
A straightforward solution to this problem is to store all characters as equal-length encodings. As shown in the figure below, each character in "Hello" occupies 1 byte, while each character in "算法" (algorithm) occupies 2 bytes. We could encode all characters in "Hello 算法" as 2 bytes by padding the higher bits with zeros. This method would enable the system to interpret a character every 2 bytes, recovering the content of the phrase.
However, as ASCII has shown us, encoding English only requires 1 byte. Using the above approach would double the space occupied by English text compared to ASCII encoding, which is a waste of memory space. Therefore, a more efficient Unicode encoding method is needed.
However, as ASCII has shown us, encoding English only requires 1 byte. Using the above approach would double the space occupied by English text compared to ASCII encoding, which is a waste of memory space. Therefore, a more efficient Unicode encoding method is needed.
## UTF-8 Encoding
## UTF-8 encoding
Currently, UTF-8 has become the most widely used Unicode encoding method internationally. **It is a variable-length encoding**, using 1 to 4 bytes to represent a character, depending on the complexity of the character. ASCII characters need only 1 byte, Latin and Greek letters require 2 bytes, commonly used Chinese characters need 3 bytes, and some other rare characters need 4 bytes.
Currently, UTF-8 has become the most widely used Unicode encoding method internationally. **It is a variable-length encoding**, using 1 to 4 bytes to represent a character, depending on the complexity of the character. ASCII characters need only 1 byte, Latin and Greek letters require 2 bytes, commonly used Chinese characters need 3 bytes, and some other rare characters need 4 bytes.
@ -51,24 +51,24 @@ But why set the highest 2 bits of the remaining bytes to $10$? Actually, this $1
The reason for using $10$ as a checksum is that, under UTF-8 encoding rules, it's impossible for the highest two bits of a character to be $10$. This can be proven by contradiction: If the highest two bits of a character are $10$, it indicates that the character's length is $1$, corresponding to ASCII. However, the highest bit of an ASCII character should be $0$, which contradicts the assumption.
The reason for using $10$ as a checksum is that, under UTF-8 encoding rules, it's impossible for the highest two bits of a character to be $10$. This can be proven by contradiction: If the highest two bits of a character are $10$, it indicates that the character's length is $1$, corresponding to ASCII. However, the highest bit of an ASCII character should be $0$, which contradicts the assumption.
Apart from UTF-8, other common encoding methods include:
Apart from UTF-8, other common encoding methods include:
- **UTF-16 Encoding**: Uses 2 or 4 bytes to represent a character. All ASCII characters and commonly used non-English characters are represented with 2 bytes; a few characters require 4 bytes. For 2-byte characters, the UTF-16 encoding equals the Unicode code point.
- **UTF-16 encoding**: Uses 2 or 4 bytes to represent a character. All ASCII characters and commonly used non-English characters are represented with 2 bytes; a few characters require 4 bytes. For 2-byte characters, the UTF-16 encoding equals the Unicode code point.
- **UTF-32 Encoding**: Every character uses 4 bytes. This means UTF-32 occupies more space than UTF-8 and UTF-16, especially for texts with a high proportion of ASCII characters.
- **UTF-32 encoding**: Every character uses 4 bytes. This means UTF-32 occupies more space than UTF-8 and UTF-16, especially for texts with a high proportion of ASCII characters.
From the perspective of storage space, using UTF-8 to represent English characters is very efficient because it only requires 1 byte; using UTF-16 to encode some non-English characters (such as Chinese) can be more efficient because it only requires 2 bytes, while UTF-8 might need 3 bytes.
From the perspective of storage space, using UTF-8 to represent English characters is very efficient because it only requires 1 byte; using UTF-16 to encode some non-English characters (such as Chinese) can be more efficient because it only requires 2 bytes, while UTF-8 might need 3 bytes.
From a compatibility perspective, UTF-8 is the most versatile, with many tools and libraries supporting UTF-8 as a priority.
From a compatibility perspective, UTF-8 is the most versatile, with many tools and libraries supporting UTF-8 as a priority.
## Character Encoding in Programming Languages
## Character encoding in programming languages
Historically, many programming languages utilized fixed-length encodings such as UTF-16 or UTF-32 for processing strings during program execution. This allows strings to be handled as arrays, offering several advantages:
Historically, many programming languages utilized fixed-length encodings such as UTF-16 or UTF-32 for processing strings during program execution. This allows strings to be handled as arrays, offering several advantages:
- **Random Access**: Strings encoded in UTF-16 can be accessed randomly with ease. For UTF-8, which is a variable-length encoding, locating the $i^{th}$ character requires traversing the string from the start to the $i^{th}$ position, taking $O(n)$ time.
- **Random access**: Strings encoded in UTF-16 can be accessed randomly with ease. For UTF-8, which is a variable-length encoding, locating the $i^{th}$ character requires traversing the string from the start to the $i^{th}$ position, taking $O(n)$ time.
- **Character Counting**: Similar to random access, counting the number of characters in a UTF-16 encoded string is an $O(1)$ operation. However, counting characters in a UTF-8 encoded string requires traversing the entire string.
- **Character counting**: Similar to random access, counting the number of characters in a UTF-16 encoded string is an $O(1)$ operation. However, counting characters in a UTF-8 encoded string requires traversing the entire string.
- **String Operations**: Many string operations like splitting, concatenating, inserting, and deleting are easier on UTF-16 encoded strings. These operations generally require additional computation on UTF-8 encoded strings to ensure the validity of the UTF-8 encoding.
- **String operations**: Many string operations like splitting, concatenating, inserting, and deleting are easier on UTF-16 encoded strings. These operations generally require additional computation on UTF-8 encoded strings to ensure the validity of the UTF-8 encoding.
The design of character encoding schemes in programming languages is an interesting topic involving various factors:
The design of character encoding schemes in programming languages is an interesting topic involving various factors:
Common data structures include arrays, linked lists, stacks, queues, hash tables, trees, heaps, and graphs. They can be classified into "logical structure" and "physical structure".
Common data structures include arrays, linked lists, stacks, queues, hash tables, trees, heaps, and graphs. They can be classified into "logical structure" and "physical structure".
## Logical Structure: Linear and Non-Linear
## Logical structure: linear and non-linear
**The logical structures reveal the logical relationships between data elements**. In arrays and linked lists, data are arranged in a specific sequence, demonstrating the linear relationship between data; while in trees, data are arranged hierarchically from the top down, showing the derived relationship between "ancestors" and "descendants"; and graphs are composed of nodes and edges, reflecting the intricate network relationship.
**The logical structures reveal the logical relationships between data elements**. In arrays and linked lists, data are arranged in a specific sequence, demonstrating the linear relationship between data; while in trees, data are arranged hierarchically from the top down, showing the derived relationship between "ancestors" and "descendants"; and graphs are composed of nodes and edges, reflecting the intricate network relationship.
As shown in the figure below, logical structures can be divided into two major categories: "linear" and "non-linear". Linear structures are more intuitive, indicating data is arranged linearly in logical relationships; non-linear structures, conversely, are arranged non-linearly.
As shown in the figure below, logical structures can be divided into two major categories: "linear" and "non-linear". Linear structures are more intuitive, indicating data is arranged linearly in logical relationships; non-linear structures, conversely, are arranged non-linearly.
- **Non-Linear Data Structures**: Trees, Heaps, Graphs, Hash Tables.
- **Non-linear data structures**: Trees, Heaps, Graphs, Hash Tables.
![Linear and Non-Linear Data Structures](classification_of_data_structure.assets/classification_logic_structure.png)
![Linear and non-linear data structures](classification_of_data_structure.assets/classification_logic_structure.png)
Non-linear data structures can be further divided into tree structures and network structures.
Non-linear data structures can be further divided into tree structures and network structures.
- **Linear Structures**: Arrays, linked lists, queues, stacks, and hash tables, where elements have a one-to-one sequential relationship.
- **Linear structures**: Arrays, linked lists, queues, stacks, and hash tables, where elements have a one-to-one sequential relationship.
- **Tree Structures**: Trees, Heaps, Hash Tables, where elements have a one-to-many relationship.
- **Tree structures**: Trees, Heaps, Hash Tables, where elements have a one-to-many relationship.
- **Network Structures**: Graphs, where elements have a many-to-many relationships.
- **Network structures**: Graphs, where elements have a many-to-many relationships.
## Physical Structure: Contiguous and Dispersed
## Physical structure: contiguous and dispersed
**During the execution of an algorithm, the data being processed is stored in memory**. The figure below shows a computer memory stick where each black square is a physical memory space. We can think of memory as a vast Excel spreadsheet, with each cell capable of storing a certain amount of data.
**During the execution of an algorithm, the data being processed is stored in memory**. The figure below shows a computer memory stick where each black square is a physical memory space. We can think of memory as a vast Excel spreadsheet, with each cell capable of storing a certain amount of data.
**The system accesses the data at the target location by means of a memory address**. As shown in the figure below, the computer assigns a unique identifier to each cell in the table according to specific rules, ensuring that each memory space has a unique memory address. With these addresses, the program can access the data stored in memory.
**The system accesses the data at the target location by means of a memory address**. As shown in the figure below, the computer assigns a unique identifier to each cell in the table according to specific rules, ensuring that each memory space has a unique memory address. With these addresses, the program can access the data stored in memory.
@ -35,7 +35,7 @@ Memory is a shared resource for all programs. When a block of memory is occupied
As illustrated in the figure below, **the physical structure reflects the way data is stored in computer memory** and it can be divided into contiguous space storage (arrays) and non-contiguous space storage (linked lists). The two types of physical structures exhibit complementary characteristics in terms of time efficiency and space efficiency.
As illustrated in the figure below, **the physical structure reflects the way data is stored in computer memory** and it can be divided into contiguous space storage (arrays) and non-contiguous space storage (linked lists). The two types of physical structures exhibit complementary characteristics in terms of time efficiency and space efficiency.
![Contiguous Space Storage and Dispersed Space Storage](classification_of_data_structure.assets/classification_phisical_structure.png)
![Contiguous space storage and dispersed space storage](classification_of_data_structure.assets/classification_phisical_structure.png)
**It is worth noting that all data structures are implemented based on arrays, linked lists, or a combination of both**. For example, stacks and queues can be implemented using either arrays or linked lists; while implementations of hash tables may involve both arrays and linked lists.
**It is worth noting that all data structures are implemented based on arrays, linked lists, or a combination of both**. For example, stacks and queues can be implemented using either arrays or linked lists; while implementations of hash tables may involve both arrays and linked lists.
In this book, chapters marked with an asterisk '*' are optional readings. If you are short on time or find them challenging, you may skip these initially and return to them after completing the essential chapters.
In this book, chapters marked with an asterisk '*' are optional readings. If you are short on time or find them challenging, you may skip these initially and return to them after completing the essential chapters.
## Integer Encoding
## Integer encoding
In the table from the previous section, we observed that all integer types can represent one more negative number than positive numbers, such as the `byte` range of $[-128, 127]$. This phenomenon seems counterintuitive, and its underlying reason involves knowledge of sign-magnitude, one's complement, and two's complement encoding.
In the table from the previous section, we observed that all integer types can represent one more negative number than positive numbers, such as the `byte` range of $[-128, 127]$. This phenomenon seems counterintuitive, and its underlying reason involves knowledge of sign-magnitude, one's complement, and two's complement encoding.
@ -16,7 +16,7 @@ Firstly, it's important to note that **numbers are stored in computers using the
The following diagram illustrates the conversions among sign-magnitude, one's complement, and two's complement:
The following diagram illustrates the conversions among sign-magnitude, one's complement, and two's complement:
![Conversions between Sign-Magnitude, One's Complement, and Two's Complement](number_encoding.assets/1s_2s_complement.png)
![Conversions between sign-magnitude, one's complement, and two's complement](number_encoding.assets/1s_2s_complement.png)
Although sign-magnitude is the most intuitive, it has limitations. For one, **negative numbers in sign-magnitude cannot be directly used in calculations**. For example, in sign-magnitude, calculating $1 + (-2)$ results in $-3$, which is incorrect.
Although sign-magnitude is the most intuitive, it has limitations. For one, **negative numbers in sign-magnitude cannot be directly used in calculations**. For example, in sign-magnitude, calculating $1 + (-2)$ results in $-3$, which is incorrect.
@ -86,7 +86,7 @@ We can now summarize the reason for using two's complement in computers: with tw
The design of two's complement is quite ingenious, and due to space constraints, we'll stop here. Interested readers are encouraged to explore further.
The design of two's complement is quite ingenious, and due to space constraints, we'll stop here. Interested readers are encouraged to explore further.
## Floating-Point Number Encoding
## Floating-point number encoding
You might have noticed something intriguing: despite having the same length of 4 bytes, why does a `float` have a much larger range of values compared to an `int`? This seems counterintuitive, as one would expect the range to shrink for `float` since it needs to represent fractions.
You might have noticed something intriguing: despite having the same length of 4 bytes, why does a `float` have a much larger range of values compared to an `int`? This seems counterintuitive, as one would expect the range to shrink for `float` since it needs to represent fractions.
@ -123,7 +123,7 @@ $$
\end{aligned}
\end{aligned}
$$
$$
![Example Calculation of a float in IEEE 754 Standard](number_encoding.assets/ieee_754_float.png)
![Example calculation of a float in IEEE 754 standard](number_encoding.assets/ieee_754_float.png)
Observing the diagram, given an example data $\mathrm{S} = 0$, $\mathrm{E} = 124$, $\mathrm{N} = 2^{-2} + 2^{-3} = 0.375$, we have:
Observing the diagram, given an example data $\mathrm{S} = 0$, $\mathrm{E} = 124$, $\mathrm{N} = 2^{-2} + 2^{-3} = 0.375$, we have:
@ -137,7 +137,7 @@ Now we can answer the initial question: **The representation of `float` includes
As shown in the table below, exponent bits $E = 0$ and $E = 255$ have special meanings, **used to represent zero, infinity, $\mathrm{NaN}$, etc.**
As shown in the table below, exponent bits $E = 0$ and $E = 255$ have special meanings, **used to represent zero, infinity, $\mathrm{NaN}$, etc.**
<palign="center"> Table <id> Meaning of Exponent Bits </p>
<palign="center"> Table <id> Meaning of exponent bits </p>
| Exponent Bit E | Fraction Bit $\mathrm{N} = 0$ | Fraction Bit $\mathrm{N} \ne 0$ | Calculation Formula |
| Exponent Bit E | Fraction Bit $\mathrm{N} = 0$ | Fraction Bit $\mathrm{N} \ne 0$ | Calculation Formula |
- Data structures can be categorized from two perspectives: logical structure and physical structure. Logical structure describes the logical relationships between data elements, while physical structure describes how data is stored in computer memory.
- Data structures can be categorized from two perspectives: logical structure and physical structure. Logical structure describes the logical relationships between data elements, while physical structure describes how data is stored in computer memory.
- Common logical structures include linear, tree-like, and network structures. We generally classify data structures into linear (arrays, linked lists, stacks, queues) and non-linear (trees, graphs, heaps) based on their logical structure. The implementation of hash tables may involve both linear and non-linear data structures.
- Common logical structures include linear, tree-like, and network structures. We generally classify data structures into linear (arrays, linked lists, stacks, queues) and non-linear (trees, graphs, heaps) based on their logical structure. The implementation of hash tables may involve both linear and non-linear data structures.
The previous two sections introduced the working principle of hash tables and the methods to handle hash collisions. However, both open addressing and chaining can **only ensure that the hash table functions normally when collisions occur, but cannot reduce the frequency of hash collisions**.
The previous two sections introduced the working principle of hash tables and the methods to handle hash collisions. However, both open addressing and chaining can **only ensure that the hash table functions normally when collisions occur, but cannot reduce the frequency of hash collisions**.
If hash collisions occur too frequently, the performance of the hash table will deteriorate drastically. As shown in the figure below, for a chaining hash table, in the ideal case, the key-value pairs are evenly distributed across the buckets, achieving optimal query efficiency; in the worst case, all key-value pairs are stored in the same bucket, degrading the time complexity to $O(n)$.
If hash collisions occur too frequently, the performance of the hash table will deteriorate drastically. As shown in the figure below, for a chaining hash table, in the ideal case, the key-value pairs are evenly distributed across the buckets, achieving optimal query efficiency; in the worst case, all key-value pairs are stored in the same bucket, degrading the time complexity to $O(n)$.
![Ideal and Worst Cases of Hash Collisions](hash_algorithm.assets/hash_collision_best_worst_condition.png)
![Ideal and worst cases of hash collisions](hash_algorithm.assets/hash_collision_best_worst_condition.png)
**The distribution of key-value pairs is determined by the hash function**. Recalling the steps of calculating a hash function, first compute the hash value, then modulo it by the array length:
**The distribution of key-value pairs is determined by the hash function**. Recalling the steps of calculating a hash function, first compute the hash value, then modulo it by the array length:
@ -16,35 +16,35 @@ Observing the above formula, when the hash table capacity `capacity` is fixed, *
This means that, to reduce the probability of hash collisions, we should focus on the design of the hash algorithm `hash()`.
This means that, to reduce the probability of hash collisions, we should focus on the design of the hash algorithm `hash()`.
## Goals of Hash Algorithms
## Goals of hash algorithms
To achieve a "fast and stable" hash table data structure, hash algorithms should have the following characteristics:
To achieve a "fast and stable" hash table data structure, hash algorithms should have the following characteristics:
- **Determinism**: For the same input, the hash algorithm should always produce the same output. Only then can the hash table be reliable.
- **Determinism**: For the same input, the hash algorithm should always produce the same output. Only then can the hash table be reliable.
- **High Efficiency**: The process of computing the hash value should be fast enough. The smaller the computational overhead, the more practical the hash table.
- **High efficiency**: The process of computing the hash value should be fast enough. The smaller the computational overhead, the more practical the hash table.
- **Uniform Distribution**: The hash algorithm should ensure that key-value pairs are evenly distributed in the hash table. The more uniform the distribution, the lower the probability of hash collisions.
- **Uniform distribution**: The hash algorithm should ensure that key-value pairs are evenly distributed in the hash table. The more uniform the distribution, the lower the probability of hash collisions.
In fact, hash algorithms are not only used to implement hash tables but are also widely applied in other fields.
In fact, hash algorithms are not only used to implement hash tables but are also widely applied in other fields.
- **Password Storage**: To protect the security of user passwords, systems usually do not store the plaintext passwords but rather the hash values of the passwords. When a user enters a password, the system calculates the hash value of the input and compares it with the stored hash value. If they match, the password is considered correct.
- **Password storage**: To protect the security of user passwords, systems usually do not store the plaintext passwords but rather the hash values of the passwords. When a user enters a password, the system calculates the hash value of the input and compares it with the stored hash value. If they match, the password is considered correct.
- **Data Integrity Check**: The data sender can calculate the hash value of the data and send it along; the receiver can recalculate the hash value of the received data and compare it with the received hash value. If they match, the data is considered intact.
- **Data integrity check**: The data sender can calculate the hash value of the data and send it along; the receiver can recalculate the hash value of the received data and compare it with the received hash value. If they match, the data is considered intact.
For cryptographic applications, to prevent reverse engineering such as deducing the original password from the hash value, hash algorithms need higher-level security features.
For cryptographic applications, to prevent reverse engineering such as deducing the original password from the hash value, hash algorithms need higher-level security features.
- **Unidirectionality**: It should be impossible to deduce any information about the input data from the hash value.
- **Unidirectionality**: It should be impossible to deduce any information about the input data from the hash value.
- **Collision Resistance**: It should be extremely difficult to find two different inputs that produce the same hash value.
- **Collision resistance**: It should be extremely difficult to find two different inputs that produce the same hash value.
- **Avalanche Effect**: Minor changes in the input should lead to significant and unpredictable changes in the output.
- **Avalanche effect**: Minor changes in the input should lead to significant and unpredictable changes in the output.
Note that **"Uniform Distribution" and "Collision Resistance" are two separate concepts**. Satisfying uniform distribution does not necessarily mean collision resistance. For example, under random input `key`, the hash function `key % 100` can produce a uniformly distributed output. However, this hash algorithm is too simple, and all `key` with the same last two digits will have the same output, making it easy to deduce a usable `key` from the hash value, thereby cracking the password.
Note that **"Uniform Distribution" and "Collision Resistance" are two separate concepts**. Satisfying uniform distribution does not necessarily mean collision resistance. For example, under random input `key`, the hash function `key % 100` can produce a uniformly distributed output. However, this hash algorithm is too simple, and all `key` with the same last two digits will have the same output, making it easy to deduce a usable `key` from the hash value, thereby cracking the password.
## Design of Hash Algorithms
## Design of hash algorithms
The design of hash algorithms is a complex issue that requires consideration of many factors. However, for some less demanding scenarios, we can also design some simple hash algorithms.
The design of hash algorithms is a complex issue that requires consideration of many factors. However, for some less demanding scenarios, we can also design some simple hash algorithms.
- **Additive Hash**: Add up the ASCII codes of each character in the input and use the total sum as the hash value.
- **Additive hash**: Add up the ASCII codes of each character in the input and use the total sum as the hash value.
- **Multiplicative Hash**: Utilize the non-correlation of multiplication, multiplying each round by a constant, accumulating the ASCII codes of each character into the hash value.
- **Multiplicative hash**: Utilize the non-correlation of multiplication, multiplying each round by a constant, accumulating the ASCII codes of each character into the hash value.
- **XOR Hash**: Accumulate the hash value by XORing each element of the input data.
- **XOR hash**: Accumulate the hash value by XORing each element of the input data.
- **Rotating Hash**: Accumulate the ASCII code of each character into a hash value, performing a rotation operation on the hash value before each accumulation.
- **Rotating hash**: Accumulate the ASCII code of each character into a hash value, performing a rotation operation on the hash value before each accumulation.
```src
```src
[file]{simple_hash}-[class]{}-[func]{rot_hash}
[file]{simple_hash}-[class]{}-[func]{rot_hash}
@ -78,7 +78,7 @@ It is worth noting that if the `key` is guaranteed to be randomly and uniformly
In summary, we usually choose a prime number as the modulus, and this prime number should be large enough to eliminate periodic patterns as much as possible, enhancing the robustness of the hash algorithm.
In summary, we usually choose a prime number as the modulus, and this prime number should be large enough to eliminate periodic patterns as much as possible, enhancing the robustness of the hash algorithm.
## Common Hash Algorithms
## Common hash algorithms
It is not hard to see that the simple hash algorithms mentioned above are quite "fragile" and far from reaching the design goals of hash algorithms. For example, since addition and XOR obey the commutative law, additive hash and XOR hash cannot distinguish strings with the same content but in different order, which may exacerbate hash collisions and cause security issues.
It is not hard to see that the simple hash algorithms mentioned above are quite "fragile" and far from reaching the design goals of hash algorithms. For example, since addition and XOR obey the commutative law, additive hash and XOR hash cannot distinguish strings with the same content but in different order, which may exacerbate hash collisions and cause security issues.
@ -90,7 +90,7 @@ Over the past century, hash algorithms have been in a continuous process of upgr
- SHA-2 series, especially SHA-256, is one of the most secure hash algorithms to date, with no successful attacks reported, hence commonly used in various security applications and protocols.
- SHA-2 series, especially SHA-256, is one of the most secure hash algorithms to date, with no successful attacks reported, hence commonly used in various security applications and protocols.
- SHA-3 has lower implementation costs and higher computational efficiency compared to SHA-2, but its current usage coverage is not as extensive as the SHA-2 series.
- SHA-3 has lower implementation costs and higher computational efficiency compared to SHA-2, but its current usage coverage is not as extensive as the SHA-2 series.
<palign="center"> Table <id> Common Hash Algorithms </p>
<palign="center"> Table <id> Common hash algorithms </p>
@ -100,7 +100,7 @@ Over the past century, hash algorithms have been in a continuous process of upgr
| Security Level | Low, has been successfully attacked | Low, has been successfully attacked | High | High |
| Security Level | Low, has been successfully attacked | Low, has been successfully attacked | High | High |
| Applications | Abandoned, still used for data integrity checks | Abandoned | Cryptocurrency transaction verification, digital signatures, etc. | Can be used to replace SHA-2 |
| Applications | Abandoned, still used for data integrity checks | Abandoned | Cryptocurrency transaction verification, digital signatures, etc. | Can be used to replace SHA-2 |
# Hash Values in Data Structures
# Hash values in data structures
We know that the keys in a hash table can be of various data types such as integers, decimals, or strings. Programming languages usually provide built-in hash algorithms for these data types to calculate the bucket indices in the hash table. Taking Python as an example, we can use the `hash()` function to compute the hash values for various data types.
We know that the keys in a hash table can be of various data types such as integers, decimals, or strings. Programming languages usually provide built-in hash algorithms for these data types to calculate the bucket indices in the hash table. Taking Python as an example, we can use the `hash()` function to compute the hash values for various data types.
As mentioned in the previous section, **usually the input space of a hash function is much larger than its output space**, making hash collisions theoretically inevitable. For example, if the input space consists of all integers and the output space is the size of the array capacity, multiple integers will inevitably map to the same bucket index.
As mentioned in the previous section, **usually the input space of a hash function is much larger than its output space**, making hash collisions theoretically inevitable. For example, if the input space consists of all integers and the output space is the size of the array capacity, multiple integers will inevitably map to the same bucket index.
@ -9,22 +9,22 @@ Hash collisions can lead to incorrect query results, severely affecting the usab
There are mainly two methods for improving the structure of hash tables: "Separate Chaining" and "Open Addressing".
There are mainly two methods for improving the structure of hash tables: "Separate Chaining" and "Open Addressing".
## Separate Chaining
## Separate chaining
In the original hash table, each bucket can store only one key-value pair. "Separate chaining" transforms individual elements into a linked list, with key-value pairs as list nodes, storing all colliding key-value pairs in the same list. The figure below shows an example of a hash table with separate chaining.
In the original hash table, each bucket can store only one key-value pair. "Separate chaining" transforms individual elements into a linked list, with key-value pairs as list nodes, storing all colliding key-value pairs in the same list. The figure below shows an example of a hash table with separate chaining.
The operations of a hash table implemented with separate chaining have changed as follows:
The operations of a hash table implemented with separate chaining have changed as follows:
- **Querying Elements**: Input `key`, pass through the hash function to obtain the bucket index, access the head node of the list, then traverse the list and compare `key` to find the target key-value pair.
- **Querying elements**: Input `key`, pass through the hash function to obtain the bucket index, access the head node of the list, then traverse the list and compare `key` to find the target key-value pair.
- **Adding Elements**: First access the list head node via the hash function, then add the node (key-value pair) to the list.
- **Adding elements**: First access the list head node via the hash function, then add the node (key-value pair) to the list.
- **Deleting Elements**: Access the list head based on the hash function's result, then traverse the list to find and remove the target node.
- **Deleting elements**: Access the list head based on the hash function's result, then traverse the list to find and remove the target node.
Separate chaining has the following limitations:
Separate chaining has the following limitations:
- **Increased Space Usage**: The linked list contains node pointers, which consume more memory space than arrays.
- **Increased space usage**: The linked list contains node pointers, which consume more memory space than arrays.
- **Reduced Query Efficiency**: Due to the need for linear traversal of the list to find the corresponding element.
- **Reduced query efficiency**: Due to the need for linear traversal of the list to find the corresponding element.
The code below provides a simple implementation of a separate chaining hash table, with two things to note:
The code below provides a simple implementation of a separate chaining hash table, with two things to note:
@ -37,28 +37,28 @@ The code below provides a simple implementation of a separate chaining hash tabl
It's worth noting that when the list is very long, the query efficiency $O(n)$ is poor. **At this point, the list can be converted to an "AVL tree" or "Red-Black tree"** to optimize the time complexity of the query operation to $O(\log n)$.
It's worth noting that when the list is very long, the query efficiency $O(n)$ is poor. **At this point, the list can be converted to an "AVL tree" or "Red-Black tree"** to optimize the time complexity of the query operation to $O(\log n)$.
## Open Addressing
## Open addressing
"Open addressing" does not introduce additional data structures but uses "multiple probes" to handle hash collisions. The probing methods mainly include linear probing, quadratic probing, and double hashing.
"Open addressing" does not introduce additional data structures but uses "multiple probes" to handle hash collisions. The probing methods mainly include linear probing, quadratic probing, and double hashing.
Let's use linear probing as an example to introduce the mechanism of open addressing hash tables.
Let's use linear probing as an example to introduce the mechanism of open addressing hash tables.
### Linear Probing
### Linear probing
Linear probing uses a fixed-step linear search for probing, differing from ordinary hash tables.
Linear probing uses a fixed-step linear search for probing, differing from ordinary hash tables.
- **Inserting Elements**: Calculate the bucket index using the hash function. If the bucket already contains an element, linearly traverse forward from the conflict position (usually with a step size of $1$) until an empty bucket is found, then insert the element.
- **Inserting elements**: Calculate the bucket index using the hash function. If the bucket already contains an element, linearly traverse forward from the conflict position (usually with a step size of $1$) until an empty bucket is found, then insert the element.
- **Searching for Elements**: If a hash collision is found, use the same step size to linearly traverse forward until the corresponding element is found and return `value`; if an empty bucket is encountered, it means the target element is not in the hash table, so return `None`.
- **Searching for elements**: If a hash collision is found, use the same step size to linearly traverse forward until the corresponding element is found and return `value`; if an empty bucket is encountered, it means the target element is not in the hash table, so return `None`.
The figure below shows the distribution of key-value pairs in an open addressing (linear probing) hash table. According to this hash function, keys with the same last two digits will be mapped to the same bucket. Through linear probing, they are stored consecutively in that bucket and the buckets below it.
The figure below shows the distribution of key-value pairs in an open addressing (linear probing) hash table. According to this hash function, keys with the same last two digits will be mapped to the same bucket. Through linear probing, they are stored consecutively in that bucket and the buckets below it.
![Distribution of Key-Value Pairs in Open Addressing (Linear Probing) Hash Table](hash_collision.assets/hash_table_linear_probing.png)
![Distribution of key-value pairs in open addressing (linear probing) hash table](hash_collision.assets/hash_table_linear_probing.png)
However, **linear probing tends to create "clustering"**. Specifically, the longer a continuous position in the array is occupied, the more likely these positions are to encounter hash collisions, further promoting the growth of these clusters and eventually leading to deterioration in the efficiency of operations.
However, **linear probing tends to create "clustering"**. Specifically, the longer a continuous position in the array is occupied, the more likely these positions are to encounter hash collisions, further promoting the growth of these clusters and eventually leading to deterioration in the efficiency of operations.
It's important to note that **we cannot directly delete elements in an open addressing hash table**. Deleting an element creates an empty bucket `None` in the array. When searching for elements, if linear probing encounters this empty bucket, it will return, making the elements below this bucket inaccessible. The program may incorrectly assume these elements do not exist, as shown in the figure below.
It's important to note that **we cannot directly delete elements in an open addressing hash table**. Deleting an element creates an empty bucket `None` in the array. When searching for elements, if linear probing encounters this empty bucket, it will return, making the elements below this bucket inaccessible. The program may incorrectly assume these elements do not exist, as shown in the figure below.
![Query Issues Caused by Deletion in Open Addressing](hash_collision.assets/hash_table_open_addressing_deletion.png)
![Query issues caused by deletion in open addressing](hash_collision.assets/hash_table_open_addressing_deletion.png)
To solve this problem, we can use a "lazy deletion" mechanism: instead of directly removing elements from the hash table, **use a constant `TOMBSTONE` to mark the bucket**. In this mechanism, both `None` and `TOMBSTONE` represent empty buckets and can hold key-value pairs. However, when linear probing encounters `TOMBSTONE`, it should continue traversing since there may still be key-value pairs below it.
To solve this problem, we can use a "lazy deletion" mechanism: instead of directly removing elements from the hash table, **use a constant `TOMBSTONE` to mark the bucket**. In this mechanism, both `None` and `TOMBSTONE` represent empty buckets and can hold key-value pairs. However, when linear probing encounters `TOMBSTONE`, it should continue traversing since there may still be key-value pairs below it.
@ -72,7 +72,7 @@ The code below implements an open addressing (linear probing) hash table with la
Quadratic probing is similar to linear probing and is one of the common strategies of open addressing. When a collision occurs, quadratic probing does not simply skip a fixed number of steps but skips "the square of the number of probes," i.e., $1, 4, 9, \dots$ steps.
Quadratic probing is similar to linear probing and is one of the common strategies of open addressing. When a collision occurs, quadratic probing does not simply skip a fixed number of steps but skips "the square of the number of probes," i.e., $1, 4, 9, \dots$ steps.
@ -86,12 +86,12 @@ However, quadratic probing is not perfect:
- Clustering still exists, i.e., some positions are more likely to be occupied than others.
- Clustering still exists, i.e., some positions are more likely to be occupied than others.
- Due to the growth of squares, quadratic probing may not probe the entire hash table, meaning it might not access empty buckets even if they exist in the hash table.
- Due to the growth of squares, quadratic probing may not probe the entire hash table, meaning it might not access empty buckets even if they exist in the hash table.
### Double Hashing
### Double hashing
As the name suggests, the double hashing method uses multiple hash functions $f_1(x)$, $f_2(x)$, $f_3(x)$, $\dots$ for probing.
As the name suggests, the double hashing method uses multiple hash functions $f_1(x)$, $f_2(x)$, $f_3(x)$, $\dots$ for probing.
- **Inserting Elements**: If hash function $f_1(x)$ encounters a conflict, try $f_2(x)$, and so on, until an empty position is found and the element is inserted.
- **Inserting elements**: If hash function $f_1(x)$ encounters a conflict, try $f_2(x)$, and so on, until an empty position is found and the element is inserted.
- **Searching for Elements**: Search in the same order of hash functions until the target element is found and returned; if an empty position is encountered or all hash functions have been tried, it indicates the element is not in the hash table, then return `None`.
- **Searching for elements**: Search in the same order of hash functions until the target element is found and returned; if an empty position is encountered or all hash functions have been tried, it indicates the element is not in the hash table, then return `None`.
Compared to linear probing, double hashing is less prone to clustering but involves additional computation for multiple hash functions.
Compared to linear probing, double hashing is less prone to clustering but involves additional computation for multiple hash functions.
@ -99,7 +99,7 @@ Compared to linear probing, double hashing is less prone to clustering but invol
Please note that open addressing (linear probing, quadratic probing, and double hashing) hash tables all have the issue of "not being able to directly delete elements."
Please note that open addressing (linear probing, quadratic probing, and double hashing) hash tables all have the issue of "not being able to directly delete elements."
## Choice of Programming Languages
## Choice of programming languages
Various programming languages have adopted different hash table implementation strategies, here are a few examples:
Various programming languages have adopted different hash table implementation strategies, here are a few examples:
A "hash table", also known as a "hash map", achieves efficient element querying by establishing a mapping between keys and values. Specifically, when we input a `key` into the hash table, we can retrieve the corresponding `value` in $O(1)$ time.
A "hash table", also known as a "hash map", achieves efficient element querying by establishing a mapping between keys and values. Specifically, when we input a `key` into the hash table, we can retrieve the corresponding `value` in $O(1)$ time.
@ -8,11 +8,11 @@ As shown in the figure below, given $n$ students, each with two pieces of data:
Apart from hash tables, arrays and linked lists can also be used to implement querying functions. Their efficiency is compared in the table below.
Apart from hash tables, arrays and linked lists can also be used to implement querying functions. Their efficiency is compared in the table below.
- **Adding Elements**: Simply add the element to the end of the array (or linked list), using $O(1)$ time.
- **Adding elements**: Simply add the element to the end of the array (or linked list), using $O(1)$ time.
- **Querying Elements**: Since the array (or linked list) is unordered, it requires traversing all the elements, using $O(n)$ time.
- **Querying elements**: Since the array (or linked list) is unordered, it requires traversing all the elements, using $O(n)$ time.
- **Deleting Elements**: First, locate the element, then delete it from the array (or linked list), using $O(n)$ time.
- **Deleting elements**: First, locate the element, then delete it from the array (or linked list), using $O(n)$ time.
<palign="center"> Table <id> Comparison of Element Query Efficiency </p>
<palign="center"> Table <id> Comparison of element query efficiency </p>
@ -22,7 +22,7 @@ Apart from hash tables, arrays and linked lists can also be used to implement qu
Observations reveal that **the time complexity for adding, deleting, and querying in a hash table is $O(1)$**, which is highly efficient.
Observations reveal that **the time complexity for adding, deleting, and querying in a hash table is $O(1)$**, which is highly efficient.
## Common Operations of Hash Table
## Common operations of hash table
Common operations of a hash table include initialization, querying, adding key-value pairs, and deleting key-value pairs, etc. Example code is as follows:
Common operations of a hash table include initialization, querying, adding key-value pairs, and deleting key-value pairs, etc. Example code is as follows:
@ -484,7 +484,7 @@ There are three common ways to traverse a hash table: traversing key-value pairs
First, let's consider the simplest case: **implementing a hash table using just an array**. In the hash table, each empty slot in the array is called a "bucket", and each bucket can store one key-value pair. Therefore, the query operation involves finding the bucket corresponding to the `key` and retrieving the `value` from it.
First, let's consider the simplest case: **implementing a hash table using just an array**. In the hash table, each empty slot in the array is called a "bucket", and each bucket can store one key-value pair. Therefore, the query operation involves finding the bucket corresponding to the `key` and retrieving the `value` from it.
@ -511,7 +511,7 @@ The following code implements a simple hash table. Here, we encapsulate `key` an
Fundamentally, the role of the hash function is to map the entire input space of all keys to the output space of all array indices. However, the input space is often much larger than the output space. Therefore, **theoretically, there must be situations where "multiple inputs correspond to the same output"**.
Fundamentally, the role of the hash function is to map the entire input space of all keys to the output space of all array indices. However, the input space is often much larger than the output space. Therefore, **theoretically, there must be situations where "multiple inputs correspond to the same output"**.
When we hear the word "algorithm," we naturally think of mathematics. However, many algorithms do not involve complex mathematics but rely more on basic logic, which can be seen everywhere in our daily lives.
When we hear the word "algorithm," we naturally think of mathematics. However, many algorithms do not involve complex mathematics but rely more on basic logic, which can be seen everywhere in our daily lives.
@ -33,7 +33,7 @@ This essential skill for elementary students, looking up a dictionary, is actual
2. Take out a card from the unordered section and insert it into the correct position in the ordered section; after this, the leftmost two cards are in order.
2. Take out a card from the unordered section and insert it into the correct position in the ordered section; after this, the leftmost two cards are in order.
3. Continue to repeat step `2.` until all cards are in order.
3. Continue to repeat step `2.` until all cards are in order.
The above method of organizing playing cards is essentially the "Insertion Sort" algorithm, which is very efficient for small datasets. Many programming languages' sorting functions include the insertion sort.
The above method of organizing playing cards is essentially the "Insertion Sort" algorithm, which is very efficient for small datasets. Many programming languages' sorting functions include the insertion sort.
This open-source project aims to create a free, and beginner-friendly crash course on data structures and algorithms.
This open-source project aims to create a free, and beginner-friendly crash course on data structures and algorithms.
@ -6,7 +6,7 @@ This open-source project aims to create a free, and beginner-friendly crash cour
- Run code with just one click, supporting Java, C++, Python, Go, JS, TS, C#, Swift, Rust, Dart, Zig and other languages.
- Run code with just one click, supporting Java, C++, Python, Go, JS, TS, C#, Swift, Rust, Dart, Zig and other languages.
- Readers are encouraged to engage with each other in the discussion area for each section, questions and comments are usually answered within two days.
- Readers are encouraged to engage with each other in the discussion area for each section, questions and comments are usually answered within two days.
## Target Audience
## Target audience
If you are new to algorithms with limited exposure, or you have accumulated some experience in algorithms, but you only have a vague understanding of data structures and algorithms, and you are constantly jumping between "yep" and "hmm", then this book is for you!
If you are new to algorithms with limited exposure, or you have accumulated some experience in algorithms, but you only have a vague understanding of data structures and algorithms, and you are constantly jumping between "yep" and "hmm", then this book is for you!
@ -18,15 +18,15 @@ If you are an algorithm expert, we look forward to receiving your valuable sugge
You should know how to write and read simple code in at least one programming language.
You should know how to write and read simple code in at least one programming language.
## Content Structure
## Content structure
The main content of the book is shown in the following figure.
The main content of the book is shown in the following figure.
- **Complexity Analysis**: explores aspects and methods for evaluating data structures and algorithms. Covers methods of deriving time complexity and space complexity, along with common types and examples.
- **Complexity analysis**: explores aspects and methods for evaluating data structures and algorithms. Covers methods of deriving time complexity and space complexity, along with common types and examples.
- **Data Structures**: focuses on fundamental data types, classification methods, definitions, pros and cons, common operations, types, applications, and implementation methods of data structures such as array, linked list, stack, queue, hash table, tree, heap, graph, etc.
- **Data structures**: focuses on fundamental data types, classification methods, definitions, pros and cons, common operations, types, applications, and implementation methods of data structures such as array, linked list, stack, queue, hash table, tree, heap, graph, etc.
- **Algorithms**: defines algorithms, discusses their pros and cons, efficiency, application scenarios, problem-solving steps, and includes sample questions for various algorithms such as search, sorting, divide and conquer, backtracking, dynamic programming, greedy algorithms, and more.
- **Algorithms**: defines algorithms, discusses their pros and cons, efficiency, application scenarios, problem-solving steps, and includes sample questions for various algorithms such as search, sorting, divide and conquer, backtracking, dynamic programming, greedy algorithms, and more.
![Main Content of the Book](about_the_book.assets/hello_algo_mindmap.png)
![Main content of the book](about_the_book.assets/hello_algo_mindmap.png)
For the best reading experience, it is recommended that you read through this section.
For the best reading experience, it is recommended that you read through this section.
## Writing Conventions
## Writing conventions
- Chapters marked with '*' after the title are optional and contain relatively challenging content. If you are short on time, it is advisable to skip them.
- Chapters marked with '*' after the title are optional and contain relatively challenging content. If you are short on time, it is advisable to skip them.
- Technical terms will be in boldface (in the print and PDF versions) or underlined (in the web version), for instance, <u>array</u>. It's advisable to familiarize yourself with these for better comprehension of technical texts.
- Technical terms will be in boldface (in the print and PDF versions) or underlined (in the web version), for instance, <u>array</u>. It's advisable to familiarize yourself with these for better comprehension of technical texts.
@ -16,7 +16,7 @@
=== "Python"
=== "Python"
```python title=""
```python title=""
"""Header comments for labeling functions, classes, test samples, etc""""
"""Header comments for labeling functions, classes, test samples, etc"""
# Comments for explaining details
# Comments for explaining details
@ -180,15 +180,15 @@
// comments
// comments
```
```
## Efficient Learning via Animated Illustrations
## Efficient learning via animated illustrations
Compared with text, videos and pictures have a higher density of information and are more structured, making them easier to understand. In this book, **key and difficult concepts are mainly presented through animations and illustrations**, with text serving as explanations and supplements.
Compared with text, videos and pictures have a higher density of information and are more structured, making them easier to understand. In this book, **key and difficult concepts are mainly presented through animations and illustrations**, with text serving as explanations and supplements.
When encountering content with animations or illustrations as shown in the figure below, **prioritize understanding the figure, with text as supplementary**, integrating both for a comprehensive understanding.
When encountering content with animations or illustrations as shown in the figure below, **prioritize understanding the figure, with text as supplementary**, integrating both for a comprehensive understanding.
The source code of this book is hosted on the [GitHub Repository](https://github.com/krahets/hello-algo). As shown in the figure below, **the source code comes with test examples and can be executed with just a single click**.
The source code of this book is hosted on the [GitHub Repository](https://github.com/krahets/hello-algo). As shown in the figure below, **the source code comes with test examples and can be executed with just a single click**.
@ -196,7 +196,7 @@ If time permits, **it's recommended to type out the code yourself**. If pressed
Compared to just reading code, writing code often yields more learning. **Learning by doing is the real way to learn.**
Compared to just reading code, writing code often yields more learning. **Learning by doing is the real way to learn.**
Alternatively, you can also click the "Download ZIP" button at the location shown in the figure below to directly download the code as a compressed ZIP file. Then, you can simply extract it locally.
Alternatively, you can also click the "Download ZIP" button at the location shown in the figure below to directly download the code as a compressed ZIP file. Then, you can simply extract it locally.
![Cloning Repository and Downloading Code](suggestions.assets/download_code.png)
![Cloning repository and downloading code](suggestions.assets/download_code.png)
**Step 3: Run the source code**. As shown in the figure below, for the code block labeled with the file name at the top, we can find the corresponding source code file in the `codes` folder of the repository. These files can be executed with a single click, which will help you save unnecessary debugging time and allow you to focus on learning.
**Step 3: Run the source code**. As shown in the figure below, for the code block labeled with the file name at the top, we can find the corresponding source code file in the `codes` folder of the repository. These files can be executed with a single click, which will help you save unnecessary debugging time and allow you to focus on learning.
![Code Block and Corresponding Source Code File](suggestions.assets/code_md_to_repo.png)
![Code block and corresponding source code file](suggestions.assets/code_md_to_repo.png)
## Learning Together in Discussion
## Learning together in discussion
While reading this book, please don't skip over the points that you didn't learn. **Feel free to post your questions in the comment section**. We will be happy to answer them and can usually respond within two days.
While reading this book, please don't skip over the points that you didn't learn. **Feel free to post your questions in the comment section**. We will be happy to answer them and can usually respond within two days.
As illustrated in the figure below, each chapter features a comment section at the bottom. I encourage you to pay attention to these comments. They not only expose you to others' encountered problems, aiding in identifying knowledge gaps and sparking deeper contemplation, but also invite you to generously contribute by answering fellow readers' inquiries, sharing insights, and fostering mutual improvement.
As illustrated in the figure below, each chapter features a comment section at the bottom. I encourage you to pay attention to these comments. They not only expose you to others' encountered problems, aiding in identifying knowledge gaps and sparking deeper contemplation, but also invite you to generously contribute by answering fellow readers' inquiries, sharing insights, and fostering mutual improvement.
In a queue, we can only delete elements from the head or add elements to the tail. As shown in the following diagram, a "double-ended queue (deque)" offers more flexibility, allowing the addition or removal of elements at both the head and the tail.
In a queue, we can only delete elements from the head or add elements to the tail. As shown in the following diagram, a "double-ended queue (deque)" offers more flexibility, allowing the addition or removal of elements at both the head and the tail.
![Operations in Double-Ended Queue](deque.assets/deque_operations.png)
![Operations in double-ended queue](deque.assets/deque_operations.png)
## Common Operations in Double-Ended Queue
## Common operations in double-ended queue
The common operations in a double-ended queue are listed below, and the names of specific methods depend on the programming language used.
The common operations in a double-ended queue are listed below, and the names of specific methods depend on the programming language used.
<palign="center"> Table <id> Efficiency of Double-Ended Queue Operations </p>
<palign="center"> Table <id> Efficiency of double-ended queue operations </p>
The implementation of a double-ended queue is similar to that of a regular queue, it can be based on either a linked list or an array as the underlying data structure.
The implementation of a double-ended queue is similar to that of a regular queue, it can be based on either a linked list or an array as the underlying data structure.
### Implementation Based on Doubly Linked List
### Implementation based on doubly linked list
Recall from the previous section that we used a regular singly linked list to implement a queue, as it conveniently allows for deleting from the head (corresponding to the dequeue operation) and adding new elements after the tail (corresponding to the enqueue operation).
Recall from the previous section that we used a regular singly linked list to implement a queue, as it conveniently allows for deleting from the head (corresponding to the dequeue operation) and adding new elements after the tail (corresponding to the enqueue operation).
@ -373,7 +373,7 @@ The implementation code is as follows:
As shown in the figure below, similar to implementing a queue with an array, we can also use a circular array to implement a double-ended queue.
As shown in the figure below, similar to implementing a queue with an array, we can also use a circular array to implement a double-ended queue.
@ -398,7 +398,7 @@ The implementation only needs to add methods for "front enqueue" and "rear deque
[file]{array_deque}-[class]{array_deque}-[func]{}
[file]{array_deque}-[class]{array_deque}-[func]{}
```
```
## Applications of Double-Ended Queue
## Applications of double-ended queue
The double-ended queue combines the logic of both stacks and queues, **thus, it can implement all their respective use cases while offering greater flexibility**.
The double-ended queue combines the logic of both stacks and queues, **thus, it can implement all their respective use cases while offering greater flexibility**.
As shown in the figure below, we call the front of the queue the "head" and the back the "tail." The operation of adding elements to the rear of the queue is termed "enqueue," and the operation of removing elements from the front is termed "dequeue."
As shown in the figure below, we call the front of the queue the "head" and the back the "tail." The operation of adding elements to the rear of the queue is termed "enqueue," and the operation of removing elements from the front is termed "dequeue."
The common operations on a queue are shown in the table below. Note that method names may vary across different programming languages. Here, we use the same naming convention as that used for stacks.
The common operations on a queue are shown in the table below. Note that method names may vary across different programming languages. Here, we use the same naming convention as that used for stacks.
<palign="center"> Table <id> Efficiency of Queue Operations </p>
<palign="center"> Table <id> Efficiency of queue operations </p>
To implement a queue, we need a data structure that allows adding elements at one end and removing them at the other. Both linked lists and arrays meet this requirement.
To implement a queue, we need a data structure that allows adding elements at one end and removing them at the other. Both linked lists and arrays meet this requirement.
### Implementation Based on a Linked List
### Implementation based on a linked list
As shown in the figure below, we can consider the "head node" and "tail node" of a linked list as the "front" and "rear" of the queue, respectively. It is stipulated that nodes can only be added at the rear and removed at the front.
As shown in the figure below, we can consider the "head node" and "tail node" of a linked list as the "front" and "rear" of the queue, respectively. It is stipulated that nodes can only be added at the rear and removed at the front.
@ -341,7 +341,7 @@ Below is the code for implementing a queue using a linked list:
Deleting the first element in an array has a time complexity of $O(n)$, which would make the dequeue operation inefficient. However, this problem can be cleverly avoided as follows.
Deleting the first element in an array has a time complexity of $O(n)$, which would make the dequeue operation inefficient. However, this problem can be cleverly avoided as follows.
@ -375,7 +375,7 @@ The above implementation of the queue still has its limitations: its length is f
The comparison of the two implementations is consistent with that of the stack and is not repeated here.
The comparison of the two implementations is consistent with that of the stack and is not repeated here.
## Typical Applications of Queue
## Typical applications of queue
- **Amazon Orders**. After shoppers place orders, these orders join a queue, and the system processes them in order. During events like Singles' Day, a massive number of orders are generated in a short time, making high concurrency a key challenge for engineers.
- **Amazon orders**: After shoppers place orders, these orders join a queue, and the system processes them in order. During events like Singles' Day, a massive number of orders are generated in a short time, making high concurrency a key challenge for engineers.
- **Various To-Do Lists**. Any scenario requiring a "first-come, first-served" functionality, such as a printer's task queue or a restaurant's food delivery queue, can effectively maintain the order of processing with a queue.
- **Various to-do lists**: Any scenario requiring a "first-come, first-served" functionality, such as a printer's task queue or a restaurant's food delivery queue, can effectively maintain the order of processing with a queue.
@ -6,13 +6,13 @@ We can compare a stack to a pile of plates on a table. To access the bottom plat
As shown in the figure below, we refer to the top of the pile of elements as the "top of the stack" and the bottom as the "bottom of the stack." The operation of adding elements to the top of the stack is called "push," and the operation of removing the top element is called "pop."
As shown in the figure below, we refer to the top of the pile of elements as the "top of the stack" and the bottom as the "bottom of the stack." The operation of adding elements to the top of the stack is called "push," and the operation of removing the top element is called "pop."
The common operations on a stack are shown in the table below. The specific method names depend on the programming language used. Here, we use `push()`, `pop()`, and `peek()` as examples.
The common operations on a stack are shown in the table below. The specific method names depend on the programming language used. Here, we use `push()`, `pop()`, and `peek()` as examples.
<palign="center"> Table <id> Efficiency of Stack Operations </p>
<palign="center"> Table <id> Efficiency of stack operations </p>
To gain a deeper understanding of how a stack operates, let's try implementing a stack class ourselves.
To gain a deeper understanding of how a stack operates, let's try implementing a stack class ourselves.
A stack follows the principle of Last-In-First-Out, which means we can only add or remove elements at the top of the stack. However, both arrays and linked lists allow adding and removing elements at any position, **therefore a stack can be seen as a restricted array or linked list**. In other words, we can "shield" certain irrelevant operations of an array or linked list, aligning their external behavior with the characteristics of a stack.
A stack follows the principle of Last-In-First-Out, which means we can only add or remove elements at the top of the stack. However, both arrays and linked lists allow adding and removing elements at any position, **therefore a stack can be seen as a restricted array or linked list**. In other words, we can "shield" certain irrelevant operations of an array or linked list, aligning their external behavior with the characteristics of a stack.
### Implementation Based on Linked List
### Implementation based on a linked list
When implementing a stack using a linked list, we can consider the head node of the list as the top of the stack and the tail node as the bottom of the stack.
When implementing a stack using a linked list, we can consider the head node of the list as the top of the stack and the tail node as the bottom of the stack.
@ -339,7 +339,7 @@ Below is an example code for implementing a stack based on a linked list:
When implementing a stack using an array, we can consider the end of the array as the top of the stack. As shown in the figure below, push and pop operations correspond to adding and removing elements at the end of the array, respectively, both with a time complexity of $O(1)$.
When implementing a stack using an array, we can consider the end of the array as the top of the stack. As shown in the figure below, push and pop operations correspond to adding and removing elements at the end of the array, respectively, both with a time complexity of $O(1)$.
@ -358,7 +358,7 @@ Since the elements to be pushed onto the stack may continuously increase, we can
[file]{array_stack}-[class]{array_stack}-[func]{}
[file]{array_stack}-[class]{array_stack}-[func]{}
```
```
## Comparison of the Two Implementations
## Comparison of the two implementations
**Supported Operations**
**Supported Operations**
@ -383,7 +383,7 @@ However, since linked list nodes require extra space for storing pointers, **the
In summary, we cannot simply determine which implementation is more memory-efficient. It requires analysis based on specific circumstances.
In summary, we cannot simply determine which implementation is more memory-efficient. It requires analysis based on specific circumstances.
## Typical Applications of Stack
## Typical applications of stack
- **Back and forward in browsers, undo and redo in software**. Every time we open a new webpage, the browser pushes the previous page onto the stack, allowing us to go back to the previous page through the back operation, which is essentially a pop operation. To support both back and forward, two stacks are needed to work together.
- **Back and forward in browsers, undo and redo in software**. Every time we open a new webpage, the browser pushes the previous page onto the stack, allowing us to go back to the previous page through the back operation, which is essentially a pop operation. To support both back and forward, two stacks are needed to work together.
- **Memory management in programs**. Each time a function is called, the system adds a stack frame at the top of the stack to record the function's context information. In recursive functions, the downward recursion phase keeps pushing onto the stack, while the upward backtracking phase keeps popping from the stack.
- **Memory management in programs**. Each time a function is called, the system adds a stack frame at the top of the stack to record the function's context information. In recursive functions, the downward recursion phase keeps pushing onto the stack, while the upward backtracking phase keeps popping from the stack.
- Stack is a data structure that follows the Last-In-First-Out (LIFO) principle and can be implemented using arrays or linked lists.
- Stack is a data structure that follows the Last-In-First-Out (LIFO) principle and can be implemented using arrays or linked lists.
- In terms of time efficiency, the array implementation of the stack has a higher average efficiency. However, during expansion, the time complexity for a single push operation can degrade to $O(n)$. In contrast, the linked list implementation of a stack offers more stable efficiency.
- In terms of time efficiency, the array implementation of the stack has a higher average efficiency. However, during expansion, the time complexity for a single push operation can degrade to $O(n)$. In contrast, the linked list implementation of a stack offers more stable efficiency.