Add the initial translation for the Heap chapter (#1210)
After Width: | Height: | Size: 16 KiB |
@ -0,0 +1,74 @@
|
|||||||
|
# Heap construction operation
|
||||||
|
|
||||||
|
In some cases, we want to build a heap using all elements of a list, and this process is known as "heap construction operation."
|
||||||
|
|
||||||
|
## Implementing with heap insertion operation
|
||||||
|
|
||||||
|
First, we create an empty heap and then iterate through the list, performing the "heap insertion operation" on each element in turn. This means adding the element to the end of the heap and then "heapifying" it from bottom to top.
|
||||||
|
|
||||||
|
Each time an element is added to the heap, the length of the heap increases by one. Since nodes are added to the binary tree from top to bottom, the heap is constructed "from top to bottom."
|
||||||
|
|
||||||
|
Let the number of elements be $n$, and each element's insertion operation takes $O(\log{n})$ time, thus the time complexity of this heap construction method is $O(n \log n)$.
|
||||||
|
|
||||||
|
## Implementing by heapifying through traversal
|
||||||
|
|
||||||
|
In fact, we can implement a more efficient method of heap construction in two steps.
|
||||||
|
|
||||||
|
1. Add all elements of the list as they are into the heap, at this point the properties of the heap are not yet satisfied.
|
||||||
|
2. Traverse the heap in reverse order (reverse of level-order traversal), and perform "top to bottom heapify" on each non-leaf node.
|
||||||
|
|
||||||
|
**After heapifying a node, the subtree with that node as the root becomes a valid sub-heap**. Since the traversal is in reverse order, the heap is built "from bottom to top."
|
||||||
|
|
||||||
|
The reason for choosing reverse traversal is that it ensures the subtree below the current node is already a valid sub-heap, making the heapification of the current node effective.
|
||||||
|
|
||||||
|
It's worth mentioning that **since leaf nodes have no children, they naturally form valid sub-heaps and do not need to be heapified**. As shown in the following code, the last non-leaf node is the parent of the last node; we start from it and traverse in reverse order to perform heapification:
|
||||||
|
|
||||||
|
```src
|
||||||
|
[file]{my_heap}-[class]{max_heap}-[func]{__init__}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Complexity analysis
|
||||||
|
|
||||||
|
Next, let's attempt to calculate the time complexity of this second method of heap construction.
|
||||||
|
|
||||||
|
- Assuming the number of nodes in the complete binary tree is $n$, then the number of leaf nodes is $(n + 1) / 2$, where $/$ is integer division. Therefore, the number of nodes that need to be heapified is $(n - 1) / 2$.
|
||||||
|
- In the process of "top to bottom heapification," each node is heapified to the leaf nodes at most, so the maximum number of iterations is the height of the binary tree $\log n$.
|
||||||
|
|
||||||
|
Multiplying the two, we get the time complexity of the heap construction process as $O(n \log n)$. **But this estimate is not accurate, because it does not take into account the nature of the binary tree having far more nodes at the lower levels than at the top.**
|
||||||
|
|
||||||
|
Let's perform a more accurate calculation. To simplify the calculation, assume a "perfect binary tree" with $n$ nodes and height $h$; this assumption does not affect the correctness of the result.
|
||||||
|
|
||||||
|
![Node counts at each level of a perfect binary tree](build_heap.assets/heapify_operations_count.png)
|
||||||
|
|
||||||
|
As shown in the figure above, the maximum number of iterations for a node "to be heapified from top to bottom" is equal to the distance from that node to the leaf nodes, which is precisely "node height." Therefore, we can sum the "number of nodes $\times$ node height" at each level, **to get the total number of heapification iterations for all nodes**.
|
||||||
|
|
||||||
|
$$
|
||||||
|
T(h) = 2^0h + 2^1(h-1) + 2^2(h-2) + \dots + 2^{(h-1)}\times1
|
||||||
|
$$
|
||||||
|
|
||||||
|
To simplify the above equation, we need to use knowledge of sequences from high school, first multiply $T(h)$ by $2$, to get:
|
||||||
|
|
||||||
|
$$
|
||||||
|
\begin{aligned}
|
||||||
|
T(h) & = 2^0h + 2^1(h-1) + 2^2(h-2) + \dots + 2^{h-1}\times1 \newline
|
||||||
|
2T(h) & = 2^1h + 2^2(h-1) + 2^3(h-2) + \dots + 2^h\times1 \newline
|
||||||
|
\end{aligned}
|
||||||
|
$$
|
||||||
|
|
||||||
|
By subtracting $T(h)$ from $2T(h)$ using the method of displacement, we get:
|
||||||
|
|
||||||
|
$$
|
||||||
|
2T(h) - T(h) = T(h) = -2^0h + 2^1 + 2^2 + \dots + 2^{h-1} + 2^h
|
||||||
|
$$
|
||||||
|
|
||||||
|
Observing the equation, $T(h)$ is an geometric series, which can be directly calculated using the sum formula, resulting in a time complexity of:
|
||||||
|
|
||||||
|
$$
|
||||||
|
\begin{aligned}
|
||||||
|
T(h) & = 2 \frac{1 - 2^h}{1 - 2} - h \newline
|
||||||
|
& = 2^{h+1} - h - 2 \newline
|
||||||
|
& = O(2^h)
|
||||||
|
\end{aligned}
|
||||||
|
$$
|
||||||
|
|
||||||
|
Further, a perfect binary tree with height $h$ has $n = 2^{h+1} - 1$ nodes, thus the complexity is $O(2^h) = O(n)$. This calculation shows that **the time complexity of inputting a list and constructing a heap is $O(n)$, which is very efficient**.
|
After Width: | Height: | Size: 20 KiB |
After Width: | Height: | Size: 25 KiB |
After Width: | Height: | Size: 20 KiB |
After Width: | Height: | Size: 19 KiB |
After Width: | Height: | Size: 27 KiB |
After Width: | Height: | Size: 28 KiB |
After Width: | Height: | Size: 29 KiB |
After Width: | Height: | Size: 30 KiB |
After Width: | Height: | Size: 31 KiB |
After Width: | Height: | Size: 33 KiB |
After Width: | Height: | Size: 17 KiB |
After Width: | Height: | Size: 18 KiB |
After Width: | Height: | Size: 24 KiB |
After Width: | Height: | Size: 26 KiB |
After Width: | Height: | Size: 26 KiB |
After Width: | Height: | Size: 27 KiB |
After Width: | Height: | Size: 26 KiB |
After Width: | Height: | Size: 27 KiB |
After Width: | Height: | Size: 23 KiB |
After Width: | Height: | Size: 22 KiB |
After Width: | Height: | Size: 33 KiB |
@ -0,0 +1,9 @@
|
|||||||
|
# Heap
|
||||||
|
|
||||||
|
![Heap](../assets/covers/chapter_heap.jpg)
|
||||||
|
|
||||||
|
!!! abstract
|
||||||
|
|
||||||
|
The heap is like mountain peaks, stacked and undulating, each with its unique shape.
|
||||||
|
|
||||||
|
Among these peaks, the highest one always catches the eye first.
|
@ -0,0 +1,17 @@
|
|||||||
|
# Summary
|
||||||
|
|
||||||
|
### Key review
|
||||||
|
|
||||||
|
- A heap is a complete binary tree, which can be divided into a max heap and a min heap based on its property. The top element of a max (min) heap is the largest (smallest).
|
||||||
|
- A priority queue is defined as a queue with dequeue priority, usually implemented using a heap.
|
||||||
|
- Common operations of a heap and their corresponding time complexities include: element insertion into the heap $O(\log n)$, removing the top element from the heap $O(\log n)$, and accessing the top element of the heap $O(1)$.
|
||||||
|
- A complete binary tree is well-suited to be represented by an array, thus heaps are commonly stored using arrays.
|
||||||
|
- Heapify operations are used to maintain the properties of the heap and are used in both heap insertion and removal operations.
|
||||||
|
- The time complexity of inserting $n$ elements into a heap and building the heap can be optimized to $O(n)$, which is highly efficient.
|
||||||
|
- Top-k is a classic algorithm problem that can be efficiently solved using the heap data structure, with a time complexity of $O(n \log k)$.
|
||||||
|
|
||||||
|
### Q & A
|
||||||
|
|
||||||
|
**Q**: Is the "heap" in data structures the same concept as the "heap" in memory management?
|
||||||
|
|
||||||
|
The two are not the same concept, even though they are both referred to as "heap". The heap in computer system memory is part of dynamic memory allocation, where the program can use it to store data during execution. The program can request a certain amount of heap memory to store complex structures like objects and arrays. When these data are no longer needed, the program needs to release this memory to prevent memory leaks. Compared to stack memory, the management and usage of heap memory need to be more cautious, as improper use may lead to memory leaks and dangling pointers.
|
After Width: | Height: | Size: 14 KiB |
After Width: | Height: | Size: 15 KiB |
After Width: | Height: | Size: 16 KiB |
After Width: | Height: | Size: 19 KiB |
After Width: | Height: | Size: 20 KiB |
After Width: | Height: | Size: 23 KiB |
After Width: | Height: | Size: 20 KiB |
After Width: | Height: | Size: 20 KiB |
After Width: | Height: | Size: 16 KiB |
After Width: | Height: | Size: 9.9 KiB |
After Width: | Height: | Size: 17 KiB |
@ -0,0 +1,73 @@
|
|||||||
|
# Top-k problem
|
||||||
|
|
||||||
|
!!! question
|
||||||
|
|
||||||
|
Given an unordered array `nums` of length $n$, return the largest $k$ elements in the array.
|
||||||
|
|
||||||
|
For this problem, we will first introduce two straightforward solutions, then explain a more efficient heap-based method.
|
||||||
|
|
||||||
|
## Method 1: Iterative selection
|
||||||
|
|
||||||
|
We can perform $k$ rounds of iterations as shown in the figure below, extracting the $1^{st}$, $2^{nd}$, $\dots$, $k^{th}$ largest elements in each round, with a time complexity of $O(nk)$.
|
||||||
|
|
||||||
|
This method is only suitable when $k \ll n$, as the time complexity approaches $O(n^2)$ when $k$ is close to $n$, which is very time-consuming.
|
||||||
|
|
||||||
|
![Iteratively finding the largest k elements](top_k.assets/top_k_traversal.png)
|
||||||
|
|
||||||
|
!!! tip
|
||||||
|
|
||||||
|
When $k = n$, we can obtain a complete ordered sequence, which is equivalent to the "selection sort" algorithm.
|
||||||
|
|
||||||
|
## Method 2: Sorting
|
||||||
|
|
||||||
|
As shown in the figure below, we can first sort the array `nums` and then return the last $k$ elements, with a time complexity of $O(n \log n)$.
|
||||||
|
|
||||||
|
Clearly, this method "overachieves" the task, as we only need to find the largest $k$ elements, without the need to sort the other elements.
|
||||||
|
|
||||||
|
![Sorting to find the largest k elements](top_k.assets/top_k_sorting.png)
|
||||||
|
|
||||||
|
## Method 3: Heap
|
||||||
|
|
||||||
|
We can solve the Top-k problem more efficiently based on heaps, as shown in the following process.
|
||||||
|
|
||||||
|
1. Initialize a min heap, where the top element is the smallest.
|
||||||
|
2. First, insert the first $k$ elements of the array into the heap.
|
||||||
|
3. Starting from the $k + 1^{th}$ element, if the current element is greater than the top element of the heap, remove the top element of the heap and insert the current element into the heap.
|
||||||
|
4. After completing the traversal, the heap contains the largest $k$ elements.
|
||||||
|
|
||||||
|
=== "<1>"
|
||||||
|
![Find the largest k elements based on heap](top_k.assets/top_k_heap_step1.png)
|
||||||
|
|
||||||
|
=== "<2>"
|
||||||
|
![top_k_heap_step2](top_k.assets/top_k_heap_step2.png)
|
||||||
|
|
||||||
|
=== "<3>"
|
||||||
|
![top_k_heap_step3](top_k.assets/top_k_heap_step3.png)
|
||||||
|
|
||||||
|
=== "<4>"
|
||||||
|
![top_k_heap_step4](top_k.assets/top_k_heap_step4.png)
|
||||||
|
|
||||||
|
=== "<5>"
|
||||||
|
![top_k_heap_step5](top_k.assets/top_k_heap_step5.png)
|
||||||
|
|
||||||
|
=== "<6>"
|
||||||
|
![top_k_heap_step6](top_k.assets/top_k_heap_step6.png)
|
||||||
|
|
||||||
|
=== "<7>"
|
||||||
|
![top_k_heap_step7](top_k.assets/top_k_heap_step7.png)
|
||||||
|
|
||||||
|
=== "<8>"
|
||||||
|
![top_k_heap_step8](top_k.assets/top_k_heap_step8.png)
|
||||||
|
|
||||||
|
=== "<9>"
|
||||||
|
![top_k_heap_step9](top_k.assets/top_k_heap_step9.png)
|
||||||
|
|
||||||
|
Example code is as follows:
|
||||||
|
|
||||||
|
```src
|
||||||
|
[file]{top_k}-[class]{}-[func]{top_k_heap}
|
||||||
|
```
|
||||||
|
|
||||||
|
A total of $n$ rounds of heap insertions and deletions are performed, with the maximum heap size being $k$, hence the time complexity is $O(n \log k)$. This method is very efficient; when $k$ is small, the time complexity tends towards $O(n)$; when $k$ is large, the time complexity will not exceed $O(n \log n)$.
|
||||||
|
|
||||||
|
Additionally, this method is suitable for scenarios with dynamic data streams. By continuously adding data, we can maintain the elements within the heap, thereby achieving dynamic updates of the largest $k$ elements.
|