Add the initial translation for the Heap chapter (#1210)

pull/1211/head^2
Yudong Jin 8 months ago committed by GitHub
parent 3b797d56af
commit 04ebee0308
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

@ -0,0 +1,74 @@
# Heap construction operation
In some cases, we want to build a heap using all elements of a list, and this process is known as "heap construction operation."
## Implementing with heap insertion operation
First, we create an empty heap and then iterate through the list, performing the "heap insertion operation" on each element in turn. This means adding the element to the end of the heap and then "heapifying" it from bottom to top.
Each time an element is added to the heap, the length of the heap increases by one. Since nodes are added to the binary tree from top to bottom, the heap is constructed "from top to bottom."
Let the number of elements be $n$, and each element's insertion operation takes $O(\log{n})$ time, thus the time complexity of this heap construction method is $O(n \log n)$.
## Implementing by heapifying through traversal
In fact, we can implement a more efficient method of heap construction in two steps.
1. Add all elements of the list as they are into the heap, at this point the properties of the heap are not yet satisfied.
2. Traverse the heap in reverse order (reverse of level-order traversal), and perform "top to bottom heapify" on each non-leaf node.
**After heapifying a node, the subtree with that node as the root becomes a valid sub-heap**. Since the traversal is in reverse order, the heap is built "from bottom to top."
The reason for choosing reverse traversal is that it ensures the subtree below the current node is already a valid sub-heap, making the heapification of the current node effective.
It's worth mentioning that **since leaf nodes have no children, they naturally form valid sub-heaps and do not need to be heapified**. As shown in the following code, the last non-leaf node is the parent of the last node; we start from it and traverse in reverse order to perform heapification:
```src
[file]{my_heap}-[class]{max_heap}-[func]{__init__}
```
## Complexity analysis
Next, let's attempt to calculate the time complexity of this second method of heap construction.
- Assuming the number of nodes in the complete binary tree is $n$, then the number of leaf nodes is $(n + 1) / 2$, where $/$ is integer division. Therefore, the number of nodes that need to be heapified is $(n - 1) / 2$.
- In the process of "top to bottom heapification," each node is heapified to the leaf nodes at most, so the maximum number of iterations is the height of the binary tree $\log n$.
Multiplying the two, we get the time complexity of the heap construction process as $O(n \log n)$. **But this estimate is not accurate, because it does not take into account the nature of the binary tree having far more nodes at the lower levels than at the top.**
Let's perform a more accurate calculation. To simplify the calculation, assume a "perfect binary tree" with $n$ nodes and height $h$; this assumption does not affect the correctness of the result.
![Node counts at each level of a perfect binary tree](build_heap.assets/heapify_operations_count.png)
As shown in the figure above, the maximum number of iterations for a node "to be heapified from top to bottom" is equal to the distance from that node to the leaf nodes, which is precisely "node height." Therefore, we can sum the "number of nodes $\times$ node height" at each level, **to get the total number of heapification iterations for all nodes**.
$$
T(h) = 2^0h + 2^1(h-1) + 2^2(h-2) + \dots + 2^{(h-1)}\times1
$$
To simplify the above equation, we need to use knowledge of sequences from high school, first multiply $T(h)$ by $2$, to get:
$$
\begin{aligned}
T(h) & = 2^0h + 2^1(h-1) + 2^2(h-2) + \dots + 2^{h-1}\times1 \newline
2T(h) & = 2^1h + 2^2(h-1) + 2^3(h-2) + \dots + 2^h\times1 \newline
\end{aligned}
$$
By subtracting $T(h)$ from $2T(h)$ using the method of displacement, we get:
$$
2T(h) - T(h) = T(h) = -2^0h + 2^1 + 2^2 + \dots + 2^{h-1} + 2^h
$$
Observing the equation, $T(h)$ is an geometric series, which can be directly calculated using the sum formula, resulting in a time complexity of:
$$
\begin{aligned}
T(h) & = 2 \frac{1 - 2^h}{1 - 2} - h \newline
& = 2^{h+1} - h - 2 \newline
& = O(2^h)
\end{aligned}
$$
Further, a perfect binary tree with height $h$ has $n = 2^{h+1} - 1$ nodes, thus the complexity is $O(2^h) = O(n)$. This calculation shows that **the time complexity of inputting a list and constructing a heap is $O(n)$, which is very efficient**.

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

@ -0,0 +1,538 @@
# Heap
A "heap" is a complete binary tree that satisfies specific conditions and can be mainly divided into two types, as shown in the figure below.
- "Min heap": The value of any node $\leq$ the values of its child nodes.
- "Max heap": The value of any node $\geq$ the values of its child nodes.
![Min heap and max heap](heap.assets/min_heap_and_max_heap.png)
As a special case of a complete binary tree, heaps have the following characteristics:
- The bottom layer nodes are filled from left to right, and nodes in other layers are fully filled.
- The root node of the binary tree is called the "heap top," and the bottom-rightmost node is called the "heap bottom."
- For max heaps (min heaps), the value of the heap top element (root node) is the largest (smallest).
## Common operations on heaps
It should be noted that many programming languages provide a "priority queue," which is an abstract data structure defined as a queue with priority sorting.
In fact, **heaps are often used to implement priority queues, with max heaps equivalent to priority queues where elements are dequeued in descending order**. From a usage perspective, we can consider "priority queue" and "heap" as equivalent data structures. Therefore, this book does not make a special distinction between the two, uniformly referring to them as "heap."
Common operations on heaps are shown in the table below, and the method names depend on the programming language.
<p align="center"> Table <id> &nbsp; Efficiency of Heap Operations </p>
| Method name | Description | Time complexity |
| ----------- | ------------------------------------------------------------ | --------------- |
| `push()` | Add an element to the heap | $O(\log n)$ |
| `pop()` | Remove the top element from the heap | $O(\log n)$ |
| `peek()` | Access the top element (for max/min heap, the max/min value) | $O(1)$ |
| `size()` | Get the number of elements in the heap | $O(1)$ |
| `isEmpty()` | Check if the heap is empty | $O(1)$ |
In practice, we can directly use the heap class (or priority queue class) provided by programming languages.
Similar to sorting algorithms where we have "ascending order" and "descending order," we can switch between "min heap" and "max heap" by setting a `flag` or modifying the `Comparator`. The code is as follows:
=== "Python"
```python title="heap.py"
# 初始化小顶堆
min_heap, flag = [], 1
# 初始化大顶堆
max_heap, flag = [], -1
# Python 的 heapq 模块默认实现小顶堆
# 考虑将“元素取负”后再入堆,这样就可以将大小关系颠倒,从而实现大顶堆
# 在本示例中flag = 1 时对应小顶堆flag = -1 时对应大顶堆
# 元素入堆
heapq.heappush(max_heap, flag * 1)
heapq.heappush(max_heap, flag * 3)
heapq.heappush(max_heap, flag * 2)
heapq.heappush(max_heap, flag * 5)
heapq.heappush(max_heap, flag * 4)
# 获取堆顶元素
peek: int = flag * max_heap[0] # 5
# 堆顶元素出堆
# 出堆元素会形成一个从大到小的序列
val = flag * heapq.heappop(max_heap) # 5
val = flag * heapq.heappop(max_heap) # 4
val = flag * heapq.heappop(max_heap) # 3
val = flag * heapq.heappop(max_heap) # 2
val = flag * heapq.heappop(max_heap) # 1
# 获取堆大小
size: int = len(max_heap)
# 判断堆是否为空
is_empty: bool = not max_heap
# 输入列表并建堆
min_heap: list[int] = [1, 3, 2, 5, 4]
heapq.heapify(min_heap)
```
=== "C++"
```cpp title="heap.cpp"
/* 初始化堆 */
// 初始化小顶堆
priority_queue<int, vector<int>, greater<int>> minHeap;
// 初始化大顶堆
priority_queue<int, vector<int>, less<int>> maxHeap;
/* 元素入堆 */
maxHeap.push(1);
maxHeap.push(3);
maxHeap.push(2);
maxHeap.push(5);
maxHeap.push(4);
/* 获取堆顶元素 */
int peek = maxHeap.top(); // 5
/* 堆顶元素出堆 */
// 出堆元素会形成一个从大到小的序列
maxHeap.pop(); // 5
maxHeap.pop(); // 4
maxHeap.pop(); // 3
maxHeap.pop(); // 2
maxHeap.pop(); // 1
/* 获取堆大小 */
int size = maxHeap.size();
/* 判断堆是否为空 */
bool isEmpty = maxHeap.empty();
/* 输入列表并建堆 */
vector<int> input{1, 3, 2, 5, 4};
priority_queue<int, vector<int>, greater<int>> minHeap(input.begin(), input.end());
```
=== "Java"
```java title="heap.java"
/* 初始化堆 */
// 初始化小顶堆
Queue<Integer> minHeap = new PriorityQueue<>();
// 初始化大顶堆(使用 lambda 表达式修改 Comparator 即可)
Queue<Integer> maxHeap = new PriorityQueue<>((a, b) -> b - a);
/* 元素入堆 */
maxHeap.offer(1);
maxHeap.offer(3);
maxHeap.offer(2);
maxHeap.offer(5);
maxHeap.offer(4);
/* 获取堆顶元素 */
int peek = maxHeap.peek(); // 5
/* 堆顶元素出堆 */
// 出堆元素会形成一个从大到小的序列
peek = maxHeap.poll(); // 5
peek = maxHeap.poll(); // 4
peek = maxHeap.poll(); // 3
peek = maxHeap.poll(); // 2
peek = maxHeap.poll(); // 1
/* 获取堆大小 */
int size = maxHeap.size();
/* 判断堆是否为空 */
boolean isEmpty = maxHeap.isEmpty();
/* 输入列表并建堆 */
minHeap = new PriorityQueue<>(Arrays.asList(1, 3, 2, 5, 4));
```
=== "C#"
```csharp title="heap.cs"
/* 初始化堆 */
// 初始化小顶堆
PriorityQueue<int, int> minHeap = new();
// 初始化大顶堆(使用 lambda 表达式修改 Comparator 即可)
PriorityQueue<int, int> maxHeap = new(Comparer<int>.Create((x, y) => y - x));
/* 元素入堆 */
maxHeap.Enqueue(1, 1);
maxHeap.Enqueue(3, 3);
maxHeap.Enqueue(2, 2);
maxHeap.Enqueue(5, 5);
maxHeap.Enqueue(4, 4);
/* 获取堆顶元素 */
int peek = maxHeap.Peek();//5
/* 堆顶元素出堆 */
// 出堆元素会形成一个从大到小的序列
peek = maxHeap.Dequeue(); // 5
peek = maxHeap.Dequeue(); // 4
peek = maxHeap.Dequeue(); // 3
peek = maxHeap.Dequeue(); // 2
peek = maxHeap.Dequeue(); // 1
/* 获取堆大小 */
int size = maxHeap.Count;
/* 判断堆是否为空 */
bool isEmpty = maxHeap.Count == 0;
/* 输入列表并建堆 */
minHeap = new PriorityQueue<int, int>([(1, 1), (3, 3), (2, 2), (5, 5), (4, 4)]);
```
=== "Go"
```go title="heap.go"
// Go 语言中可以通过实现 heap.Interface 来构建整数大顶堆
// 实现 heap.Interface 需要同时实现 sort.Interface
type intHeap []any
// Push heap.Interface 的方法,实现推入元素到堆
func (h *intHeap) Push(x any) {
// Push 和 Pop 使用 pointer receiver 作为参数
// 因为它们不仅会对切片的内容进行调整,还会修改切片的长度。
*h = append(*h, x.(int))
}
// Pop heap.Interface 的方法,实现弹出堆顶元素
func (h *intHeap) Pop() any {
// 待出堆元素存放在最后
last := (*h)[len(*h)-1]
*h = (*h)[:len(*h)-1]
return last
}
// Len sort.Interface 的方法
func (h *intHeap) Len() int {
return len(*h)
}
// Less sort.Interface 的方法
func (h *intHeap) Less(i, j int) bool {
// 如果实现小顶堆,则需要调整为小于号
return (*h)[i].(int) > (*h)[j].(int)
}
// Swap sort.Interface 的方法
func (h *intHeap) Swap(i, j int) {
(*h)[i], (*h)[j] = (*h)[j], (*h)[i]
}
// Top 获取堆顶元素
func (h *intHeap) Top() any {
return (*h)[0]
}
/* Driver Code */
func TestHeap(t *testing.T) {
/* 初始化堆 */
// 初始化大顶堆
maxHeap := &intHeap{}
heap.Init(maxHeap)
/* 元素入堆 */
// 调用 heap.Interface 的方法,来添加元素
heap.Push(maxHeap, 1)
heap.Push(maxHeap, 3)
heap.Push(maxHeap, 2)
heap.Push(maxHeap, 4)
heap.Push(maxHeap, 5)
/* 获取堆顶元素 */
top := maxHeap.Top()
fmt.Printf("堆顶元素为 %d\n", top)
/* 堆顶元素出堆 */
// 调用 heap.Interface 的方法,来移除元素
heap.Pop(maxHeap) // 5
heap.Pop(maxHeap) // 4
heap.Pop(maxHeap) // 3
heap.Pop(maxHeap) // 2
heap.Pop(maxHeap) // 1
/* 获取堆大小 */
size := len(*maxHeap)
fmt.Printf("堆元素数量为 %d\n", size)
/* 判断堆是否为空 */
isEmpty := len(*maxHeap) == 0
fmt.Printf("堆是否为空 %t\n", isEmpty)
}
```
=== "Swift"
```swift title="heap.swift"
/* 初始化堆 */
// Swift 的 Heap 类型同时支持最大堆和最小堆,且需要引入 swift-collections
var heap = Heap<Int>()
/* 元素入堆 */
heap.insert(1)
heap.insert(3)
heap.insert(2)
heap.insert(5)
heap.insert(4)
/* 获取堆顶元素 */
var peek = heap.max()!
/* 堆顶元素出堆 */
peek = heap.removeMax() // 5
peek = heap.removeMax() // 4
peek = heap.removeMax() // 3
peek = heap.removeMax() // 2
peek = heap.removeMax() // 1
/* 获取堆大小 */
let size = heap.count
/* 判断堆是否为空 */
let isEmpty = heap.isEmpty
/* 输入列表并建堆 */
let heap2 = Heap([1, 3, 2, 5, 4])
```
=== "JS"
```javascript title="heap.js"
// JavaScript 未提供内置 Heap 类
```
=== "TS"
```typescript title="heap.ts"
// TypeScript 未提供内置 Heap 类
```
=== "Dart"
```dart title="heap.dart"
// Dart 未提供内置 Heap 类
```
=== "Rust"
```rust title="heap.rs"
use std::collections::BinaryHeap;
use std::cmp::Reverse;
/* 初始化堆 */
// 初始化小顶堆
let mut min_heap = BinaryHeap::<Reverse<i32>>::new();
// 初始化大顶堆
let mut max_heap = BinaryHeap::new();
/* 元素入堆 */
max_heap.push(1);
max_heap.push(3);
max_heap.push(2);
max_heap.push(5);
max_heap.push(4);
/* 获取堆顶元素 */
let peek = max_heap.peek().unwrap(); // 5
/* 堆顶元素出堆 */
// 出堆元素会形成一个从大到小的序列
let peek = max_heap.pop().unwrap(); // 5
let peek = max_heap.pop().unwrap(); // 4
let peek = max_heap.pop().unwrap(); // 3
let peek = max_heap.pop().unwrap(); // 2
let peek = max_heap.pop().unwrap(); // 1
/* 获取堆大小 */
let size = max_heap.len();
/* 判断堆是否为空 */
let is_empty = max_heap.is_empty();
/* 输入列表并建堆 */
let min_heap = BinaryHeap::from(vec![Reverse(1), Reverse(3), Reverse(2), Reverse(5), Reverse(4)]);
```
=== "C"
```c title="heap.c"
// C 未提供内置 Heap 类
```
=== "Kotlin"
```kotlin title="heap.kt"
/* 初始化堆 */
// 初始化小顶堆
var minHeap = PriorityQueue<Int>()
// 初始化大顶堆(使用 lambda 表达式修改 Comparator 即可)
val maxHeap = PriorityQueue { a: Int, b: Int -> b - a }
/* 元素入堆 */
maxHeap.offer(1)
maxHeap.offer(3)
maxHeap.offer(2)
maxHeap.offer(5)
maxHeap.offer(4)
/* 获取堆顶元素 */
var peek = maxHeap.peek() // 5
/* 堆顶元素出堆 */
// 出堆元素会形成一个从大到小的序列
peek = maxHeap.poll() // 5
peek = maxHeap.poll() // 4
peek = maxHeap.poll() // 3
peek = maxHeap.poll() // 2
peek = maxHeap.poll() // 1
/* 获取堆大小 */
val size = maxHeap.size
/* 判断堆是否为空 */
val isEmpty = maxHeap.isEmpty()
/* 输入列表并建堆 */
minHeap = PriorityQueue(mutableListOf(1, 3, 2, 5, 4))
```
=== "Ruby"
```ruby title="heap.rb"
```
=== "Zig"
```zig title="heap.zig"
```
??? pythontutor "可视化运行"
https://pythontutor.com/render.html#code=import%20heapq%0A%0A%22%22%22Driver%20Code%22%22%22%0Aif%20__name__%20%3D%3D%20%22__main__%22%3A%0A%20%20%20%20%23%20%E5%88%9D%E5%A7%8B%E5%8C%96%E5%B0%8F%E9%A1%B6%E5%A0%86%0A%20%20%20%20min_heap,%20flag%20%3D%20%5B%5D,%201%0A%20%20%20%20%23%20%E5%88%9D%E5%A7%8B%E5%8C%96%E5%A4%A7%E9%A1%B6%E5%A0%86%0A%20%20%20%20max_heap,%20flag%20%3D%20%5B%5D,%20-1%0A%20%20%20%20%0A%20%20%20%20%23%20Python%20%E7%9A%84%20heapq%20%E6%A8%A1%E5%9D%97%E9%BB%98%E8%AE%A4%E5%AE%9E%E7%8E%B0%E5%B0%8F%E9%A1%B6%E5%A0%86%0A%20%20%20%20%23%20%E8%80%83%E8%99%91%E5%B0%86%E2%80%9C%E5%85%83%E7%B4%A0%E5%8F%96%E8%B4%9F%E2%80%9D%E5%90%8E%E5%86%8D%E5%85%A5%E5%A0%86%EF%BC%8C%E8%BF%99%E6%A0%B7%E5%B0%B1%E5%8F%AF%E4%BB%A5%E5%B0%86%E5%A4%A7%E5%B0%8F%E5%85%B3%E7%B3%BB%E9%A2%A0%E5%80%92%EF%BC%8C%E4%BB%8E%E8%80%8C%E5%AE%9E%E7%8E%B0%E5%A4%A7%E9%A1%B6%E5%A0%86%0A%20%20%20%20%23%20%E5%9C%A8%E6%9C%AC%E7%A4%BA%E4%BE%8B%E4%B8%AD%EF%BC%8Cflag%20%3D%201%20%E6%97%B6%E5%AF%B9%E5%BA%94%E5%B0%8F%E9%A1%B6%E5%A0%86%EF%BC%8Cflag%20%3D%20-1%20%E6%97%B6%E5%AF%B9%E5%BA%94%E5%A4%A7%E9%A1%B6%E5%A0%86%0A%20%20%20%20%0A%20%20%20%20%23%20%E5%85%83%E7%B4%A0%E5%85%A5%E5%A0%86%0A%20%20%20%20heapq.heappush%28max_heap,%20flag%20*%201%29%0A%20%20%20%20heapq.heappush%28max_heap,%20flag%20*%203%29%0A%20%20%20%20heapq.heappush%28max_heap,%20flag%20*%202%29%0A%20%20%20%20heapq.heappush%28max_heap,%20flag%20*%205%29%0A%20%20%20%20heapq.heappush%28max_heap,%20flag%20*%204%29%0A%20%20%20%20%0A%20%20%20%20%23%20%E8%8E%B7%E5%8F%96%E5%A0%86%E9%A1%B6%E5%85%83%E7%B4%A0%0A%20%20%20%20peek%20%3D%20flag%20*%20max_heap%5B0%5D%20%23%205%0A%20%20%20%20%0A%20%20%20%20%23%20%E5%A0%86%E9%A1%B6%E5%85%83%E7%B4%A0%E5%87%BA%E5%A0%86%0A%20%20%20%20%23%20%E5%87%BA%E5%A0%86%E5%85%83%E7%B4%A0%E4%BC%9A%E5%BD%A2%E6%88%90%E4%B8%80%E4%B8%AA%E4%BB%8E%E5%A4%A7%E5%88%B0%E5%B0%8F%E7%9A%84%E5%BA%8F%E5%88%97%0A%20%20%20%20val%20%3D%20flag%20*%20heapq.heappop%28max_heap%29%20%23%205%0A%20%20%20%20val%20%3D%20flag%20*%20heapq.heappop%28max_heap%29%20%23%204%0A%20%20%20%20val%20%3D%20flag%20*%20heapq.heappop%28max_heap%29%20%23%203%0A%20%20%20%20val%20%3D%20flag%20*%20heapq.heappop%28max_heap%29%20%23%202%0A%20%20%20%20val%20%3D%20flag%20*%20heapq.heappop%28max_heap%29%20%23%201%0A%20%20%20%20%0A%20%20%20%20%23%20%E8%8E%B7%E5%8F%96%E5%A0%86%E5%A4%A7%E5%B0%8F%0A%20%20%20%20size%20%3D%20len%28max_heap%29%0A%20%20%20%20%0A%20%20%20%20%23%20%E5%88%A4%E6%96%AD%E5%A0%86%E6%98%AF%E5%90%A6%E4%B8%BA%E7%A9%BA%0A%20%20%20%20is_empty%20%3D%20not%20max_heap%0A%20%20%20%20%0A%20%20%20%20%23%20%E8%BE%93%E5%85%A5%E5%88%97%E8%A1%A8%E5%B9%B6%E5%BB%BA%E5%A0%86%0A%20%20%20%20min_heap%20%3D%20%5B1,%203,%202,%205,%204%5D%0A%20%20%20%20heapq.heapify%28min_heap%29&cumulative=false&curInstr=3&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=311&rawInputLstJSON=%5B%5D&textReferences=false
## Implementation of heaps
The following implementation is of a max heap. To convert it into a min heap, simply invert all size logic comparisons (for example, replace $\geq$ with $\leq$). Interested readers are encouraged to implement it on their own.
### Storage and representation of heaps
As mentioned in the "Binary Trees" section, complete binary trees are well-suited for array representation. Since heaps are a type of complete binary tree, **we will use arrays to store heaps**.
When using an array to represent a binary tree, elements represent node values, and indexes represent node positions in the binary tree. **Node pointers are implemented through an index mapping formula**.
As shown in the figure below, given an index $i$, the index of its left child is $2i + 1$, the index of its right child is $2i + 2$, and the index of its parent is $(i - 1) / 2$ (floor division). When the index is out of bounds, it signifies a null node or the node does not exist.
![Representation and storage of heaps](heap.assets/representation_of_heap.png)
We can encapsulate the index mapping formula into functions for convenient later use:
```src
[file]{my_heap}-[class]{max_heap}-[func]{parent}
```
### Accessing the top element of the heap
The top element of the heap is the root node of the binary tree, which is also the first element of the list:
```src
[file]{my_heap}-[class]{max_heap}-[func]{peek}
```
### Inserting an element into the heap
Given an element `val`, we first add it to the bottom of the heap. After addition, since `val` may be larger than other elements in the heap, the heap's integrity might be compromised, **thus it's necessary to repair the path from the inserted node to the root node**. This operation is called "heapifying".
Considering starting from the node inserted, **perform heapify from bottom to top**. As shown in the figure below, we compare the value of the inserted node with its parent node, and if the inserted node is larger, we swap them. Then continue this operation, repairing each node in the heap from bottom to top until passing the root node or encountering a node that does not need to be swapped.
=== "<1>"
![Steps of element insertion into the heap](heap.assets/heap_push_step1.png)
=== "<2>"
![heap_push_step2](heap.assets/heap_push_step2.png)
=== "<3>"
![heap_push_step3](heap.assets/heap_push_step3.png)
=== "<4>"
![heap_push_step4](heap.assets/heap_push_step4.png)
=== "<5>"
![heap_push_step5](heap.assets/heap_push_step5.png)
=== "<6>"
![heap_push_step6](heap.assets/heap_push_step6.png)
=== "<7>"
![heap_push_step7](heap.assets/heap_push_step7.png)
=== "<8>"
![heap_push_step8](heap.assets/heap_push_step8.png)
=== "<9>"
![heap_push_step9](heap.assets/heap_push_step9.png)
Given a total of $n$ nodes, the height of the tree is $O(\log n)$. Hence, the loop iterations for the heapify operation are at most $O(\log n)$, **making the time complexity of the element insertion operation $O(\log n)$**. The code is as shown:
```src
[file]{my_heap}-[class]{max_heap}-[func]{sift_up}
```
### Removing the top element from the heap
The top element of the heap is the root node of the binary tree, that is, the first element of the list. If we directly remove the first element from the list, all node indexes in the binary tree would change, making it difficult to use heapify for repairs subsequently. To minimize changes in element indexes, we use the following steps.
1. Swap the top element with the bottom element of the heap (swap the root node with the rightmost leaf node).
2. After swapping, remove the bottom of the heap from the list (note, since it has been swapped, what is actually being removed is the original top element).
3. Starting from the root node, **perform heapify from top to bottom**.
As shown in the figure below, **the direction of "heapify from top to bottom" is opposite to "heapify from bottom to top"**. We compare the value of the root node with its two children and swap it with the largest child. Then repeat this operation until passing the leaf node or encountering a node that does not need to be swapped.
=== "<1>"
![Steps of removing the top element from the heap](heap.assets/heap_pop_step1.png)
=== "<2>"
![heap_pop_step2](heap.assets/heap_pop_step2.png)
=== "<3>"
![heap_pop_step3](heap.assets/heap_pop_step3.png)
=== "<4>"
![heap_pop_step4](heap.assets/heap_pop_step4.png)
=== "<5>"
![heap_pop_step5](heap.assets/heap_pop_step5.png)
=== "<6>"
![heap_pop_step6](heap.assets/heap_pop_step6.png)
=== "<7>"
![heap_pop_step7](heap.assets/heap_pop_step7.png)
=== "<8>"
![heap_pop_step8](heap.assets/heap_pop_step8.png)
=== "<9>"
![heap_pop_step9](heap.assets/heap_pop_step9.png)
=== "<10>"
![heap_pop_step10](heap.assets/heap_pop_step10.png)
Similar to the element insertion operation, the time complexity of the top element removal operation is also $O(\log n)$. The code is as follows:
```src
[file]{my_heap}-[class]{max_heap}-[func]{sift_down}
```
## Common applications of heaps
- **Priority Queue**: Heaps are often the preferred data structure for implementing priority queues, with both enqueue and dequeue operations having a time complexity of $O(\log n)$, and building a queue having a time complexity of $O(n)$, all of which are very efficient.
- **Heap Sort**: Given a set of data, we can create a heap from them and then continually perform element removal operations to obtain ordered data. However, we usually use a more elegant method to implement heap sort, as detailed in the "Heap Sort" section.
- **Finding the Largest $k$ Elements**: This is a classic algorithm problem and also a typical application, such as selecting the top 10 hot news for Weibo hot search, picking the top 10 selling products, etc.

@ -0,0 +1,9 @@
# Heap
![Heap](../assets/covers/chapter_heap.jpg)
!!! abstract
The heap is like mountain peaks, stacked and undulating, each with its unique shape.
Among these peaks, the highest one always catches the eye first.

@ -0,0 +1,17 @@
# Summary
### Key review
- A heap is a complete binary tree, which can be divided into a max heap and a min heap based on its property. The top element of a max (min) heap is the largest (smallest).
- A priority queue is defined as a queue with dequeue priority, usually implemented using a heap.
- Common operations of a heap and their corresponding time complexities include: element insertion into the heap $O(\log n)$, removing the top element from the heap $O(\log n)$, and accessing the top element of the heap $O(1)$.
- A complete binary tree is well-suited to be represented by an array, thus heaps are commonly stored using arrays.
- Heapify operations are used to maintain the properties of the heap and are used in both heap insertion and removal operations.
- The time complexity of inserting $n$ elements into a heap and building the heap can be optimized to $O(n)$, which is highly efficient.
- Top-k is a classic algorithm problem that can be efficiently solved using the heap data structure, with a time complexity of $O(n \log k)$.
### Q & A
**Q**: Is the "heap" in data structures the same concept as the "heap" in memory management?
The two are not the same concept, even though they are both referred to as "heap". The heap in computer system memory is part of dynamic memory allocation, where the program can use it to store data during execution. The program can request a certain amount of heap memory to store complex structures like objects and arrays. When these data are no longer needed, the program needs to release this memory to prevent memory leaks. Compared to stack memory, the management and usage of heap memory need to be more cautious, as improper use may lead to memory leaks and dangling pointers.

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

@ -0,0 +1,73 @@
# Top-k problem
!!! question
Given an unordered array `nums` of length $n$, return the largest $k$ elements in the array.
For this problem, we will first introduce two straightforward solutions, then explain a more efficient heap-based method.
## Method 1: Iterative selection
We can perform $k$ rounds of iterations as shown in the figure below, extracting the $1^{st}$, $2^{nd}$, $\dots$, $k^{th}$ largest elements in each round, with a time complexity of $O(nk)$.
This method is only suitable when $k \ll n$, as the time complexity approaches $O(n^2)$ when $k$ is close to $n$, which is very time-consuming.
![Iteratively finding the largest k elements](top_k.assets/top_k_traversal.png)
!!! tip
When $k = n$, we can obtain a complete ordered sequence, which is equivalent to the "selection sort" algorithm.
## Method 2: Sorting
As shown in the figure below, we can first sort the array `nums` and then return the last $k$ elements, with a time complexity of $O(n \log n)$.
Clearly, this method "overachieves" the task, as we only need to find the largest $k$ elements, without the need to sort the other elements.
![Sorting to find the largest k elements](top_k.assets/top_k_sorting.png)
## Method 3: Heap
We can solve the Top-k problem more efficiently based on heaps, as shown in the following process.
1. Initialize a min heap, where the top element is the smallest.
2. First, insert the first $k$ elements of the array into the heap.
3. Starting from the $k + 1^{th}$ element, if the current element is greater than the top element of the heap, remove the top element of the heap and insert the current element into the heap.
4. After completing the traversal, the heap contains the largest $k$ elements.
=== "<1>"
![Find the largest k elements based on heap](top_k.assets/top_k_heap_step1.png)
=== "<2>"
![top_k_heap_step2](top_k.assets/top_k_heap_step2.png)
=== "<3>"
![top_k_heap_step3](top_k.assets/top_k_heap_step3.png)
=== "<4>"
![top_k_heap_step4](top_k.assets/top_k_heap_step4.png)
=== "<5>"
![top_k_heap_step5](top_k.assets/top_k_heap_step5.png)
=== "<6>"
![top_k_heap_step6](top_k.assets/top_k_heap_step6.png)
=== "<7>"
![top_k_heap_step7](top_k.assets/top_k_heap_step7.png)
=== "<8>"
![top_k_heap_step8](top_k.assets/top_k_heap_step8.png)
=== "<9>"
![top_k_heap_step9](top_k.assets/top_k_heap_step9.png)
Example code is as follows:
```src
[file]{top_k}-[class]{}-[func]{top_k_heap}
```
A total of $n$ rounds of heap insertions and deletions are performed, with the maximum heap size being $k$, hence the time complexity is $O(n \log k)$. This method is very efficient; when $k$ is small, the time complexity tends towards $O(n)$; when $k$ is large, the time complexity will not exceed $O(n \log n)$.
Additionally, this method is suitable for scenarios with dynamic data streams. By continuously adding data, we can maintain the elements within the heap, thereby achieving dynamic updates of the largest $k$ elements.

@ -97,13 +97,13 @@ nav:
- 7.4 Binary Search Tree: chapter_tree/binary_search_tree.md
- 7.5 AVL Tree *: chapter_tree/avl_tree.md
- 7.6 Summary: chapter_tree/summary.md
# - Chapter 8. Heap:
# # [icon: material/family-tree]
# - chapter_heap/index.md
# - 8.1 Heap: chapter_heap/heap.md
# - 8.2 Building a Heap: chapter_heap/build_heap.md
# - 8.3 Top-k Problem: chapter_heap/top_k.md
# - 8.4 Summary: chapter_heap/summary.md
- Chapter 8. Heap:
# [icon: material/family-tree]
- chapter_heap/index.md
- 8.1 Heap: chapter_heap/heap.md
- 8.2 Building a Heap: chapter_heap/build_heap.md
- 8.3 Top-k Problem: chapter_heap/top_k.md
- 8.4 Summary: chapter_heap/summary.md
# - Chapter 9. Graph:
# # [icon: material/graphql]
# - chapter_graph/index.md

Loading…
Cancel
Save