krahets 10 months ago
parent 84520801e2
commit 22974527a5

@ -4,39 +4,39 @@ comments: true
# 3.4   Character Encoding *
In computers, all data is stored in binary form, and the character `char` is no exception. To represent characters, we need to establish a "character set" that defines a one-to-one correspondence between each character and binary numbers. With a character set, computers can convert binary numbers to characters by looking up a table.
In the computer system, all data is stored in binary form, and characters (represented by char) are no exception. To represent characters, we need to develop a "character set" that defines a one-to-one mapping between each character and binary numbers. With the character set, computers can convert binary numbers to characters by looking up the table.
## 3.4.1   ASCII Character Set
The "ASCII code" is one of the earliest character sets, officially known as the American Standard Code for Information Interchange. It uses 7 binary digits (the lower 7 bits of a byte) to represent a character, allowing for a maximum of 128 different characters. As shown in the Figure 3-6 , ASCII includes uppercase and lowercase English letters, numbers 0 ~ 9, some punctuation marks, and some control characters (such as newline and tab).
The "ASCII code" is one of the earliest character sets, officially known as the American Standard Code for Information Interchange. It uses 7 binary digits (the lower 7 bits of a byte) to represent a character, allowing for a maximum of 128 different characters. As shown in the Figure 3-6 , ASCII includes uppercase and lowercase English letters, numbers 0 ~ 9, various punctuation marks, and certain control characters (such as newline and tab).
![ASCII Code](character_encoding.assets/ascii_table.png){ class="animation-figure" }
<p align="center"> Figure 3-6 &nbsp; ASCII Code </p>
However, **ASCII can only represent English characters**. With the globalization of computers, a character set called "EASCII" was developed to represent more languages. It expands on the 7-bit basis of ASCII to 8 bits, enabling the representation of 256 different characters.
However, **ASCII can only represent English characters**. With the globalization of computers, a character set called "EASCII" was developed to represent more languages. It expands from the 7-bit structure of ASCII to 8 bits, enabling the representation of 256 characters.
Globally, a series of EASCII character sets for different regions emerged. The first 128 characters of these sets are uniformly ASCII, while the remaining 128 characters are defined differently to cater to various language requirements.
Globally, various region-specific EASCII character sets have been introduced. The first 128 characters of these sets are consistent with the ASCII, while the remaining 128 characters are defined differently to accommodate the requirements of different languages.
## 3.4.2 &nbsp; GBK Character Set
Later, it was found that **EASCII still could not meet the character requirements of many languages**. For instance, there are nearly a hundred thousand Chinese characters, with several thousand used in everyday life. In 1980, China's National Standards Bureau released the "GB2312" character set, which included 6763 Chinese characters, essentially meeting the computer processing needs for Chinese.
Later, it was found that **EASCII still could not meet the character requirements of many languages**. For instance, there are nearly a hundred thousand Chinese characters, with several thousand used regularly. In 1980, the Standardization Administration of China released the "GB2312" character set, which included 6763 Chinese characters, essentially fulfilling the computer processing needs for the Chinese language.
However, GB2312 could not handle some rare and traditional characters. The "GBK" character set, an expansion of GB2312, includes a total of 21886 Chinese characters. In the GBK encoding scheme, ASCII characters are represented with one byte, while Chinese characters use two bytes.
However, GB2312 could not handle some rare and traditional characters. The "GBK" character set expands GB2312 and includes 21886 Chinese characters. In the GBK encoding scheme, ASCII characters are represented with one byte, while Chinese characters use two bytes.
## 3.4.3 &nbsp; Unicode Character Set
With the rapid development of computer technology and a plethora of character sets and encoding standards, numerous problems arose. On one hand, these character sets generally only defined characters for specific languages and could not function properly in multilingual environments. On the other hand, the existence of multiple character set standards for the same language caused garbled text when information was exchanged between computers using different encoding standards.
With the rapid evolution of computer technology and a plethora of character sets and encoding standards, numerous problems arose. On the one hand, these character sets generally only defined characters for specific languages and could not function properly in multilingual environments. On the other hand, the existence of multiple character set standards for the same language caused garbled text when information was exchanged between computers using different encoding standards.
Researchers of that era thought: **What if we introduced a comprehensive character set that included all languages and symbols worldwide, wouldn't that solve the problems of cross-language environments and garbled text?** Driven by this idea, the extensive character set, Unicode, was born.
Researchers of that era thought: **What if a comprehensive character set encompassing all global languages and symbols was developed? Wouldn't this resolve the issues associated with cross-linguistic environments and garbled text?** Inspired by this idea, the extensive character set, Unicode, was born.
The Chinese name for "Unicode" is "统一码" (Unified Code), theoretically capable of accommodating over a million characters. It aims to incorporate characters from all over the world into a single set, providing a universal character set for processing and displaying various languages and reducing the issues of garbled text due to different encoding standards.
"Unicode" is referred to as "统一码" (Unified Code) in Chinese, theoretically capable of accommodating over a million characters. It aims to incorporate characters from all over the world into a single set, providing a universal character set for processing and displaying various languages and reducing the issues of garbled text due to different encoding standards.
Since its release in 1991, Unicode has continually expanded to include new languages and characters. As of September 2022, Unicode contains 149,186 characters, including characters, symbols, and even emojis from various languages. In the vast Unicode character set, commonly used characters occupy 2 bytes, while some rare characters take up 3 or even 4 bytes.
Since its release in 1991, Unicode has continually expanded to include new languages and characters. As of September 2022, Unicode contains 149,186 characters, including characters, symbols, and even emojis from various languages. In the vast Unicode character set, commonly used characters occupy 2 bytes, while some rare characters may occupy 3 or even 4 bytes.
Unicode is a universal character set that assigns a number (called a "code point") to each character, **but it does not specify how these character code points should be stored in a computer**. One might ask: When Unicode code points of varying lengths appear in a text, how does the system parse the characters? For example, given a 2-byte code, how does the system determine if it represents a single 2-byte character or two 1-byte characters?
Unicode is a universal character set that assigns a number (called a "code point") to each character, **but it does not specify how these character code points should be stored in a computer system**. One might ask: How does a system interpret Unicode code points of varying lengths within a text? For example, given a 2-byte code, how does the system determine if it represents a single 2-byte character or two 1-byte characters?
A straightforward solution to this problem is to store all characters as equal-length encodings. As shown in the Figure 3-7 , each character in "Hello" occupies 1 byte, while each character in "算法" (algorithm) occupies 2 bytes. We could encode all characters in "Hello 算法" as 2 bytes by padding the higher bits with zeros. This way, the system can parse a character every 2 bytes, recovering the content of the phrase.
A straightforward solution to this problem is to store all characters as equal-length encodings. As shown in the Figure 3-7 , each character in "Hello" occupies 1 byte, while each character in "算法" (algorithm) occupies 2 bytes. We could encode all characters in "Hello 算法" as 2 bytes by padding the higher bits with zeros. This method would enable the system to interpret a character every 2 bytes, recovering the content of the phrase.
![Unicode Encoding Example](character_encoding.assets/unicode_hello_algo.png){ class="animation-figure" }
@ -55,9 +55,9 @@ The encoding rules for UTF-8 are not complex and can be divided into two cases:
The Figure 3-8 shows the UTF-8 encoding for "Hello算法". It can be observed that since the highest $n$ bits are set to $1$, the system can determine the length of the character as $n$ by counting the number of highest bits set to $1$.
But why set the highest 2 bits of the remaining bytes to $10$? Actually, this $10$ serves as a kind of checksum. If the system starts parsing text from an incorrect byte, the $10$ at the beginning of the byte can help the system quickly detect an anomaly.
But why set the highest 2 bits of the remaining bytes to $10$? Actually, this $10$ serves as a kind of checksum. If the system starts parsing text from an incorrect byte, the $10$ at the beginning of the byte can help the system quickly detect anomalies.
The reason for using $10$ as a checksum is that, under UTF-8 encoding rules, it's impossible for the highest two bits of a character to be $10$. This can be proven by contradiction: If the highest two bits of a character are $10$, it indicates that the character's length is $1$, corresponding to ASCII. However, the highest bit of an ASCII character should be $0$, contradicting the assumption.
The reason for using $10$ as a checksum is that, under UTF-8 encoding rules, it's impossible for the highest two bits of a character to be $10$. This can be proven by contradiction: If the highest two bits of a character are $10$, it indicates that the character's length is $1$, corresponding to ASCII. However, the highest bit of an ASCII character should be $0$, which contradicts the assumption.
![UTF-8 Encoding Example](character_encoding.assets/utf-8_hello_algo.png){ class="animation-figure" }
@ -65,16 +65,16 @@ The reason for using $10$ as a checksum is that, under UTF-8 encoding rules, it'
Apart from UTF-8, other common encoding methods include:
- **UTF-16 Encoding**: Uses 2 or 4 bytes to represent a character. All ASCII characters and commonly used non-English characters are represented with 2 bytes; a few characters require 4 bytes. For 2-byte characters, the UTF-16 encoding is equal to the Unicode code point.
- **UTF-16 Encoding**: Uses 2 or 4 bytes to represent a character. All ASCII characters and commonly used non-English characters are represented with 2 bytes; a few characters require 4 bytes. For 2-byte characters, the UTF-16 encoding equals the Unicode code point.
- **UTF-32 Encoding**: Every character uses 4 bytes. This means UTF-32 occupies more space than UTF-8 and UTF-16, especially for texts with a high proportion of ASCII characters.
From the perspective of storage space, UTF-8 is highly efficient for representing English characters, requiring only 1 byte; UTF-16 might be more efficient for encoding some non-English characters (like Chinese), as it requires only 2 bytes, while UTF-8 might need 3 bytes.
From the perspective of storage space, using UTF-8 to represent English characters is very efficient because it only requires 1 byte; using UTF-16 to encode some non-English characters (such as Chinese) can be more efficient because it only requires 2 bytes, while UTF-8 might need 3 bytes.
From a compatibility standpoint, UTF-8 is the most versatile, with many tools and libraries supporting UTF-8 as a priority.
From a compatibility perspective, UTF-8 is the most versatile, with many tools and libraries supporting UTF-8 as a priority.
## 3.4.5 &nbsp; Character Encoding in Programming Languages
In many classic programming languages, strings during program execution are encoded using fixed-length encodings like UTF-16 or UTF-32. This allows strings to be treated as arrays, offering several advantages:
Historically, many programming languages utilized fixed-length encodings such as UTF-16 or UTF-32 for processing strings during program execution. This allows strings to be handled as arrays, offering several advantages:
- **Random Access**: Strings encoded in UTF-16 can be accessed randomly with ease. For UTF-8, which is a variable-length encoding, locating the $i^{th}$ character requires traversing the string from the start to the $i^{th}$ position, taking $O(n)$ time.
- **Character Counting**: Similar to random access, counting the number of characters in a UTF-16 encoded string is an $O(1)$ operation. However, counting characters in a UTF-8 encoded string requires traversing the entire string.
@ -82,16 +82,16 @@ In many classic programming languages, strings during program execution are enco
The design of character encoding schemes in programming languages is an interesting topic involving various factors:
- Javas `String` type uses UTF-16 encoding, with each character occupying 2 bytes. This was based on the initial belief that 16 bits were sufficient to represent all possible characters, a judgment later proven incorrect. As the Unicode standard expanded beyond 16 bits, characters in Java may now be represented by a pair of 16-bit values, known as “surrogate pairs.”
- Javas `String` type uses UTF-16 encoding, with each character occupying 2 bytes. This was based on the initial belief that 16 bits were sufficient to represent all possible characters and proven incorrect later. As the Unicode standard expanded beyond 16 bits, characters in Java may now be represented by a pair of 16-bit values, known as “surrogate pairs.”
- JavaScript and TypeScript use UTF-16 encoding for similar reasons as Java. When JavaScript was first introduced by Netscape in 1995, Unicode was still in its early stages, and 16-bit encoding was sufficient to represent all Unicode characters.
- C# uses UTF-16 encoding, largely because the .NET platform, designed by Microsoft, and many Microsoft technologies, including the Windows operating system, extensively use UTF-16 encoding.
Due to the underestimation of character counts, these languages had to resort to using "surrogate pairs" to represent Unicode characters exceeding 16 bits. This approach has its drawbacks: strings containing surrogate pairs may have characters occupying 2 or 4 bytes, losing the advantage of fixed-length encoding, and handling surrogate pairs adds to the complexity and debugging difficulty of programming.
Due to the underestimation of character counts, these languages had to use "surrogate pairs" to represent Unicode characters exceeding 16 bits. This approach has its drawbacks: strings containing surrogate pairs may have characters occupying 2 or 4 bytes, losing the advantage of fixed-length encoding. Additionally, handling surrogate pairs adds complexity and debugging difficulty to programming.
Owing to these reasons, some programming languages have adopted different encoding schemes:
Addressing these challenges, some languages have adopted alternative encoding strategies:
- Pythons `str` type uses Unicode encoding with a flexible representation where the storage length of characters depends on the largest Unicode code point in the string. If all characters are ASCII, each character occupies 1 byte; if characters exceed ASCII but are within the Basic Multilingual Plane (BMP), each occupies 2 bytes; if characters exceed the BMP, each occupies 4 bytes.
- Pythons `str` type uses Unicode encoding with a flexible representation where the storage length of characters depends on the largest Unicode code point in the string. If all characters are ASCII, each character occupies 1 byte, 2 bytes for characters within the Basic Multilingual Plane (BMP), and 4 bytes for characters beyond the BMP.
- Gos `string` type internally uses UTF-8 encoding. Go also provides the `rune` type for representing individual Unicode code points.
- Rusts `str` and `String` types use UTF-8 encoding internally. Rust also offers the `char` type for individual Unicode code points.
Its important to note that the above discussion pertains to how strings are stored in programming languages, **which is a different issue from how strings are stored in files or transmitted over networks**. For file storage or network transmission, strings are usually encoded in UTF-8 format for optimal compatibility and space efficiency.
Its important to note that the above discussion pertains to how strings are stored in programming languages, **which is different from how strings are stored in files or transmitted over networks**. For file storage or network transmission, strings are usually encoded in UTF-8 format for optimal compatibility and space efficiency.

@ -4,11 +4,11 @@ comments: true
# 5.1 &nbsp; Stack
"Stack" is a linear data structure that follows the principle of Last-In-First-Out (LIFO).
A "Stack" is a linear data structure that follows the principle of Last-In-First-Out (LIFO).
We can compare a stack to a pile of plates on a table. To access the bottom plate, one must remove the plates on top. If we replace the plates with various types of elements (such as integers, characters, objects, etc.), we obtain the data structure known as a stack.
We can compare a stack to a pile of plates on a table. To access the bottom plate, one must first remove the plates on top. By replacing the plates with various types of elements (such as integers, characters, objects, etc.), we obtain the data structure known as a stack.
As shown in the following figure, we refer to the top of the pile of elements as the "top of the stack" and the bottom as the "bottom of the stack." The operation of adding elements to the top of the stack is called "push," and the operation of removing the top element is called "pop."
As shown in the Figure 5-1 , we refer to the top of the pile of elements as the "top of the stack" and the bottom as the "bottom of the stack." The operation of adding elements to the top of the stack is called "push," and the operation of removing the top element is called "pop."
![Stack's Last-In-First-Out Rule](stack.assets/stack_operations.png){ class="animation-figure" }
@ -319,9 +319,9 @@ Typically, we can directly use the stack class built into the programming langua
## 5.1.2 &nbsp; Implementing a Stack
To understand the mechanics of a stack more deeply, let's try implementing a stack class ourselves.
To gain a deeper understanding of how a stack operates, let's try implementing a stack class ourselves.
A stack follows the principle of Last-In-First-Out, which means we can only add or remove elements at the top of the stack. However, both arrays and linked lists allow adding and removing elements at any position, **therefore a stack can be seen as a restricted array or linked list**. In other words, we can "mask" some unrelated operations of arrays or linked lists to make their logic conform to the characteristics of a stack.
A stack follows the principle of Last-In-First-Out, which means we can only add or remove elements at the top of the stack. However, both arrays and linked lists allow adding and removing elements at any position, **therefore a stack can be seen as a restricted array or linked list**. In other words, we can "shield" certain irrelevant operations of an array or linked list, aligning their external behavior with the characteristics of a stack.
### 1. &nbsp; Implementation Based on Linked List
@ -1708,7 +1708,7 @@ Both implementations support all the operations defined in a stack. The array im
**Time Efficiency**
In the array-based implementation, both push and pop operations occur in pre-allocated continuous memory, which has good cache locality and therefore higher efficiency. However, if the push operation exceeds the array capacity, it triggers a resizing mechanism, making the time complexity of that push operation $O(n)$.
In the array-based implementation, both push and pop operations occur in pre-allocated contiguous memory, which has good cache locality and therefore higher efficiency. However, if the push operation exceeds the array capacity, it triggers a resizing mechanism, making the time complexity of that push operation $O(n)$.
In the linked list implementation, list expansion is very flexible, and there is no efficiency decrease issue as in array expansion. However, the push operation requires initializing a node object and modifying pointers, so its efficiency is relatively lower. If the elements being pushed are already node objects, then the initialization step can be skipped, improving efficiency.

@ -6,29 +6,29 @@ comments: true
### 1. &nbsp; Key Review
- A stack is a data structure that follows the Last-In-First-Out (LIFO) principle and can be implemented using either arrays or linked lists.
- In terms of time efficiency, the array implementation of a stack has higher average efficiency, but during expansion, the time complexity for a single push operation can degrade to $O(n)$. In contrast, the linked list implementation of a stack offers more stable efficiency.
- Regarding space efficiency, the array implementation of a stack may lead to some level of space wastage. However, it's important to note that the memory space occupied by nodes in a linked list is generally larger than that for elements in an array.
- A queue is a data structure that follows the First-In-First-Out (FIFO) principle, and it can also be implemented using either arrays or linked lists. The conclusions regarding time and space efficiency for queues are similar to those for stacks.
- A double-ended queue is a more flexible type of queue that allows adding and removing elements from both ends.
- Stack is a data structure that follows the Last-In-First-Out (LIFO) principle and can be implemented using arrays or linked lists.
- In terms of time efficiency, the array implementation of the stack has a higher average efficiency. However, during expansion, the time complexity for a single push operation can degrade to $O(n)$. In contrast, the linked list implementation of a stack offers more stable efficiency.
- Regarding space efficiency, the array implementation of the stack may lead to a certain degree of space wastage. However, it's important to note that the memory space occupied by nodes in a linked list is generally larger than that for elements in an array.
- A queue is a data structure that follows the First-In-First-Out (FIFO) principle, and it can also be implemented using arrays or linked lists. The conclusions regarding time and space efficiency for queues are similar to those for stacks.
- A double-ended queue (deque) is a more flexible type of queue that allows adding and removing elements at both ends.
### 2. &nbsp; Q & A
**Q**: Is the browser's forward and backward functionality implemented with a doubly linked list?
The forward and backward functionality of a browser fundamentally represents the "stack" concept. When a user visits a new page, it is added to the top of the stack; when they click the back button, the page is popped from the top. A double-ended queue can conveniently implement some additional operations, as mentioned in the "Double-Ended Queue" section.
A browser's forward and backward navigation is essentially a manifestation of the "stack" concept. When a user visits a new page, the page is added to the top of the stack; when they click the back button, the page is popped from the top of the stack. A double-ended queue (deque) can conveniently implement some additional operations, as mentioned in the "Double-Ended Queue" section.
**Q**: After popping from a stack, is it necessary to free the memory of the popped node?
If the popped node will still be used later, it's not necessary to free its memory. In languages like Java and Python that have automatic garbage collection, manual memory release isn't required; in C and C++, manual memory release is necessary if the node will no longer be used.
If the popped node will still be used later, it's not necessary to free its memory. In languages like Java and Python that have automatic garbage collection, manual memory release is not necessary; in C and C++, manual memory release is required.
**Q**: A double-ended queue seems like two stacks joined together. What are its uses?
A double-ended queue is essentially a combination of a stack and a queue, or like two stacks joined together. It exhibits both stack and queue logic, therefore enabling the implementation of all applications of stacks and queues with added flexibility.
A double-ended queue, which is a combination of a stack and a queue or two stacks joined together, exhibits both stack and queue logic. Thus, it can implement all applications of stacks and queues while offering more flexibility.
**Q**: How exactly are undo and redo implemented?
Undo and redo are implemented using two stacks: Stack A for undo and Stack B for redo.
Undo and redo operations are implemented using two stacks: Stack A for undo and Stack B for redo.
1. Each time a user performs an operation, it is pushed onto Stack A, and Stack B is cleared.
2. When the user executes an "undo", the most recent operation is popped from Stack A and pushed onto Stack B.

@ -13,62 +13,66 @@ status: new
| 中文 | English | 中文 | English |
| -------------- | ------------------------------ | -------------- | --------------------------- |
| 算法 | algorithm | 层序遍历 | level-order traversal |
| 数据结构 | data structure | 广度优先遍历 | breadth-first traversal |
| 渐近复杂度分析 | asymptotic complexity analysis | 深度优先遍历 | depth-first traversal |
| 时间复杂度 | time complexity | 二叉搜索树 | binary search tree |
| 空间复杂度 | space complexity | 平衡二叉搜索树 | balanced binary search tree |
| 迭代 | iteration | 平衡因子 | balance factor |
| 递归 | recursion | 堆 | heap |
| 尾递归 | tail recursion | 大顶堆 | max heap |
| 递归树 | recursion tree | 小顶堆 | min heap |
| 大 $O$ 记号 | big-$O$ notation | 优先队列 | priority queue |
| 渐近上界 | asymptotic upper bound | 堆化 | heapify |
| 原码 | sign-magnitude | 图 | graph |
| 反码 | 1s complement | 顶点 | vertex |
| 补码 | 2s complement | 无向图 | undirected graph |
| 数组 | array | 有向图 | directed graph |
| 索引 | index | 连通图 | connected graph |
| 链表 | linked list | 非连通图 | disconnected graph |
| 链表节点 | linked list node, list node | 有权图 | weighted graph |
| 列表 | list | 邻接 | adjacency |
| 动态数组 | dynamic array | 路径 | path |
| 硬盘 | hard disk | 入度 | in-degree |
| 内存 | random-access memory (RAM) | 出度 | out-degree |
| 缓存 | cache memory | 邻接矩阵 | adjacency matrix |
| 缓存未命中 | cache miss | 邻接表 | adjacency list |
| 缓存命中率 | cache hit rate | 广度优先搜索 | breadth-first search |
| 栈 | stack | 深度优先搜索 | depth-first search |
| 队列 | queue | 二分查找 | binary search |
| 双向队列 | double-ended queue | 搜索算法 | searching algorithm |
| 哈希表 | hash table | 排序算法 | sorting algorithm |
| 桶 | bucket | 选择排序 | selection sort |
| 哈希函数 | hash function | 冒泡排序 | bubble sort |
| 哈希冲突 | hash collision | 插入排序 | insertion sort |
| 负载因子 | load factor | 快速排序 | quick sort |
| 链式地址 | separate chaining | 归并排序 | merge sort |
| 开放寻址 | open addressing | 堆排序 | heap sort |
| 线性探测 | linear probing | 桶排序 | bucket sort |
| 懒删除 | lazy deletion | 计数排序 | counting sort |
| 二叉树 | binary tree | 基数排序 | radix sort |
| 树节点 | tree node | 分治 | divide and conquer |
| 左子节点 | left-child node | 汉诺塔问题 | hanota problem |
| 右子节点 | right-child node | 回溯算法 | backtracking algorithm |
| 父节点 | parent node | 约束 | constraint |
| 左子树 | left subtree | 解 | solution |
| 右子树 | right subtree | 状态 | state |
| 根节点 | root node | 剪枝 | pruning |
| 叶节点 | leaf node | 全排列问题 | permutations problem |
| 边 | edge | 子集和问题 | subset-sum problem |
| 层 | level | n 皇后问题 | n-queens problem |
| 度 | degree | 动态规划 | dynamic programming |
| 高度 | height | 初始状态 | initial state |
| 深度 | depth | 状态转移方程 | state-trasition equation |
| 完美二叉树 | perfect binary tree | 背包问题 | knapsack problem |
| 完全二叉树 | complete binary tree | 编辑距离问题 | edit distance problem |
| 完满二叉树 | full binary tree | 贪心算法 | greedy algorithm |
| 算法 | algorithm | AVL 树 | AVL tree |
| 数据结构 | data structure | 红黑树 | red-black tree |
| 渐近复杂度分析 | asymptotic complexity analysis | 层序遍历 | level-order traversal |
| 时间复杂度 | time complexity | 广度优先遍历 | breadth-first traversal |
| 空间复杂度 | space complexity | 深度优先遍历 | depth-first traversal |
| 迭代 | iteration | 二叉搜索树 | binary search tree |
| 递归 | recursion | 平衡二叉搜索树 | balanced binary search tree |
| 尾递归 | tail recursion | 平衡因子 | balance factor |
| 递归树 | recursion tree | 堆 | heap |
| 大 $O$ 记号 | big-$O$ notation | 大顶堆 | max heap |
| 渐近上界 | asymptotic upper bound | 小顶堆 | min heap |
| 原码 | sign-magnitude | 优先队列 | priority queue |
| 反码 | 1s complement | 堆化 | heapify |
| 补码 | 2s complement | Top-$k$ 问题 | Top-$k$ problem |
| 数组 | array | 图 | graph |
| 索引 | index | 顶点 | vertex |
| 链表 | linked list | 无向图 | undirected graph |
| 链表节点 | linked list node, list node | 有向图 | directed graph |
| 头节点 | head node | 连通图 | connected graph |
| 尾节点 | tail node | 非连通图 | disconnected graph |
| 列表 | list | 有权图 | weighted graph |
| 动态数组 | dynamic array | 邻接 | adjacency |
| 硬盘 | hard disk | 路径 | path |
| 内存 | random-access memory (RAM) | 入度 | in-degree |
| 缓存 | cache memory | 出度 | out-degree |
| 缓存未命中 | cache miss | 邻接矩阵 | adjacency matrix |
| 缓存命中率 | cache hit rate | 邻接表 | adjacency list |
| 栈 | stack | 广度优先搜索 | breadth-first search |
| 栈顶 | top of the stack | 深度优先搜索 | depth-first search |
| 栈底 | bottom of the stack | 二分查找 | binary search |
| 队列 | queue | 搜索算法 | searching algorithm |
| 双向队列 | double-ended queue | 排序算法 | sorting algorithm |
| 队首 | front of the queue | 选择排序 | selection sort |
| 队尾 | rear of the queue | 冒泡排序 | bubble sort |
| 哈希表 | hash table | 插入排序 | insertion sort |
| 桶 | bucket | 快速排序 | quick sort |
| 哈希函数 | hash function | 归并排序 | merge sort |
| 哈希冲突 | hash collision | 堆排序 | heap sort |
| 负载因子 | load factor | 桶排序 | bucket sort |
| 链式地址 | separate chaining | 计数排序 | counting sort |
| 开放寻址 | open addressing | 基数排序 | radix sort |
| 线性探测 | linear probing | 分治 | divide and conquer |
| 懒删除 | lazy deletion | 汉诺塔问题 | hanota problem |
| 二叉树 | binary tree | 回溯算法 | backtracking algorithm |
| 树节点 | tree node | 约束 | constraint |
| 左子节点 | left-child node | 解 | solution |
| 右子节点 | right-child node | 状态 | state |
| 父节点 | parent node | 剪枝 | pruning |
| 左子树 | left subtree | 全排列问题 | permutations problem |
| 右子树 | right subtree | 子集和问题 | subset-sum problem |
| 根节点 | root node | n 皇后问题 | n-queens problem |
| 叶节点 | leaf node | 动态规划 | dynamic programming |
| 边 | edge | 初始状态 | initial state |
| 层 | level | 状态转移方程 | state-trasition equation |
| 度 | degree | 背包问题 | knapsack problem |
| 高度 | height | 编辑距离问题 | edit distance problem |
| 深度 | depth | 贪心算法 | greedy algorithm |
| 完美二叉树 | perfect binary tree | | |
| 完全二叉树 | complete binary tree | | |
| 完满二叉树 | full binary tree | | |
| 平衡二叉树 | balanced binary tree | | |
| AVL 树 | AVL tree | | |
| 红黑树 | red-black tree | | |
</div>

@ -19,11 +19,11 @@ icon: material/book-open-outline
两年前,我在力扣上分享了“剑指 Offer”系列题解受到了许多读者的鼓励和支持。在与读者交流期间我最常被问的一个问题是“如何入门算法”。逐渐地我对这个问题产生了浓厚的兴趣。
两眼一抹黑地刷题似乎是最受欢迎的方法,简单、直接且有效。然而刷题就如同玩“扫雷”游戏,自学能力强的人能够顺利将地雷逐个排掉,而基础不足的人很可能被得满头是包,并在挫折中步步退缩。通读教材也是一种常见做法,但对于面向求职的人来说,毕业论文、投递简历、准备笔试和面试已经消耗了大部分精力,啃厚重的书往往变成了一项艰巨的挑战。
两眼一抹黑地刷题似乎是最受欢迎的方法,简单、直接且有效。然而刷题就如同玩“扫雷”游戏,自学能力强的人能够顺利将地雷逐个排掉,而基础不足的人很可能被得满头是包,并在挫折中步步退缩。通读教材也是一种常见做法,但对于面向求职的人来说,毕业论文、投递简历、准备笔试和面试已经消耗了大部分精力,啃厚重的书往往变成了一项艰巨的挑战。
如果你也面临类似的困扰,那么很幸运这本书“找”到了你。本书是我对这个问题给出的答案,即使不是最优解,也至少是一次积极的尝试。本书虽然不足以让你直接拿到 Offer但会引导你探索数据结构与算法的“知识地图”带你了解不同“地雷”的形状、大小和分布位置让你掌握各种“排雷方法”。有了这些本领相信你可以更加自如地刷题和阅读文献逐步构建起完整的知识体系。
我深深赞同费曼教授所言“Knowledge isnt free. You have to pay attention.”从这个意义上看,这本书并非完全“免费”。为了不辜负你为本书所付出的宝贵“注意力”,我会竭尽所能,投入最大的“注意力”来完成本书的创作。
我深深赞同费曼教授所言“Knowledge isn't free. You have to pay attention.”从这个意义上看,这本书并非完全“免费”。为了不辜负你为本书所付出的宝贵“注意力”,我会竭尽所能,投入最大的“注意力”来完成本书的创作。
## 本章内容

@ -4,16 +4,18 @@
[2] Aditya Bhargava. Grokking Algorithms: An Illustrated Guide for Programmers and Other Curious People (1st Edition).
[3] 严蔚敏. 数据结构C 语言版).
[3] Robert Sedgewick, et al. Algorithms (4th Edition).
[4] 邓俊辉. 数据结构C++ 语言版,第三版).
[4] 严蔚敏. 数据结构C 语言版).
[5] 马克 艾伦 维斯著,陈越译. 数据结构与算法分析Java语言描述第三版).
[5] 邓俊辉. 数据结构C++ 语言版,第三版).
[6] 程杰. 大话数据结构.
[6] 马克 艾伦 维斯著,陈越译. 数据结构与算法分析Java语言描述第三版.
[7] 王争. 数据结构与算法之美.
[7] 程杰. 大话数据结构.
[8] Gayle Laakmann McDowell. Cracking the Coding Interview: 189 Programming Questions and Solutions (6th Edition).
[8] 王争. 数据结构与算法之美.
[9] Aston Zhang, et al. Dive into Deep Learning.
[9] Gayle Laakmann McDowell. Cracking the Coding Interview: 189 Programming Questions and Solutions (6th Edition).
[10] Aston Zhang, et al. Dive into Deep Learning.

@ -1,5 +1,5 @@
---
comments: true
comments: false
glightbox: false
hide:
- footer
@ -266,7 +266,7 @@ hide:
<div class="profile-div">
<div class="profile-cell">
<a href="https://github.com/krahets">
<img class="profile-img" src="https://avatars.githubusercontent.com/u/26993056?v=4" alt="krahets" />
<img class="profile-img" src="assets/avatar/avatar_yudongjin.jpg" alt="yudongjin" />
<br><b>靳宇栋(@krahets</b>
</a>
</div>
@ -279,63 +279,63 @@ hide:
<div class="profile-div">
<div class="profile-cell">
<a href="https://github.com/codingonion">
<img class="profile-img" src="https://avatars.githubusercontent.com/u/99076655?v=4" alt="codingonion" />
<img class="profile-img" src="assets/avatar/avatar_codingonion.jpg" alt="codingonion" />
<br><b>codingonion</b>
<br><sub>Zig, Rust</sub>
</a>
</div>
<div class="profile-cell">
<a href="https://github.com/Gonglja">
<img class="profile-img" src="https://avatars.githubusercontent.com/u/39959756?v=4" alt="Gonglja" />
<img class="profile-img" src="assets/avatar/avatar_Gonglja.jpg" alt="Gonglja" />
<br><b>Gonglja</b>
<br><sub>C, C++</sub>
</a>
</div>
<div class="profile-cell">
<a href="https://github.com/gvenusleo">
<img class="profile-img" src="https://avatars.githubusercontent.com/u/79075347?v=4" alt="gvenusleo" />
<img class="profile-img" src="assets/avatar/avatar_gvenusleo.jpg" alt="gvenusleo" />
<br><b>gvenusleo</b>
<br><sub>Dart</sub>
</a>
</div>
<div class="profile-cell">
<a href="https://github.com/hpstory">
<img class="profile-img" src="https://avatars.githubusercontent.com/u/33348162?v=4" alt="hpstory" />
<img class="profile-img" src="assets/avatar/avatar_hpstory.jpg" alt="hpstory" />
<br><b>hpstory</b>
<br><sub>C#</sub>
</a>
</div>
<div class="profile-cell">
<a href="https://github.com/justin-tse">
<img class="profile-img" src="https://avatars.githubusercontent.com/u/24556310?v=4" alt="justin-tse" />
<img class="profile-img" src="assets/avatar/avatar_justin-tse.jpg" alt="justin-tse" />
<br><b>justin-tse</b>
<br><sub>JS, TS</sub>
</a>
</div>
<div class="profile-cell">
<a href="https://github.com/krahets">
<img class="profile-img" src="https://avatars.githubusercontent.com/u/26993056?v=4" alt="krahets" />
<img class="profile-img" src="assets/avatar/avatar_krahets.jpg" alt="krahets" />
<br><b>krahets</b>
<br><sub>Python, Java</sub>
</a>
</div>
<div class="profile-cell">
<a href="https://github.com/night-cruise">
<img class="profile-img" src="https://avatars.githubusercontent.com/u/77157236?v=4" alt="night-cruise" />
<img class="profile-img" src="assets/avatar/avatar_night-cruise.jpg" alt="night-cruise" />
<br><b>night-cruise</b>
<br><sub>Rust</sub>
</a>
</div>
<div class="profile-cell">
<a href="https://github.com/nuomi1">
<img class="profile-img" src="https://avatars.githubusercontent.com/u/3739017?v=4" alt="nuomi1" />
<img class="profile-img" src="assets/avatar/avatar_nuomi1.jpg" alt="nuomi1" />
<br><b>nuomi1</b>
<br><sub>Swift</sub>
</a>
</div>
<div class="profile-cell">
<a href="https://github.com/Reanon">
<img class="profile-img" src="https://avatars.githubusercontent.com/u/22005836?v=4" alt="Reanon" />
<img class="profile-img" src="assets/avatar/avatar_Reanon.jpg" alt="Reanon" />
<br><b>Reanon</b>
<br><sub>Go, C</sub>
</a>
@ -352,4 +352,28 @@ hide:
</a>
</div>
</div>
</section>
<section data-md-color-scheme="default" data-md-color-primary="white" class="home-div">
<div class="section-content giscus-container">
<p>欢迎在评论区留下你的见解、问题或建议</p>
<!-- Insert generated snippet here -->
<script
src="https://giscus.app/client.js"
data-repo="krahets/hello-algo"
data-repo-id="R_kgDOIXtSqw"
data-category="Announcements"
data-category-id="DIC_kwDOIXtSq84CSZk_"
data-mapping="pathname"
data-strict="1"
data-reactions-enabled="1"
data-emit-metadata="0"
data-input-position="top"
data-theme="light"
data-lang="zh-CN"
crossorigin="anonymous"
async
>
</script>
</div>
</section>

@ -417,7 +417,7 @@ a:hover .hero-caption {
}
.giscus-container {
width: 50em;
width: 40em;
max-width: 100%;
margin: 0 auto;
}

Loading…
Cancel
Save