Why This Matters
Many systems only need the best few results: top scores, nearest candidates, highest frequencies, next jobs, or best recommendations. Sorting everything can waste work.
Decision Rule
Use sorting when you need the whole ordered list. Use top-k selection when you only need the best k items and k is much smaller than n.
Comparison Table
| Need | Full sorting | Top-k selection |
|---|---|---|
| Output | all items ordered | best k items |
| Typical cost | O(n log n) | O(n log k) with size-k heap |
| Order among top k | naturally sorted | may need final sort of k items |
| Good for | ranking every item | leaderboards, candidate generation, frequent elements |
| Python tools | sorted, .sort() | heapq.nlargest, heapq.nsmallest, custom heap |
Worked Example
If n = 1,000,000 and k = 10, sorting ranks all one million items. A min-heap of size 10 keeps only the current best candidates. Each new item either gets ignored or replaces the smallest top candidate.
The heap does less ordering work because it never fully sorts the losing items.
Common Mistakes
| Mistake | Correction |
|---|---|
| Sorting everything for top 10 by habit. | Consider a heap or selection algorithm. |
| Forgetting that heap output may not be sorted. | Sort the final k items if display order matters. |
| Using top-k when every rank matters. | Full sorting is the correct interface when all order positions are needed. |
Diagnostic Questions
| Question | Answer signal |
|---|---|
| What is the typical heap cost for top-k? | O(n log k). |
| When is full sorting appropriate? | When the entire ordered list is needed. |
| Why does a size-k heap save work? | It keeps only candidates that can still be in the answer. |
Exercises
- Beginner: Find the three largest numbers by hand without sorting the whole list.
- Beginner: Use Python
heapq.nlargeston a list of scores. - Beginner: Explain whether top-k output must be sorted.
- Intermediate: Implement top-k frequent words with
Counterandheapq. - Intermediate: Compare
sorted(nums)[-k:]with a size-k heap for largen. - Challenge: Explain quickselect and why its average complexity is linear.
References
- MIT OpenCourseWare. 6.006 Introduction to Algorithms, Spring 2020: https://ocw.mit.edu/courses/6-006-introduction-to-algorithms-spring-2020/
- MIT OpenCourseWare. 6.046J Design and Analysis of Algorithms, Spring 2015: https://ocw.mit.edu/courses/6-046j-design-and-analysis-of-algorithms-spring-2015/