Skip to main content

DSAPath12 minintermediate

Top-K Selection vs Sorting

Sorting orders every item; top-k selection only finds the best k items, often with a heap or selection algorithm.

DifficultyIntermediate
TierTier 1
ModuleComparisons
LanguagesPython

Why This Matters

Many systems only need the best few results: top scores, nearest candidates, highest frequencies, next jobs, or best recommendations. Sorting everything can waste work.

Decision Rule

Use sorting when you need the whole ordered list. Use top-k selection when you only need the best k items and k is much smaller than n.

Comparison Table

NeedFull sortingTop-k selection
Outputall items orderedbest k items
Typical costO(n log n)O(n log k) with size-k heap
Order among top knaturally sortedmay need final sort of k items
Good forranking every itemleaderboards, candidate generation, frequent elements
Python toolssorted, .sort()heapq.nlargest, heapq.nsmallest, custom heap

Worked Example

If n = 1,000,000 and k = 10, sorting ranks all one million items. A min-heap of size 10 keeps only the current best candidates. Each new item either gets ignored or replaces the smallest top candidate.

The heap does less ordering work because it never fully sorts the losing items.

Common Mistakes

MistakeCorrection
Sorting everything for top 10 by habit.Consider a heap or selection algorithm.
Forgetting that heap output may not be sorted.Sort the final k items if display order matters.
Using top-k when every rank matters.Full sorting is the correct interface when all order positions are needed.

Diagnostic Questions

QuestionAnswer signal
What is the typical heap cost for top-k?O(n log k).
When is full sorting appropriate?When the entire ordered list is needed.
Why does a size-k heap save work?It keeps only candidates that can still be in the answer.

Exercises

  • Beginner: Find the three largest numbers by hand without sorting the whole list.
  • Beginner: Use Python heapq.nlargest on a list of scores.
  • Beginner: Explain whether top-k output must be sorted.
  • Intermediate: Implement top-k frequent words with Counter and heapq.
  • Intermediate: Compare sorted(nums)[-k:] with a size-k heap for large n.
  • Challenge: Explain quickselect and why its average complexity is linear.

References