Arrays & Hashing · LC #347

Top K Frequent Elements

Return the k most frequent integers from an array. Teaches bucket-sort as an O(n) alternative to a min-heap (O(n log k)) for frequency-rank problems.

mediumLC #347AMZ★★★GOO★★META★★MSFT★★AAPLNFLX
Mark as done
Confidence

Statement

Problem & Approach

Problem statement, sample I/O, and key reasoning.

Given an integer array `nums` and an integer `k`, return the `k` most frequent elements. The answer can be in any order.

Sample I/O

Input:  nums = [1,1,1,2,2,3], k = 2
Output: [1,2]

Input:  nums = [1], k = 1
Output: [1]

Intuition

How to Think About It

Intuition

After counting frequencies, we know each number's frequency is an integer in [1, n]. We can use that bounded range as an index into a "bucket" array. The highest-frequency elements sit in the last buckets, so scanning from the back gives us the top-k in O(n) total.

Core Concept

Build a frequency map counting occurrences of each number (O(n)). Create a buckets array of length n+1 where buckets[f] holds all numbers with frequency f. Scan from index n down to 1, collecting elements until k elements are gathered. Time: O(n). Space: O(n). Compare with min-heap: O(n log k) time, O(k) extra space - simpler but slower.

When to use

Finding top-k by integer frequency when the range of possible frequencies is bounded by n. Anywhere you need O(n) over O(n log n) and frequencies are discrete integers. Streaming top-k with a sliding window.

When NOT to use

When k is very small and n is huge - a min-heap of size k uses O(k) space and may be preferred. When frequencies are not integers or are floating-point weights. When you need the k-th largest value, not frequency (use quickselect).

Key Insights

What to Know Cold

  • Maximum possible frequency is n (all elements same), so the bucket array only needs n+1 slots.
  • Bucket sort achieves O(n) by exploiting the bounded integer domain of frequencies - standard comparison sorts can't do better than O(n log n).
  • A min-heap approach (Python heapq.nlargest) is O(n log k) - acceptable for interviews but sub-optimal.
  • You can use Counter.most_common(k) in Python for a concise one-liner - still O(n log k) internally.
  • The bucket scan must continue past partially-filled buckets to guarantee exactly k elements when multiple numbers share a frequency.

1 example

Worked Examples

Bucket Sort - O(n)

nums = [1,1,1,2,2,3], k=2. Return the 2 most frequent: [1, 2].

freq={1:3, 2:2, 3:1}. Bucket[3]=[1], bucket[2]=[2], bucket[1]=[3]. Scan from end: collect 1, then 2. Done at k=2.

from collections import Counter
nums, k = [1,1,1,2,2,3], 2
count = Counter(nums)
buckets = [[] for _ in range(len(nums) + 1)]
for num, freq in count.items():
    buckets[freq].append(num)
res = []
for i in range(len(buckets)-1, 0, -1):
    res.extend(buckets[i])
    if len(res) >= k:
        print(res[:k])  # [1, 2]
        break

This problem appears constantly in Amazon and Google phone screens. The bucket-sort O(n) trick over the naive heap O(n log k) is a concrete demonstration of "know your constraints" - a key senior-engineer signal.