Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

All about Greedy Algorithms| Beginners Guide.
Latest   Machine Learning

All about Greedy Algorithms| Beginners Guide.

Last Updated on November 13, 2023 by Editorial Team

Author(s): Muttineni Sai Rohith

Originally published on Towards AI.

All about Greedy AlgorithmsU+007C Beginners Guide.

Imagine you are on a journey to a new destination, you probably use GPS navigation to find the quickest route. Just like looking for a time-efficient path in an unfamiliar route, Greedy Algorithms always select the next step that offers the most obvious and immediate benefit. Greedy Algorithms tend to choose the best option at each step, which gradually gives us a way to achieve the solution in a time-efficient approach.

Now let's deal with an example to learn more about Greedy Algorithms — the coin change problem.

Suppose we need to give 31 coins as change, and the denominations we have is 20, 10, 5, 2, 1.

The Coin Change problem uses a greedy algorithmic approach —

  • First, find the biggest denomination that is less than the total amount.
  • Add the denomination to the result and subtract it from the total amount.
  • If the total amount now is Zero, then Print the denominations.
  • Else Repeat the steps 1 and 2.

Greedy Algorithms: Definition

Greedy Algorithms are a set of optimization algorithms used in Algorithmic Problem Solving. The core idea behind the Greedy approach is to make a locally optimal choice at each stage with the hope of finding the global optimum. Unlike in dynamic programming, where we solve subproblems, and store solutions to avoid redundant calculations, Greedy Algorithms make decisions by selecting the best option available at a point in time without considering the further steps.

When should one choose Greedy Algorithms?

Greedy Algorithms are suitable for problems that exhibit the following characteristics:

  1. Greedy Choice: Greedy Algorithms always make the best decision at each step to make it to an optimal global solution. In this process, they don’t consider revisiting the decisions made. If we don’t have to reconsider or change the decisions made, then one can choose a greedy algorithmic approach.
  2. Optimal Substructure: If we can approach the overall solution by breaking our problem into a set of subproblems where the best decision matters, we can use a greedy algorithm.
  3. Time Efficient: Greedy algorithms are simple and time efficient. They approach the global optimum faster and may not consider the decisions made in the past.

Now, Let’s work on 3 problems that use a greedy algorithmic approach.

Activity Selection Problem — Non-overlapping Intervals

Given an array of intervals intervals where intervals[i] = [starti, endi], return the minimum number of intervals you need to remove to make the rest of the intervals non-overlapping.

Example 1:

Input: intervals = [[1,2],[2,3],[3,4],[1,3]]
Output: 1
Explanation: [1,3] can be removed and the rest of the intervals are non-overlapping.

Example 2:

Input: intervals = [[1,2],[1,2],[1,2]]
Output: 2
Explanation: You need to remove two [1,2] to make the rest of the intervals non-overlapping.

Example 3:

Input: intervals = [[1,2],[2,3]]
Output: 0
Explanation: You don't need to remove any of the intervals since they're already non-overlapping.

Code:

class Solution:
def eraseOverlapIntervals(self, intervals: List[List[int]]) -> int:
intervals.sort(key=lambda x: x[1])
last_end = intervals[0][1]
count = 0
for s, e in intervals[1:]:
if last_end>s:
count += 1
else:
last_end = e
return(count)

The above code follows a greedy algorithm approach. We are sorting the input list to make sure we can follow optimal substructure characteristics — Dividing it into smaller problems. Then, we make sure each list in the nested list does not overlap the former one, and based on it we increment our answer.

Huffman Coding

Huffman Coding is an algorithm for lossless data compression. The goal of Huffman Coding is to represent the data by assigning shorter binary codes to the most frequently occurring characters and longer characters to less frequent characters.

Approach:

  • Calculate the frequency of each character in the data passed.
  • Tree is built in such a way that more frequent symbols have shorter binary codes and less frequent symbols have larger binary codes.
  • Binary codes are assigned to each character. The path from the root of the tree to a particular character uniquely defines its binary code.
  • These Binary Codes assigned to each symbol form the Huffman Codes. The original data was replaced with the Huffman Code.
  • The same Huffman tree is used to decompress the data. Starting from the root, the binary codes are traversed, and the original symbols are reconstructed.

Example — Tree Construction:

Let’s consider the text — ABRACADABRA, Here A Occurs 5 times, B and R Occur twice, and C and D occur once. Huffman tree is constructed based on these frequencies and the tree might look like this —

In this example, ‘A’ might be assigned the code ‘0’, ‘B’ the code ‘110’, ‘R’ the code ‘111’, ‘C’ the code ‘10’, and ‘D’ the code ‘1110’. Huffman coding is widely used in various applications, including file compression (ZIP files), image compression (JPEG), and network protocols. Its efficiency comes from the fact that more frequent symbols have shorter codes, resulting in the overall compression of the data.

Code:

import heapq
from collections import defaultdict

class Node:
def __init__(self, symbol=None, frequency=None):
self.symbol = symbol
self.frequency = frequency
self.left = None
self.right = None

def __lt__(self, other):
return self.frequency < other.frequency

def build_huffman_tree(frequencies):
heap = [Node(symbol=s, frequency=f) for s, f in frequencies.items()]
heapq.heapify(heap)

while len(heap) > 1:
left = heapq.heappop(heap)
right = heapq.heappop(heap)
merged = Node(frequency=left.frequency + right.frequency)
merged.left = left
merged.right = right
heapq.heappush(heap, merged)

return heap[0]

def generate_huffman_codes(node, code="", mapping=None):
if mapping is None:
mapping = {}

if node.symbol:
mapping[node.symbol] = code
if node.left:
generate_huffman_codes(node.left, code + "0", mapping)
if node.right:
generate_huffman_codes(node.right, code + "1", mapping)

return mapping

def huffman_encode(text):
frequencies = defaultdict(int)
for symbol in text:
frequencies[symbol] += 1

root = build_huffman_tree(frequencies)
codes = generate_huffman_codes(root)

encoded_text = "".join(codes[symbol] for symbol in text)

return encoded_text, root

text_to_encode = "ABRACADABRA"
encoded_text, huffman_tree_root = huffman_encode(text_to_encode)

print("Original Text:", text_to_encode)
print("Encoded Text:", encoded_text)

To know more about the collections used in the above code, refer to my previous article —

All about Collections Module in Python

As we all know python has its own regular data type heroes — List, Tuple, Dictionary, and infamous Sets. But along with…

pub.towardsai.net

Dijkstra Algorithm

The Dijkstra Algorithm is used in graph theory to find the shortest path between the nodes of the graph, mainly for the graphs with positive edge weights.

Approach:

  • Initialize a priority queue to keep track of distances from source to nodes. set distance to source as 0 and inf for all other nodes.
  • From the node with the least distance, update distances to other neighbors by calculating the sum of the current distance from the selected nodes to other nodes.
  • After exploring the distances, mark it as visited to stop revisiting the nodes explored. Repeat the process until all the nodes are visited.

Example:

Nodes: A, B, C, D
Edges: (A, B, 4), (A, C, 5), (A, D, 2), (B, C, 1), (B, D, 3), (C, D, 5)

Let’s find the shortest paths from node A to all other nodes using Dijkstra’s algorithm:

  • Start at node A. Set the distance to A as 0 and inf for all other nodes: (A, 0), (B, ∞), (C, ∞), (D, ∞).
  • Initialize the priority queue with these distances.
  • Select A with distance 0. Explore its neighbors: B (weight 4), C (weight 5), D (weight 2).
  • Update distances: (A, 0), (B, 4), (C, 5), (D, 2).
  • Mark A as visited. Repeat until all the nodes are explored.
  • Select D with distance 2. Explore its neighbors: B (weight 3), and C (weight 5).
  • Update the distances: (A, 0), (B, 4), (C, 5), (D, 2) (no change for D). Mark D as visited.
  • Repeat the process for all other nodes.
  • The shortest paths from A to other nodes are:
  • A to B: [A, D, B] (Total distance: 4 + 3 = 7)
  • A to C: [A, D, C] (Total distance: 2 + 5 = 7)
  • A to D: [A, D] (Total distance: 2)

Code:

import heapq
from collections import defaultdict

def dijkstra(graph, start):
distances = {node: float('infinity') for node in graph}
distances[start] = 0

priority_queue = [(0, start)]

while priority_queue:
current_distance, current_node = heapq.heappop(priority_queue)

if current_distance > distances[current_node]:
continue

for neighbor, weight in graph[current_node].items():
distance = current_distance + weight

if distance < distances[neighbor]:
distances[neighbor] = distance
heapq.heappush(priority_queue, (distance, neighbor))

return distances

graph = {
'A': {'B': 4, 'C': 5, 'D': 2},
'B': {'A': 4, 'C': 1, 'D': 3},
'C': {'A': 5, 'B': 1, 'D': 5},
'D': {'A': 2, 'B': 3, 'C': 5}
}

start_node = 'A'
shortest_distances = dijkstra(graph, start_node)

print(f"Shortest distances from {start_node}:")
for node, distance in shortest_distances.items():
print(f"To {node}: {distance}")

You Can explore problems related to Greedy Algorithms in this link.

Note: It is important to select the best approach for a problem. The greedy Algorithm may not always provide the optimal solution. For example, in the coin change problem, what if the denominations are [3,5] and the amount is 9. A greedy algorithmic approach will fail in this case., But there are numerous other solutions where it works. So Choose your approach wisely.

You Can learn more about Data structures and Dynamic Programming using the below articles —

Understanding Backtracking using Python: Beginners Guide

Backtracking is an algorithm used to search for all possible solutions to a problem. In this technique, we find a…

pub.towardsai.net

Enhancing Python Code: Memoization vs Tabulation

Memoization is a specific optimization technique used in computer programming and algorithm design to improve the…

blog.devgenius.io

I hope this article is helpful… Happy Learning….

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓