Free SERP-Based Keyword Clustering Tool for Large Keyword Datasets

SERP-Based Keyword Clustering with MinHash and LSH

To meet my specific needs, I developed a keyword clustering tool based on SERP similarity using MinHash and LSH (Locality-Sensitive Hashing). While vectorization isn't always necessary for URLs, representing a SERP as a set opens the door to various mathematical techniques. I initially created a clusterer based on the Jaccard index, but MinHash and LSH offer distinct advantages in terms of efficiency and scalability.

Why MinHash and LSH?

MinHash and LSH provide a powerful method for clustering keywords based on SERP similarity. These techniques offer several advantages over traditional methods like the Jaccard index:

MinHash and LSH vs. Jaccard Index

Efficiency and Scalability

Approximate Similarity

High Performance with Extensive Datasets

Resource Efficiency

Key features

Download the tool here: SERP-Based Keyword Clustering Tool (Python)

Project on Github: Github Repository

Instructions

Important! This tool only clusters keywords; it does not collect search engine results. For fetching search results, you can use other services such as A-Parser.

Setup Instructions

1. Install Required Libraries

pip install pandas
pip install tqdm
pip install datasketch

2. Command-Line Usage Instructions

Required Arguments:

Optional Arguments:

Example Command in Terminal:

python minhash-cluster-cli.py for-clustering.csv clustered_keywords.csv -s ';' -k 'keyword' -u 'url' -t 0.6

Output File