Free SERP-Based Keyword Clustering Tool
For my own needs, I developed a keyword clusterer based on SERP similarity. I realized that vectorization isn't necessarily required for URLs. Essentially, a SERP is a set, and for sets, different formulas can be used. Thus, I created a clusterer based on the Jaccard index.
SERP-based Jaccard similarity measures the similarity between two sets of search engine results by comparing the common URLs to the total unique URLs. It is calculated as the intersection divided by the union of the URL sets.
I also tried other metrics like Dice or Overlap, but they performed worse. By leveraging the Jaccard Coefficient, this tool effectively groups keywords based on SERP similarity, ensuring robust and scalable clustering for SEO purposes.
Key features
- Platform Independence: No OS constraints; works in any environment.
- Accuracy: Exact similarity.
- High Performance: Handles extensive datasets without hanging.
- Language Agnostic: Supports clustering in any language.
- Efficient Resource Use: Low on computational resources.
SERP-Based Keyword Clustering Tool
Why did I create this clustering tool?
Here's what motivated me to create it:
- Platform Limitations: Most programs are designed for Windows.
- Service Constraints: Many SAAS services offer clustering only as an additional feature.
- Inaccurate: Other algorithms, like Minhash, use approximate similarity..
- Performance Issues: Existing tools struggle with over 100k keywords; I need to handle large volumes.
- ChatGPT Limitations: Offers clustering but fails with large datasets.
- Language Support: I needed support for any language.
Instructions
Important! This tool only clusters keywords; it does not collect search engine results. For fetching search results, you can use other services such as Ahrefs, selecting the option to include the top 10 positions from SERPs for each keyword.
Save a copy of Colab file to your Google Drive via the File menu to avoid the Google Colab warning each time.
How to Use
- Start the Process: Click the play button (▷) at the bottom.
- Prepare Your CSV: Upload a CSV file containing at least two columns: Keywords and URL.
- Input Column Delimiter: By default, this is a comma.
- Enter Column Labels: Specify the column labels for keywords and URLs (case-sensitive).
- Set Similarity Threshold: Enter a similarity threshold; 0.6 is recommended.
- Run: Click the Run button to start the clustering process.
- Save Results: After completion, click Save to download the results to your Downloads folder.
Output File
- Group Column: Each group in the Group column is numbered starting from 0.
- Keyword Clustering: Keywords grouped together will have the same group number.
- Unclustered Keywords: If a keyword has no common groups with others, it will be in a separate group.
- URLs Not Collected: If keywords have no associated URLs, they will all be grouped as -1.
- Browser Compatibility: This code does not work correctly in Safari. Please use Chrome.