Screaming Frog and large sites

Screaming Frog

Screaming frog is the essential app in any serious SEOs toolkit. Essentially it is a tool that allows you to crawl both small and large sites as if you were google/bing/etc and analyze things such as meta description length, 404s, H1 tags, etc.

What screaming frog is not is a tool that tells what to fix on your page, this is something you will need to figure out yourself.

Screaming frog itself doesn’t really have a limit of how large sites you crawl, at least I haven’t run in to a site yet that was too large to crawl if you turn on database mode.

You can run Screaming frog in either database mode, which is that it saves the crawl on your hardrive, or memory storage in which case it would be stored in your RAM.

Although Screaming frog itself has no issue working with large sites, you might run in to trouble if you export your data to csv in order to share it with e.g. your client.

For example Excel becomes unusable when you are working with sheets with more than 20,000 rows, and Google Sheets doesn’t allow imports larger than ~20MB.

Splitting your CSV

The first step always is to exclude as many URLs as possible in Screaming frog, but if you still after having excluded everything have tens of thousands of relevant URLs you need to export to a csv you will need to split that csv if you want to work with it.

If you are using OS X splitting the files in to more manageable size is actually super easy. In the bellow example the file I wish to split is on my OS X desktop in a folder called split and named all_inlinks.csv and is 59MB in size.

Use your spotlight search function and search for Terminal.

Go to desktop

cd ~/Desktop/split

Query the number of rows in the file

cat all_inlinks.csv | wc -l

Once you know the number of rows, divide that number with as much as you need to get the file size down to your target size.

Although google allows uploads up to 20MB, in my experience for some reason files close, but not over, to the limit don't work to upload. So, if you plan on uploading to google sheets get the file size down to ~17MB.

To split the file to smaller files all with e.g. 25,000 rows enter in Terminal

split -l 250000 all_inlinks.csv

The resulting files do not have the .csv file extension, you can add it either manually or by entering the following

for i in *; do mv "$i" "$i.csv"; done