Integrating Crowd Sourced Cannabis from Tumblr

How do you get pictures of over 9,000 cannabis strains? You could start a very long and expensive journey of buying and documenting thousands of products. Or you can get the community pitch in their pot (pics).

When it came time to fill up the Kushy database with a myriad of dank images, we looked to WeedPornDaily, an online cannabis publication and community. Users of WPD can submit photos to the website through a Tumblr submission form, or are curated by the staff from known stoney sources. With over 7 years of pot photos stockpiled and organized neatly by tag, we were able to effortlessly search through the archive and fill our database.

The Code

This was a pretty simple task. We take our database of over 9,000 cannabis strains and run it against the public Tumblr XML. Each strain name would be checked as a Tumblr tag, and if we got photos, we'd input it into an image database. The images would be associated with an item_id and item_type, where type would be strains and ID would be the Strain ID. And we'd store the image caption, would contains additional credit for the image source.

With PHP, this is accomplished fairly crudely using a combination of cURL and non-PDO SQL interactions:

We also create a log (error-strains.txt) of every strain that isn't represented on Tumblr, so we can double check and have a better understanding of where the holes are in the data.

The Result

After about 30 minutes of processing, we pulled over 3,000 photos for our 9,000+ strains.

And that's only pulling the first page of images for each strain that was picked up. Many strains have hundreds of pages of images, like O.G. Kush, which would give us easily over 10k images alone. We'll save that scrape for another day.

Thank you again to WeedPornDaily and it's community for providing the immense wealth of well sorted cannabis for us to parse and import into the Kushy API

Oscar


Keep Reading: