Pinboard bookmark to PDF with NodeJS and Puppeteer
I often bookmark articles, explanations, tutorials, etc. with the plan to one day come back and read them. Sadly, by the time I do, there’s usually a few dead links, lost forever.
Recently I found myself frustrated at the sheer number of bookmarks in my browser and decided a bookmarking service was needed. In the past, I have used Delicious and even rolled my own, but I had also previously been a paying customer of Pinboard, and after some searching decided it best suited my needs. Those needs being tags, a browser plugin, and an API.
After reactivating my account I went through the ~100 bookmarks I added back in 2015. Removed any dead links, discovered some forgotten gems, and deleted the no longer relevant. Finally, I imported all ~650 of my Firefox bookmarks and set about sorting those. I hope to keep the tags to a minimum, favouring search over an abundance of tags.
To mitigate the loss of anything I found particularly interesting or important, such as this list of Undocumented DOS Commands, I wanted to be able to tag a link with
topdf and have the site automatically printed to a PDF file and stored in Dropbox, or Google Drive.
The first port of call was IFTTT (If This Then That) where I set up a pathway from my Pinboard account to a service called PDFmyURL. This worked great, but it turned out they don’t like free users printing to PDF using automation, and I wasn’t about to pay $19 a month for the privilege.
But I’m a professional software engineer! It shouldn’t be too hard to roll my own solution…
Looking up the Pinboard API I found an endpoint that would return the most recent pins (that’s what they call bookmarks) for a given tag, in my case
I had already used Puppeteer in a solution for a client looking to generate and print labels from a web interface, so I knew it could handle converting a URL to a full PDF document. Puppeteer is a Node library which settled the question of which language to use.
Turns out there’s a neat library on NPM for Pinboard called
node-pinboard. I also chose to use
slugify to convert the page titles to clean filenames, and something called
scroll-to-bottomjs, which I’ll get to later:
Getting the bookmarks
The first thing to do is grab our bookmarks:
The above code grabs all the recent pins tagged
topdf. Looping through each one we send both it’s description (the title of the pin which I may have altered from the default page title) and the href, or URL, to our
generatePDF function. By pushing these async functions on to an array and using
Promise.all we can run them all in parallel, eventually getting back an array of stored filenames.
The above function creates an instance of Puppeteer, loads the URL, uses
Slugify to create a clean filename then creates a PDF before returning the filename. The line
await page.evaluate(ScrollToBottom); scrolls the page to the bottom before rendering the PDF, this ensures lazy-loaded images are rendered. I’ve chosen to store the PDF files in
/tmp to ensure they will be deleted at some point. I did end up creating a CRON job to clear any
Running this daily means that pins I processed yesterday could be processed again today. By deleting the local PDF files periodically I need a different way to ensure I don’t have duplicates.
The Pinboard API returns a fair amount of information on each pin:
Most notable here is
hash. Hashes are unique fingerprints, or signatures so no two different URLs will give the same hash. Each time the program runs the previous days hashes are loaded as an array from a JSON file. The new pins pulled from the API are checked against this array and anything not processed has a PDF generated an uploaded, anything that is in the array is added to a new array, we’ll call this
If a URL gets all the way through and is successfully uploaded, it too is added to
new_hashes, and this is then converted to JSON and overwrites the previous day’s hashes. If a URL is not successful, that is it fails at either the PDF or upload stages, it will not be added to the
new_hashes array and so will be processed again on the subsequent day.
As an early adopter I managed to snag a free 50gb account on box.com which I’ve never really found a use for, so this was ideal for me. I’ll go into uploading the files to there in a different post as this won’t apply to everyone, but you can find tutorials elsewhere online for uploading to Dropbox, OneDrive, or Google Drive, S3, etc.