Local Development With Subdomains on WSL2

I'm writing a webapp that loads content based on the subdomain. Doing this locally is problematic because you can't use a subdomain with http://localhos:8080.

I'm using WSL2 on Windows 10 and can therefore leverage the hosts file, usually found in C:\Windows\System32\drivers\etc\hosts. By adding these lines to it I can create a domain with two subdomains locally and point them both at my WSL instances IP address:

172.19.38.94     test-site.mysite.local
172.19.38.94     no-site.mysite.local

After making any changes to the hosts file it is important to flush the DNS cache by running the following in Powershell:

ipconfig /flushdns

When visiting those URLs in my browser, and by adding the port, I can see my webapp running on WSL.

In javascript I can read the subdomain with the following:

String(window.location.host).split(`.`)[0]

Javascript Railway Board

Using the Transport API I thought it might be fun to build a small web-based board that looked and felt like the electronic display boards seen at British railway stations.

Future plans would be to use a small screen attached to a Raspberry Pi Zero and 3D print a desktop standing sign.

I miss the 1990s

Pinboard bookmark to PDF with NodeJS and Puppeteer

I often bookmark articles, explanations, tutorials, etc. with the plan to one day come back and read them. Sadly, by the time I do, there's usually a few dead links, lost forever.

Recently I found myself frustrated at the sheer number of bookmarks in my browser and decided a bookmarking service was needed. In the past, I have used Delicious and even rolled my own, but I had also previously been a paying customer of Pinboard, and after some searching decided it best suited my needs. Those needs being tags, a browser plugin, and an API.

After reactivating my account I went through the ~100 bookmarks I added back in 2015. Removed any dead links, discovered some forgotten gems, and deleted the no longer relevant. Finally, I imported all ~650 of my Firefox bookmarks and set about sorting those. I hope to keep the tags to a minimum, favouring search over an abundance of tags.

To mitigate the loss of anything I found particularly interesting or important, such as this list of Undocumented DOS Commands, I wanted to be able to tag a link with topdf and have the site automatically printed to a PDF file and stored in Dropbox, or Google Drive.

The first port of call was IFTTT (If This Then That) where I set up a pathway from my Pinboard account to a service called PDFmyURL. This worked great, but it turned out they don't like free users printing to PDF using automation, and I wasn't about to pay $19 a month for the privilege.

But I'm a professional software engineer! It shouldn't be too hard to roll my own solution...

Looking up the Pinboard API I found an endpoint that would return the most recent pins (that's what they call bookmarks) for a given tag, in my case topdf:

https://api.pinboard.in/v1/posts/recent

I had already used Puppeteer in a solution for a client looking to generate and print labels from a web interface, so I knew it could handle converting a URL to a full PDF document. Puppeteer is a Node library which settled the question of which language to use.

Turns out there's a neat library on NPM for Pinboard called node-pinboard. I also chose to use slugify to convert the page titles to clean filenames, and something called scroll-to-bottomjs, which I'll get to later:

npm i --save puppeteer node-pinboard slugify scroll-to-bottomjs

Since Puppeteer is a pretty heavyweight tool, featuring a headless but otherwise fully functioning Chrome browser and V8 Javascript runtime, I had to install some dependencies to get things working:

sudo apt install ca-certificates fonts-liberation gconf-service libappindicator1 libasound2 libatk-bridge2.0-0 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgbm1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libnss3 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 lsb-release wget xdg-utils

Getting the bookmarks

The first thing to do is grab our bookmarks:

const Pinboard = require(`node-pinboard`).default;

const pinboard  = new Pinboard(<YOUR_PINBOARD_API_TOKEN>);

function getFromPinboard()
  pinboard.recent({ tag : `topdf` })
    .then((data) => {
      if(data.posts.length === 0){ return process.exit(0); }  // If nothing, exit
      const promises = [];
      data.posts.forEach((p) => {
        promises.push(generatePDF(p.href, p.description));
      });
      if(promises.length === 0){ process.exit(0); }  // If nothing, exit
      return Promise.all(promises);
    })
    .then((result) => {

    })
    .catch((error) => {

    });
}

The above code grabs all the recent pins tagged topdf. Looping through each one we send both it's description (the title of the pin which I may have altered from the default page title) and the href, or URL, to our generatePDF function. By pushing these async functions on to an array and using Promise.all we can run them all in parallel, eventually getting back an array of stored filenames.

Generating PDFs

const Puppeteer      = require(`puppeteer`);
const ScrollToBottom = require(`scroll-to-bottomjs`);
const Slugify        = require(`slugify`);

async function generatePDF(url, title) {
  try {
    const browser = await Puppeteer.launch({ headless : true, args : ['--no-sandbox'] }); // Puppeteer can only generate pdf in headless mode.
    const page = await browser.newPage();
    await page.goto(url, { waitUntil : 'networkidle2', timeout : 0 }); // timeout : 0 disables the network timeout
    const filename = `${Slugify(title)}.pdf`.toLowerCase();
    const pdfConfig = {
      path            : `/tmp/${filename}`, 
      format          : 'A4',
      printBackground : true
    };
    await page.emulateMedia('screen');
    await page.evaluate(ScrollToBottom);
    await page.pdf(pdfConfig);
    await browser.close();
    return filename;
  } catch(e) {
    console.error(e);
    return null;
  }
}

The above function creates an instance of Puppeteer, loads the URL, uses Slugify to create a clean filename then creates a PDF before returning the filename. The line await page.evaluate(ScrollToBottom); scrolls the page to the bottom before rendering the PDF, this ensures lazy-loaded images are rendered. I've chosen to store the PDF files in /tmp to ensure they will be deleted at some point. I did end up creating a CRON job to clear any .pdf files once a week.

Preventing duplicates

Running this daily means that pins I processed yesterday could be processed again today. By deleting the local PDF files periodically I need a different way to ensure I don't have duplicates.

The Pinboard API returns a fair amount of information on each pin:

{
  href: 'https://developer.nvidia.com/video-encode-decode-gpu-support-matrix',
  description: 'Video Encode and Decode GPU Support Matrix | NVIDIA Developer',
  extended: '',
  meta: '39ca3d2e5174865610eff3f2f1b83970',
  hash: 'c31630a3d99256a99a3718f10e7ecf37',
  time: '2020-05-31T11:23:43Z',
  shared: 'no',
  toread: 'no',
  tags: 'ffmpeg encoding'
},

Most notable here is hash. Hashes are unique fingerprints, or signatures so no two different URLs will give the same hash. Each time the program runs the previous days hashes are loaded as an array from a JSON file. The new pins pulled from the API are checked against this array and anything not processed has a PDF generated an uploaded, anything that is in the array is added to a new array, we'll call this new_hashes.

If a URL gets all the way through and is successfully uploaded, it too is added to new_hashes, and this is then converted to JSON and overwrites the previous day's hashes. If a URL is not successful, that is it fails at either the PDF or upload stages, it will not be added to the new_hashes array and so will be processed again on the subsequent day.

Uploading

As an early adopter I managed to snag a free 50gb account on box.com which I've never really found a use for, so this was ideal for me. I'll go into uploading the files to there in a different post as this won't apply to everyone, but you can find tutorials elsewhere online for uploading to Dropbox, OneDrive, or Google Drive, S3, etc.

JWTs and refresh tokens

A few notes on JWTs and refresh tokens whilst it is fresh in my mind. I am likely to forget in the months between touching the relevant systems.

JWTs

JWTs (JSON Web Tokens) carry information necessary to access a resource directly. They are short-lived. They can be verified without needing to be checked against a database (for example).

Refresh Tokens

Refresh tokens carry information to get new access tokens. They are usually just large strings of random characters.

How does it work?

  1. The client sends credentials (email, password) to the server.
  2. If valid, the server creates a JWT and a refresh token and sends them back to the client, along with the JWT expiry time.
  3. The client stores the JWT in memory and the refresh token in an HttpOnly cookie.
  4. Before the JWT expires, the refresh token is sent to the server to obtain a new JWT. The refresh token could also be regenerated at this time.
  5. If at any point we do not have a JWT in memory we can use the refresh token to obtain a new one.

Should the JWT expire without a new one being generated the server we have two options. I am not sure which I prefer.

  1. Have the server use the refresh token to obtain a new JWT and send that back with whatever payload was expected. The client would check every response for a new JWT and store it.
  2. Send back a 401 with the message expired_token, the client requests a new JWT using the refresh token and retries the request.

Of course, if at any point the refresh token is refused, the user is logged out or refused access to the resource.

Best Practises