Welcome – Lewis Walsh, Freelance software engineer from Essex, United Kingdom. Posting mainly about software development, system administration, electronics, and photography.

New Raspberry Pi Server

When I upgrade my desktop PC I tradtionally retire my old server and use the replaced PC in its place. Also taking the opportunity for a clean install of Ubuntu server. I recently realised that the machine currently serving that role is overkill for what I need, namely:

File server for me alone
Transmission for downloading
Anonymous VPN
Pi Hole

A large desktop machine with 6GB RAM and a big PSU is unnecessary.

I switched to a Raspberry Pi 4 with 4GB RAM, got a little cooling fan for it and set up a simple Ubuntu server. The Pi4 has gigabit ethernet and USB3 support. For files I purchased a 4GB WD Passport drive that can be powered from the Pi's USB3 port. I only have about 2GB of data right now, so room to grow.

Backing this up to two 2GB external drives became tiresome and wasn't happening as often as it should. I purchased a second 4GB Passport drive planning to rsync nightly rather than use software RAID. Unfortuntely there just wasn't enough power from the Pi4 to support two drives.

I realised this was a blessing in disguise. Installing Ubuntu server on a Raspberry Pi and plugging the second 4GB drive in to that would give me the nightly rsync backup I need. The 100mb speed of the Pi3's NIC and USB2 would be slow in comparison, but more than enough for a nightly backup run. Not to mention the peace of mind in knowing if the Pi4 went up in smoke it could take all my data and I'd still have a working backup.

This also meant I could spread the resource load. I moved the PiHole over the Pi3 as it's only DNS traffic. I'm also using it to serve a few old, separately-powered external drives over the network. I can use the Pi3 to try things out and easily rebuild in case I screw up without taking my main fileserver offline.

I had a spare five-port gigabit network switch that I dedicated to use with this little server array, keeping it all in a plastic box.

While technically this met my needs of emulating my old server's abilities, as well as cutting power consumption, it was inelegant. Three power-bricks, and two long USB cables to bridge a gap mere inches in length.

I checked the power-consumption of the switch, which turned out to be 5v 0.7a. The Pi4 needs a good 3a, and the Pi3 2.5a, both 5v. That's about 6.2 amps for the whole thing.

I hopped on one of those Chinese online stores and ordered a 12v 8a power brick, a short micro-usb cable to power the pi3, and a short USB-C cable to power the pi4.

To split the 12v and get it to three 5v devices I wired a connector block in parallel to three 12v-5v buck converters.

With two 30cm cat5e cables I now have a single 12v power brick and a single network cable running the whole thing. I also have two spare network ports, and about 1.8 amps of power going begging.

Maybe I'll put a Raspberry Pi Zero (or similar SBC), or maybe an ESP32 in there and set up some LoRa.

In hindsight, I could've used a 5v power supply (with enough current), but I hadn't finalised the setup and wasn't sure everything I might want to power would be 5v. As it stands, I'm thinking of hooking up an old 12v case fan to move some air around everything.

Local Development With Subdomains on WSL2

I'm writing a webapp that loads content based on the subdomain. Doing this locally is problematic because you can't use a subdomain with http://localhos:8080.

I'm using WSL2 on Windows 10 and can therefore leverage the hosts file, usually found in C:\Windows\System32\drivers\etc\hosts. By adding these lines to it I can create a domain with two subdomains locally and point them both at my WSL instances IP address:

172.19.38.94     test-site.mysite.local
172.19.38.94     no-site.mysite.local

After making any changes to the hosts file it is important to flush the DNS cache by running the following in Powershell:

ipconfig /flushdns

When visiting those URLs in my browser, and by adding the port, I can see my webapp running on WSL.

In javascript I can read the subdomain with the following:

String(window.location.host).split(`.`)[0]

Javascript Railway Board

Using the Transport API I thought it might be fun to build a small web-based board that looked and felt like the electronic display boards seen at British railway stations.

Future plans would be to use a small screen attached to a Raspberry Pi Zero and 3D print a desktop standing sign.

I miss the 1990s

Pinboard bookmark to PDF with NodeJS and Puppeteer

I often bookmark articles, explanations, tutorials, etc. with the plan to one day come back and read them. Sadly, by the time I do, there's usually a few dead links, lost forever.

Recently I found myself frustrated at the sheer number of bookmarks in my browser and decided a bookmarking service was needed. In the past, I have used Delicious and even rolled my own, but I had also previously been a paying customer of Pinboard, and after some searching decided it best suited my needs. Those needs being tags, a browser plugin, and an API.

After reactivating my account I went through the ~100 bookmarks I added back in 2015. Removed any dead links, discovered some forgotten gems, and deleted the no longer relevant. Finally, I imported all ~650 of my Firefox bookmarks and set about sorting those. I hope to keep the tags to a minimum, favouring search over an abundance of tags.

To mitigate the loss of anything I found particularly interesting or important, such as this list of Undocumented DOS Commands, I wanted to be able to tag a link with topdf and have the site automatically printed to a PDF file and stored in Dropbox, or Google Drive.

The first port of call was IFTTT (If This Then That) where I set up a pathway from my Pinboard account to a service called PDFmyURL. This worked great, but it turned out they don't like free users printing to PDF using automation, and I wasn't about to pay $19 a month for the privilege.

But I'm a professional software engineer! It shouldn't be too hard to roll my own solution...

Looking up the Pinboard API I found an endpoint that would return the most recent pins (that's what they call bookmarks) for a given tag, in my case topdf:

https://api.pinboard.in/v1/posts/recent

I had already used Puppeteer in a solution for a client looking to generate and print labels from a web interface, so I knew it could handle converting a URL to a full PDF document. Puppeteer is a Node library which settled the question of which language to use.

Turns out there's a neat library on NPM for Pinboard called node-pinboard. I also chose to use slugify to convert the page titles to clean filenames, and something called scroll-to-bottomjs, which I'll get to later:

npm i --save puppeteer node-pinboard slugify scroll-to-bottomjs

Since Puppeteer is a pretty heavyweight tool, featuring a headless but otherwise fully functioning Chrome browser and V8 Javascript runtime, I had to install some dependencies to get things working:

sudo apt install ca-certificates fonts-liberation gconf-service libappindicator1 libasound2 libatk-bridge2.0-0 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgbm1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libnss3 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 lsb-release wget xdg-utils

Getting the bookmarks

The first thing to do is grab our bookmarks:

const Pinboard = require(`node-pinboard`).default;

const pinboard  = new Pinboard(<YOUR_PINBOARD_API_TOKEN>);

function getFromPinboard()
  pinboard.recent({ tag : `topdf` })
    .then((data) => {
      if(data.posts.length === 0){ return process.exit(0); }  // If nothing, exit
      const promises = [];
      data.posts.forEach((p) => {
        promises.push(generatePDF(p.href, p.description));
      });
      if(promises.length === 0){ process.exit(0); }  // If nothing, exit
      return Promise.all(promises);
    })
    .then((result) => {

    })
    .catch((error) => {

    });
}

The above code grabs all the recent pins tagged topdf. Looping through each one we send both it's description (the title of the pin which I may have altered from the default page title) and the href, or URL, to our generatePDF function. By pushing these async functions on to an array and using Promise.all we can run them all in parallel, eventually getting back an array of stored filenames.

Generating PDFs

const Puppeteer      = require(`puppeteer`);
const ScrollToBottom = require(`scroll-to-bottomjs`);
const Slugify        = require(`slugify`);

async function generatePDF(url, title) {
  try {
    const browser = await Puppeteer.launch({ headless : true, args : ['--no-sandbox'] }); // Puppeteer can only generate pdf in headless mode.
    const page = await browser.newPage();
    await page.goto(url, { waitUntil : 'networkidle2', timeout : 0 }); // timeout : 0 disables the network timeout
    const filename = `${Slugify(title)}.pdf`.toLowerCase();
    const pdfConfig = {
      path            : `/tmp/${filename}`, 
      format          : 'A4',
      printBackground : true
    };
    await page.emulateMedia('screen');
    await page.evaluate(ScrollToBottom);
    await page.pdf(pdfConfig);
    await browser.close();
    return filename;
  } catch(e) {
    console.error(e);
    return null;
  }
}

The above function creates an instance of Puppeteer, loads the URL, uses Slugify to create a clean filename then creates a PDF before returning the filename. The line await page.evaluate(ScrollToBottom); scrolls the page to the bottom before rendering the PDF, this ensures lazy-loaded images are rendered. I've chosen to store the PDF files in /tmp to ensure they will be deleted at some point. I did end up creating a CRON job to clear any .pdf files once a week.

Preventing duplicates

Running this daily means that pins I processed yesterday could be processed again today. By deleting the local PDF files periodically I need a different way to ensure I don't have duplicates.

The Pinboard API returns a fair amount of information on each pin:

{
  href: 'https://developer.nvidia.com/video-encode-decode-gpu-support-matrix',
  description: 'Video Encode and Decode GPU Support Matrix | NVIDIA Developer',
  extended: '',
  meta: '39ca3d2e5174865610eff3f2f1b83970',
  hash: 'c31630a3d99256a99a3718f10e7ecf37',
  time: '2020-05-31T11:23:43Z',
  shared: 'no',
  toread: 'no',
  tags: 'ffmpeg encoding'
},

Most notable here is hash. Hashes are unique fingerprints, or signatures so no two different URLs will give the same hash. Each time the program runs the previous days hashes are loaded as an array from a JSON file. The new pins pulled from the API are checked against this array and anything not processed has a PDF generated an uploaded, anything that is in the array is added to a new array, we'll call this new_hashes.

If a URL gets all the way through and is successfully uploaded, it too is added to new_hashes, and this is then converted to JSON and overwrites the previous day's hashes. If a URL is not successful, that is it fails at either the PDF or upload stages, it will not be added to the new_hashes array and so will be processed again on the subsequent day.

Uploading

As an early adopter I managed to snag a free 50gb account on box.com which I've never really found a use for, so this was ideal for me. I'll go into uploading the files to there in a different post as this won't apply to everyone, but you can find tutorials elsewhere online for uploading to Dropbox, OneDrive, or Google Drive, S3, etc.