Solving Internal Problems - API Documentation

This is case #2 in coming up with solutions to internal problems to become a more valuable developer.

Case #2: API Documentation

At my current job, we are contracted to develop an internal frontend application for a fortune 500 company. The employees of this company must interact with several applications to do their jobs. The application I lead the frontend development for aims to alleviate that; to have one place for the employees to go to do their work. The application will then communicate with those different applications behind the scenes via api requests. The backend is composed of several different expressjs applications that are all brought together via a webserver. We, the frontend team, just need to interact with a single http port to make our requests against.

Example:

  • https://<current-domain>:8080/api/foo -> get's proxied to the "foo" service
  • https://<current-domain>:8080/api/bar -> get's proxied to the "bar" service

Each of these api services are developed by different teams within our client company.

Emulation

Developing within this company's network requires VPN access and that comes with a whole host of challenges. Instead, we've set up a mock expressjs server to develop against locally. This allows us to not have any dependency on their backend development environment. All we need is api documentation. We don't need to wait for these apis to actually be developed and deployed. We can just take the api documentation they've provided and create mock endpoints and responses within our express app to develop against. This is really nice because we have complete control over everything and can develop against various responses for any given endpoint(think error responses).

Receiving api documentation

We were/are being given "api documentation" in pretty much any and all formats:

  • emails
  • .txt documents
  • Word documents
  • power points
  • slack messages

Because the backend consists of several teams maintaining their own api services, there is no "standard" way of giving us api documentations. We have attempted to get this client to provide api documentation in a specific format but because we're just a vendor, we really have no control over them. We just take what we're given and put it in our express app. Because of this, and the fact that they do not maintain any sort of central api documentation, our mock express app has become the only central "api documentation" in existence for their api services.

The problems

At the time, our mock express app contained ~700 unique endpoints. There were a few problems I noticed:

Discarding request payloads

When adding mock POST/PUT endpoints, we were just making them repond with the expected success response, totally disregarding the request payloads.

app.post('/thing-service/thing', (req, res) => {
  return res.json({message: "Created thing successfully!"})
})

After we developed the frontend to send the expected payload based on the api docs we were sent, that request documentation was essentially gone. We'd have to search for when and where those api docs were sent if we ever needed to view them again.

The next problem was that the only "documentation" we had was our express app code which was very hard to search just due to the size it had grown and the inability to exclude search terms from the hardcode responses. We didn't really have a good way to search on urls. If we searched for "thing", that could show up in url patterns or, more commonly, in the hardcoded respones making it really difficult to find anything.

Clumsy and tedious

If we wanted to test various respones from a specific endpoint we'd have to go find it and then hardcode in a new response. This is a very manual and tedious task. Often we would end up losing the original response, which was still valid, because we'd overwrite it to test a new response.

The Solution

I wanted a way to:

  • keep various potential responses
  • keep request POST/PUT payloads
  • easily send/receive api documentation
  • easily explore all ~700 endpoints
  • keep our mock express app

Route files

That's when I took inspiration from Swagger(which I find very confusing to get started).

My first thought was to identify the most relevant pieces of data for any given endpoint:

  • url pattern
  • http method
  • request payloads - if applicable
  • query params - if applicable
  • responses
    • success responses
    • error responses

Luckily this is all stuff that can be encapsulated in a json file, which I know our backend api counterparts are comfortable with(as opposed to something like yaml).

{
  "title": "Create thing for username",
  "path": "/thing-service/:username/thing",
  "params": {
    "username": {
      "value": "johndoe",
      "help": "The owner's username"
    }
  },
  "method": "POST",
  "payload": {
    "thing": {
      "name": "Foobar",
      "color": "Blue"
    }
  },
  "responses": [
    {
      "status": 200,
      "response": {
        "message": "Thing created successfully!"
      }
    },
    {
      "status": 400,
      "response": {
        "errors": [
          {
            "detail": "Invalid color submitted"
          }
        ]
      }
    }
  ]
}

I could define several of these route files and then write code in my express app to:

  • glob a certain directory for these route files
  • parse them into javascript objects
  • iterate over them and create functioning endpoints for each
import glob from 'glob'
import express from 'express'

const app = express()

app.get('/foo/bar', (req, res) => {
  return res.json({foo: "bar"})
})

// all the other ~700 routes

const routes = glob
  .sync('api-routes/**/*.json')
  .map(require)

for (const route of routes) {
  app[route.method](route.path, (req, res) => {
    const firstResponse = route.responses[0]
    return res.json(firstResponse)
  })
}

const PORT = process.env.PORT || 8080
app.listen(PORT, function () {
  console.log('Dev Express server running at localhost:' + PORT)
})

This was great! I instantly solved several problems. I communicated to my backend counterparts that I would strongly prefer them to document their apis in these json route files. All I had to do was get the routes files and dump them in my api-routes directory and they'd instantly be avaible to develop against!

UI for the route files

I still wanted to solve searching and make reading these route files easier though. That's when I decided this documentation needed a user interface. The initial idea was pretty simple:

  • add an api endpoint to the mock server that
    • get's a list of existing endpoints on the express app instance(method, and url pattern)
    • get all routes from files
    • combine those to come up with a single list of known routes with as much information as possible
    • respond with that list of routes
  • create a react app to consume that endpoint and render documentation for the routes
  • serve that react app on a mock server endpoint

This whole thing seemed like it would shedding light on my existing express app so it seemed appropriate to call it "headlamp". Link to repo here. It started to take shape. Because I had a complete list of all the routes defined in the frontend, it was trivial to add searching by url pattern and http method.

image

Clicking on an endpoint would expand to show everything known about the endpoint...along with a few other added features...

image

Some features I added:

  • ability to actually submit the request and see the response in the UI and in the browser's network tab
  • ability to toggle different responses directly from the ui
    • we can now toggle the error response on and test that in our application UI
  • ability to submit and activate adhoc json responses
    • makes it incredibly easy to debug things if we know a certain production response is causing an issue
  • ability to search url patterns by omitting the route param names if you can't remember those
    • searching "/thing-service/:/thing" would find /thing-service/:username/thing
  • attempts to find where this endpoint is being used in the source code (see "References" section in the screenshot above)
  • shows the location of the route file if this endpoint was created from one
  • responds with headers to show
    • what route file is responsible for this request
    • a link to where you can view this request's UI documentation

image (Disregard the fact that the link points to a different port than what's in the browser url. This is just a result of running headlamp in development mode to hack on it. It points to the same port when used normally.)

  • ability to use .js route files and create dynamic responses. See third code snippet here.
  • ability to expand the references to view the acual source code of where this endpoint is used

image

  • ability to upload HAR files if you need to emulate a specific sequence of responses actually encountered in production

Result

This case of scratching our own itch has been immensely helpful and made our development experience so much nicer and more productive than interacting with an express app manually.

Our client now asks us how they can run this to view their own api's documentation.

Is the headlamp code the best, most organized, test-driven developed code out there? Absolutely not, nor does it need to be. This was an internal tool developed to make our own lives easier. It is serving its purpose extremely well for us. I haven't needed to touch it for almost 2 years now and we've used it every day since its inception.

Looking for ways to keep your work interesting while at the same time improving your productivity? Take the time to assess your current development challenges. View them as opportunities to come up with effective solutions. I thoroughly believe this is a solid approach to proving your worth and advancing your career.

Fun fact: At the time of writing this our mock express app now has 1298 unique endpoints.

Link to headlamp

Career advice and server monitoring

I came across a tweet the other day that had some career advice that struck a chord with me. Unfortunately, I can't remember who wrote it so I can't find it. It listed a few bullet points to help in advancing your career. One of the bullet points was something along the lines of:

Look for places to work that give you enough agency to actually make changes.

This alone has been what has given me the most satisfaction at work. It allows for me to solve new problems, make a tangible impact, and often explore new technologies. Not only is this immensely fulfilling to me but it's also what I think has made me a valuable part of my team.

I've always considered myself to be entrpreneurial and although I'm still waiting on my big break with my side hustles, this has often helped scratch that itch. Lots of businesses are born out of solving pain points for people and this is very much like that.

In a series of posts, I'd like to share some of what I've done that has solved problems where I work. This will be an outlet to list some accomplishments I can look back on but also hopefully as inspiration and source of ideas for you to impact your work environment.

Case #1: Server Monitoring

This is the story of how and why I wrote a little Go application.

At work, we've been slowly transitioning more of our application servers from Windows to Linux boxes. As part of this migration, we've needed to find new tools to help manage and monitor these servers.

I helped setup and now, maintain, some of our system monitoring tools for this initiative. We use Grafana in conjunction with Prometheus and Prometheus's node exporter to monitor server metrics

  • CPU usage
  • RAM usage
  • Disk space
  • etc

We use Alertmanager to send us Slack messages whenever problems arise.

Wait, what happened?

This has worked really well...while we're at our desks. If an alert occurred, I could ssh into the offending server and run a series of commands based on the alert type to figure out what was going on. However, sometimes we'd get alerts in the middle of the night but would resolve themselves before we were able to manually check it out. This was both worrisome and frustrating. We couldn't catch the problem.

After encountering various alerts over time, I came to find a bit of a routine when discovering what was going on. If the server was encountering a CPU or RAM spike, I would log in and use ps to view the currently running processes to find the culprit(s). If it was a disk space alert, I knew which directories on certain servers were likely to be the cause so I would cd to those and use the du command to see the sizes of those. This was all a manual process that I had to perform as soon as we got the alert. I wanted to automate this.

Choosing the tech

Prometheus's node exporter works by being an app that runs on every server you're monitoring. It exposes metrics, via an http endpoint, for your Prometheus instance to scrape. This seemed like a good model to use for my purposes. I wanted to be able to hit a port over http and have that respond with the output of those commands I run based on the alert type so now I just needed to settle on a language. We have several linux servers that take on a wide variety of roles: nginx proxy, api server, static asset server, docker host, etc. I couldn't count on these various servers having any one language installed which meant I either had to:

  • pick a language and install that on all of our servers
  • pick a language that could be compiled to a single executable

Because I mostly do frontend React work these days I was excited to finally find a usecase for Deno, a javascript runtime that compiles to a single executable file. I ran into two issues with Deno though:

Deno wasn't going to work for me so I settled on Go, which I had messed around with a couple years back and seemed perfect for my usecase. It compiles to a single executable, works on Centos, and the executables are relatively small. What I ended developing compiled to a 7 MB executable.

Architecture

I wanted to write an http application which accepts an alert name via query parameter, executes a predetermined set of commands based on that alert name, and outputs the result of those commands.

http://myserver.acme.com:5050/report?alertName=RAM Used

I wanted the commands to be configurable per server. Based on the server, I would want to look at different places for a Disk Space alert. For a database server, I would want to check where that data is being stored. For a box serving as an nginx proxy, I would want to check the logs directory. These directories aren't all applicable for every server. For this, I made the Go application accept a cli argument that would be the path to a json configuration file. It would contain the list of commands to run for each alert name. I made sure to make the code read this file on every http request rather than read it once at startup and keep that in memory so I could update the config file without having to bring the app down or redeploy it. This would be deployed and running on all the same servers that prometheus is monitoring.

Once, I had this little "resource reporter" app with a single endpoint working I moved onto the next step which was the trigger. Luckily, Alertmanager can be configured to use a generic webhook when alerts occur. This would obvioulsy be my trigger. Rather than develop a whole separate Go application for this, I decided to just add it to my existing reporter app.

I wrote a new endpoint in the same app that would

  • accept the incoming alerts from Alertmanager
  • determine which server triggered the alert(based on the Alertmanager payload)
  • make an http request to the resource reporter app running on the alerting server
    • which would report back the outputs of the predetermined commands
  • send those command outputs to our companies devops Slack channel

So now my little resource app has two endpoints:

  • /report?alertName=<alertName> - run commands and respond with the outputs of each
  • /webhook - Alertmanager webhook, which calls the /report endpoint for other servers

I compiled this and deployed it as a linux service on all of our servers and configured Alertmanager to use it as a webhook and it's been working beautifully.

Remember when I said things would spike the CPU in the middle of the night? Turns out it was our anti-virus conducting a scan. Luckily we can configure when those scans occur and can ignore certain alerts during specific times in Alertmanager.

Take-aways

Sometimes I'll look at a problem in regards to a new piece of tech and think I'm bound by what other people have already built for some reason. "I can't find anything someone else has built to do what I want so I guess it can't be done". This attitude is absurd. We are the builders.

I also find that I try to be too much of a purist and get hung up on "good" architecture. "This thing that I'm building needs to be infinitely flexible to accomodate every scenario"...even though I know I won't run into that scenario. I considered introducing an event dispatcher in my little application so I could add other ways to be notified: email, text, discord, etc. That was way too complicated for what I needed now. It reminds of some advice I've heard regarding purchasing power tools:

Buy the cheap version first. If you find yourself using it a ton, then upgrade.

The advice I have to repeat to myself:

Build the version I need now first becuase more often than not, it will suffice and I can move onto another problem to solve.

My advice for developer looking to advance their career: Look for pain points within your company to solve. Take the initiative to attempt to solve those...and then blog about it :)

Bonus section

Docker has been a game changer for me. Not only does it allow me to package up applications but it's also a stress-free way to explore new technologies. There are a ton of applications that now have Getting Started instructions that use Docker. This negates the need to install and configure every little thing to get something up and running. That work has already been baked into a Docker image for you. I don't need to muddy up my system on something I might not end up using. When using Docker, if I find that a piece of tech isn't quite right for whatever reason, I simply stop and delete the container and image and I'm back to where I was with no remnants left on my machine.

Before installing Granana and Prometheus on a company server, I spun up local docker containers running these apps to get familiar with them.

If you're at all interested in dev-ops I would strongly encourage you to get familiar with Docker. It has made exploring cheap and easy which has empowered me to discover new ways to improve our systems.

Time Machine for Developers

The problem

Time Machine will backup your files. For a developer, and having loads of node_modules and vendor directories for projects, this is huge time suck and waste since those can be brought back with npm, yarn, composer, etc.

The solution

Directories can be excluded from Time Machine using the cli.

Adding exclusions

First, verify that you can get a list of directories that you want to exclude:

cd "$HOME/code" && find $(pwd) \
    -maxdepth 3 \
    -type d \( -name vendor -o -name node_modules \)

The first part of the above assumes you keep your code in a code directory in your home directory. You'll need to change that if that's not the case for you.

Executing this command should output a list of directories that have vendor or node_modules directories. These are what you want excluded from Time Machine.

Next, we can add to the above command to pipe those into the Time Machine utility tool, tmutil, to be excluded.

cd "$HOME/code" && find $(pwd) \
    -maxdepth 3 \
    -type d \( -name vendor -o -name node_modules \) \
    -prune \
    -exec tmutil addexclusion {} \; \
    -exec tmutil isexcluded {} \;

Viewing exclusions

You can see all excluded files/directories with the following command:

sudo mdfind "com_apple_backup_excludeItem = 'com.apple.backupd'"

You can see if a specific file or directory is excluded with:

tmutil isexcluded /some/file/path.txt

Verify that directories we excluded earlier are in fact exluded:

cd "$HOME/code" && find $(pwd) \
  -maxdepth 3 \
  -type d \( -name vendor -o -name node_modules \) \
  -exec tmutil isexcluded {} + | grep -F "[Excluded]"

Remove exclusions

Or remove all the exclusions we added

cd "$HOME/code" && find $(pwd) \
  -maxdepth 3 \
  -type d \( -name vendor -o -name node_modules \) \
  -prune \
  -exec tmutil removeexclusion {} \; \
  -exec tmutil isexcluded {} \;

Automate

We can automate this so if we create a new project we don't have to worry about manually adding the exclusion with a cron job.

Create a bash file that will run the addexclusion commands:

#!/usr/bin/env bash

cd "$HOME/code" && find $(pwd) \
  -maxdepth 3 \
  -type d \( -name vendor -o -name node_modules \) \
  -prune \
  -exec tmutil addexclusion {} \; \
  -exec tmutil isexcluded {} \;

Remember to change the $HOME/code to wherever you keep your projects. I put this at $HOME/code/machine-utils/time-machine-exclusions.sh.

Make sure it's executuble:

chmod +x $HOME/code/machine-utils/time-machine-exclusions.sh

Add this as a cronjob to run every hour:

  • crontab -e
  • press i to go into "insert" mode
  • paste the following line (Change according to where you saved the file on your machine. The full path is required.):
    • 0 * * * * /Users/davidadams/code/machine-utils/time-machine-exclusions.sh
  • press esc to exit "insert" mode
  • type :x to save and exit

Tags: Mac, Time Machine