One of our clients recently had a unique use case. They had a Wiki site where they wanted to restrict viewing of posts to only their app’s authorized users. Picture something like a SaaS app where the Wiki site had proprietary content that our client only wanted paying users to access.
The two obvious options to implement this would be:
- Create a Wiki user for each authorized user – this has the downside that we’d need to maintain two accounts, figure out how to keep users logged into both, and deal with synchronizing account data.
- Modify the Wiki’s application code to authorize the users in some fashion – this is problematic because it would make upgrading the Wiki software difficult.
Turns out there’s a third option which is much smoother! Nginx has a directive called auth_request which allows nginx to authorize access to a resource based on a 2nd HTTP request.
The way it works is:
- Your SaaS app is setup at platform.setfive.com where users are authenticated by a Symfony application.
- You configure your Symfony application to send a cookie back with a wildcard domain of “.setfive.com”
- Your wiki is running at wiki.setfive.com and configured to authorize requests to platform.setfive.com/is-authenticated
- Now, when users request wiki.setfive.com their browser will send your Symfony authentication cookie, nginx will make a request to platform.setfive.com/is-authenticated, and if they’re authenticated they’ll be granted access to your wiki.
The nginx config for this is pretty straightforward as well. One thing to note is this module is not standard so on Ubuntu you do need to install the nginx-extras package to enable it.
Posted In: Tips n' Tricks
Phew! Been awhile but we’re back!
NOTE: There’s a working Spring Boot application demonstrating this at https://github.com/Setfive/spring-demos
For many applications a security and authentication scheme centered around users makes sense since the focus of the application is logged in users taking some sort of action. Imagine a task tracking app, users “create tasks”, “complete tasks”, etc. For these use cases, Spring Boot’s Security system makes it easy to add application security which then provides a “User” model to the rest of the application. This allows your code to do things like “getUser()” in a Controller and have ready access to the currently authenticated user.
But what about applications that don’t have a user based model? Imagine something like an API which provides HTML to PDF conversions. There’s really no concept of “Users” but rather a need to authenticate that requests are coming from authorized partners via something like an API key. So from an application perspective you don’t really want to involve the user management system, there’s no passwords to verify, and obviously the simpler the better.
Turns out its very straightforward to accomplish this with a Spring managed Filter. Full code below:
The code is pretty straightforward but a couple of highlights are:
- It’s a Spring Component so that you can inject the repository that you need to check the database to see if the key is valid
- It’s setup to only activate on URLs which start with “/api” so your other routes wont need to include the Key header
- If the key is missing or invalid it correctly returns a 401 HTTP response code
That’s about it! As always questions and comments welcome!
Posted In: General
One of Setfive’s New Years Resolutions is to prioritize our internal marketing. In establishing online presence, an initial project included refreshing the @setfive Twitter following list. To do so, we built a list of target accounts that we wanted to follow and then started searching for tools to automate the following. After some research, it appeared the only existing tools were paid with weird, and “sketchy,” pricing models. So, we decided to look at using the Twitter API to implement this list ourselves.
As we started looking at the API, we learned you need to be approved by Twitter to use the API. In addition, you need to implement OAuth to get tokens for write actions on behalf of a user, like following an account. We were only planning to use this tool internally once, so we decided to avoid the API and just automate browser actions via Puppeteer. For the uninitiated, Puppeteer is a library that allows developers to programmatically control Google Chromium, which is Chrome’s open source cousin.
Puppeteer ships as a npm package, so getting started is really just a “npm install,” and you’re off to the races. The Puppeteer docs provide multiple examples, so, I was able to whip up what we needed in a handful of lines of code (see below). Overall, the experience was positive and I’d be happy to use Puppeteer again.
So why would Puppeteer be interesting to your business?
In 2019 APIs are popular, many business provide access to data and actions through programatic means. However, not all of them do and that’s where Puppeteer provides an advantage. For example, many legacy insurance companies only provide quotes after you fill out a web form, which normally, you’d have to complete manually. If you automated that process with Puppeteer, you’d be able to process quotes at a much faster rate. 24 hours a day, and give yourself a competitive advantage against your competition.
Posted In: General
At the beginning of last year we tried something different and deployed an application on Amazon’s Elastic Container Service (ECS). This is in contrast to our normal approach of deploying applications directly on EC2 instances. So from a devops perspective using ECS involved two challenges, working with Docker containers and using ECS as a service. We’ll focus on ECS here since enough ink has been spilled about Docker.
For a bit of background, ECS is a managed services that allows you to run Docker containers on AWS cloud infrastructure (EC2s). Amazon extended the abstraction with “Fargate for ECS” which launches your Docker containers on AWS managed EC2s so you don’t have to manage or maintain any underlying hardware. With Fargate you define a “Task” consisting of a Docker image, # of vCPUs, and amount of RAM which AWS uses to launch a Docker container on a EC2 which you don’t have any access to. And then naturally if you need to provision additional capacity you can just tick the 1 to 2 and AWS will launch an additional container.
The app we deployed on ECS is one we inherited from another team. The app is a consumer facing Vert.x (Java) web app that provides a set of API endpoints for content focussed consumer sites. Before taking over the app we had learned that it had some “unique” (aka bugs) scaling requirements which was one of the motivators to use ECS. On the AWS side, our setup consisted of a Fargate ECS cluster connected to an application load balancer (ALB) which handled SSL termination and health checks for the ECS Tasks. In addition, we connected CircleCI for continuous integration and continuous deployment. We’re usingecs-deploy to handle CD on ECS. ecs-deploy handles creating a new ECS task definition, bringing up a new container with that definition, and cycling out the old container if everything went well. So a year in here are some takeaways of using Fargate ECS.
Reinforces cattle, not pets
There’s a cloud devops mantra that you should treat your servers like a herd of cattle, not the family pets. The thinking being that, especially for horizontally scalable cloud servers, you want the servers to be easy to bring up and not “special” in any way. When using EC2s you can convince yourself that you’re adopting this philosophy but eventually something will leak through. Sure you use a configuration management tool but during an emergency someone will surely manually install something. And your developers aren’t supposed to use the local disk but at some point someone will rely on files in “/tmp” always being there.
In contrast, deploying on ECS reinforces thinking of individual servers as disposable because the state of your container is destroyed every time you launch a new task. So each time you deploy new code a new container will be launched without retaining any previous state. At the server level, this dynamic actually makes it impossible to make ad hoc server changes since it’s impossible to SSH to your ECS so any changes would have to present in your Dockerfile. Similarly at the app level since your disks don’t persist between deployments you’d quickly stop writing anything important just to disk.
When using Fargate on ECS the only way to access output from your container is through a CloudWatch log group through the CloudWatch UI. At first look this is great, you can view your logs right in the AWS console UI without having to SSH into any servers! But as time goes on you’ll start to miss being able to see and manipulate logs in a regular terminal.
The first stumbling block is actually the UI itself. It’s not uncommon for the UI to be a couple of minutes delayed which ends up being a significant pain point when “shit is broken”. Related to the delays, it seems like sometimes logs are available in the ECS Task view before CloudWatch which ends up being confusing to members of the team as they debugged issues.
Additionally, although the UI has search and filtering capabilities they’re fairly limited and difficult to use. Compounding this, there frustratingly isn’t an easy way to download the log files locally. This makes it difficult to use common Linux command line tools to parse and analyze logs which you’d normally be able to do. It is possible to export your CloudWatch logs to S3 via the console and then download them locally but the process involves a lot of clicks. You could automate this via the API but it feels like something you shouldn’t have to build since for example the load balancer automatically delivers logs into S3.
You’ll (probably) still need an EC2
The ECS/container dream is that you’ll be able to run all of your apps on managed, abstract infrastructure where all you have to worry about is a Dockerfile. This might be true for some people but for typical organizations you’re probably going to need to run an EC2. The biggest pain points for us were running scheduled tasks (crontabs) and having the flexibility to run ad hoc commands inside AWS.
It is possible to run scheduled tasks on ECS but after experimenting with it we didn’t think it was a great fit. Instead, the approach we took was setting up Jenkins on an EC2 to run scheduled jobs which consisted of running some command in a Docker container. So ultimately our scheduled jobs shared Docker images with our ECS tasks since the images are hosted by AWS ECR. Because of this, the same CircleCI build process that updates the ECS task will also update the image that Jenkins runs so the presence of Jenkins is mostly transparent to developers.
Not having an “inside AWS” environment to run commands is one of the most limiting aspects of using ECS exclusively. At some point an engineer on every team is going to find themselves needing to run a database dump or analyze some log files both of which will simply be orders of magnitude faster if run within AWS vs. over the internet. We took a pretty typical approach here with an EC2 configured via Ansible in an autoscale group.
Is it worth it?
After a year with ECS+Fargate I think it’s definitely a good solution for a set of deployment scenarios. Specifically, if you’re dealing with running a dynamic set of web frontends that are easily containerized it’ll probably be a great fit. The task scaling is “one click” (or API call) as advertised and it feels much snappier than bringing up a whole EC2, even with a snapshotted AMI. One final dimension to evaluate is naturally cost. ECS is billed roughly at the same rate as a regular EC2 but if you’re leaving tasks underutilized you’ll be paying for capacity you aren’t using. As noted above, there are some operational pain points with ECS but on the whole I think it’s a good option to evaluate when using AWS.
Posted In: Amazon AWS