Matt Daum, Author at {5} Setfive - Talking to the World

Have you ever tried pitching an upgrade to management? Odds are, you probably didn’t find yourself walking away with a blank check. Maybe you’re a network administrator for a real estate company whose boss doesn’t understand why the cheaper network infrastructure isn’t always the best option for scalability; or maybe you need to request an upgrade for an application that takes up a significant amount of your time every day to troubleshoot because it’s incompatible with other operating systems. The conversation goes something like:

You: “Boss, we really need to upgrade [xyz] software package.”

Them: “Why do we need the upgrade? If it ain’t broke, don’t fix it.”

Your: “Well, it’s creating a number of issues for our team. The manufacturer no longer supports the version we use, because it’s been obsolete for 10 years. Whenever an issue comes up we have to come up with a workaround.”

Them: “How much is the upgrade?”

You: “It’ll be $ X for a shared license for all team members.”

Them: “I just don’t think we don’t have the money in the budget for that kind of upgrade. We have a lot more pressing projects requiring capital right now, and I can’t see us justifying that expense to our board.”

Such a request may not be well received because of difference in perception of the situation –of the cost-reward assessment of the solution. The management team may not speak the same language, so to speak, as the technical support or engineers, so it’s crucial to put the request into terms they will understand and listen to. Better yet, frame that request as an offer.

Here’s three ways here that you can sell to that point, in language even your boss can understand.

1. Security

Technology has never had an obsolescence rate as fast as it is today. Failure to keep up to date is not just a matter of having the best and newest techy toys, though; it can lead to a security breach of personal information (like the Target PIN data breach of 2013), stolen identities and stolen money.

Cyber security expenses are perhaps the hardest sell to make, considering failure to upgrade presents a latent risk rather than an active one. It works until it doesn’t. Earlier in 2020, the stock market witnessed a reactionary boost in security spending in companies like FireEye, after celebrities like Elon Musk’s Twitter accounts were hacked.

As an engineer pitching an upgrade to management, convey the risk of not getting it and the potential fallout.

2. Developer productivity

A software version upgrade can be money in the bank if it saves man hours by making tasks less labor intensive and more efficient. Use terms like “faster,” “leaner,” or the military favorite “force multiplier.” If you’ve got estimates on time allotted to a given project that can be broken down into hourly direct labor savings, that’s always a great selling point.

3. Hiring and Retention

The success of any project depends not just on the tools, but on the people using those tools; to a large degree, the more cutting edge your tools are, the more cutting edge the people in your employ will be. When it comes to hiring and retaining employees, one deciding factor will surely be how modern your operating environment is.

If your team uses obsolete tools, it may even be more difficult to find someone with that skill. If you use Python 2 instead of Python 3, the syntax and many features are quite different, but all modern users are taught the most recent version, so it will present a small challenge to hire someone who’s willing to use (or learn) an obsolete version of that language.

Whatever the case, it’s safe to say meeting the bottom line is among any company’s top priorities, when it comes to spending. The more you can appeal to that end and sell a tangible ROI for the cost of the upgrade, the more likely you are to hear a ‘yes.’

Another tip is to present multiple options: a good, better, and best option with #1 being the most expensive. People are often more likely to choose to do something when it is presented as one of multiple options than alone.

When Amazon Web Services rolled out their version 4 signature we started seeing sporadic errors on a few projects when we created pre-authenticated link to S3 resources with a relative timestamp. Trying to track down the errors wasn’t easy. It seemed that it would occur rarely while executing the same exact code. Our code was simply to get a pre-authenticated URL that would expire in 7 days, the max duration V4 signatures are allowed to be valid. The error we’d get was “The expiration date of a signature version 4 presigned URL must be less than one week”. Weird, we kept passing in “7 days” as the expiration time. After the error occurred a couple of times over a few weeks I decided to look into it.

The code throwing the error was located right in the SignatureV4 class. The error is thrown when the end timestamp minus the start timestamp for the signature was greater than a week. Looking through the way the timestamps were generated it went something like this:

Generate the start timestamp as current time for the signature assuming one is not passed.
Do a few other quick things not related to this problem.
Do a check to insure that the end minus start timestamp is less than a week in seconds.

So a rough example with straight PHP could of the above steps for a ‘7 days’ expiration would be as follows:

Straight forward enough, right? the problem lies when a second “rolls” between generating the `$start` and the end timestamp check. For example, if you generate the `$start` at `2017-08-20 12:01:01.999999`. Let’s say this gets assigned the timestamp of `2017-08-20 12:01:01`. Then the check for the 7 weeks occurs at `2017-08-27 12:01:02.0000` it’ll throw an exception as duration between the start and end it’s actually now for 86,401 seconds total. It turns outs triggering this error is easier than you’d think. Run this script locally:

That will throw an exception within a few seconds of running most likely.

After I figured out the error, the next step was to submit a issue to make sure I’m not misunderstanding how the library should be used. The simplest fix for me was to generate the end expiration timestamp before generating the start timestamp. After I made the PR, Kevin S. from AWS pointed out that while this fixed the problem, the duration still wasn’t guaranteed to always be the same for the same relative time period. For example, if you created 1000 presigned URLs all with ‘+7 days’ as the valid period, some may be 86400 in duration others may be 86399. This isn’t a huge problem, but Kevin made a great point that we could solve the problem by locking the relative timestamp for the end based on the start timestamp. After adding that to the PR it was accepted. As of release 3.32.4 the fix is now included in the SDK.

On one of our projects that I am working on I had the following problem: I needed to create an aggregate temporary table in the database from a few different queries while still using Doctrine2. I needed to aggregate the results in the database rather than memory as the result set could be very large causing the PHP process to run out of memory. The reason I wanted to still use Doctrine to get the base queries was the application passes around a QueryBuilder object to add restrictions to the query which may be defined outside of the current function, every query in the application goes through this process for security purposes.

After looking around a bit, it was clear that Doctrine did not support (and shouldn’t support) what I was trying to do. My next step was to figure out how to get an executable query from Doctrine2 without ever running it. Doctrine2 has a built in SQL logger interface which basically lets you to listen for executed queries and to see what the actual SQL and parameters were for the executed query. The problem I had was I didn’t want to actually execute the query I had built in Doctrine, I just wanted the SQL that would be executed via PDO. After digging through the code a bit further I found the routines that Doctrine used to actually build the query and parameters for PDO to execute, however, the methods were all private and internalized. I came up with the following class to take a Doctrine Query and return a SQL statement, parameters, and parameter types that can be used to execute it via PDO.

In the ExampleUsage.php file above I take a query builder, get the runnable query, and then insert it into my temporary table. In my circumstance I had about 3-4 of these types of statements.

If you look at the QueryUtils::getRunnableQueryAndParametersForQuery function, it does a number of things.

First, it uses Reflection Classes to be able to access private member of the Query. This breaks a lot of programming principles and Doctrine could change the interworkings of the Query class and break this class. It’s not a good programming practice to be flipping private variables public, as generally they are private for a reason.
Second, Doctrine aliases any alias you give it in your select. For example if you do “SELECT u.myField as my_field” Doctrine may realias that to “my_field_0”. This make it difficult if you want to read out specific columns from the query without going back through Doctrine. This class flips the aliases back to your original alias, so you can reference ‘my_field’ for example.
Third, it returns an array of parameters and their types. The Doctrine Connection class uses these arrays to execute the query via PDO. I did not want to reimplement some of the actual parameters and types to PDO, so I opted to pass it through the Doctrine Connection class.

Overall this was the best solution I could find at the time for what I was trying to do. If I was ok with running the query first, capturing the actual SQL via an SQL Logger would have been the proper and best route to go, however I did not want to run the query.

Hope this helps if you find yourself in a similar situation!

On a project we were working on recently it appeared that we had data coming into our Extract, Transform, Load (ETL) processes which should have been filtered out. In this particular case the files which we imported only would exist at max up to 7 days and on any given day we’d have tens of thousands of files that would be created and imported. This presented a difficult problem to trace down if something inside our ETL had gone awry or if we were being fed bad data. Furthermore as the files always would be deleted after importing we didn’t keep where a data point was created from.

Instead of updating our ETL process to track where a specific piece of data originated from we wanted to basically ‘grep’ the files in S3. After looking around it doesn’t look like anyone has built a “Grep for S3”, so we built one. The reason we didn’t simply download the files locally and then process them one at a time is it’d take forever to transfer, then grep each one individual sequentially. Instead we wanted to do the search in parallel and not hold the entire files on the local disk.

With this we came up with our simple S3Grep java app (a pre-built jar is located in the releases) which will search all files in a specific bucket for a specific string. It currently supports both regex or non-regex search strings. You can specify how many threads you want it to use to process the files or it by default will try to use the same number of CPU’s on your machine. It utilizes the S3 Java adapter to read the files as a stream rather than a single transfer, than read from disk. Using the tool is very simple:

A the s3grep.properties file is a config file where you setup what you are searching for. An example:

For the most part this is self explanatory. The log level will default to INFO, however if you specify DEBUG it will output some more information such as what file’s it is currently checking. The logger_pattern parameter defaults to “%d{dd MMM yyyy HH:mm:ss} [%p] %m%n” and can be any pattern you want. For more information on the formatting visit the PatternLayout Documentation.

The default output format would look something like this:

If you want a little less verbose and more of just log lines you can update the logger_pattern to be just %m%n and end up with something similar to:

The format of the output is FILE:LINE_NUMBER:matching_string.

Anyways hope this helps you if you are trying to hunt down what file contains a text string in your S3 buckets. Let us know if you have any questions or if we can help!

In my last post I talked about setting up Symfony2 entities for translation and integrating it with Sonata Admin. One of the trickier parts of moving from a non-translatable entity to a translatable one is the migration of your data.

To understand some of the complexities with the migration you must understand the changes to the database that occur when taking an entity from being a regular entity to a translatable one. Any columns that are translatable will now live on a separate table and the old column is no longer used. Let’s use the following pre-translation entity DB schema as an example:

For this entity we’ll make visible_label translatable, following the instructions in my previous post. This will result in the following final schema:

The column “visible_label” has moved from the regular entity table to the entity’s translation table. If you had data in the visible_label previously it would be lost as that column no longer exists. Since we had tons of data in our case this wasn’t acceptable.

To make sure we didn’t lose data, we did the translatable migration in two stages. First, we kept the columns we were translating in the original entity and only removed the getters and setters. The reason we removed the getter and setters is we wanted to utilize the magic __call() method so it would return values from the translatable entity. All that was left was the original column declaration. At first it seemed like making the column variable public for the time being would be a quick and easy solution, then run a script that reads the public variable and migrates it to the translation. The problem with this approach is Twig will read out the public variable rather than calling through the __call() method to the translatable entity. Since we were testing at the same time as trying to build the migration, we needed the tests to access the translatable entity and not the old public variable. We ended up using Reflection Classes and keeping the column declared as a private. With reflection you can change properties to be accessible outside of the class even though they are declared private. For example:

By using the reflection we’re able to access the original “visible_label” column and migrate the data to the translation entity. We built similar routines for each of the entities that we had to migrate. After the migration and everyone confirmed that the live site was functioning properly, we removed the translated columns from the original entity and database.

By taking this two staged approach we were able to move to translatable entities while not losing any data in the migration. In our case we also marked (//START TRANS, //END TRANS) on each entity the start of translatable columns and end so that we could use sed to go through all of them and remove the old columns once the migration was finished.

Happy translating!

Author: Matt Daum

Are you struggling to pitch management on an upgrade?

An Edge Case of Time in AWS PHP SDK

Doctrine2 QueryBuilder Executable SQL Without Running The Query

S3Grep – Searching S3 Files and Buckets

Symfony2 – Moving to Translatable Entities