2023-04-23

On being an external developer

Project details:

Technical debt vs. being an external developer

The state in which the project got maneuvered into isn't even a big deal if you are an in-house developer and, of course, was not the intention from anyone wanting it to be like this in the first place, I surely don't want to blame anyone. It's just the regular mixture of being pressured developing new features and having no room to reevaluate technical decisions, doing this over nearly 8 years took its toll.

But from the inside you don't notice much pain, you have every tool you need available inside the intranet, every service works out of the box like it did for years now, no major infrastructure change is on the horizon, and the backlog is full of user-stories anyway, so why allocating time to change something that just works™️?

So as their working day was not really affected at all, as soon as we entered we got confronted with all sorts of issues that made working on the project quite challenging:

So what should we do? Luckily, there is DB2-LUW, a software that enables us to run a local DB2-Instance on our machines. Setting this up took me a lot of time and I don't know if I ever would call this task finished, because we still keep running in issues (2 years later).

I had to:

Problems

Here is just a list of memorable problems we had:

No support for M1 Macs

Even after 2 years after release no support for m1 macs, guess it'll never be there, but maybe emulation gets better in a few months.

There is a support thread inside the IBM Community-Forum about this topic.

Import time on x86
Import time on m1 macs

Current State (April 2023): It works with x86 Emulation, but it is terribly slow compared to native speed, so I have to run an additional mini pc at home that sips quite a lot of energy, quite unfavorable with today's energy prices, using another instance via VPN was impossible because the apps' performance suffered a lot with latency. There is a beta functionality right now to let Docker use Apples Rosetta-2 translation layer for x86-images, but that is currently not working for DB2-LUW:

Rosetta 2 Emulation Beta in Docker
..that is not working atm sadly

Incompatible queries between LUW and z/OS

Some SQL Queries are not 100% compatible, for example the "Inner Join" keyword is not allowed in recursive queries.

And the "Merge Into" Command fails on LUW if there is a "NOT ATOMIC CONTINUE ON SQLEXCEPTION" in it.

These 2 two errors were pretty easy to fix, but hard to spot because everything just crashes only on runtime, and we are still not sure if we found every compatibility issue just yet, what is pretty terrible from a workflow point of view: You actually want to work on a user story but worst case you have to troubleshoot yet another kind of obscure database-compatibility-error, and it happened a LOT.

Randomly corrupted archives after importing data

That error foreshadowed a little within the previous screenshots and was by far the hardest one to catch. At one point in time, a colleague contacted me because of this error:

DB2 Error -1448

That error appeared every time he used a specific feature of our app, I could not reproduce it with my local build, so I threw away his local docker infrastructure and rebuilt everything, but the error remained. So I did the same with my infrastructure and compared the log files:

Left: His log, Right: My log

Seems like that while importing our test-data, his importer chokes on some archives, looking at them with a hex-editor revealed that some files got corrupted.

Corrupted Data

We re-downloaded these exact archives and noticed that it happened again once the import finished. At first, I thought his HDD was broken, but I also looked at our docker-compose file again and noticed that the docker container had write access to these archives right now. I quickly changed the mount to be read-only, re-downloaded the archives once again and started over, no corrupted archives anymore.

Conclusion: Sometimes the importer destroys parts of archives as long as the process has write access to our files that we want to import, a "feature" that I certainly did not expect from an import tool. After that I wrote a few file validation and downloading scripts to be able to check pretty quickly if files were corrupted on our machines, turns out at least one archive was always corrupted on every developer machine and server-vm:

Corrupted Data

Til now I don't know what caused this behavior, if it is a bug, some kind of weird side effect or something else, I only know how to mitigate it by mounting everything read-only that is not supposed to get changed.

Our efforts to make the experience less catastrophic for externals

Switch over to JOOQ

JOOQ auto-generates boilerplate java code out a DDL and gives us a lot of compile-time-safety while writing queries. When the DDL changes, the boilerplate-code does also, and you get compile-time-errors everywhere you need to change your code. Without that, you are basically blind and have to rely on your unit- or integration-tests to make a good job of finding errors.

ScalikeJDBC doesn't have this functionality at all, what currently is one of our biggest problems. Doing large changes to the underlying database-structure is always coupled to do a lot of manual testing and refactoring, what costs us a lot of time and gives us nearly no freedom to refactor existing logic, code and data-structures.

Also, JOOQ provides some kind of "cross-database" support for databases, so we could use a MySQL or PostgreSQL database for our local machines and get rid of the horrible local db2 experience, but being database-agnostic prevents us from using currently needed database-specific feature-sets like temporal-tables from db2, or the merge-into statement. MySQL for example does not support any of it, PostgreSQL has extensions for it, but it would still be hard to set it up.

Furthermore, using JOOQ with DB2 as Database costs money because you need a corporate license, another hurdle you cross, e.g. you have to find some good arguments for your PO to justify the extra costs.

Also, the old tech-stack took its toll, there were no JOOQ-Autogen plugins available anymore for SBT that are compatible with Java 1.8. We could use older versions of it, but that would create other problems as well.

In the end we gave up the dream to have JOOQ inside the project because it would take too much time to move everything over, and the companies infrastructure does not allow it to move from DB2 to something else without that much friction anytime soon.

Trying a different Database

Same with JOOQ, too many incompatibilities and too little time to set up a database agnostic solution.

Introduce Database-Versioning with Flyway

We successfully did this with another MySQL-Database that got used inside the project, but for DB2 it won't happen. Firstly, we don't have write access to all schemas, and we decided to not allocate any resources to implement it there because the data-model is not supposed to change drastically anymore, so we are staying with the status-quo, but it would be so nice to have this for writing unit-tests for our DAOs.

tl;dr