Why I turned down an RPA project

For one of my former clients, I turned down an RPA project. I just couldn't see how the architecture and approach could be a responsible piece of HIPAA software. The concept of an RPA makes me uncomfortable from an engineering perspective, let alone in a HIPAA/GDPR regulated environment.

The risks in security, PHI-leakage, brittleness, silent failure modes, and the sheer number of (async) edge-cases that would need to be considered, made it all seem like a huge and costly challenge at best.

I just couldn't see how this was an appropriate direction when proper alternative solutions exist--or at least that I thought existed.

In the end, having gotten some legal insight from HIPAA experts, they helped me see that there is a way--though a long one. And finding out later, RPAs might just be a necessary evil in the US.

What are RPAs?

RPA (Robotic Process Automation) is basically software running a web-browser by itself. Not a desktop AI agent running your browser, not free-ranging an AI unto the internet, just software running through (mostly) deterministic commands/flows using a web-browser that it runs in the background.

The good: they're genuinely useful

The most canonical use-case is automation. Have a flow that takes 10 minutes to click around a website and fill in a form you do everyday? An RPA can just do that for you. You, or a programmer, can tell it how to navigate the website and fill in the form. Then you just give it the data it needs to fill in, and your time is freed up.

In healthcare specifically it's used frequently to have one system effectively work together with another's. Why it is so difficult in healthcare is a fascinating topic in-and-of itself, but in short; slow to non-existent adoption of standardizations and protocols. And even if they are present, the cost (in time and money) of certifying a connection make it often unviable. When the clean route—a proper FHIR or HL7 integration—isn't on the table, RPAs offer a real, at least interim, solution.

The bad: the hack-job feel

So here's the problem with RPAs; they feel like a hack-job--circumventing the proper clean route with some patch work. Any seasoned software engineer knows intuitively that work like that just leads to pain down the line; weekend call-ups, frequent maintenance, Star Wars intro esque error logs, etc. But to list the concerns concretely; here were my three major areas that made me (and probably my professional liability insurer) squirm.

1. Security

Healthcare websites need an account and log-in. Having an RPA authenticate would mean storing the username and password, an inferior authentication method—a static, shared secret sitting in a vault somewhere. Compromise the store and you've handed over a full human account. Especially considering MFA (multi-factor authentication) often gets disabled to simplify these flows.

Another big issue is when an account is from a person in the company. This would violate HIPAA immediately, as you'd no longer be able to tell what actions were performed by the RPA, and which by the account owner.

2. Brittleness

Unlike proper programmatic interfaces, websites change; one button being moved could break the whole system. Or worse, the website change introduces a silent failure, introducing corrupt/erroneous data in your system and/or theirs. And what happens when the RPA undoubtedly will stop working at some point? You're back to where you started.

3. PHI-bleed

The RPA handles a bunch of PHI, not only the PHI you give it, but also the PHI it comes across as it's navigating the platform. As more modern RPAs analyze screenshots and do semantic searches across pages (to help make it less brittle), the image models used now also handle PHI. Most companies setup a BAA with the LLM provider, but the image model gets missed easily. The RPAs are also needlessly exposed to PHI that they're not directly working with. Say, they need to search a patient in a list view, just to navigate to that patient they'd see a whole list of other patients, with no concern for the other patient's consent. In short, the PHI bleeds around in RPAs.

The ugly: making it compliant

Now, they're still needed, and the above are real engineering challenges--non-trivial, but not impossible. Here are some of my hindsight findings on making an RPA a lot more mature.

Connection pinboard meme

1. Identity, ephemeral credentials & MFA

Get dedicated service accounts from the counterparty—ones that belong to the bot, not borrowed from an employee. This is an absolute must, perhaps the single most important thing. It keeps who's responsible in the audit trail: every action the RPA takes is the RPA's, and you can prove it.

Getting one service account is the bare minimum. But ideally multiple to allow for privilege separation.

Store the credentials in a real secrets manager with rotation and least-privilege access—not just a config file. And yes, invest in the effort of not disabling MFA (some platforms even require it). If the platform supports TOTP, store the seed alongside the credentials and generate the code programmatically; you keep the second factor without a human in the loop. Though its not true MFA, it does add another layer of security.

Then of course; log everything the bot does on your side, immutably. If someone asks what touched a record and when, you should be able to answer to the keystroke.

2. Make it fail loud

A silent failure should really be public enemy #2 (safety first 🫡). Treat the RPA like a real system, not a script. Assert your assumptions before every action; 'is this the page and field I think it is?'. Consider the edge-cases, especially in race-conditions--there will be many.

Verifying whether a write operation actually landed is well worth the extra effort. Validate the data going in and confirm it coming out.

Especially for RPA, make sure you monitor it well: alerting on failures, anomalies, race-conditions, basically any edge-case needs to be thought through. Assume it will break, and design so that when it does, it hands off to a human.

3. Starve it of PHI

Obviously, give the RPA only the PHI it needs. But surprisingly more often missed; have BAAs with every external service that is involved, keep PHI out of logging and avoid needless PHI exposure like dashboards or list views by using URLs directly.

Bonus points: redact or obfuscate PHI before passing it to a vision model by scrambling text in certain PHI containing components in the HTML.

None of this is crazy out-of-the-box thinking in the HIPAA space. But RPAs will need to carry extra weight and consideration since its such unorthodox software in this setting.

Conclusion

RPAs are here, and unfortunately I see them here to stay. Though we all want the simple-straightforward solution, in the current healthcare space, we just need to go the long way around. But as we go that extra distance, it's important not to take shortcuts. RPAs are serious software solutions in HIPAA environments, not just simple scripts--and we need to treat them this way. They'll always be a bit of a hack-job, but they can be built responsibly.