Back
Designing trust into a government-scale AI audit system case study visual
Federal Court of Accounts (TCU) via Capgemini

Designing trust into a government-scale AI audit system

AI UXTrust ArchitectureEnterprise UX

I led the UX for an AI system that helped government auditors write complex legal instructions. By reshaping messy inputs and giving auditors full control to review and correct the AI's output, we cut writing time by 63%: months to hours. The system shipped, was adopted across teams, and won Best technology case in the public sector.

2017
Year
Public sector / AI audit
Industry
Remote / Distributed
Client Location
Senior UX Designer
Role
2017
Timeline

Federal Court of Accounts (TCU) via Capgemini is not just another case study

I led the UX for an AI system that helped government auditors write complex legal instructions. By reshaping messy inputs and giving auditors full control to review and correct the AI's output, we cut writing time by 63%: months to hours. The system shipped, was adopted across teams, and won Best technology case in the public sector.

The Real Problem

The auditors did not trust the AI.

Auditors at Brazil's Federal Court of Accounts were spending months writing Audit and Merit Instructions: important legal documents sent to the Public Ministry. The Court wanted AI to speed this up.

But there was a bigger issue than speed: the auditors did not trust the AI.

These were senior experts who had done this work manually for decades. They were constitutionally skeptical. They would not sign their name to something they could not fully verify.

The challenge was not making the AI smarter. It was making the output trustworthy enough for guarded experts to actually use it.

How I Approached It

The biggest trust lever was upstream.

I spent time observing how the auditors actually worked. I broke their process into small mental steps: what they looked at first, what made them pause, and what made them trust or reject a draft.

That is when I realized the biggest trust problem was not the AI model. It was the messy, inconsistent inputs the system was receiving. Different teams used different templates, so the AI produced unpredictable results.

My key decision: instead of trying to fix the output, I went upstream and reshaped the inputs.

We consolidated dozens of legacy templates into one clean, consistent pattern. The AI immediately started producing more predictable and accurate drafts. That single change became the biggest trust lever in the entire project.

Design Decisions

The AI should draft. The human should always decide.

Once the inputs were cleaner, I focused on giving auditors control and visibility. Every major decision made the AI easier to inspect, edit, and correct.

Decision 01

Reshape the inputs before polishing the output

Different teams used different legacy templates, so the AI received inconsistent inputs and produced unpredictable drafts.

Consolidated dozens of legacy templates into one clean, consistent instruction pattern before focusing on the generated output.

Auditors saw drafts that were more predictable, easier to compare, and easier to verify against their mental model of the work.

The AI became more trustworthy because the source material it saw became more structured and legible.

Decision 02

Make every AI-generated instruction editable

Senior auditors would not sign off on legal instructions they could not fully verify or correct.

Every AI-generated instruction remained fully editable, so auditors could change anything before accepting the draft.

The AI felt like a drafting partner instead of an authority. Auditors kept ownership of the final legal text.

The explicit right to correct lowered the risk of relying on a system they could not fully inspect internally.

Decision 03

Use conditional components to reduce verification load

Dense legal authoring flows exposed too much at once, making auditors work harder to find what mattered for a given instruction.

Introduced conditional components based on Hick's Law, so only relevant fields appeared based on the auditor's choices.

Auditors moved through complex authoring steps with less irrelevant material competing for attention.

Reducing choice noise made verification feel more deliberate and less like hunting for hidden errors.

Decision 04

Make dense legal text easier to scan and verify

Audit instructions were legally dense. Trust depended on the auditor's ability to scan, compare, and catch issues quickly.

Used colour-coded readability and visible reasoning cues so dense legal text became easier to review.

Auditors could move through the draft with clearer signposts instead of rereading long blocks of undifferentiated text.

The output became easier to verify, which mattered more than making the AI sound confident.

The Moment It Worked

A guarded expert chose to trust the output.

During a usability test, a notoriously picky senior Secretariat Director read an AI-generated draft. He paused, re-read it twice, then looked up and said:

This could turn months of work into hours.

That moment, watching a deeply skeptical expert choose to trust the output, became the foundation of how I think about AI design.

Impact

63% reduction in instruction-writing time.

63% reduction in instruction-writing time. Work that took months could be completed in hours.

The system shipped into production and was adopted across multiple teams.

The work was praised by ministries and won Best technology case in the public sector in Brazil.

What I Learned

Trust is an architecture, not a feature.

This project taught me that trust is an architecture, not a feature. You have to design it into the inputs, the outputs, and the correction surfaces from the beginning.

The same core pattern I used here, reshape inputs and give users the right to correct, is exactly what I later applied to LLM and GenUI work on Ask Markerr and TaleWeaver.