/2025/02/14/dynamic-data-masking-notes

Notes on Dynamic Data Masking

中文

Dynamic data masking started for me as a way to keep privacy rules close to the API surface instead of scattering them through every query, converter, and response object.

The problem looks simple at first: hide phone numbers, emails, addresses, company names, or exact numbers when the current visitor should not see them. The difficult part is not the masking function itself. The difficult part is applying the right rule at the right boundary without turning normal business code into a pile of if statements.

1. Mask at the Response Boundary

The cleanest place to apply masking is usually after the business method has produced its result and before the result is serialized to the client.

That keeps the service logic focused on real business behavior:

  • query the data
  • calculate the result
  • enforce real authorization
  • return the normal domain shape

Masking is not the same as authorization. Authorization decides whether an operation is allowed. Masking decides how much of an allowed response should be visible.

I like keeping those two ideas separate. If they are mixed together, it becomes easy to accidentally change business behavior when the real goal was only to change presentation.

2. Describe the Rule Declaratively

The useful shape is an annotation or small configuration block near the API method:

@MaskResponse(fields = {
@MaskField(path = "owner.phone", strategy = MOBILE_PHONE),
@MaskField(path = "owner.email", strategy = EMAIL),
@MaskField(path = "location.address", strategy = ADDRESS),
@MaskField(path = "items.price", strategy = FUZZY_NUMBER, check = LoginCheck.class)
})
public Page<ResultDTO> page(Query query) {
return service.page(query);
}

This reads like a contract. The method still returns the full DTO internally, but the external response gets filtered through a visible list of sensitive fields.

The important part is path. Real API responses are rarely flat. They contain lists, nested objects, paging wrappers, and sometimes arrays. A masking tool that only works on top-level fields becomes too weak very quickly.

3. Walk Nested Data Generically

The masking engine should not know the shape of every DTO. It should know how to walk a path.

A rough version looks like this:

function maskResponse(result, rules, requestContext):
for rule in rules:
if rule.check exists and rule.check.shouldSkip(requestContext):
continue

applyPath(result, split(rule.path, "."), rule.strategy)

return result

function applyPath(value, parts, strategy):
if value is null:
return

if value is a collection:
for item in value:
applyPath(item, parts, strategy)
return

current = parts[0]

if parts has one item:
original = readField(value, current)
masked = strategy.apply(original)
writeField(value, current, masked)
return

nextValue = readField(value, current)
applyPath(nextValue, rest(parts), strategy)

This makes the API rule independent of whether the field lives directly on the object, under data, inside records, or below a nested relation.

4. Strategies Should Be Small and Predictable

Masking strategies should be boring. They are not the place for clever business logic.

Good strategies are simple:

  • show only the first character
  • hide the middle of a phone number
  • hide both sides of an email
  • replace a value with Login to see
  • replace a street address with a settlement hint
  • round a number into a less precise bucket
  • clear a value to null or empty string

The strategy receives the field value and returns the replacement:

strategy MOBILE_PHONE_HARD(value):
if value is blank:
return ""

return keep first char + repeat("*", length(value) - 2) + keep last char

strategy FUZZY_NUMBER_ROUND_UP(value):
if value is null:
return null

return roundUpToReadableBucket(value)

The more predictable these functions are, the easier it is to review whether the API is leaking something.

5. Visibility Checks Make It Dynamic

The dynamic part is not that the mask pattern changes randomly. The dynamic part is that the same endpoint can return different visibility depending on context.

Examples:

  • logged-out users see a city but not a full address
  • logged-in users see more detail
  • verified merchants see a company name
  • internal users skip masking
  • public pages fuzzy exact quantities

That can be represented as a small check object:

interface MaskCheck:
function shouldSkip(context): boolean

class LoginCheck:
function shouldSkip(context):
return context.currentUser is not null

class VerifiedMerchantCheck:
function shouldSkip(context):
return context.currentUser has verified merchant profile

The naming matters. I prefer shouldSkip or canReveal to a vague check, because the wrong boolean direction is an easy source of bugs.

6. Do Not Forget Translated Fields

Masking becomes more interesting when a response also has translated values.

If the visible field is derived from a language map, masking only the rendered field is not enough. The raw multilingual source may still be serialized somewhere else, or it may be used by another serializer later.

The masking layer should understand that:

field "address" may have translations:
language["address"].cn
language["address"].en

when masking address:
mask address
mask language["address"].cn
mask language["address"].en

That keeps the response consistent. A user should not lose the original field but still receive the unmasked English or Chinese translation.

7. Keep It Visible in Code Review

The biggest advantage of declarative masking is reviewability.

When a new endpoint is added, I want to see sensitive decisions near the method:

@MaskResponse(fields = {
@MaskField(path = "data.contacts.phone", strategy = MOBILE_PHONE),
@MaskField(path = "data.contacts.email", strategy = EMAIL)
})

That is easier to audit than searching through converters, helper methods, and mapper XML.

Current Rule

For dynamic masking, my rule is:

return the same shape, reduce visibility by context

The API should stay predictable for clients, but sensitive fields should become less precise when the viewer does not have enough context to see them.