Dataset-level IAM is a blunt instrument. roles/bigquery.dataViewer on a dataset is all-or-nothing: the analyst who needs aggregate revenue also gets raw ssn, email, and every customer’s row regardless of region. Compliance teams hate it, and “just make another copy with the PII stripped out” is how you end up with twelve drifting shadow tables nobody can audit.
BigQuery’s answer is a stack of fine-grained controls that layer on top of IAM without copying data: policy tags for column-level security, dynamic data masking for partial exposure, row access policies for tenant and region filtering, and authorized views, datasets, and routines for controlled cross-dataset sharing. This guide builds them in the order you should actually deploy them, ending with a governed analytics zone you can hand to auditors.
1. Understand the access-control layers and their evaluation order
Before writing a single command, get the mental model right. BigQuery evaluates access in layers, and they compose rather than override:
| Layer | Granularity | Mechanism | Failure mode if misused |
|---|---|---|---|
| IAM | Project / dataset / table | roles/bigquery.dataViewer, custom roles |
Over-broad grants, all-or-nothing |
| Authorized views/datasets | Query surface | View runs as its owner, not the caller | Forgotten authorization = Access Denied |
| Column-level security | Column | Policy tags on a Data Catalog taxonomy | Tag without reader role blocks the column |
| Data masking | Column (value transform) | Data policy bound to a policy tag | Wrong role precedence leaks raw values |
| Row-level security | Row | Row access policies with predicates | Missing GRANT TO (... ) hides all rows |
The rule that trips people up: column-level security and row-level security are restrictive, not additive. A user must hold a table read role (via IAM) and the relevant fine-grained reader role (via the policy tag) and match a row access policy (or the table has none) to see a value. Any layer can subtract access; none can grant past IAM. Start broad with IAM, then carve down.
Design principle: grant table read to a wide group, then use policy tags and row policies to subtract sensitive columns and rows. This keeps the IAM surface small and pushes the nuance into governed, auditable objects.
2. Build a Data Catalog taxonomy for column-level security
Column-level security is driven by policy tags organized in a taxonomy. The taxonomy is the classification hierarchy (“PII > high”, “PII > low”, “Financial”); the policy tag is what you attach to a column. Enforce the taxonomy so tags actually gate access.
Create the taxonomy and tags in the same region as the data (a taxonomy in us cannot tag a table in eu):
# Create the taxonomy with access control enforced
gcloud data-catalog taxonomies create \
--location=us \
--display-name="Data Sensitivity" \
--description="Column classification for analytics zone" \
--activated-policy-types=FINE_GRAINED_ACCESS_CONTROL \
--project=acme-data-prod
# Capture the taxonomy resource ID it returns, then add policy tags
export TAXONOMY="projects/acme-data-prod/locations/us/taxonomies/1234567890123456789"
gcloud data-catalog taxonomies policy-tags create \
--taxonomy="$TAXONOMY" \
--location=us \
--display-name="pii_high"
gcloud data-catalog taxonomies policy-tags create \
--taxonomy="$TAXONOMY" \
--location=us \
--display-name="pii_low"
Grant the Fine-Grained Reader role on the policy tag (not the dataset) to the principals allowed to read tagged columns in the clear. This is the only role that lets a query return a tagged column’s raw value:
# Grant fine-grained read on a single policy tag
gcloud data-catalog taxonomies policy-tags add-iam-policy-binding \
"projects/acme-data-prod/locations/us/taxonomies/1234567890123456789/policyTags/pii_high_tag_id" \
--member="group:pii-cleared@acme.com" \
--role="roles/datacatalog.categoryFineGrainedReader"
The role lives in the
datacatalognamespace even though you consume it in BigQuery:roles/datacatalog.categoryFineGrainedReader. Do not confuse it withroles/datacatalog.viewer, which does nothing for column access.
3. Attach policy tags to columns
You cannot set policy tags in a CREATE TABLE DDL statement. Apply them by patching the table schema with bq update (or tables.patch via the API / Terraform). Export the current schema, add the policyTags block to the sensitive columns, and push it back:
bq show --schema --format=prettyjson acme-data-prod:raw.customers > schema.json
Edit the columns you want to protect so they carry the tag resource name:
[
{ "name": "customer_id", "type": "INT64" },
{
"name": "email",
"type": "STRING",
"policyTags": {
"names": ["projects/acme-data-prod/locations/us/taxonomies/1234567890123456789/policyTags/pii_high_tag_id"]
}
},
{
"name": "ssn",
"type": "STRING",
"policyTags": {
"names": ["projects/acme-data-prod/locations/us/taxonomies/1234567890123456789/policyTags/pii_high_tag_id"]
}
},
{ "name": "country", "type": "STRING" }
]
bq update acme-data-prod:raw.customers schema.json
A column carries at most one policy tag. From this moment, anyone without Fine-Grained Reader on pii_high who runs SELECT * gets Access Denied: ... due to column-level security, while SELECT customer_id, country succeeds. Column-level security fails closed: the query errors rather than silently dropping columns, which is what you want for a hard boundary.
4. Add dynamic data masking instead of blocking
Hard blocking is right for ssn. But many use cases want the analyst to see a masked form of email for joins and bucketing without exposing the real address. That is dynamic data masking: a data policy bound to a policy tag that rewrites the value at query time based on the caller’s role.
Masking introduces a second role. With both roles in play on the same tag, precedence matters:
- Fine-Grained Reader (
roles/datacatalog.categoryFineGrainedReader) sees the raw value. - Masked Reader (
roles/bigquerydatapolicy.maskedReader) sees the masked value. - A user with neither, on a tagged column, is blocked (column-level security).
Create a data policy on the tag and pick a predefined masking rule. The built-in rules are: Nullify, Default masking value, SHA-256 hash, Random (salted) hash, Email mask, Date year mask, First four characters, Last four characters, and Custom routine.
# Email mask: keeps the domain, replaces the local part (e.g. XXXXX@acme.com)
gcloud data-catalog policy-tags create-data-policy \
--location=us \
--data-policy-id=mask_email_policy \
--policy-tag="projects/acme-data-prod/locations/us/taxonomies/1234567890123456789/policyTags/pii_low_tag_id" \
--data-policy-type=DATA_MASKING_POLICY \
--masking-expression=EMAIL_MASK \
--project=acme-data-prod
# Grant Masked Reader to the broad analyst group on the masked tag
gcloud data-catalog policy-tags add-iam-policy-binding \
"projects/acme-data-prod/locations/us/taxonomies/1234567890123456789/policyTags/pii_low_tag_id" \
--member="group:analysts@acme.com" \
--role="roles/bigquerydatapolicy.maskedReader"
The query text never changes. An analyst runs SELECT email FROM raw.customers and receives XXXXX@acme.com; a cleared user runs the same statement and receives the real address. No view, no copy.
Masking gotcha: a masked column can still be used in
WHERE,GROUP BY, andJOIN, but the predicate operates on the masked value for a Masked Reader.WHERE email = 'jane@acme.com'returns nothing for them because they only ever seeXXXXX@acme.com. Tell your analysts, or they will file a bug against you.
Routine-based (custom) masking
When the built-in rules do not fit, point the data policy at a custom masking routine – a SQL UDF that takes the column value and returns the masked form. This is how you implement format-preserving masks, partial card numbers, or region-specific redaction.
-- A deterministic partial-mask UDF: show last 4, redact the rest
CREATE FUNCTION `acme-data-prod.masking.last4` (val STRING)
RETURNS STRING
OPTIONS (data_governance_type = 'DATA_MASKING')
AS (
CONCAT(REPEAT('X', GREATEST(LENGTH(val) - 4, 0)), RIGHT(val, 4))
);
The OPTIONS (data_governance_type = 'DATA_MASKING') clause is mandatory – BigQuery refuses to bind a routine to a data policy without it, which prevents an arbitrary UDF from being smuggled into the masking path. Bind it by setting the data policy type to the routine instead of a predefined expression, then grant Masked Reader as before.
5. Filter rows with row access policies
Column controls protect what fields; row access policies protect which rows. They are the right tool for regional data residency and multi-tenant isolation. The DDL is a first-class statement – no schema editing:
-- EU analysts see only EU rows
CREATE ROW ACCESS POLICY eu_rows
ON `acme-data-prod.raw.customers`
GRANT TO ('group:eu-analysts@acme.com')
FILTER USING (country IN ('FR', 'DE', 'ES', 'IT'));
-- US analysts see only US rows
CREATE ROW ACCESS POLICY us_rows
ON `acme-data-prod.raw.customers`
GRANT TO ('group:us-analysts@acme.com')
FILTER USING (country = 'US');
Two behaviors define correct predicate design:
- Policies are a union (OR) across all policies a user matches. If a user is in both groups, they see EU and US rows.
- The moment a table has at least one row access policy, any user who is not in any policy’s grantee list sees zero rows – even a project owner. This is the most common “my dashboard went blank” incident. Add a deliberate escape hatch for admins:
CREATE ROW ACCESS POLICY admin_all
ON `acme-data-prod.raw.customers`
GRANT TO ('group:data-admins@acme.com')
FILTER USING (TRUE);
For scale, drive the predicate from a mapping table instead of hardcoding groups – one policy that resolves the caller’s allowed regions with SESSION_USER():
CREATE ROW ACCESS POLICY region_by_lookup
ON `acme-data-prod.raw.customers`
GRANT TO ('group:all-analysts@acme.com')
FILTER USING (
country IN (
SELECT region_code
FROM `acme-data-prod.security.user_region_map`
WHERE user_email = SESSION_USER()
)
);
Now onboarding a region is an INSERT into a table, not a DDL change reviewed by three people. SESSION_USER() returns the email of the principal running the query, evaluated per query.
6. Share across datasets with authorized views, datasets, and routines
Fine-grained controls live on the source table. To expose a curated slice to another team or project without granting them access to the raw dataset, use authorized views. An authorized view runs with the view’s identity, so consumers need read on the view’s dataset only – never on the source:
-- Curated, pre-aggregated view in a separate "marts" dataset
CREATE VIEW `acme-data-prod.marts.customer_summary` AS
SELECT country, COUNT(*) AS customers, APPROX_COUNT_DISTINCT(email) AS distinct_emails
FROM `acme-data-prod.raw.customers`
GROUP BY country;
Then authorize the view on the source dataset. Prefer authorized datasets, which authorize every view in marts at once and survive new views without re-running grants:
# Authorize the entire marts dataset against the raw dataset (one-time)
bq update --source_dataset=acme-data-prod:marts acme-data-prod:raw
Consumers get roles/bigquery.dataViewer on marts and can query customer_summary, but SELECT * FROM raw.customers returns Access Denied. Because the view’s identity reads the source, row access policies and column masks on raw.customers still apply as the view owner – so author the view with a service account whose grants reflect exactly what consumers should see, not your own admin identity.
For logic that must touch raw data behind a tighter contract, authorized routines (UDFs and stored procedures) do the same for callable code: the routine reads the protected table, the caller cannot.
CREATE FUNCTION `acme-data-prod.marts.risk_score`(cust_id INT64)
RETURNS FLOAT64
AS (
(SELECT some_score FROM `acme-data-prod.raw.customers` WHERE customer_id = cust_id)
);
# Authorize the routine against the source dataset
bq update --source_dataset=acme-data-prod:marts acme-data-prod:raw
Security warning, straight from the docs: an authorized stored procedure can run DDL and DML. A caller granted execute can create, modify, or delete objects in the source dataset, bypassing the IAM they would normally hit. Treat authorized stored procedures as a privilege boundary – code review them like production infra, and prefer authorized views or table-valued/scalar UDFs (which only read) when you do not need write side effects. Remote functions (Cloud Run / Cloud Functions endpoints) extend this same controlled-path idea outside SQL, but you own the egress and authentication on the remote side, so scope their service accounts tightly.
7. Audit who can see what, and who did
Controls you cannot prove are controls auditors reject. Three sources together give full coverage.
Enumerate the policies in place with INFORMATION_SCHEMA:
-- Every row access policy on tables in a dataset
SELECT table_name, row_access_policy_name, creation_time, last_modified_time
FROM `acme-data-prod.raw`.INFORMATION_SCHEMA.ROW_ACCESS_POLICIES
ORDER BY table_name;
-- Which columns carry a policy tag (look for non-null policy_tags)
SELECT table_name, column_name, policy_tags
FROM `acme-data-prod.raw`.INFORMATION_SCHEMA.COLUMNS
WHERE policy_tags IS NOT NULL;
See who actually read sensitive data via Cloud Audit Logs. Turn on Data Access logs for BigQuery (they are off by default and do incur cost), then query the read events. The policy tags referenced by a query are recorded, so you can answer “who touched pii_high last month”:
resource.type="bigquery_dataset"
logName:"cloudaudit.googleapis.com%2Fdata_access"
protoPayload.metadata.tableDataRead.policyId:*
Audit caveat to set expectations: creating and deleting a row access policy is logged (the policy name appears), but the
filter_expressionandgrantee_listare deliberately omitted from the log because they can contain sensitive identifiers. Listing/viewing policies is not logged at all. So Cloud Logging tells you a policy changed and who changed it – to know what it now says, you must readINFORMATION_SCHEMAor the DDL. Pair a scheduled query overROW_ACCESS_POLICIESwith the audit log to reconstruct the full picture.
Enterprise scenario
A European retail bank ran a single customers table feeding both a fraud team and a marketing analytics team. GDPR and the bank’s data-residency policy demanded three things at once: marketing must never see raw iban or national_id; analysts in each country may only see that country’s customers; and fraud investigators need raw values but every access must be auditable. The platform team’s first instinct – maintain a masked copy per country – collapsed under drift: thirteen tables, nightly copy jobs, and a reconciliation backlog that itself became a compliance finding.
They rebuilt it as a single governed table with layered controls. iban and national_id got the pii_high tag with a DATA_MASKING_POLICY (last-four custom routine) for marketing’s Masked Reader role, and Fine-Grained Reader granted only to the fraud group. Residency was a lookup-driven row access policy keyed on SESSION_USER(), so adding a country was an INSERT, not a migration. Marketing consumed everything through an authorized dataset of pre-aggregated views, never touching raw. The decisive simplification was the lookup-driven predicate – it killed the thirteen-table sprawl outright:
CREATE ROW ACCESS POLICY residency
ON `bank-prod.raw.customers`
GRANT TO ('group:country-analysts@bank.eu')
FILTER USING (
country_code IN (
SELECT country_code
FROM `bank-prod.security.analyst_residency`
WHERE user_email = SESSION_USER()
)
);
Data Access logs scoped to policyId gave the auditor a clean monthly report of every raw-PII read, and the masked-versus-raw split fell straight out of IAM role membership. Thirteen tables became one, the copy pipeline was deleted, and the next audit closed with no findings on access control.
Verify
Validate each layer as a non-privileged principal – impersonate a test service account that holds only table read, not the fine-grained roles. Self-testing as an owner hides every restriction.
# Impersonate a least-privilege identity for the checks below
gcloud config set auth/impersonate_service_account test-analyst@acme-data-prod.iam.gserviceaccount.com
- Column block:
SELECT ssn FROM raw.customers LIMIT 1returnsAccess Denied: ... column-level securityfor a user without Fine-Grained Reader. - Masking:
SELECT email FROM raw.customers LIMIT 1returns the masked form (XXXXX@acme.com) for a Masked Reader and the raw value for a Fine-Grained Reader. - Row filter: a EU-group identity running
SELECT DISTINCT country FROM raw.customersreturns only EU countries; an ungranted identity returns zero rows. - Authorized view: the consumer can
SELECT * FROM marts.customer_summarybut getsAccess Deniedonraw.customers. - Audit trail: after a test read, the Data Access log shows a
tableDataReadentry with the expectedpolicyId.
# Reset impersonation when done
gcloud config unset auth/impersonate_service_account