Skip to content

Latest commit

 

History

History

bigquery-confidential-data

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Bigquery Sensitive Data Example

This example illustrates how to run a Flex Java re-identification Dataflow job in the Secured data warehouse and how to use Data Catalog policy tags to restrict access to confidential columns in the re-identified table.

It uses:

  • The Secured data warehouse module to create the Secured data warehouse infrastructure,
  • The de-identification template submodule to create the regional structured DLP template,
  • A Dataflow flex template to deploy the re-identification job.
  • A Dataflow flex template to deploy the de-identification job.

Requirements

  1. A crypto_key and wrapped_key pair. Contact your Security Team to obtain the pair. The crypto_key location must be the same location used for the location variable.
  2. Pre-build Java Regional DLP De-identification and Re-identification flex templates. See Flex templates.
  3. The identity deploying the example must have permissions to grant role "roles/artifactregistry.reader" in the docker and python repos of the Flex templates.
  4. You need to create network and subnetwork in the data ingestion project configured for Private Google Access.
  5. You need to create network and subnetwork in the confidential project configured for Private Google Access.

Firewall rules

  • All the egress should be denied
  • Allow only Restricted API Egress by TPC at 443 port
  • Allow only Private API Egress by TPC at 443 port
  • Allow ingress Dataflow workers by TPC at ports 12345 and 12346
  • Allow egress Dataflow workers by TPC at ports 12345 and 12346

DNS configuration

  • Restricted Google APIs
  • Private Google APIs
  • Restricted gcr.io
  • Restricted Artifact Registry

Inputs

Name Description Type Default Required
access_context_manager_policy_id The id of the default Access Context Manager policy. Can be obtained by running gcloud access-context-manager policies list --organization YOUR-ORGANIZATION_ID --format="value(name)". number n/a yes
confidential_data_project_id Project where the confidential datasets and tables are created. string n/a yes
confidential_subnets_self_link The URI of the subnetwork where Data Ingestion Dataflow is going to be deployed. string n/a yes
crypto_key The full resource name of the Cloud KMS key that wraps the data crypto key used by DLP. string n/a yes
data_governance_project_id The ID of the project in which the data governance resources will be created. string n/a yes
data_ingestion_project_id The ID of the project in which the data ingestion resources will be created. string n/a yes
data_ingestion_subnets_self_link The URI of the subnetwork where Data Ingestion Dataflow is going to be deployed. string n/a yes
delete_contents_on_destroy (Optional) If set to true, delete all the tables in the dataset when destroying the resource; otherwise, destroying the resource will fail if tables are present. bool false no
external_flex_template_project_id Project id of the external project that host the flex Dataflow templates. string n/a yes
java_de_identify_template_gs_path The Google Cloud Storage gs path to the JSON file built flex template that supports DLP de-identification. string n/a yes
java_re_identify_template_gs_path The Google Cloud Storage gs path to the JSON file built flex template that supports DLP re-identification. string n/a yes
non_confidential_data_project_id Project with the de-identified dataset and table. string n/a yes
org_id GCP Organization ID. string n/a yes
perimeter_additional_members The list of all members to be added on perimeter access, except the service accounts created by this module. Prefix user: (user:[email protected]) or serviceAccount: (serviceAccount:[email protected]) is required. list(string) n/a yes
sdx_project_number The Project Number to configure Secure data exchange with egress rule for the flex Dataflow templates. string n/a yes
terraform_service_account The email address of the service account that will run the Terraform config. string n/a yes
wrapped_key The base64 encoded data crypto key wrapped by KMS. string n/a yes

Outputs

Name Description
taxonomy_name The taxonomy display name.