This example illustrates how to run a Flex Java re-identification Dataflow job in the Secured data warehouse and how to use Data Catalog policy tags to restrict access to confidential columns in the re-identified table.
It uses:
- The Secured data warehouse module to create the Secured data warehouse infrastructure,
- The de-identification template submodule to create the regional structured DLP template,
- A Dataflow flex template to deploy the re-identification job.
- A Dataflow flex template to deploy the de-identification job.
- A
crypto_key
andwrapped_key
pair. Contact your Security Team to obtain the pair. Thecrypto_key
location must be the same location used for thelocation
variable. - Pre-build Java Regional DLP De-identification and Re-identification flex templates. See Flex templates.
- The identity deploying the example must have permissions to grant role "roles/artifactregistry.reader" in the docker and python repos of the Flex templates.
- You need to create network and subnetwork in the data ingestion project configured for Private Google Access.
- You need to create network and subnetwork in the confidential project configured for Private Google Access.
- All the egress should be denied
- Allow only Restricted API Egress by TPC at 443 port
- Allow only Private API Egress by TPC at 443 port
- Allow ingress Dataflow workers by TPC at ports 12345 and 12346
- Allow egress Dataflow workers by TPC at ports 12345 and 12346
- Restricted Google APIs
- Private Google APIs
- Restricted gcr.io
- Restricted Artifact Registry
Name | Description | Type | Default | Required |
---|---|---|---|---|
access_context_manager_policy_id | The id of the default Access Context Manager policy. Can be obtained by running gcloud access-context-manager policies list --organization YOUR-ORGANIZATION_ID --format="value(name)" . |
number |
n/a | yes |
confidential_data_project_id | Project where the confidential datasets and tables are created. | string |
n/a | yes |
confidential_subnets_self_link | The URI of the subnetwork where Data Ingestion Dataflow is going to be deployed. | string |
n/a | yes |
crypto_key | The full resource name of the Cloud KMS key that wraps the data crypto key used by DLP. | string |
n/a | yes |
data_governance_project_id | The ID of the project in which the data governance resources will be created. | string |
n/a | yes |
data_ingestion_project_id | The ID of the project in which the data ingestion resources will be created. | string |
n/a | yes |
data_ingestion_subnets_self_link | The URI of the subnetwork where Data Ingestion Dataflow is going to be deployed. | string |
n/a | yes |
delete_contents_on_destroy | (Optional) If set to true, delete all the tables in the dataset when destroying the resource; otherwise, destroying the resource will fail if tables are present. | bool |
false |
no |
external_flex_template_project_id | Project id of the external project that host the flex Dataflow templates. | string |
n/a | yes |
java_de_identify_template_gs_path | The Google Cloud Storage gs path to the JSON file built flex template that supports DLP de-identification. | string |
n/a | yes |
java_re_identify_template_gs_path | The Google Cloud Storage gs path to the JSON file built flex template that supports DLP re-identification. | string |
n/a | yes |
non_confidential_data_project_id | Project with the de-identified dataset and table. | string |
n/a | yes |
org_id | GCP Organization ID. | string |
n/a | yes |
perimeter_additional_members | The list of all members to be added on perimeter access, except the service accounts created by this module. Prefix user: (user:[email protected]) or serviceAccount: (serviceAccount:[email protected]) is required. | list(string) |
n/a | yes |
sdx_project_number | The Project Number to configure Secure data exchange with egress rule for the flex Dataflow templates. | string |
n/a | yes |
terraform_service_account | The email address of the service account that will run the Terraform config. | string |
n/a | yes |
wrapped_key | The base64 encoded data crypto key wrapped by KMS. | string |
n/a | yes |
Name | Description |
---|---|
taxonomy_name | The taxonomy display name. |