You can find here an exercise to test Fuzzy Magic in Google Cloud with Cloud Dataprep by Trifacta. The goal is to standardise the company names in a file (VANILLA Ltd, *** VANILLA LTD ***, vanilla ltd, vanila ltd, Vanilla Ltd., etc.) before uploading them to the target table in BigQuery.
Cloud Dataprep is the combination of Trifacta software for data preparation and Cloud Dataflow to upload the data. As a reminder, Cloud Dataflow is an Apache Beam runner.
Leave a Reply