This exercise is part of Data Engineering and Machine Learning Fundamental training provided by Google. ‘Engineer Data in Google Cloud Challenge Lab’ is a non-guided lab under the quest ‘Engineer Data in Google Cloud’ from Google Skills Boost.
You have started a new role as a Data Engineer for TaxiCab Inc. You are expected to import some historical data to a working BigQuery dataset, and build a basic model that predicts fares based on information available when a new ride starts. Leadership is interested in building an app and estimating for users how much a ride will cost. The source data will be provided in your project.
As soon as you sit down at your desk and open your new laptop you receive your first assignment: build a basic BQML fare prediction model for leadership. Perform the following tasks to import and clean the data, then build the model and perform batch predictions with new data so that leadership can review model performance and make a go/no-go decision on deploying the app functionality.
Lab: https://www.cloudskillsboost.google/focuses/12379?parent=catalog
You’ve already completed the first step, and have created a dataset taxirides and imported the historical data to table, historical_taxi_rides_raw. This is data prior for rides to 2015. You may need to wait 1-3 minutes for the data to be fully populated in your project. To complete this task you will need to:
CREATE OR REPLACE TABLE taxirides.taxi_training_data_328 AS
SELECT
(tolls_amount + fare_amount) AS fare_amount_818,
pickup_datetime,
pickup_longitude AS pickuplon,
pickup_latitude AS pickuplat,
dropoff_longitude AS dropofflon,
dropoff_latitude AS dropofflat,
passenger_count AS passengers,
FROM
taxirides.historical_taxi_rides_raw
WHERE
RAND() < 0.001
AND trip_distance > 0
AND fare_amount >= 2
AND pickup_longitude > -78
AND pickup_longitude < -70
AND dropoff_longitude > -78
AND dropoff_longitude < -70
AND pickup_latitude > 37
AND pickup_latitude < 45
AND dropoff_latitude > 37
AND dropoff_latitude < 45
AND passenger_count > 0
Based on the data you have in taxirides.taxi_training_data_328 , build a BQML model that predicts fare_amount_818 . Call the model taxirides.fare_model_477 . Your model will need an RMSE of 10 or less to complete the task. Some helpful hints:
CREATE OR REPLACE MODEL taxirides.fare_model_477
TRANSFORM (
* EXCEPT(pickup_datetime),
ST_Distance(ST_GeogPoint(pickuplon, pickuplat),
ST_GeogPoint(dropofflon, dropofflat)) AS euclidean,
CAST(EXTRACT(DAYOFWEEK FROM pickup_datetime) AS STRING) AS dayofweek,
CAST(EXTRACT(HOUR FROM pickup_datetime) AS STRING) AS hourofday
)
OPTIONS(input_label_cols=['fare_amount_818'], model_type='linear_reg')
AS
SELECT * FROM `qwiklabs-gcp-03-24bc07c21bd0.taxirides.taxi_training_data_328`
Leadership is curious to see how well your model performs over new data, in this case, all of the data they’ve collected in 2015. This data is in taxirides.report_prediction_data. Only values known at prediction time are included in the table. Use ML.PREDICT and your model to predict Fare amount and store your results in a table called 2015_fare_amount_predictions
CREATE OR REPLACE TABLE taxirides.2015_fare_amount_predictions
AS
SELECT * FROM ML.PREDICT
(MODEL taxirides.fare_model_477,
(SELECT * FROM taxirides.report_prediction_data))