Data Engineering on Google Cloud Platform, Reston

Monday, February 5, 2018 - 09:00
Add to Calendar
Data Engineering on Google Cloud Platform, Reston
05 Feb 2018 - 09:00 AM05 Feb 2018 - 09:00 AM
Reston Reston, VA United States
Reston,

on Google Cloud Platform

(4 days)

This four-day instructor-led class provides participants a hands-on introduction to designing and building on Google Cloud Platform. Through a combination of presentations, demos, and hand-on labs, participants will how to , build end-to-end pipelines, analyze and carry out . The course covers structured, unstructured, and streaming . A laptop is required for all workshops and will not be provided.

Objectives

This course teaches participants the following skills:

  • and build on Google Cloud Platform
  • and streaming by implementing autoscaling pipelines on Cloud Dataflow
  • Derive business from extremely large datasets using Google BigQuery
  • Train, evaluate and predict using models using and Cloud
  • Leverage unstructured using and APIs on Cloud Dataproc
  • Enable instant from streaming

Audience

This class is intended for experienced developers who are responsible for managing transformations including:

  • Extracting, Loading, Transforming, cleaning, and validating
  • Designing pipelines and architectures for
  • Creating and maintaining and models
  • Querying datasets, visualizing query results and creating reports

Prerequisites

To get the most of out of this course, participants should have:

  • Completed Google Cloud Fundamentals- and course OR have equivalent experience
  • Basic proficiency with common such as SQL
  • Experience with modeling, extract, transform, load activities Developing applications using a common such
  • Familiarity with and/or

Course Outline

Module 1: Google Cloud Dataproc Overview

  • Creating and managing clusters.
  • Leveraging custom types and preemptible worker nodes.
  • Scaling and deleting Clusters.
  • Lab: Creating Hadoop Clusters with Google Cloud Dataproc.

Module 2: Running Dataproc Jobs

  • Running Pig and Hive jobs.
  • of storage and compute.
  • Lab: Running Hadoop and Jobs with Dataproc.
  • Lab: Submit and monitor jobs.

Module 3: Integrating Dataproc with Google Cloud Platform

  • Customize with initialization actions.
  • BigQuery .
  • Lab: Leveraging Google Cloud Platform Services.

Module 4: Making of Unstructured with Googles APIs

  • Googles APIs.
  • Common Use Cases.
  • Invoking APIs.
  • Lab: Adding Capabilities to .

Module 5: Serverless with BigQuery

  • What is BigQuery.
  • Queries and Functions.
  • Lab: Writing queries in BigQuery.
  • Loading into BigQuery.
  • Exporting from BigQuery.
  • Lab: Loading and exporting .
  • Nested and repeated fields.
  • Querying multiple tables.
  • Lab: Complex queries.
  • Performance and pricing.

Module 6: Serverless, autoscaling pipelines with Dataflow

  • The Beam .
  • pipelines in Beam .
  • pipelines in Beam Java.
  • Lab: Writing a Dataflow pipeline.
  • Scalable using Beam.
  • Lab: MapReduce in Dataflow.
  • Incorporating additional data.
  • Lab: Side inputs.
  • Handling stream data.
  • GCP Reference .

Module 7: Getting started with

  • What is ().
  • Effective : concepts, types.
  • datasets: generalization.
  • Lab: Explore and create datasets.

Module 8: Building models with

  • Getting started with .
  • Lab: Using tf..
  • graphs and loops + lab.
  • Lab: Using low-level + early stopping.
  • Monitoring training.
  • Lab: Charts and graphs of training.

Module 9: Scaling models with CloudML

  • Why Cloud ML?
  • Packaging up a .
  • End-to-end training.
  • Lab: Run a locally and on cloud.

Module 10:

  • Creating good features.
  • Transforming inputs.
  • features.
  • Preprocessing with Cloud .
  • Lab: .

Module 11: of streaming pipelines

  • Stream : Challenges.
  • Handling variable volumes.
  • Dealing with unordered/late .
  • Lab: Designing streaming pipeline.

Module 12: Ingesting Variable Volumes

  • What is Cloud Pub/Sub?
  • How it works: Topics and Subscriptions.
  • Lab: Simulator.

Module 13: Implementing streaming pipelines

  • Challenges in stream .
  • Handle late : watermarks, triggers, accumulation.
  • Lab: Stream pipeline for live traffic .

Module 14: Streaming and dashboards

  • Streaming : from to decisions.
  • Querying streaming with BigQuery.
  • What is Google ?
  • Lab: build a real-time dashboard to visualize processed .

Module 15: High throughput and low-latency with Bigtable

  • What is Cloud Spanner?
  • Designing Bigtable schema.
  • Ingesting into Bigtable.
  • Lab: streaming into Bigtable.
Share:

on Google Cloud Platform

(4 days)

This four-day instructor-led class provides participants a hands-on introduction to designing and building on Google Cloud Platform. Through a combination of presentations, demos, and hand-on labs, participants will how to , build end-to-end pipelines, analyze and carry out . The course covers structured, unstructured, and streaming . A laptop is required for all workshops and will not be provided.

Objectives

This course teaches participants the following skills:

  • and build on Google Cloud Platform
  • and streaming by implementing autoscaling pipelines on Cloud Dataflow
  • Derive business from extremely large datasets using Google BigQuery
  • Train, evaluate and predict using models using and Cloud
  • Leverage unstructured using and APIs on Cloud Dataproc
  • Enable instant from streaming

Audience

This class is intended for experienced developers who are responsible for managing transformations including:

  • Extracting, Loading, Transforming, cleaning, and validating
  • Designing pipelines and architectures for
  • Creating and maintaining and models
  • Querying datasets, visualizing query results and creating reports

Prerequisites

To get the most of out of this course, participants should have:

  • Completed Google Cloud Fundamentals- and course OR have equivalent experience
  • Basic proficiency with common such as SQL
  • Experience with modeling, extract, transform, load activities Developing applications using a common such
  • Familiarity with and/or

Course Outline

Module 1: Google Cloud Dataproc Overview

  • Creating and managing clusters.
  • Leveraging custom types and preemptible worker nodes.
  • Scaling and deleting Clusters.
  • Lab: Creating Hadoop Clusters with Google Cloud Dataproc.

Module 2: Running Dataproc Jobs

  • Running Pig and Hive jobs.
  • of storage and compute.
  • Lab: Running Hadoop and Jobs with Dataproc.
  • Lab: Submit and monitor jobs.

Module 3: Integrating Dataproc with Google Cloud Platform

  • Customize with initialization actions.
  • BigQuery .
  • Lab: Leveraging Google Cloud Platform Services.

Module 4: Making of Unstructured with Googles APIs

  • Googles APIs.
  • Common Use Cases.
  • Invoking APIs.
  • Lab: Adding Capabilities to .

Module 5: Serverless with BigQuery

  • What is BigQuery.
  • Queries and Functions.
  • Lab: Writing queries in BigQuery.
  • Loading into BigQuery.
  • Exporting from BigQuery.
  • Lab: Loading and exporting .
  • Nested and repeated fields.
  • Querying multiple tables.
  • Lab: Complex queries.
  • Performance and pricing.

Module 6: Serverless, autoscaling pipelines with Dataflow

  • The Beam .
  • pipelines in Beam .
  • pipelines in Beam Java.
  • Lab: Writing a Dataflow pipeline.
  • Scalable using Beam.
  • Lab: MapReduce in Dataflow.
  • Incorporating additional data.
  • Lab: Side inputs.
  • Handling stream data.
  • GCP Reference .

Module 7: Getting started with

  • What is ().
  • Effective : concepts, types.
  • datasets: generalization.
  • Lab: Explore and create datasets.

Module 8: Building models with

  • Getting started with .
  • Lab: Using tf..
  • graphs and loops + lab.
  • Lab: Using low-level + early stopping.
  • Monitoring training.
  • Lab: Charts and graphs of training.

Module 9: Scaling models with CloudML

  • Why Cloud ML?
  • Packaging up a .
  • End-to-end training.
  • Lab: Run a locally and on cloud.

Module 10:

  • Creating good features.
  • Transforming inputs.
  • features.
  • Preprocessing with Cloud .
  • Lab: .

Module 11: of streaming pipelines

  • Stream : Challenges.
  • Handling variable volumes.
  • Dealing with unordered/late .
  • Lab: Designing streaming pipeline.

Module 12: Ingesting Variable Volumes

  • What is Cloud Pub/Sub?
  • How it works: Topics and Subscriptions.
  • Lab: Simulator.

Module 13: Implementing streaming pipelines

  • Challenges in stream .
  • Handle late : watermarks, triggers, accumulation.
  • Lab: Stream pipeline for live traffic .

Module 14: Streaming and dashboards

  • Streaming : from to decisions.
  • Querying streaming with BigQuery.
  • What is Google ?
  • Lab: build a real-time dashboard to visualize processed .

Module 15: High throughput and low-latency with Bigtable

  • What is Cloud Spanner?
  • Designing Bigtable schema.
  • Ingesting into Bigtable.
  • Lab: streaming into Bigtable.

Location

Event Time: 
Monday, February 5, 2018 - 09:00
Add to Calendar
Event Day: 
February 5 Monday
Address: 
Reston Reston, VA United States
City/State: 
Reston,
Organizer: 
ROI Training, Inc