July 2018

My day at Google Cloud OnBoard

On the 12th of March 2018, I had the pleasure of participating in the Cloud OnBoard event, which is a free and introductory training for Google Cloud Platform (GCP), in London. The event was created for IT Managers, System Engineers and Operations professionals, Developers, Solution Architects and business leaders searching for Cloud solutions.

Why Google Cloud?

There are a few reasons to consider Google:

  • Google has been contributing significantly to the open source world throughout the years, namely:
    • Google Container Engine, Kubernetes, which is now open source.
    • TensorFlow, an open source framework for machine learning.
    • Apache Beam, which works as a model of portable, unifying and extensible ETL, that allows batch processing and streaming.
    • Some strong contributions on Node.js.
    • More than 2000 contributions in open source projects.
  • Currently, most of my tasks as a consultant are undertaken in a well-known bank, and it is public knowledge that this institution is linked to Google.
  • Other big companies and start-ups use GCP, such as Snapchat, Revolut, Philips, Ocado, BNP, etc.

During this training, I discovered a few more reasons to consider Google:

  • Google cares about the environment:
    • It has the first data centres certified to ISO 14001
    • It is 100% carbon neutral and has used renewable energy since 2017.
  • Payments are processed by the second, not by the minute.
  • Huge discounts exist for bigger commitments or 24/7 use of Google.
  • You can customise machines if the standard ones do not meet your needs.
  • It is possible to escalate up and down right to zero (this is automatically managed, based on requests or the amount of data).
  • Multiple safety layers from HTTPS to storage (not even Google can access data: the client has a master key through their password).
  • Google also creates hardware. As of right now, it has TPUs (supposedly, more hardware development is expected in the roadmap).
  • The long-term goal is to change the virtualization to a No-Ops (no server)

About the Cloud OnBoard event: The event discusses the main technologies available in GCP, from development to analysis.

Computing

App Engine

I remember that when GCP was launched, App Engine was one of the main products, even though most people did not really understand the concept. Essentially, I see it as a PaaS to develop apps in which Node.js, Java, Ruby, C#, Python and PHP are supported by the product from scratch. Some interesting functionalities are:

  • It is possible to get the app and deploy on-premise, for example, by using a Docker container or Kubernetes.
  • A control version of the app is available. For example, it is possible to use load balancing with a version for requests from Europe and another version for requests from the USA.
  • GCP automatically manages priorities based on requests (which may be defined by region). This functionality was shown in the training, by using Apache Benchmark. However, in my opinion, response was a little slow, maybe because it was such a small panel. Nevertheless, it proved its “scalability” both up and down to zero in a multi-region context, by using load balancing to alternate between versions of the app.
  • It is easy to integrate an app developed in App Engine with other GCP products, such as databases, dataflows, monitoring, etc.
  • In addition, Google provides a mobile app that allows you to manage GCP if, at some point, you cannot access your computer.

Computer Engine

This is part of what the GCP offers on virtual machines and is similar to Amazon EC2.

In this event, it was revealed that the acquisition of Al start-up DeepMind in 2014, for $500m, helped reduce the operation costs of Google Data Centres, to the point where they broke even.

Kubernetes Engine

Previously known as Google Container Engine, the name changed to Kubernetes when Google decided to make it open source. For those familiar with Docker, it is a well-known engine that allows deployment, management and scalability. Since it is a Google product, GCP is probably the best place to use Kubernetes on the Cloud.

Networking

Load Balancing

This is a normal functionality for any Cloud provider. The only reason I listed it here is because this product is used by Google services (Google Search, Gmail, Google Maps, Youtube…). If your website or app has times where it receives large numbers of requests, GCP does not need prior notice to be able to escalate automatically. Obviously, this also depends on the internally chosen architecture for the app.

Storage and Databases

Cloud Storage

It has a storage model similar to Google Drive, Dropbox, Box, but it is easier to integrate with other GCP apps. In this product, an item (PDF, DOC, MP3…) is unchangeable and easily scalable. This offer is very similar to Amazon’s S3.

Cloud SQL

Basically, this is MySQL prepared by Google to be scalable and with high performance. At the time of the event, PostgreSQL was only available in a beta version, but now it is widely available.

Cloud Spanner

Essentially, this is a database that escalates horizontally and is strongly consistent with relatable data (it would be interesting if Cloud Spanner were open source, but it still would not be guaranteed to have access to its full content, since it is necessary to have an exquisite infrastructure and to use the True Time API – see below). It is probably the dream database to support:

  • Transactions (globally consistent)
  • Automatic replications
  • SQL (Standard ANSI 2011 with add-ons)
  • Scalable
  • High availability.

This database puts into perspective one of the most well-known theorems in computer science: Brewer’s theorem or the CAP theorem. This theorem says that any distribution system is not free from network failures, therefore it is theoretically impossible to ensure, simultaneously, three requirements in a database: consistency, availability and partition tolerance.

How did Google manage something theoretically impossible? Google data centres use a special API, called TrueTime, which is Google’s global synchronization watch. However, a big part comes from the exquisite infrastructure Google has. For those who would like to know more, you can read the papers here and here.

Big Table

A widely known NoSQL database, since it is used by Google Services such as Google Search, Gmail, Google Maps, Google Analytics… Even though the NoSQL database is not open source, there is a very similar one, HBase.

Big Data

DataFlow

A service used to transform and enrich data in stream and batch modes that works as an ETL tool. The main functionality of DataFlow is Apache Beam support, which gives the ability to develop pipelines (by using SDKs in Java and/or Python) on-premise and easily change to GCP with DataFlow. 

Dataproc

Google’s take on Apache Spark and Apache Hadoop and provides other tools from the Hadoop ecosystem, such as Hive. GCP uses open source code as its base, but performs a few alterations to be able to connect to some of its other products, such as Cloud Storage.

Pub/Sub

The GCP service for streaming focused on events. Its equivalent would be Apache Kafka, if you are searching for an on-premise solution.

Artificial Intelligence

TensorFlow

GCP has several machine learning services, and the majority use an open source framework, TensorFlow.

Similarly to Kubenetes and Apache Beam, you can use TensorFlow on-premise and, when you feel comfortable, you can migrate to a CGP. Furthermore, it is possible to use GCP’s APIs, which already have models trained for detecting items in an image, translating text, speech, extracting video metadata, etc.

Lastly, there follows a comparison of Services in GCP, AWS and Azure

Google Cloud Platform Amazon Web Services Microsoft Azure
Google Compute Engine Amazon EC2 Azure Virtual Machines
Google App Engine AWS Elastic Beanstalk Azure Cloud Services
Google Kubernetes Engine   . Amazon EC2 Container Service   . Azure Container Service
Google Cloud Bigtable Amazon DynamoDB Azure Cosmos DB
Google BigQuery Amazon Redshift Azure SQL Data Warehouse
Google Cloud Functions AWS Lambda Azure Functions
Google Cloud Datastore Amazon DynamoDB Cosmos DB
Google Storage Amazon S3 Azure Blob Storage
Google Cloud Dataflow AWS Glue / Kinesis / EMR Azure Data Factory / Stream Analytics Data Lake Analytics
Google Cloud Dataproc Amazon EMR Azure HDInsight

Joel Latino

Senior BI & Big Data Consultant, Xpand IT

Joel LatinoMy day at Google Cloud OnBoard
read more

Artificial Intelligence: The Future is now

It’s not exactly breaking news that the concept of Artificial Intelligence (AI) has been gaining some ground. A new wave of platforms that achieve maximum performance using the last generation of processors are obtaining really positive results. However, the question still stands: what defines this subject and what practical uses does it offer?

Ana PaneiroArtificial Intelligence: The Future is now
read more

Xpand IT opens new office in the USA

Lisbon, Viana do Castelo, Porto, Braga, London … and now, San Francisco. This is Xpand IT’s sixth office, a move that represents an investment in a city outside Europe, looking to respond to the quick growth that the product area has been experiencing all over the world, but particularly in the USA.

Ana PaneiroXpand IT opens new office in the USA
read more

Welcome to Tableau Prep!

According to a Harvard Business Review article from 2017, people spend 80% of their time prepping data and only 20% analysing it. Tableau acknowledged that problem and came up with a simple solution: Tableau Prep.

Ana PaneiroWelcome to Tableau Prep!
read more

Big Data: the state of the art

Xpand IT cannot define the state of the art of Big Data without reflecting upon the huge annual increase in the adoption of Big Data technologies, from which we highlight the Confluent and Cloudera platforms.

Nuno BarretoBig Data: the state of the art
read more

GDPR – Our commitment

GDPR is the trending abbreviation and does not need any introduction. The new General Data Protection Regulation (GDPR) entered into force on the 25th of May and marks a new era concerning data and personal data processing. It concerns companies and individuals, in and outside of the European Union, as long as a European citizen is part of the transaction.

Ana PaneiroGDPR – Our commitment
read more

Blockchain: society’s new paradigm

Blockchain has been much discussed recently. We decided to gather our three Blockchain experts to talk about the subject in order to clarify its concept, the differences between public Blockchain and private Blockchain and its practical uses in some industries, such as in Health and in Supply Chain.

Ana PaneiroBlockchain: society’s new paradigm
read more