On the 12th of March 2018, I had the pleasure of participating in the Cloud OnBoard event, which is a free and introductory training for Google Cloud Platform (GCP), in London. The event was created for IT Managers, System Engineers and Operations professionals, Developers, Solution Architects and business leaders searching for Cloud solutions.
Why Google Cloud?
There are a few reasons to consider Google:
- Google has been contributing significantly to the open source world throughout the years, namely:
- Google Container Engine, Kubernetes, which is now open source.
- TensorFlow, an open source framework for machine learning.
- Apache Beam, which works as a model of portable, unifying and extensible ETL, that allows batch processing and streaming.
- Some strong contributions on Node.js.
- More than 2000 contributions in open source projects.
- Currently, most of my tasks as a consultant are undertaken in a well-known bank, and it is public knowledge that this institution is linked to Google.
- Other big companies and start-ups use GCP, such as Snapchat, Revolut, Philips, Ocado, BNP, etc.
During this training, I discovered a few more reasons to consider Google:
- Google cares about the environment:
- It has the first data centres certified to ISO 14001
- It is 100% carbon neutral and has used renewable energy since 2017.
- Payments are processed by the second, not by the minute.
- Huge discounts exist for bigger commitments or 24/7 use of Google.
- You can customise machines if the standard ones do not meet your needs.
- It is possible to escalate up and down right to zero (this is automatically managed, based on requests or the amount of data).
- Multiple safety layers from HTTPS to storage (not even Google can access data: the client has a master key through their password).
- Google also creates hardware. As of right now, it has TPUs (supposedly, more hardware development is expected in the roadmap).
- The long-term goal is to change the virtualization to a No-Ops (no server)
About the Cloud OnBoard event: The event discusses the main technologies available in GCP, from development to analysis.
I remember that when GCP was launched, App Engine was one of the main products, even though most people did not really understand the concept. Essentially, I see it as a PaaS to develop apps in which Node.js, Java, Ruby, C#, Python and PHP are supported by the product from scratch. Some interesting functionalities are:
- It is possible to get the app and deploy on-premise, for example, by using a Docker container or Kubernetes.
- A control version of the app is available. For example, it is possible to use load balancing with a version for requests from Europe and another version for requests from the USA.
- GCP automatically manages priorities based on requests (which may be defined by region). This functionality was shown in the training, by using Apache Benchmark. However, in my opinion, response was a little slow, maybe because it was such a small panel. Nevertheless, it proved its “scalability” both up and down to zero in a multi-region context, by using load balancing to alternate between versions of the app.
- It is easy to integrate an app developed in App Engine with other GCP products, such as databases, dataflows, monitoring, etc.
- In addition, Google provides a mobile app that allows you to manage GCP if, at some point, you cannot access your computer.
This is part of what the GCP offers on virtual machines and is similar to Amazon EC2.
In this event, it was revealed that the acquisition of Al start-up DeepMind in 2014, for $500m, helped reduce the operation costs of Google Data Centres, to the point where they broke even.
Previously known as Google Container Engine, the name changed to Kubernetes when Google decided to make it open source. For those familiar with Docker, it is a well-known engine that allows deployment, management and scalability. Since it is a Google product, GCP is probably the best place to use Kubernetes on the Cloud.
This is a normal functionality for any Cloud provider. The only reason I listed it here is because this product is used by Google services (Google Search, Gmail, Google Maps, Youtube…). If your website or app has times where it receives large numbers of requests, GCP does not need prior notice to be able to escalate automatically. Obviously, this also depends on the internally chosen architecture for the app.
Storage and Databases
It has a storage model similar to Google Drive, Dropbox, Box, but it is easier to integrate with other GCP apps. In this product, an item (PDF, DOC, MP3…) is unchangeable and easily scalable. This offer is very similar to Amazon’s S3.
Basically, this is MySQL prepared by Google to be scalable and with high performance. At the time of the event, PostgreSQL was only available in a beta version, but now it is widely available.
Essentially, this is a database that escalates horizontally and is strongly consistent with relatable data (it would be interesting if Cloud Spanner were open source, but it still would not be guaranteed to have access to its full content, since it is necessary to have an exquisite infrastructure and to use the True Time API – see below). It is probably the dream database to support:
- Transactions (globally consistent)
- Automatic replications
- SQL (Standard ANSI 2011 with add-ons)
- High availability.
This database puts into perspective one of the most well-known theorems in computer science: Brewer’s theorem or the CAP theorem. This theorem says that any distribution system is not free from network failures, therefore it is theoretically impossible to ensure, simultaneously, three requirements in a database: consistency, availability and partition tolerance.
How did Google manage something theoretically impossible? Google data centres use a special API, called TrueTime, which is Google’s global synchronization watch. However, a big part comes from the exquisite infrastructure Google has. For those who would like to know more, you can read the papers here and here.
A widely known NoSQL database, since it is used by Google Services such as Google Search, Gmail, Google Maps, Google Analytics… Even though the NoSQL database is not open source, there is a very similar one, HBase.
A service used to transform and enrich data in stream and batch modes that works as an ETL tool. The main functionality of DataFlow is Apache Beam support, which gives the ability to develop pipelines (by using SDKs in Java and/or Python) on-premise and easily change to GCP with DataFlow.
Google’s take on Apache Spark and Apache Hadoop and provides other tools from the Hadoop ecosystem, such as Hive. GCP uses open source code as its base, but performs a few alterations to be able to connect to some of its other products, such as Cloud Storage.
The GCP service for streaming focused on events. Its equivalent would be Apache Kafka, if you are searching for an on-premise solution.
GCP has several machine learning services, and the majority use an open source framework, TensorFlow.
Similarly to Kubenetes and Apache Beam, you can use TensorFlow on-premise and, when you feel comfortable, you can migrate to a CGP. Furthermore, it is possible to use GCP’s APIs, which already have models trained for detecting items in an image, translating text, speech, extracting video metadata, etc.
Lastly, there follows a comparison of Services in GCP, AWS and Azure
|Google Cloud Platform
||Amazon Web Services
|Google Compute Engine
||Azure Virtual Machines
|Google App Engine
||AWS Elastic Beanstalk
||Azure Cloud Services
|Google Kubernetes Engine .
||Amazon EC2 Container Service .
||Azure Container Service
|Google Cloud Bigtable
||Azure Cosmos DB
||Azure SQL Data Warehouse
|Google Cloud Functions
|Google Cloud Datastore
||Azure Blob Storage
|Google Cloud Dataflow
||AWS Glue / Kinesis / EMR
||Azure Data Factory / Stream Analytics Data Lake Analytics
|Google Cloud Dataproc
Senior BI & Big Data Consultant, Xpand IT