VantageCloud Lake on AWS: Network configuration

This post contains a cheat sheet summarising the key elements of the network setup you must choose to connect your Teradata VantageCloud Lake instance on AWS with your account. It also describes the AWS services Teradata supports for the connections. Below, you can find an explanation of all of them.

Teradata VantageCloud Lake on AWS - Cheat Sheet for Networking options

You can download a high-resolution network cheat sheet for VantageCloud Lake on AWS in a repository in my GitHub account.

Network components in detail

AWS Connectivity Options for VantageCloud Lake

On Prem-to-Cloud connection

AWS Direct Connect

An AWS Direct Connect links an internal network to an AWS Direct Connect location over a standard Ethernet fibre-optic cable. One end of the cable is connected to one of your routers, and the other is to an AWS Direct Connect router. From this connection, you can create virtual interfaces to public AWS services (e.g., Amazon S3) or a VPC, bypassing internet service providers in the network path.

AWS Direct Connect is available starting at 50 Mbps and scaling up to 100 Gbps. Thus, you must know what bandwidth you’ll need to plan for migrations and regular operations with VantageCloud Lake on AWS.

There are two types of AWS Direct Connect connections:

  • Dedicated Connection: A physical Ethernet connection associated with a single customer. Available port speeds are 1 Gbps, 10 Gbps, and 100 Gbps.
  • Hosted Connection: A physical Ethernet connection that an AWS Direct Connect Partner (e.g., Megaport) provisions on behalf of a customer. Available port speeds range from 50 Mbps to 10 Gbps.

Every type of connection has different limitations.

Regarding security, the AWS Direct Connect does not encrypt traffic by default because it is a private circuit; i.e., data does not flow over the public internet.

To encrypt data over a Direct Connect, you can:

  • Configure application-level encryption (e.g., TLS 1.2 encryption for TTU drivers),
  • Deploy an AWS Site-to-Site VPN over the Direct Connect to create an IPsec-encrypted tunnel, or
  • Leverage MACsec for Layer 2 encryption over a Dedicated Direct Connect. This service is only available in specific locations and at high bandwidths; consult the AWS documentation.

Direct Connect allows for bidirectional traffic. Thus, to secure this connection, you should place a firewall in either the co-location data centre or your on-premises data centre.

AWS Site-to-Site VPN

A Virtual Private Network (VPN) extends a private network across a public network. It enables users to send and receive data across shared or public networks as if they have connected their computing devices directly to the private network.

So, an AWS Site-to-Site VPN can connect on-premises networks to the cloud, attach two cloud service providers, and link different VPCs within the cloud, although Teradata does not support these two last patterns.

While you can use an AWS Site-to-Site VPN for the On Prem-to-cloud connection, it has a lower bandwidth than an AWS ExpressRoute, and its performance is less predictable since it runs over the public internet.

Regarding security, a VPN is the only connectivity method encrypted at the route level by default. AWS Site-to-Site VPNs support Internet Protocol security (IPsec) encrypted tunnels. You can also configure application-level encryption on top of the IPsec route-level encryption (e.g., TLS 1.2 encryption for TTU drivers).

On a separate note, Site-to-Site VPNs allow bidirectional traffic. Thus, it would be best to secure this connection through a firewall in your on-premise data centre.

Cloud-to-Cloud connection

We call “handshake” the communication between the VantageCloud Lake account and your AWS account; it is your means to access your data, load more, and consume it. Thus, this connection must:

  1. Be secure,
  2. Be fast and handle large amounts of data, and
  3. Preserve the built-in parallelism in VantageCloud Lake when possible. Letting the database handle the workload will improve its performance.
AWS PrivateLink

The AWS PrivateLink provides private connectivity between VPCs, AWS services, and on-premises networks without exposing traffic to the public internet or opening networks to one another.

PrivateLink endpoints appear inside your network with private IPs as part of your internal network. I.e., the traffic between the virtual network and the service travels the AWS backbone network.

The AWS PrivateLink lets you securely connect a VPC to Teradata VantageCloud Lake with a uni-directional traffic pattern. Thus, session traffic is initiated only from your side of the link, and VantageCloud Lake can’t start a connection back into your network.

Consequently, there is no need to configure firewall rules or special routing tables since the two sides of the network are not directly joined.

As for security, PrivateLink does not encrypt traffic because it is a private circuit — data does not flow on the public Internet. You can configure application-level encryption to encrypt data over a PrivateLink (e.g., TLS 1.2 encryption for TTU drivers).

On a separate note, if you want to use LDAP, which requires bi-directional traffic between your AWS account and the Vantage one, you must create additional PrivateLinks (sometimes referred to as “Reverse PrivateLinks” or “Customer PrivateLinks”). They will allow VantageCloud Lake to initiate traffic into your network.

QueryGrid or Data Copy also need bidirectional traffic. However, Teradata configures them on your behalf in the VantageCloud Lake account with bridge nodes instead of Reverse PrivateLinks. The bridge nodes allow aggregating traffic and reducing the number of PrivateLinks. This architecture requires that you deploy a server in your VPC.

Additionally, if you need your Lake instance to contact one of your object stores with OTFs, you will require a reverse PrivateLink. Note that if you require cross-cloud access, NOS and OTF requests will currently be gateway through the Valtix-controlled egress and ingress traffic, which may represent a bottleneck.

Public Internet

Notably, you can connect third-party applications and Viewpoint through the public internet. However, other solutions, such as QueryGrid or LDAP, must communicate with VantageCloud Lake through PrivateLink.

To access your VantageCloud Lake account through the public internet, you must provide the source CIDR block to whitelist access in the Console. Note that Lake doesn’t allow whitelisting of 0.0.0.0/0 CIDR block.

Bear in mind that you won’t need to perform additional network changes if you allow the source to access the public internet.

However, if you are accessing Lake outside your company network, you should use your company’s VPN to route traffic from the allowed CIDR block.

API Calls to NOS Buckets and OTFs

You can read (and write) with VantageCloud Lake from AWS S3 through a Teradata feature called NOS (Native Object Storage) Reads (and Writes). Additionally, Teradata supports reading and writing from data stored in Open Table Format (OTF).

To access an S3 bucket, you have to use its APIs. The AWS S3 API calls run through the public internet by default.

Teradata leverages AWS Gateway VPC Endpoints to secure the connections to AWS S3 buckets with its VantageCloud Lake instances. Additionally, Teradata configures Lake with internal IP addresses only (they don’t have public IPs). They also have Gateway Endpoints enabled. Thus, the traffic will travel through the AWS backbone network when you call the S3 API from Teradata VantageCloud Lake, with both the bucket and Lake in the same region. I.e., data never transfers through the public internet when you use NOS Reads and Writes from Teradata VantageCloud Lake into an S3 bucket in the same region as Lake.

Note that the S3 endpoint and the Gateway Endpoint are regional resources. So, the S3 API calls always run on the public internet if you access an S3 bucket in a region different from Vantage. You will also incur egress costs with cross-region reads as per the AWS Pricing Calculator. Furthermore, you will likely have a larger latency than when you access a bucket in the same region.

Incidentally, when you copy DSA Backups or Snapshots from a bucket within the Lake account and another bucket, you also use AWS S3 APIs. Therefore, the discussion in this section about where the traffic goes when you read from buckets also applies to this scenario.

Main security elements in VantageCloud architecture

In terms of security, there are five key aspects of VantageCloud Lake architecture:

  • Teradata configures the Compute Engine virtual machines that run the VantageCloud Lake instances with internal IP addresses only, i.e., they don’t have public IPs.
  • These virtual machines’ VPCs have AWS Gateway VPC Endpoints enabled.
  • NOS traffic initiated from these virtual machines stays within Teradata’s Virtual Network and goes to an AWS S3 API endpoint.
  • With Gateway Endpoints enabled, the AWS S3 API endpoint IP will resolve to an internal address, while Lake and the bucket are in the same region.
  • Teradata configures VantageCloud Lake to use the HTTPS call by default.

You have all the details in the NOS Orange Book to read and write data in NOS storage.

Network & Encryption

All Teradata network connections can be encrypted. Some examples of connectivity encryption options are:

  • Protocol transit encryption for SQLe (Quality of Protection [QOP] or TTU v17.10 + TLS).
  • All HTTP interfaces will be HTTPS (such as Viewpoint).
  • By default, all the SQLe traffic is encrypted with Teradata-provided cypher on the 1025 port and TLS1.2 certificate on the 443 port.

Even though Teradata Clients (Teradata Tools and Utilities or TTUs) are not specific to any Cloud Service Provider, when you consider the security of your connections, remember that you can enable TLS on TTUs 17.10 and above. Moreover, you can enable Teradata generic encryption (256 bits) for TTUs 16.20 onwards.

Other Service Cloud Providers

You can also find posts and cheat sheets on connecting VantageCloud Lake on Azure and GCP in this blog.


I updated this article on 25 July 2024 to add the link to the post about network configuration for VantageCloud Lake on GCP.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *