Celia Muriel https://celiamuriel.com/ Deep-sea data diving https://celiamuriel.com/wp-content/uploads/2022/03/cropped-Celia-Muriel-logo-1-e1646664342519-1.png 6987189DB81CB618F223396A679B6B05 VantageCloud Lake on Azure: Network configuration https://celiamuriel.com/vantagecloud-lake-on-azure-network-configuration/ Cheat sheet with the key network elements you need to connect with your Teradata VantageCloud Lake on Azure and a detailed explanation. Celia Cloud Tue, 16 Apr 2024 15:12:06 +0200

VantageCloud Lake on Azure: Network configuration

This post contains a cheat sheet summarising the key elements of the network setup you must choose to connect your Teradata VantageCloud Lake instance on Azure with your account. It also describes the Azure services Teradata supports for the connections. Below, you can find an explanation of all of them.

Cheat Sheet for the Network options for VantageCloud Lake on Azure

Teradata VantageCloud Lake on Azure Network cheat sheet
VantageCloud Lake on Azure cheat sheet for Networking options

You can download a high-resolution network cheat sheet for VantageCloud Lake on Azure in a repository in my GitHub account.

Network components in detail

Azure Connectivity Options for VantageCloud Lake

On Prem-to-Cloud connection

Azure ExpressRoute

Azure ExpressRoute links an internal network to an Azure ExpressRoute location over a standard Ethernet fibre-optic cable. One end of the cable is connected to your router in your data centre, and the other to an Azure ExpressRoute router. From this connection, you can create virtual interfaces directly to public Azure services (e.g., Blob Storage, Azure Data Lake Storage, etc.) or a VNet, bypassing internet service providers in the network path.

ExpressRoute is the primary method customers use to connect on-prem networks to their Azure networks. It is also Teradata’s preferred option because it provides higher bandwidth and more predictable performance than a VPN.

Azure ExpressRoute is available at speeds starting at 50 Mbps and scaling up to 10 Gbps. Thus, you must know what bandwidth you’ll have to plan for migrations and regular operations with VantageCloud Lake on Azure.

Furthermore, you can connect multiple ExpressRoute circuits to a VNet if you need more than 10 Gbps connectivity. – The default limit is 10, and you can achieve up to 100 Gbps.

On a separate note, you can connect an ExpressRoute Gateway to the VNet, which allows traffic to flow between the edge router and the individual Vnet.

The speeds that the Gateway allows, the number of circuits it can support, and its cost will depend on the Stock Keeping Unit (SKU) you choose for the individual Gateway.

Alternatively, you can directly route traffic from an ExpressRoute to VantageCloud Lake or through an Azure Private Link.

Regarding security, Azure ExpressRoute does not encrypt traffic by default because it is a private circuit. I.e., data does not flow over the public internet.

To encrypt data over an ExpressRoute, you can:

  • Configure application-level encryption (e.g., TLS 1.2 encryption for TTU drivers),
  • Deploy an Azure VPN over the ExpressRoute to create an IPsec-encrypted tunnel, or
  • Leverage MACsec for Layer 2 encryption over an ExpressRoute Direct connection (only available in specific locations and at high bandwidths).

Consider that Teradata can initiate traffic over an ExpressRoute into your network. Thus, to secure this connection, you should place a firewall in either the co-location data centre (e.g., Equinix) or the on-premises customer data centre.

Azure Site-to-Site VPN

A Virtual Private Network (VPN) extends a private network across a public network. It enables users to send and receive data across shared or public networks as if their computing devices were directly connected to the private network.

So, an Azure Site-to-Site VPN can connect on-premises networks to the cloud, attach two cloud service providers, and link different VNets within the cloud. Nevertheless, Teradata does not support these two last patterns.

While you can use an Azure Site-to-Site VPN for the On Prem-to-cloud connection, it has a lower bandwidth than an Azure ExpressRoute. Moreover, its performance is less predictable since it runs over the public internet.

Regarding security, a VPN is the only connectivity method encrypted at the route level by default. Azure Site-to-Site VPNs support Internet Protocol security (IPsec) encrypted tunnels. Also, you can configure application-level encryption on top of the IPsec route-level encryption (e.g., TLS 1.2 encryption for TTU drivers).

On a separate note, Site-to-Site VPNs allow bidirectional traffic. Thus, it would be best to secure this connection through a firewall in your on-premise data centre.

Cloud-to-Cloud connection

We call the Teradata VaaS account communication with your Azure account the “handshake”. It is your means to access your data, load more and consume it. Thus, this connection must:

  1. Be secure,
  2. Be fast and handle large amounts of data, and
  3. Preserve the built-in parallelism in Vantage when possible. Letting the database handle the workload will improve its performance.

Azure Private Link

The Azure Private Link enables you to access Azure PaaS Services (for example, Azure Storage and SQL Database). It also lets you access Azure-hosted customer-owned or partner services over a private endpoint in your virtual network.

Also, in the case of the Azure Private Link, the traffic between the virtual network and the service travels the Microsoft backbone network.

Azure Private Link lets you securely connect a VNet to Teradata VantageCloud Lake with a uni-directional traffic pattern. Thus, you start the session traffic from your side of the link, and VantageCloud Lake can’t start a connection back into your network.

Consequently, there is no need to configure firewall rules or special routing tables since the two sides of the network are not directly joined.

As for security, Private Link does not encrypt traffic because it is a private circuit — data does not flow on the public Internet. You can configure application-level encryption to encrypt data over a Private Link (e.g., TLS 1.2 encryption for TTU drivers).

On a separate note, if you want to use LDAP, which requires bi-directional traffic between your AWS account and the Vantage one, you must create additional Private Links (sometimes referred to as “Reverse Private Links” or “Customer Private Links”). They will allow VantageCloud Lake to initiate traffic into your network.

QueryGrid or Data Copy also need bidirectional traffic. However, Teradata configures them on your behalf in the VantageCloud Lake account with bridge nodes instead of Reverse Private Links. The bridge nodes allow aggregating traffic and reducing the number of Private Links. This architecture requires that you deploy a server in your VPC.

Public Internet

Notably, you can connect third-party applications and Viewpoint through the public internet. However, other solutions, such as QueryGrid or LDAP, must communicate with VantageCloud Lake through Private Link.

To access your VantageCloud Lake account through the public internet, you must provide the source CIDR block to whitelist access in the Console. Note that Lake doesn’t allow whitelisting of 0.0.0.0/0 CIDR block.

Bear in mind that you won’t need to perform additional network changes if you allow the source to access the public internet.

However, if you are accessing Lake outside your company network, you should use your company’s VPN to route traffic from the allowed CIDR block.

NOS Reads

You can read (and write) with VaaS from Azure Blob Storage through a Teradata feature called NOS (Native Object Storage) Reads (and Writes).

To access Blob Storage, you have to use its APIs. By default, the Azure Blob Storage API calls run through the public Internet.

Teradata leverages Virtual Network service endpoints to secure Blob Storage connections with its VantageCloud systems. Additionally, Teradata configures the database’s virtual machines with internal IP addresses only (they don’t have public IPs). They also have service endpoints enabled. Thus, the traffic will travel through the Azure backbone network when you call the Blob Storage API from Teradata VantageCloud Lake, with the bucket and Lake in the same region. I.e., data never transfers through the public internet when you use NOS Reads and Writes from Teradata VantageCloud Lake into a Blob Storage bucket in the same region as Lake.

Note that the Blob Storage and VNet service endpoints are regional resources. So, the Blob Storage API calls always run on the public internet if you access a Blob Storage bucket in a region different from VantageCloud Lake. You will also likely have a larger latency than when you access a bucket in the same region.

On a separate note, as per the Azure Pricing Calculator, reading from the Blob Storage buckets generally does not incur egress costs, but Azure charges based on reads and writes, among other factors.

Incidentally, when you copy DSA Backups or Snapshots from a bucket within the Vantage account and another bucket, you also use Azure Blob Storage APIs. Therefore, the discussion in this section about where the traffic goes when you read from buckets also applies to this scenario.

In terms of security, there are five key aspects of VantageCloud Lake architecture:

  • Teradata configures the Compute Engine virtual machines that run the VantageCloud Lake instances with internal IP addresses only, i.e., they don’t have public IPs.
  • These virtual machines’ Virtual Networks have Azure Private Endpoints enabled.
  • NOS traffic initiated from these virtual machines goes to an Azure Bob Storage API endpoint.
  • With Private Endpoints enabled, the Blob Storage bucket will use an internal IP provided by the Teradata account, while Lake and the bucket are in the same region.
  • Teradata configures VantageCloud Lake to use the HTTPS call by default.

You have all the details in the NOS Orange Book to read and write data in NOS storage.

Network & Encryption

All Teradata network connections can be encrypted. Some examples of connectivity encryption options are:

  • Protocol transit encryption for SQL-E (Quality of Protection or TTU v17.10 + TLS).
  • All HTTP interfaces will be HTTPS (such as Viewpoint).
  • By default, all the SQLe traffic is encrypted with Teradata-provided cypher on the 1025 port and TLS1.2 certificate on the 443 port.

Teradata Clients (Teradata Tools and Utilities) are not specific to any Cloud Service Provider. However, when you consider the security of your connections, you can enable TLS on TTUs 17.10 and above. Moreover, you can enable Teradata generic encryption (256 bits) for TTUs 16.20 onwards.

Other Service Cloud Providers

You can also find a post and cheat sheet on connecting VantageCloud Lake on AWS in this blog.

]]>
This post contains a cheat sheet summarising the key elements of the network setup you must choose to connect your Teradata VantageCloud Lake instance on Azure with your account. It also describes the Azure services Teradata supports for the connections. Below, you can find an explanation of all of them. Cheat Sheet for the Network options for VantageCloud Lake on Azure VantageCloud Lake on Azure cheat sheet for Networking options You can download a high-resolution network cheat sheet for VantageCloud Lake on Azure in a repository in my GitHub account. Network components in detail Azure Connectivity Options for VantageCloud Lake On Prem-to-Cloud connection Azure ExpressRoute Azure ExpressRoute links an internal network to an Azure ExpressRoute location over a standard Ethernet fibre-optic cable. One end of the cable is connected to your router in your data centre, and the other to an Azure ExpressRoute router. From this connection, you can create virtual interfaces directly to public Azure services (e.g., Blob Storage, Azure Data Lake Storage, etc.) or a VNet, bypassing internet service providers in the network path. ExpressRoute is the primary method customers use to connect on-prem networks to their Azure networks. It is also Teradata’s preferred option because it provides higher bandwidth and more predictable performance than a VPN. Azure ExpressRoute is available at speeds starting at 50 Mbps and scaling up to 10 Gbps. Thus, you must know what bandwidth you’ll have to plan for migrations and regular operations with VantageCloud Lake on Azure. Furthermore, you can connect multiple ExpressRoute circuits to a VNet if you need more than 10 Gbps connectivity. – The default limit is 10, and you can achieve up to 100 Gbps. On a separate note, you can connect an ExpressRoute Gateway to the VNet, which allows traffic to flow between the edge router and the individual Vnet. The speeds that the Gateway allows, the number of circuits it can support, and its cost will depend on the Stock Keeping Unit (SKU) you choose for the individual Gateway. Alternatively, you can directly route traffic from an ExpressRoute to VantageCloud Lake or through an Azure Private Link. Regarding security, Azure ExpressRoute does not encrypt traffic by default because it is a private circuit. I.e., data does not flow over the public internet. To encrypt data over an ExpressRoute, you can: Configure application-level encryption (e.g., TLS 1.2 encryption for TTU drivers), Deploy an Azure VPN over the ExpressRoute to create an IPsec-encrypted tunnel, or Leverage MACsec for Layer 2 encryption over an ExpressRoute Direct connection (only available in specific locations and at high bandwidths). Consider that Teradata can initiate traffic over an ExpressRoute into your network. Thus, to secure this connection, you should place a firewall in either the co-location data centre (e.g., Equinix) or the on-premises customer data centre. Azure Site-to-Site VPN A Virtual Private Network (VPN) extends a private network across a public network. It enables users to send and receive data across shared or public networks as if their computing devices were directly connected to the private network. So, an Azure Site-to-Site VPN can connect on-premises networks to the cloud, attach two cloud service providers, and link different VNets within the cloud. Nevertheless, Teradata does not support these two last patterns. While you can use an Azure Site-to-Site VPN for the On Prem-to-cloud connection, it has a lower bandwidth than an Azure ExpressRoute. Moreover, its performance is less predictable since it runs over the public internet. Regarding security, a VPN is the only connectivity method encrypted at the route level by default. Azure Site-to-Site VPNs support Internet Protocol security (IPsec) encrypted tunnels. Also, you can configure application-level encryption on top of the IPsec route-level encryption (e.g., TLS 1.2 encryption for TTU drivers). On a separate note, Site-to-Site VPNs allow bidirectional traffic. Thus, it would be best to secure this connection through a firewall in your on-premise data centre. Cloud-to-Cloud connection We call the Teradata VaaS account communication with your Azure account the “handshake”. It is your means to access your data, load more and consume it. Thus, this connection must: Be secure, Be fast and handle large amounts of data, and Preserve the built-in parallelism in Vantage when possible. Letting the database handle the workload will improve its performance. Azure Private Link The Azure Private Link enables you to access Azure PaaS Services (for example, Azure Storage and SQL Database). It also lets you access Azure-hosted customer-owned or partner services over a private endpoint in your virtual network. Also, in the case of the Azure Private Link, the traffic between the virtual network and the service travels the Microsoft backbone network. Azure Private Link lets you securely connect a VNet to Teradata VantageCloud Lake with a uni-directional traffic pattern. Thus, you start the session traffic from your side of the link, and VantageCloud Lake can’t start a connection back into your network. Consequently, there is no need to configure firewall rules or special routing tables since the two sides of the network are not directly joined. As for security, Private Link does not encrypt traffic because it is a private circuit — data does not flow on the public Internet. You can configure application-level encryption to encrypt data over a Private Link (e.g., TLS 1.2 encryption for TTU drivers). On a separate note, if you want to use LDAP, which requires bi-directional traffic between your AWS account and the Vantage one, you must create additional Private Links (sometimes referred to as “Reverse Private Links” or “Customer Private Links”). They will allow VantageCloud Lake to initiate traffic into your network. QueryGrid or Data Copy also need bidirectional traffic. However, Teradata configures them on your behalf in the VantageCloud Lake account with bridge nodes instead of Reverse Private Links. The bridge nodes allow aggregating traffic and reducing the number of Private Links. This architecture requires that you deploy a server in your VPC. Public Internet Notably, you can connect third-party applications and Viewpoint through the public internet. However, other solutions, such as QueryGrid or LDAP, must communicate with VantageCloud Lake through Private Link. To access your VantageCloud Lake account through the public internet, you must provide the source CIDR block to whitelist access in the Console. Note that Lake doesn’t allow whitelisting of 0.0.0.0/0 CIDR block. Bear in mind that you won’t need to perform additional network changes if you allow the source to access the public internet. However, if you are accessing Lake outside your company network, you should use your company’s VPN to route traffic from the allowed CIDR block. NOS Reads You can read (and write) with VaaS from Azure Blob Storage through a Teradata feature called NOS (Native Object Storage) Reads (and Writes). To access Blob Storage, you have to use its APIs. By default, the Azure Blob Storage API calls run through the public Internet. Teradata leverages Virtual Network service endpoints to secure Blob Storage connections with its VantageCloud systems. Additionally, Teradata configures the database’s virtual machines with internal IP addresses only (they don’t have public IPs). They also have service endpoints enabled. Thus, the traffic will travel through the Azure backbone network when you call the Blob Storage API from Teradata VantageCloud Lake, with the bucket and Lake in the same region. I.e., data never transfers through the public internet when you use NOS Reads and Writes from Teradata VantageCloud Lake into a Blob Storage bucket in the same region as Lake. Note that the Blob Storage and VNet service endpoints are regional resources. So, the Blob Storage API calls always run on the public internet if you access a Blob Storage bucket in a region different from VantageCloud Lake. You will also likely have a larger latency than when you access a bucket in the same region. On a separate note, as per the Azure Pricing Calculator, reading from the Blob Storage buckets generally does not incur egress costs, but Azure charges based on reads and writes, among other factors. Incidentally, when you copy DSA Backups or Snapshots from a bucket within the Vantage account and another bucket, you also use Azure Blob Storage APIs. Therefore, the discussion in this section about where the traffic goes when you read from buckets also applies to this scenario. In terms of security, there are five key aspects of VantageCloud Lake architecture: Teradata configures the Compute Engine virtual machines that run the VantageCloud Lake instances with internal IP addresses only, i.e., they don’t have public IPs. These virtual machines’ Virtual Networks have Azure Private Endpoints enabled. NOS traffic initiated from these virtual machines goes to an Azure Bob Storage API endpoint. With Private Endpoints enabled, the Blob Storage bucket will use an internal IP provided by the Teradata account, while Lake and the bucket are in the same region. Teradata configures VantageCloud Lake to use the HTTPS call by default. You have all the details in the NOS Orange Book to read and write data in NOS storage. Network & Encryption All Teradata network connections can be encrypted. Some examples of connectivity encryption options are: Protocol transit encryption for SQL-E (Quality of Protection [QOP] or TTU v17.10 + TLS). All HTTP interfaces will be HTTPS (such as Viewpoint). By default, all the SQLe traffic is encrypted with Teradata-provided cypher on the 1025 port and TLS1.2 certificate on the 443 port. Teradata Clients (Teradata Tools and Utilities) are not specific to any Cloud Service Provider. However, when you consider the security of your connections, you can enable TLS on TTUs 17.10 and above. Moreover, you can enable Teradata generic encryption (256 bits) for TTUs 16.20 onwards. Other Service Cloud Providers You can also find a post and cheat sheet on connecting VantageCloud Lake on AWS in this blog.
VantageCloud Lake on AWS: Network configuration https://celiamuriel.com/vantagecloud-lake-on-aws-network-configuration/ Cheat sheet with the key network elements you need to connect with your Teradata VantageCloud Lake on AWS and a detailed explanation. Celia Cloud Tue, 16 Apr 2024 15:04:48 +0200

VantageCloud Lake on AWS: Network configuration

This post contains a cheat sheet summarising the key elements of the network setup you must choose to connect your Teradata VantageCloud Lake instance on AWS with your account. It also describes the AWS services Teradata supports for the connections. Below, you can find an explanation of all of them.

Teradata VantageCloud Lake on AWS Network cheat sheet
VantageCloud Lake on AWS cheat sheet for Networking options

You can download a high-resolution network cheat sheet for VantageCloud Lake on AWS in a repository in my GitHub account.

Network components in detail

AWS Connectivity Options for VantageCloud Lake

On Prem-to-Cloud connection

AWS Direct Connect

An AWS Direct Connect links an internal network to an AWS Direct Connect location over a standard Ethernet fibre-optic cable. One end of the cable is connected to one of your routers, and the other is to an AWS Direct Connect router. From this connection, you can create virtual interfaces to public AWS services (e.g., Amazon S3) or a VPC, bypassing internet service providers in the network path.

AWS Direct Connect is available starting at 50 Mbps and scaling up to 100 Gbps. Thus, you must know what bandwidth you’ll need to plan for migrations and regular operations with VantageCloud Lake on AWS.

There are two types of AWS Direct Connect connections:

  • Dedicated Connection: A physical Ethernet connection associated with a single customer. Available port speeds are 1 Gbps, 10 Gbps, and 100 Gbps.
  • Hosted Connection: A physical Ethernet connection that an AWS Direct Connect Partner (e.g., Megaport) provisions on behalf of a customer. Available port speeds range from 50 Mbps to 10 Gbps.

Every type of connection has different limitations.

Regarding security, the AWS Direct Connect does not encrypt traffic by default because it is a private circuit; i.e., data does not flow over the public internet.

To encrypt data over a Direct Connect, you can:

  • Configure application-level encryption (e.g., TLS 1.2 encryption for TTU drivers),
  • Deploy an AWS Site-to-Site VPN over the Direct Connect to create an IPsec-encrypted tunnel, or
  • Leverage MACsec for Layer 2 encryption over a Dedicated Direct Connect. This service is only available in specific locations and at high bandwidths; consult the AWS documentation.

Direct Connect allows for bidirectional traffic. Thus, to secure this connection, you should place a firewall in either the co-location data centre or your on-premises data centre.

AWS Site-to-Site VPN

A Virtual Private Network (VPN) extends a private network across a public network. It enables users to send and receive data across shared or public networks as if they have connected their computing devices directly to the private network.

So, an AWS Site-to-Site VPN can connect on-premises networks to the cloud, attach two cloud service providers, and link different VPCs within the cloud, although Teradata does not support these two last patterns.

While you can use an AWS Site-to-Site VPN for the On Prem-to-cloud connection, it has a lower bandwidth than an AWS ExpressRoute, and its performance is less predictable since it runs over the public internet.

Regarding security, a VPN is the only connectivity method encrypted at the route level by default. AWS Site-to-Site VPNs support Internet Protocol security (IPsec) encrypted tunnels. You can also configure application-level encryption on top of the IPsec route-level encryption (e.g., TLS 1.2 encryption for TTU drivers).

On a separate note, Site-to-Site VPNs allow bidirectional traffic. Thus, it would be best to secure this connection through a firewall in your on-premise data centre.

Cloud-to-Cloud connection

We call “handshake” the communication between the VantageCloud Lake account and your AWS account; it is your means to access your data, load more, and consume it. Thus, this connection must:

  1. Be secure,
  2. Be fast and handle large amounts of data, and
  3. Preserve the built-in parallelism in VantageCloud Lake when possible. Letting the database handle the workload will improve its performance.

AWS PrivateLink

https://docs.aws.amazon.com/vpc/latest/privatelink/what-is-privatelink.htmlAWS PrivateLink provides private connectivity between VPCs, AWS services, and on-premises networks without exposing traffic to the public internet or opening networks to one another.

PrivateLink endpoints appear inside your network with private IPs as part of your internal network. I.e., the traffic between the virtual network and the service travels the AWS backbone network.

The AWS PrivateLink lets you securely connect a VPC to Teradata VantageCloud Lake with a uni-directional traffic pattern. Thus, session traffic is initiated only from your side of the link, and VantageCloud Lake can’t start a connection back into your network.

Consequently, there is no need to configure firewall rules or special routing tables since the two sides of the network are not directly joined.

As for security, PrivateLink does not encrypt traffic because it is a private circuit — data does not flow on the public Internet. You can configure application-level encryption to encrypt data over a PrivateLink (e.g., TLS 1.2 encryption for TTU drivers).

On a separate note, if you want to use LDAP, which requires bi-directional traffic between your AWS account and the Vantage one, you must create additional PrivateLinks (sometimes referred to as “Reverse PrivateLinks” or “Customer PrivateLinks”). They will allow VantageCloud Lake to initiate traffic into your network.

QueryGrid or Data Copy also need bidirectional traffic. However, Teradata configures them on your behalf in the VantageCloud Lake account with bridge nodes instead of Reverse PrivateLinks. The bridge nodes allow aggregating traffic and reducing the number of PrivateLinks. This architecture requires that you deploy a server in your VPC.

Public Internet

Notably, you can connect third-party applications and Viewpoint through the public internet. However, other solutions, such as QueryGrid or LDAP, must communicate with VantageCloud Lake through PrivateLink.

To access your VantageCloud Lake account through the public internet, you must provide the source CIDR block to whitelist access in the Console. Note that Lake doesn’t allow whitelisting of 0.0.0.0/0 CIDR block.

Bear in mind that you won’t need to perform additional network changes if you allow the source to access the public internet.

However, if you are accessing Lake outside your company network, you should use your company’s VPN to route traffic from the allowed CIDR block.

NOS Reads

You can read (and write) with VantageCloud Lake from AWS S3 through a Teradata feature called NOS (Native Object Storage) Reads (and Writes).

To access an S3 bucket, you have to use its APIs. The AWS S3 API calls run through the public internet by default.

Teradata leverages AWS Gateway VPC Endpoints to secure the connections to AWS S3 buckets with its VantageCloud Lake instances. Additionally, Teradata configures Lake with internal IP addresses only (they don’t have public IPs). They also have Gateway Endpoints enabled. Thus, the traffic will travel through the AWS backbone network when you call the S3 API from Teradata VantageCloud Lake, with both the bucket and Lake in the same region. I.e., data never transfers through the public internet when you use NOS Reads and Writes from Teradata VantageCloud Lake into an S3 bucket in the same region as Lake.

Note that the S3 endpoint and the Gateway Endpoint are regional resources. So, the S3 API calls always run on the public internet if you access an S3 bucket in a region different from Vantage. You will also incur egress costs with cross-region reads as per the AWS Pricing Calculator. Furthermore, you will likely have a larger latency than when you access a bucket in the same region.

Incidentally, when you copy DSA Backups or Snapshots from a bucket within the Lake account and another bucket, you also use AWS S3 APIs. Therefore, the discussion in this section about where the traffic goes when you read from buckets also applies to this scenario.

In terms of security, there are five key aspects of VantageCloud Lake architecture:

  • Teradata configures the Compute Engine virtual machines that run the VantageCloud Lake instances with internal IP addresses only, i.e., they don’t have public IPs.
  • These virtual machines’ VPCs have AWS Gateway VPC Endpoints enabled.
  • NOS traffic initiated from these virtual machines stays within Teradata’s Virtual Network and goes to an AWS S3 API endpoint.
  • With Gateway Endpoints enabled, the AWS S3 API endpoint IP will resolve to an internal address, while Lake and the bucket are in the same region.
  • Teradata configures VantageCloud Lake to use the HTTPS call by default.

You have all the details in the NOS Orange Book to read and write data in NOS storage.

Network & Encryption

All Teradata network connections can be encrypted. Some examples of connectivity encryption options are:

  • Protocol transit encryption for SQL-E (Quality of Protection or TTU v17.10 + TLS).
  • All HTTP interfaces will be HTTPS (such as Viewpoint).
  • By default, all the SQLe traffic is encrypted with Teradata-provided cypher on the 1025 port and TLS1.2 certificate on the 443 port.

Even though Teradata Clients (Teradata Tools and Utilities or TTUs) are not specific to any Cloud Service Provider, when you consider the security of your connections, remember that you can enable TLS on TTUs 17.10 and above. Moreover, you can enable Teradata generic encryption (256 bits) for TTUs 16.20 onwards.

Other Service Cloud Providers

You can also find a post and cheat sheet on connecting VantageCloud Lake on Azure in this blog.

]]>
This post contains a cheat sheet summarising the key elements of the network setup you must choose to connect your Teradata VantageCloud Lake instance on AWS with your account. It also describes the AWS services Teradata supports for the connections. Below, you can find an explanation of all of them. VantageCloud Lake on AWS cheat sheet for Networking options You can download a high-resolution network cheat sheet for VantageCloud Lake on AWS in a repository in my GitHub account. Network components in detail AWS Connectivity Options for VantageCloud Lake On Prem-to-Cloud connection AWS Direct Connect An AWS Direct Connect links an internal network to an AWS Direct Connect location over a standard Ethernet fibre-optic cable. One end of the cable is connected to one of your routers, and the other is to an AWS Direct Connect router. From this connection, you can create virtual interfaces to public AWS services (e.g., Amazon S3) or a VPC, bypassing internet service providers in the network path. AWS Direct Connect is available starting at 50 Mbps and scaling up to 100 Gbps. Thus, you must know what bandwidth you’ll need to plan for migrations and regular operations with VantageCloud Lake on AWS. There are two types of AWS Direct Connect connections: Dedicated Connection: A physical Ethernet connection associated with a single customer. Available port speeds are 1 Gbps, 10 Gbps, and 100 Gbps. Hosted Connection: A physical Ethernet connection that an AWS Direct Connect Partner (e.g., Megaport) provisions on behalf of a customer. Available port speeds range from 50 Mbps to 10 Gbps. Every type of connection has different limitations. Regarding security, the AWS Direct Connect does not encrypt traffic by default because it is a private circuit; i.e., data does not flow over the public internet. To encrypt data over a Direct Connect, you can: Configure application-level encryption (e.g., TLS 1.2 encryption for TTU drivers), Deploy an AWS Site-to-Site VPN over the Direct Connect to create an IPsec-encrypted tunnel, or Leverage MACsec for Layer 2 encryption over a Dedicated Direct Connect. This service is only available in specific locations and at high bandwidths; consult the AWS documentation. Direct Connect allows for bidirectional traffic. Thus, to secure this connection, you should place a firewall in either the co-location data centre or your on-premises data centre. AWS Site-to-Site VPN A Virtual Private Network (VPN) extends a private network across a public network. It enables users to send and receive data across shared or public networks as if they have connected their computing devices directly to the private network. So, an AWS Site-to-Site VPN can connect on-premises networks to the cloud, attach two cloud service providers, and link different VPCs within the cloud, although Teradata does not support these two last patterns. While you can use an AWS Site-to-Site VPN for the On Prem-to-cloud connection, it has a lower bandwidth than an AWS ExpressRoute, and its performance is less predictable since it runs over the public internet. Regarding security, a VPN is the only connectivity method encrypted at the route level by default. AWS Site-to-Site VPNs support Internet Protocol security (IPsec) encrypted tunnels. You can also configure application-level encryption on top of the IPsec route-level encryption (e.g., TLS 1.2 encryption for TTU drivers). On a separate note, Site-to-Site VPNs allow bidirectional traffic. Thus, it would be best to secure this connection through a firewall in your on-premise data centre. Cloud-to-Cloud connection We call “handshake” the communication between the VantageCloud Lake account and your AWS account; it is your means to access your data, load more, and consume it. Thus, this connection must: Be secure, Be fast and handle large amounts of data, and Preserve the built-in parallelism in VantageCloud Lake when possible. Letting the database handle the workload will improve its performance. AWS PrivateLink https://docs.aws.amazon.com/vpc/latest/privatelink/what-is-privatelink.htmlAWS PrivateLink provides private connectivity between VPCs, AWS services, and on-premises networks without exposing traffic to the public internet or opening networks to one another. PrivateLink endpoints appear inside your network with private IPs as part of your internal network. I.e., the traffic between the virtual network and the service travels the AWS backbone network. The AWS PrivateLink lets you securely connect a VPC to Teradata VantageCloud Lake with a uni-directional traffic pattern. Thus, session traffic is initiated only from your side of the link, and VantageCloud Lake can’t start a connection back into your network. Consequently, there is no need to configure firewall rules or special routing tables since the two sides of the network are not directly joined. As for security, PrivateLink does not encrypt traffic because it is a private circuit — data does not flow on the public Internet. You can configure application-level encryption to encrypt data over a PrivateLink (e.g., TLS 1.2 encryption for TTU drivers). On a separate note, if you want to use LDAP, which requires bi-directional traffic between your AWS account and the Vantage one, you must create additional PrivateLinks (sometimes referred to as “Reverse PrivateLinks” or “Customer PrivateLinks”). They will allow VantageCloud Lake to initiate traffic into your network. QueryGrid or Data Copy also need bidirectional traffic. However, Teradata configures them on your behalf in the VantageCloud Lake account with bridge nodes instead of Reverse PrivateLinks. The bridge nodes allow aggregating traffic and reducing the number of PrivateLinks. This architecture requires that you deploy a server in your VPC. Public Internet Notably, you can connect third-party applications and Viewpoint through the public internet. However, other solutions, such as QueryGrid or LDAP, must communicate with VantageCloud Lake through PrivateLink. To access your VantageCloud Lake account through the public internet, you must provide the source CIDR block to whitelist access in the Console. Note that Lake doesn’t allow whitelisting of 0.0.0.0/0 CIDR block. Bear in mind that you won’t need to perform additional network changes if you allow the source to access the public internet. However, if you are accessing Lake outside your company network, you should use your company’s VPN to route traffic from the allowed CIDR block. NOS Reads You can read (and write) with VantageCloud Lake from AWS S3 through a Teradata feature called NOS (Native Object Storage) Reads (and Writes). To access an S3 bucket, you have to use its APIs. The AWS S3 API calls run through the public internet by default. Teradata leverages AWS Gateway VPC Endpoints to secure the connections to AWS S3 buckets with its VantageCloud Lake instances. Additionally, Teradata configures Lake with internal IP addresses only (they don’t have public IPs). They also have Gateway Endpoints enabled. Thus, the traffic will travel through the AWS backbone network when you call the S3 API from Teradata VantageCloud Lake, with both the bucket and Lake in the same region. I.e., data never transfers through the public internet when you use NOS Reads and Writes from Teradata VantageCloud Lake into an S3 bucket in the same region as Lake. Note that the S3 endpoint and the Gateway Endpoint are regional resources. So, the S3 API calls always run on the public internet if you access an S3 bucket in a region different from Vantage. You will also incur egress costs with cross-region reads as per the AWS Pricing Calculator. Furthermore, you will likely have a larger latency than when you access a bucket in the same region. Incidentally, when you copy DSA Backups or Snapshots from a bucket within the Lake account and another bucket, you also use AWS S3 APIs. Therefore, the discussion in this section about where the traffic goes when you read from buckets also applies to this scenario. In terms of security, there are five key aspects of VantageCloud Lake architecture: Teradata configures the Compute Engine virtual machines that run the VantageCloud Lake instances with internal IP addresses only, i.e., they don’t have public IPs. These virtual machines’ VPCs have AWS Gateway VPC Endpoints enabled. NOS traffic initiated from these virtual machines stays within Teradata’s Virtual Network and goes to an AWS S3 API endpoint. With Gateway Endpoints enabled, the AWS S3 API endpoint IP will resolve to an internal address, while Lake and the bucket are in the same region. Teradata configures VantageCloud Lake to use the HTTPS call by default. You have all the details in the NOS Orange Book to read and write data in NOS storage. Network & Encryption All Teradata network connections can be encrypted. Some examples of connectivity encryption options are: Protocol transit encryption for SQL-E (Quality of Protection [QOP] or TTU v17.10 + TLS). All HTTP interfaces will be HTTPS (such as Viewpoint). By default, all the SQLe traffic is encrypted with Teradata-provided cypher on the 1025 port and TLS1.2 certificate on the 443 port. Even though Teradata Clients (Teradata Tools and Utilities or TTUs) are not specific to any Cloud Service Provider, when you consider the security of your connections, remember that you can enable TLS on TTUs 17.10 and above. Moreover, you can enable Teradata generic encryption (256 bits) for TTUs 16.20 onwards. Other Service Cloud Providers You can also find a post and cheat sheet on connecting VantageCloud Lake on Azure in this blog.
Flow in VantageCloud Lake: From Bust to Boom in Data Ingestion to Insights https://celiamuriel.com/flow-in-vantagecloud-lake-from-bust-to-boom-in-data-ingestion-to-insights/ VantageCloud Lake service that allows data users to upload external files into Lake quickly and easily. Flow paired with the Visualization features added to the Lake Console democratises getting quick insights into any information. See the below below for a demo. Celia Cloud Tue, 27 Feb 2024 21:17:57 +0200

Flow in VantageCloud Lake: From Bust to Boom in Data Ingestion to Insights

Flow is a VantageCloud Lake service that allows data users to upload external files into Lake quickly and easily.

Flow paired with the Visualization feature democratises getting quick insights into any information in Lake. See the video below for a demo.

VantageCloud Lake: Flow + Visualization demo

How Flow works

Flow is a self-service, low-code/ no-code utility. It lets data teams get started with Lake analytics quickly and easily by lowering the complexity of onboarding data from Native Object Storage into BFS and OFS tables using Primary Cluster‘s resources.

Flow - VantageCloud Lake
Flow architecture

You can use Flow through the Lake Console GUI, which allows you to create, configure and monitor Flow ingest jobs (called “flows”).

An important characteristic of Flow is that it does not require users to write scripts, code jobs, or maintain ETL servers. It also creates the target table if it doesn’t exist.

On a separate note, Flow allows up to twenty simultaneously running “flows”, each of which can ingest up to five data sources.

In fact, Flow is a data ingestion tool and can complement ETL or ELT patterns.

  • ETL, where the transformations happen on data that is in Native Object Storage, a common pattern in the Cloud.
  • ELT, where the transformations take place in the database. Flow facilitates the data movement in this case, while tools such as dbt + Airflow perform transformations.

Aside from the above, Flow allows to load:

  • One-time batches,
  • Continuous loads, i.e., stream mode, which checks for new data at a time interval that the user provides (e.g., every 1 minute) and brings in only the new data.
  • Automated scheduled jobs, which enables users to define a time window for when Flow will check for new data and only bring in the new data that arrived at the Native Object Storage bucket since the previous scheduled load.

Note that Flow automatically detects if there is a new file in the bucket and uploads it in Continuous and Scheduled mode. It checks based on the user-defined interval (continuous) or schedule.

The one-time flows can also find if there is a new file in the AWS S3 bucket. It happens when you manually run them again, i.e. you execute the Run operation.

Get Started with Flow

Teradata offers online documentation about Flow. In addition, I wrote a cookbook to make Flow work for the first time on AWS, where I gathered my lessons learnt on:

  1. How to grant Flow permissions on an AWS S3 bucket (a one-time task), and
  2. How to create a flow (an ad hoc task).

Incidentally, I gave some tips in a previous post on loading data in Lake. This post doesn’t focus on Flow, but it considers several solutions you have available.

]]>
Flow is a VantageCloud Lake service that allows data users to upload external files into Lake quickly and easily. Flow paired with the Visualization feature democratises getting quick insights into any information in Lake. See the video below for a demo. VantageCloud Lake: Flow + Visualization demo How Flow works Flow is a self-service, low-code/ no-code utility. It lets data teams get started with Lake analytics quickly and easily by lowering the complexity of onboarding data from Native Object Storage into BFS and OFS tables using Primary Cluster‘s resources. Flow architecture You can use Flow through the Lake Console GUI, which allows you to create, configure and monitor Flow ingest jobs (called “flows”). An important characteristic of Flow is that it does not require users to write scripts, code jobs, or maintain ETL servers. It also creates the target table if it doesn’t exist. On a separate note, Flow allows up to twenty simultaneously running “flows”, each of which can ingest up to five data sources. In fact, Flow is a data ingestion tool and can complement ETL or ELT patterns. ETL, where the transformations happen on data that is in Native Object Storage, a common pattern in the Cloud. ELT, where the transformations take place in the database. Flow facilitates the data movement in this case, while tools such as dbt + Airflow perform transformations. Aside from the above, Flow allows to load: One-time batches, Continuous loads, i.e., stream mode, which checks for new data at a time interval that the user provides (e.g., every 1 minute) and brings in only the new data. Automated scheduled jobs, which enables users to define a time window for when Flow will check for new data and only bring in the new data that arrived at the Native Object Storage bucket since the previous scheduled load. Note that Flow automatically detects if there is a new file in the bucket and uploads it in Continuous and Scheduled mode. It checks based on the user-defined interval (continuous) or schedule. The one-time flows can also find if there is a new file in the AWS S3 bucket. It happens when you manually run them again, i.e. you execute the Run operation. Get Started with Flow Teradata offers online documentation about Flow. In addition, I wrote a cookbook to make Flow work for the first time on AWS, where I gathered my lessons learnt on: How to grant Flow permissions on an AWS S3 bucket (a one-time task), and How to create a flow (an ad hoc task). Incidentally, I gave some tips in a previous post on loading data in Lake. This post doesn’t focus on Flow, but it considers several solutions you have available.
VantageCloud Lake in a Nutshell https://celiamuriel.com/vantagecloud-lake-in-a-nutshell/ Post that includes an infographic summarising the critical VantageCloud Lake elements and the basis for using them for a quick start. Celia Cloud Sat, 09 Dec 2023 13:19:45 +0200

VantageCloud Lake in a Nutshell infographic.

VantageCloud Lake in a Nutshell

I have extensively talked about different aspects of VantageCloud Lake in previous posts.

In this post, I summarise the key aspects of VantageCloud Lake if you want to start quickly. You can learn other features as you need them.

What You Need to Know about Lake

The infographic that opens this post summarises the critical Lake elements and the basis for using them. You can find a high-resolution VantageCloud Lake in a Nutshell image in a repository in my GitHub account.

Main Lake components

In summary, the main Lake components are:

  • Lake instance: Cloud-native data and analytics platform to build your data lakehouse. I.e., Lake can operate as a central repository of structured, unstructured, textual, analogue/ IoT data with an analytical infrastructure, which allows you to inspect and derive insights from the data. It’s in the Coordinated Universal Time (UTC) time zone.
  • Console: Self-service console for browsing, development, system configuration and monitoring.
  • Session Manager routes requests from the client application to the Primary Cluster.
  • The Primary Cluster receives the logins and requests. The Primary Cluster also handles the query planning and distribution of work across the environment. Additionally, it supports tactical SLA-related query access on small tables or indexes stored in the Block File System (BFS). Note that it also accesses the Object File System (OFS). Every Lake instance requires one Primary Cluster.
  • The Compute Clusters isolate and run stateless query processing to support ad hoc, exploratory, and departmental requirements. Multiple clusters of varying sizes can access the Object File System (OFS). The Compute Clusters provide self-service autonomy and policy-driven autoscaling. They are optional.
  • Block File System (BFS): Proprietary Teradata File System (Teradata manages it). It is built on virtual disks (block storage). BFS is the fastest storage type.
  • The Object File System (OFS) is also a proprietary Teradata File System. However, OFS is built on Native Object Storage (NOS). Therefore, it is a file storage. OFS includes the Time Travel feature, which is the ability to access the data in a table as it looked at a previous moment. OFS is a cheaper, slower storage than BFS.
  • QueryFabric: Communication layer that enables bi-directional work and data movement between one cluster to another.
  • The Compute Router directs a request from the Primary Cluster to one active Cluster within the Group to satisfy a query.

Other Solutions in the Ecosystem

Additionally, you should know that the above components have a tight relationship with:

  • Native Object Storage (NOS): A file storage as provided by the cloud providers (AWS S3, Azure Blob Storage and Google Cloud Storage). You own and manage this storage, i.e., it’s in your account). Note that Lake can apply a table header to data and access it as if it were a table within Lake.
  • QueryGrid: Teradata solution that connects Lake with other Teradata systems, such as VantageCore (on-prem) or other VantageCloud (Enterprise and Lake), and non-Teradata systems.

Load Data in Lake

Equally important to know the Lake architecture at a high level is how to load data and where to place it.

First of all, if you need to load data in a local time zone, you must define the time zone at the session level if the load solution allows it, or generate the source data in UTC. Data Copy (integrated service in Lake to transfer data from different Vantage systems) and TPT allow you to include the source time zone in their jobs at the session level.

Regarding where to place the data, you should know that as a rule of thumb, you will create all tables in OFS, especially if they are large because it is cheaper storage. However, you will place a table in BFS to achieve a better performance, for example, when you execute tactical queries. Furthermore, BFS includes several features you won’t find in OFS, e.x. temporal tables, row-level security, Security Zones and Referential Integrity (review Teradata documentation).

The Primary Cluster is similar to a VantageCloud Enterprise system, so you’ll load data as you do it.
Nevertheless, if you want to load data directly in OFS, you can use any of the following solutions:

  • The Flow Service loads data from NOS into BFS and OFS tables using the Primary Cluster‘s resources.
  • “INSERT SELECT” and “CREATE TABLES AS” from BFS tables and NOS files (with NOS_READ).
  • Data Copy.
  • QueryGrid.

It is important to realise that Teradata doesn’t support TPT to load into objects of an OFS table.

In-Depth Lake Posts

As I mentioned at the beginning, I’ve written posts that cover different aspects of Lake in detail. If you want to know more, check them:


This article was amended on 16 April 2024 to add the links to the posts about network configuration for VantageCloud Lake on AWS and Azure.

]]>
I have extensively talked about different aspects of VantageCloud Lake in previous posts. In this post, I summarise the key aspects of VantageCloud Lake if you want to start quickly. You can learn other features as you need them. What You Need to Know about Lake The infographic that opens this post summarises the critical Lake elements and the basis for using them. You can find a high-resolution VantageCloud Lake in a Nutshell image in a repository in my GitHub account. Main Lake components In summary, the main Lake components are: Lake instance: Cloud-native data and analytics platform to build your data lakehouse. I.e., Lake can operate as a central repository of structured, unstructured, textual, analogue/ IoT data with an analytical infrastructure, which allows you to inspect and derive insights from the data. It’s in the Coordinated Universal Time (UTC) time zone. Console: Self-service console for browsing, development, system configuration and monitoring. Session Manager routes requests from the client application to the Primary Cluster. The Primary Cluster receives the logins and requests. The Primary Cluster also handles the query planning and distribution of work across the environment. Additionally, it supports tactical SLA-related query access on small tables or indexes stored in the Block File System (BFS). Note that it also accesses the Object File System (OFS). Every Lake instance requires one Primary Cluster. The Compute Clusters isolate and run stateless query processing to support ad hoc, exploratory, and departmental requirements. Multiple clusters of varying sizes can access the Object File System (OFS). The Compute Clusters provide self-service autonomy and policy-driven autoscaling. They are optional. Block File System (BFS): Proprietary Teradata File System (Teradata manages it). It is built on virtual disks (block storage). BFS is the fastest storage type. The Object File System (OFS) is also a proprietary Teradata File System. However, OFS is built on Native Object Storage (NOS). Therefore, it is a file storage. OFS includes the Time Travel feature, which is the ability to access the data in a table as it looked at a previous moment. OFS is a cheaper, slower storage than BFS. QueryFabric: Communication layer that enables bi-directional work and data movement between one cluster to another. The Compute Router directs a request from the Primary Cluster to one active Cluster within the Group to satisfy a query. Other Solutions in the Ecosystem Additionally, you should know that the above components have a tight relationship with: Native Object Storage (NOS): A file storage as provided by the cloud providers (AWS S3, Azure Blob Storage and Google Cloud Storage). You own and manage this storage, i.e., it’s in your account). Note that Lake can apply a table header to data and access it as if it were a table within Lake. QueryGrid: Teradata solution that connects Lake with other Teradata systems, such as VantageCore (on-prem) or other VantageCloud (Enterprise and Lake), and non-Teradata systems. Load Data in Lake Equally important to know the Lake architecture at a high level is how to load data and where to place it. First of all, if you need to load data in a local time zone, you must define the time zone at the session level if the load solution allows it, or generate the source data in UTC. Data Copy (integrated service in Lake to transfer data from different Vantage systems) and TPT allow you to include the source time zone in their jobs at the session level. Regarding where to place the data, you should know that as a rule of thumb, you will create all tables in OFS, especially if they are large because it is cheaper storage. However, you will place a table in BFS to achieve a better performance, for example, when you execute tactical queries. Furthermore, BFS includes several features you won’t find in OFS, e.x. temporal tables, row-level security, Security Zones and Referential Integrity (review Teradata documentation). The Primary Cluster is similar to a VantageCloud Enterprise system, so you’ll load data as you do it.Nevertheless, if you want to load data directly in OFS, you can use any of the following solutions: The Flow Service loads data from NOS into BFS and OFS tables using the Primary Cluster‘s resources. “INSERT SELECT” and “CREATE TABLES AS” from BFS tables and NOS files (with NOS_READ). Data Copy. QueryGrid. It is important to realise that Teradata doesn’t support TPT to load into objects of an OFS table. In-Depth Lake Posts As I mentioned at the beginning, I’ve written posts that cover different aspects of Lake in detail. If you want to know more, check them: VantageCloud Lake Architecture. Considerations To Load Data Into VantageCloud Lake. VantageCloud Lake: Autoscaling the Compute Clusters. Time Travel in VantageCloud Lake. Session Manager in Lake: The Key to High Availability. The Path of a Query in VantageCloud Lake. VantageCloud Lake on AWS: Network configuration. VantageCloud Lake on Azure: Network configuration. This article was amended on 16 April 2024 to add the links to the posts about network configuration for VantageCloud Lake on AWS and Azure.