OTF: What It Is and Where It Fits

by

in

I will explain in this post what the Open Table Format (OTF) is. But before that, let me give some context about OTF. So, I will start explaining where there is a gap in the analytical ecosystem, and it makes sense to use OTF to architect certain specific needs.

Data Fabric with Open File Formats

Data Fabric, or “data virtualisation” technology, enables borderless analytics by orchestrating the analytical ecosystem to operate as one environment. I.e., the Data Fabric simplifies and unifies data access across complex, distributed environments. Thus, the Data Fabric allows organisations to manage, integrate, and analyse data from multiple sources — whether on-premises, in the cloud, or across hybrid systems — without needing to move or replicate it excessively.

To create a Data Fabric, organisations may use multiple existing technologies in combination to enable a metadata-driven implementation and augmented orchestration design.

Open File Format (OFF) refers to industry-standard data formats that are widely supported across platforms and tools. These OFF include Parquet, Avro, OCR or Arrow, among others. These formats are designed for interoperability, allowing multiple systems to read and write the data without proprietary constraints. OFF may be part of a Data Fabric, as well as other means to store data such as tables in a database.

Companies will implement a Data Fabric with an OFF part in the following scenarios:

  • When they want to include in their analytical architectures the ability to leverage distinct compute engines for diverse workloads.
  • When they want flexibility in terms of:
    • Open formats (e.g., file formats),
    • The availability of structured and unstructured data, and
    • Decouple storage and compute to optimise resources and costs.

These scenarios are not mutually exclusive.

Data Fabric with Open File Formats and Compute Engines

Data Lakehouse

Data Lakehouse

A Data Lakehouse is a central repository of structured, unstructured, textual and analogue/IoT data with an analytical infrastructure, which allows us to inspect and derive insights from the data in the lakehouse.

Roughly speaking:​

  • The Data Warehouse is a relational database for business reporting.​
  • The Data Lake is for data science and machine learning, using data of any structure or file format.​
  • The Data Lakehouse is a single data architecture that combines and unifies the architectures and capabilities of data lakes and data warehouses. It enables greater agility for all types of analytics, with fewer data redundancies, a simpler architecture, and a more consistent view of semantics.​ A fundamental concept in a Lakehouse is that it enables the relational paradigm for data management to be superimposed onto data in open, standardised formats residing in cheap, object storage. Thus, it allows us to store our data inexpensively and access it via SQL.​

The Data Fabric is a helpful element in a Data Lakehouse architecture.

Open Table Formats

Open Table Format (OTF) is a file format for storing tabular data that is easily accessible and interoperable across various data processing and analytics tools. In other words, a table format is a method of structuring a data set’s files to present them as a unified “table.” To do this, they consist of a data layer (the file format itself) and a metadata layer.

Data stored in Open Table Formats sits on top of an object store, providing low-cost storage. It can also be hosted on disks.

Organisations leverage OTF to build their Data Fabric, either fully or in part. We will explain later on this deck key OTF features and capabilities, illustrating how OTF simplifies many of the tasks in the Data Fabric.

The OTF pioneers include Netflix with Apache Iceberg, Databricks with Delta Lake, and Uber with Apache Hudi.

An Analytical System Based on Files

There are several ways to use data store on files in analytical ecosystems. For example, the data can be in files with no particular structure store (for example, OFF) in the cloud native object store (NOS), on disks, in tabular format such as OTF, etc. Either way, an analytical ecosystem based on files need the components shown in the diagram below to work.

Analytical system based on files
There is a high-resolution version of this infographic in my GitHub account, OTF repository.

However, there are differences in choosing OTF instead of plain OFF when organisations want to use files for analysis. OTF includes advanced features, compared to OFF, for query optimisation, integration with other elements of the ecosystem (such as Evolution), and some basic Data Protection features.

OTF Community

OTF was designed for a company to store its tables. Then any employee can use any OTF-compatible SQL engine, streaming engine, API/SDK, integrated solution, or speciality tool to access and manipulate those OTF tables. Conditions apply: the employee must have the appropriate security permissions, and they must not suffer any vendor lock-in.

OTF Community
OTF Community

I added the “OTF Community” section on 15 April 2026.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *