"how numbers are stored and used in computers"
ClickHouse provides a flexible and efficient method for creating tables using the CREATE TABLE
SQL command. This command allows defining a table with various configurations depending on the specific requirements, including explicit schemas, schema cloning, table functions, and more. To accommodate distributed databases, an ON CLUSTER
clause is available for distributed Data Definition Language (DDL) queries.
Here is a basic table that stores a unique identifier, name, age, and creation date of a table of people.
code.txt1CREATE TABLE people 2( 3 id UInt32 NOT NULL, 4 name String, 5 age Nullable(Int), 6 created_at DateTime DEFAULT now() 7) 8ENGINE = MergeTree() 9ORDER BY id;
This command creates a MergeTree
table named my_table
with an id
column as the primary key and some additional fields. Default values and nullability specifications help ensure data integrity.
You can define a table directly with its schema using the following structure:
code.txt1CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] 2( 3 column_name1 [data_type1] [NULL|NOT NULL] [DEFAULT|MATERIALIZED|EPHEMERAL|ALIAS expr1] [COMMENT 'column_comment'], 4 column_name2 [data_type2] [NULL|NOT NULL] [DEFAULT|MATERIALIZED|EPHEMERAL|ALIAS expr2] [COMMENT 'column_comment'], 5 ... 6) ENGINE = engine 7 [COMMENT 'table_comment']
In this syntax, each column is described with its name and data type, and can be further customized with modifiers like NULL
or NOT NULL
, default value expressions (DEFAULT
, MATERIALIZED
, EPHEMERAL
, or ALIAS
), and optional comments for columns and the table as a whole.
You can create a table based on the schema of an existing table using the syntax:
code.txt1CREATE TABLE [IF NOT EXISTS] [db.]new_table AS [db2.]existing_table [ENGINE = engine]
If no engine is specified, the new table inherits the engine from the existing table.
code.txt1CREATE TABLE new_table AS old_table;
This is a simple example of duplicating the schema of old_table
into new_table
.
To copy both schema and data, use the CLONE
keyword.
code.txt1CREATE TABLE [IF NOT EXISTS] [db.]new_table CLONE AS [db2.]existing_table [ENGINE = engine]
This not only replicates the structure but also duplicates all the existing data from the original table.
ClickHouse also allows creating tables based on the results of a SELECT
statement:
code.txt1CREATE TABLE [IF NOT EXISTS] [db.]table_name[(name1 [type1], name2 [type2], ...)] ENGINE = engine AS SELECT ...
This technique is particularly useful for ETL processes where transformation or filtering is necessary before data insertion.
code.txt1CREATE TABLE summary_table ENGINE = MergeTree ORDER BY id AS 2SELECT id, sum(value) as total_value FROM detailed_table GROUP BY id;
Columns can have default values specified using the DEFAULT
keyword followed by an expression. This helps in providing a fallback when a value is not provided during data insertion.
These are special column types.
These are computed automatically upon insertion and are not stored directly in the table. This saves space and allows dynamic computation on read if required.
Alias columns allow computation at query time, similar to views, which helps maintain performant queries without redundant data storage.
Primary keys can be specified inside or outside the column list. They are crucial for ensuring uniqueness and optimizing data retrieval.
ClickHouse supports advanced compression, which can be customized per column to optimize storage based on data characteristics. Popular codecs include LZ4
, ZSTD
, and Delta, enhancing both storage efficiency and query performance.
Temporary tables in ClickHouse exist only for the duration of a session and can be created similarly as permanent ones but without a specified database.
code.txt1CREATE TEMPORARY TABLE temp_table (id Int, value String);
Temporary tables are ideal for short-term data analysis or joining small datasets within a session.