"how numbers are stored and used in computers"
In ClickHouse, dictionaries provide a powerful mechanism for mapping keys to values, facilitating efficient data retrieval while significantly reducing storage requirements. To create a dictionary, one specifies its structure, source, layout, and refresh settings through SQL-like syntax.
The basic syntax for creating a dictionary in ClickHouse is as follows:
code.txt1CREATE [OR REPLACE] DICTIONARY [IF NOT EXISTS] [db.]dictionary_name [ON CLUSTER cluster] 2( 3 key1 type1 [DEFAULT|EXPRESSION expr1] [IS_OBJECT_ID], 4 key2 type2 [DEFAULT|EXPRESSION expr2], 5 attr1 type2 [DEFAULT|EXPRESSION expr3] [HIERARCHICAL|INJECTIVE], 6 attr2 type2 [DEFAULT|EXPRESSION expr4] [HIERARCHICAL|INJECTIVE] 7) 8PRIMARY KEY key1, key2 9SOURCE(SOURCE_NAME([param1 value1 ... paramN valueN])) 10LAYOUT(LAYOUT_NAME([param_name param_value])) 11LIFETIME({MIN min_val MAX max_val | max_val}) 12SETTINGS(setting_name = setting_value, setting_name = setting_value, ...) 13COMMENT 'Comment'
Here, you can choose to create a dictionary on a cluster using the ON CLUSTER
clause, which allows for distributed data definition language (DDL) operations. The dictionary's attributes are defined similarly to table columns, with the PRIMARY KEY
indicating the dictionary's key attributes. Various sources can supply the dictionary's data, from local tables to remote services, and the layout dictates the storage format in memory.
A dictionary's source defines where it retrieves data. Options include local ClickHouse tables, remote ClickHouse tables, HTTP(S) accessible files, and other databases.
To create a dictionary from a local table in ClickHouse, use a syntax structure like this:
code.txt1CREATE DICTIONARY id_value_dictionary 2( 3 id UInt64, 4 value String 5) 6PRIMARY KEY id 7SOURCE(CLICKHOUSE(TABLE 'source_table')) 8LAYOUT(FLAT()) 9LIFETIME(MIN 0 MAX 1000)
This example constructs a dictionary from source_table
, referencing its id
as the primary key. The FLAT()
layout is specified, and the data is refreshed with a minimum lifetime of 0 and a maximum of 1000 seconds.
To source a dictionary from a remote ClickHouse service, define the host, port, and authentication details. Here's an example:
code.txt1CREATE DICTIONARY id_value_dictionary 2( 3 id UInt64, 4 value String 5) 6PRIMARY KEY id 7SOURCE(CLICKHOUSE(HOST 'HOSTNAME' PORT 9000 USER 'default' PASSWORD 'PASSWORD' TABLE 'source_table' DB 'default')) 8LAYOUT(FLAT()) 9LIFETIME(MIN 0 MAX 1000)
Dictionaries can also be created from data files available over HTTP(S). For example:
code.txt1CREATE DICTIONARY default.taxi_zone_dictionary 2( 3 `LocationID` UInt16 DEFAULT 0, 4 `Borough` String, 5 `Zone` String, 6 `service_zone` String 7) 8PRIMARY KEY LocationID 9SOURCE(HTTP(URL 'https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/taxi_zone_lookup.csv' FORMAT 'CSVWithNames')) 10LIFETIME(MIN 0 MAX 0) 11LAYOUT(HASHED())
Here, data is directly read from the specified CSV file, and HASHED()
layout is employed.
Dictionaries may alternatively source data from other types of databases; further details can be found in the dictionary sources documentation.
When using the ClickHouse Cloud, ensure that you designate a user and password for dictionary operations within the SQL console, as shown below:
code.txt1CREATE USER IF NOT EXISTS clickhouse_admin 2IDENTIFIED WITH sha256_password BY 'passworD43$x'; 3 4GRANT default_role TO clickhouse_admin;
By efficiently structuring dictionaries with these capabilities, you enhance query performance and data handling within your ClickHouse environment. For a comprehensive understanding, consult the [system.dictionaries] table which contains meta-information on all dictionaries within your ClickHouse setup.