"how numbers are stored and used in computers"
ClickHouse is a high-performance, column-oriented database optimized for real-time analytics on large datasets.
This is a deep dive into the internals of ClickHouse, which attempts to trace an optimal learner's path through the source code to uncover how its core components work. We'll explore the data structures, algorithms, and design principles that make it one of the fastest OLAP databases in the world, and a brilliant contribution to the open-source ecosystem. Whether you're a systems engineer, database developer, or open-source contributor, this will hopefully equip you with a detailed understanding of how ClickHouse operates from the inside out.
Introduction to ClickHouse Architecture
Storage Engine Internals
StorageMergeTree.cpp
, MergeTreeDataPart.cpp
Query Execution Pipeline
InterpreterSelectQuery.cpp
, QueryPipeline.cpp
Data Compression and Encoding
CompressionCodec.cpp
, CompressedReadBuffer.cpp
Vectorized Execution Engine
IProcessor.h
, Block.cpp
, Column*.cpp
Merge and Mutation Mechanics
MergeTreeDataMergerMutator.cpp
, ReplicatedMergeTree*.cpp
Distributed Query Execution
RemoteBlockInputStream
and DistributedQueryExecutor
Cluster.cpp
, DistributedBlockInputStream.cpp
Caching and Memory Management
MarkCache.cpp
, Arena.cpp
, MemoryTracker.cpp
Replication and Fault Tolerance
ReplicatedMergeTreeLogEntry.cpp
, ZooKeeper.cpp
Extensibility and Plugin Interfaces
TableFunctionFactory.cpp
, AggregateFunctionFactory.cpp
Performance Tuning and Observability
SystemLog.cpp
, TraceCollector.cpp
Contributing