"how numbers are stored and used in computers"
ClickHouse is a high-performance, column-oriented database optimized for real-time analytics on large datasets.
This is a deep dive into the internals of ClickHouse, which attempts to trace an optimal learner's path through the source code to uncover how its core components work. We'll explore the data structures, algorithms, and design principles that make it one of the fastest OLAP databases in the world, and a brilliant contribution to the open-source ecosystem. Whether you're a systems engineer, database developer, or open-source contributor, this will hopefully equip you with a detailed understanding of how ClickHouse operates from the inside out.
Introduction to ClickHouse Architecture
Storage Engine Internals
StorageMergeTree.cpp, MergeTreeDataPart.cppQuery Execution Pipeline
InterpreterSelectQuery.cpp, QueryPipeline.cppData Compression and Encoding
CompressionCodec.cpp, CompressedReadBuffer.cppVectorized Execution Engine
IProcessor.h, Block.cpp, Column*.cppMerge and Mutation Mechanics
MergeTreeDataMergerMutator.cpp, ReplicatedMergeTree*.cppDistributed Query Execution
RemoteBlockInputStream and DistributedQueryExecutorCluster.cpp, DistributedBlockInputStream.cppCaching and Memory Management
MarkCache.cpp, Arena.cpp, MemoryTracker.cppReplication and Fault Tolerance
ReplicatedMergeTreeLogEntry.cpp, ZooKeeper.cppExtensibility and Plugin Interfaces
TableFunctionFactory.cpp, AggregateFunctionFactory.cppPerformance Tuning and Observability
SystemLog.cpp, TraceCollector.cppContributing