ClickHouse搭建

1.环境要求

官网

系统必须是x86_64并且可以使用SSE 4.2 指令集。这是检查当前 CPU 是否支持 SSE 4.2 的命令:

grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"

2.安装

从官网下载最新的RPM包分别是clickhouse-common-static-22.7.3.5.x86_64.rpm clickhouse-server-22.7.3.5.x86_64.rpm clickhouse-client-22.7.3.5.x86_64.rpm
执行命令

sudo rpm -ivh clickhouse-*rpm

安装server时会让输入default用户密码
img.png

安装完毕
img_1.png

3.启动服务

# 启动服务
sudo systemctl start clickhouse-server.service
# 查看状态
sudo systemctl status clickhouse-server.service

4.连接

clickhouse-client --host="127.0.0.1" --port="9000" --user="default" --password='abc123456'

5.修改数据存储地址

涉及到的配置,配置文件默认路径在/etc/clickhouse-server/config.xml

<path>/data/clickhouse/</path>

<tmp_path>/data/clickhouse/tmp/</tmp_path>

<user_files_path>/data/clickhouse/user_files/</user_files_path>

<format_schema_path>/data/clickhouse/format_schemas/</format_schema_path>


<user_directories>
    <users_xml>
        <path>users.xml</path>
    </users_xml>
    <local_directory>
        <path>/data/clickhouse/access/</path>
    </local_directory>
</user_directories>

修改后重启服务即可

6.集群模式

集群模式需要安装zookeeper集群,由于ck对zookeeper集群依赖较大,建议单独安装,不要与其他服务共用。

公司服务器共5台,因此准备搭建5分片,5副本集群。参考
img_2.png

因单Node需运行两个ck进程,需要将配置文件,数据目录,日志目录,端口等分离,此处是配置文件示例

<?xml version="1.0"?>

<clickhouse>
    <logger>

        <level>information</level>
        <log>/var/log/replica-clickhouse/replica-clickhouse.log</log>
        <errorlog>/var/log/replica-clickhouse/replica-clickhouse.err.log</errorlog>

        <size>1000M</size>
        <count>10</count>

    </logger>

    <http_port>18123</http_port>

    <tcp_port>19000</tcp_port>

    <mysql_port>19004</mysql_port>

    <postgresql_port>19005</postgresql_port>

    <interserver_http_port>19009</interserver_http_port>

    <listen_host>0.0.0.0</listen_host>

    <zookeeper incl="zookeeper-servers" optional="true"/>

    <macros incl="macros" optional="true"/>

    <remote_servers incl="remote_servers" optional="true"/>

    <include_from>/etc/clickhouse-config/replica/metrika.xml</include_from>

    <max_connections>4096</max_connections>

    <keep_alive_timeout>3</keep_alive_timeout>

    <grpc>
        <enable_ssl>false</enable_ssl>

        <ssl_cert_file>/path/to/ssl_cert_file</ssl_cert_file>
        <ssl_key_file>/path/to/ssl_key_file</ssl_key_file>

        <ssl_require_client_auth>false</ssl_require_client_auth>

        <ssl_ca_cert_file>/path/to/ssl_ca_cert_file</ssl_ca_cert_file>

        <transport_compression_type>none</transport_compression_type>

        <transport_compression_level>0</transport_compression_level>

        <max_send_message_size>-1</max_send_message_size>
        <max_receive_message_size>-1</max_receive_message_size>

        <verbose_logs>false</verbose_logs>
    </grpc>

    <openSSL>
        <server> <!-- Used for https server AND secure tcp port -->
            <certificateFile>/etc/replica-clickhouse/server.crt</certificateFile>
            <privateKeyFile>/etc/replica-clickhouse/server.key</privateKeyFile>
            <verificationMode>none</verificationMode>
            <loadDefaultCAFile>true</loadDefaultCAFile>
            <cacheSessions>true</cacheSessions>
            <disableProtocols>sslv2,sslv3</disableProtocols>
            <preferServerCiphers>true</preferServerCiphers>
        </server>

        <client> <!-- Used for connecting to https dictionary source and secured Zookeeper communication -->
            <loadDefaultCAFile>true</loadDefaultCAFile>
            <cacheSessions>true</cacheSessions>
            <disableProtocols>sslv2,sslv3</disableProtocols>
            <preferServerCiphers>true</preferServerCiphers>
            <!-- Use for self-signed: <verificationMode>none</verificationMode> -->
            <invalidCertificateHandler>
                <!-- Use for self-signed: <name>AcceptCertificateHandler</name> -->
                <name>RejectCertificateHandler</name>
            </invalidCertificateHandler>
        </client>
    </openSSL>

    <max_concurrent_queries>100</max_concurrent_queries>


    <max_server_memory_usage>0</max_server_memory_usage>


    <max_thread_pool_size>10000</max_thread_pool_size>


    <max_server_memory_usage_to_ram_ratio>0.9</max_server_memory_usage_to_ram_ratio>


    <total_memory_profiler_step>4194304</total_memory_profiler_step>

    <total_memory_tracker_sample_probability>0</total_memory_tracker_sample_probability>


    <uncompressed_cache_size>8589934592</uncompressed_cache_size>

    <mark_cache_size>5368709120</mark_cache_size>
    <mmap_cache_size>1000</mmap_cache_size>

    <compiled_expression_cache_size>134217728</compiled_expression_cache_size>

    <compiled_expression_cache_elements_size>10000</compiled_expression_cache_elements_size>

    <path>/data/replica/</path>

    <tmp_path>/data/replica/tmp/</tmp_path>

    <user_files_path>/data/replica/user_files/</user_files_path>

    <ldap_servers>
    </ldap_servers>

    <user_directories>
        <users_xml>
            <path>users.xml</path>
        </users_xml>
        <local_directory>
            <path>/data/replica/access/</path>
        </local_directory>
    </user_directories>

    <default_profile>default</default_profile>

    <default_database>default</default_database>

    <mlock_executable>true</mlock_executable>

    <remap_executable>false</remap_executable>

    <builtin_dictionaries_reload_interval>3600</builtin_dictionaries_reload_interval>

    <max_session_timeout>3600</max_session_timeout>

    <default_session_timeout>60</default_session_timeout>

    <prometheus>
        <endpoint>/metrics</endpoint>
        <port>19363</port>

        <metrics>true</metrics>
        <events>true</events>
        <asynchronous_metrics>true</asynchronous_metrics>
        <status_info>true</status_info>
    </prometheus>

    <query_log>
        <database>system</database>
        <table>query_log</table>

        <partition_by>toYYYYMM(event_date)</partition_by>

        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </query_log>

    <trace_log>
        <database>system</database>
        <table>trace_log</table>

        <partition_by>toYYYYMM(event_date)</partition_by>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </trace_log>

    <query_thread_log>
        <database>system</database>
        <table>query_thread_log</table>
        <partition_by>toYYYYMM(event_date)</partition_by>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </query_thread_log>

    <query_views_log>
        <database>system</database>
        <table>query_views_log</table>
        <partition_by>toYYYYMM(event_date)</partition_by>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </query_views_log>

    <part_log>
        <database>system</database>
        <table>part_log</table>
        <partition_by>toYYYYMM(event_date)</partition_by>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </part_log>

    <metric_log>
        <database>system</database>
        <table>metric_log</table>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
        <collect_interval_milliseconds>1000</collect_interval_milliseconds>
    </metric_log>

    <asynchronous_metric_log>
        <database>system</database>
        <table>asynchronous_metric_log</table>

        <flush_interval_milliseconds>7000</flush_interval_milliseconds>
    </asynchronous_metric_log>

    <opentelemetry_span_log>
        <engine>
            engine MergeTree
            partition by toYYYYMM(finish_date)
            order by (finish_date, finish_time_us, trace_id)
        </engine>
        <database>system</database>
        <table>opentelemetry_span_log</table>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </opentelemetry_span_log>

    <crash_log>
        <database>system</database>
        <table>crash_log</table>

        <partition_by/>
        <flush_interval_milliseconds>1000</flush_interval_milliseconds>
    </crash_log>

    <session_log>
        <database>system</database>
        <table>session_log</table>

        <partition_by>toYYYYMM(event_date)</partition_by>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </session_log>

    <top_level_domains_lists>
    </top_level_domains_lists>

    <dictionaries_config>*_dictionary.xml</dictionaries_config>

    <user_defined_executable_functions_config>*_function.xml</user_defined_executable_functions_config>

    <encryption_codecs>

    </encryption_codecs>

    <distributed_ddl>
        <path>/clickhouse/task_queue/ddl</path>

    </distributed_ddl>

    <graphite_rollup_example>
        <pattern>
            <regexp>click_cost</regexp>
            <function>any</function>
            <retention>
                <age>0</age>
                <precision>3600</precision>
            </retention>
            <retention>
                <age>86400</age>
                <precision>60</precision>
            </retention>
        </pattern>
        <default>
            <function>max</function>
            <retention>
                <age>0</age>
                <precision>60</precision>
            </retention>
            <retention>
                <age>3600</age>
                <precision>300</precision>
            </retention>
            <retention>
                <age>86400</age>
                <precision>3600</precision>
            </retention>
        </default>
    </graphite_rollup_example>

    <format_schema_path>/data/replica/format_schemas/</format_schema_path>

    <query_masking_rules>
        <rule>
            <name>hide encrypt/decrypt arguments</name>
            <regexp>((?:aes_)?(?:encrypt|decrypt)(?:_mysql)?)\s*\(\s*(?:'(?:\\'|.)+'|.*?)\s*\)</regexp>
            <replace>\1(???)</replace>
        </rule>
    </query_masking_rules>


    <send_crash_reports>
        <enabled>false</enabled>
        <anonymize>false</anonymize>
        <endpoint>https://6f33034cfe684dd7a3ab9875e57b1c8d@o388870.ingest.sentry.io/5226277</endpoint>
    </send_crash_reports>
</clickhouse>

metrika.xml文件,该文件主要涉及分片与副本分配

<yandex>
    <remote_servers>
        <wxpt_cluster>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>10.172.134.200</host>
                    <port>9000</port>
                    <user>default</user>
                    <password>abc123456</password>
                </replica>
                <replica>
                    <host>10.172.134.201</host>
                    <port>19000</port>
                    <user>default</user>
                    <password>abc123456</password>
                </replica>
            </shard>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>10.172.134.201</host>
                    <port>9000</port>
                    <user>default</user>
                    <password>abc123456</password>
                </replica>
                <replica>
                    <host>10.172.134.202</host>
                    <port>19000</port>
                    <user>default</user>
                    <password>abc123456</password>
                </replica>
            </shard>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>10.172.134.202</host>
                    <port>9000</port>
                    <user>default</user>
                    <password>abc123456</password>
                </replica>
                <replica>
                    <host>10.172.134.203</host>
                    <port>19000</port>
                    <user>default</user>
                    <password>abc123456</password>
                </replica>
            </shard>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>10.172.134.203</host>
                    <port>9000</port>
                    <user>default</user>
                    <password>abc123456</password>
                </replica>
                <replica>
                    <host>10.172.134.204</host>
                    <port>19000</port>
                    <user>default</user>
                    <password>abc123456</password>
                </replica>
            </shard>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>10.172.134.204</host>
                    <port>9000</port>
                    <user>default</user>
                    <password>abc123456</password>
                </replica>
                <replica>
                    <host>10.172.134.200</host>
                    <port>19000</port>
                    <user>default</user>
                    <password>abc123456</password>
                </replica>
            </shard>
        </wxpt_cluster>
    </remote_servers>

    <zookeeper-servers>
        <node index="1">
            <host>wxpt-ck-01</host>
            <port>2181</port>
        </node>
        <node index="2">
            <host>wxpt-ck-02</host>
            <port>2181</port>
        </node>
        <node index="3">
            <host>wxpt-ck-03</host>
            <port>2181</port>
        </node>
    </zookeeper-servers>
    <!--下面两个参数每个服务不相同-->
<!--第1个分片的第1个副本-->
    <macros>
        <shard>SHARD-01</shard>
        <replica>REPLICA-01</replica>
    </macros>
</yandex>

replica-clickhouse.server 配置,整体复制clickhouse-server.service,需修改配置文件路径与服务名称

sudo vim /lib/systemd/system/replica-clickhouse.service

# 输入下面
[Unit]
Description=ClickHouse Server (analytic DBMS for big data)
Requires=network-online.target
# NOTE: that After/Wants=time-sync.target is not enough, you need to ensure
# that the time was adjusted already, if you use systemd-timesyncd you are
# safe, but if you use ntp or some other daemon, you should configure it
# additionaly.
After=time-sync.target network-online.target
Wants=time-sync.target

[Service]
Type=simple
User=clickhouse
Group=clickhouse
Restart=always
RestartSec=30
RuntimeDirectory=replica-clickhouse
ExecStart=/usr/bin/clickhouse-server --config=/etc/clickhouse-config/replica/config.xml --pid-file=/run/replica-clickhouse/replica-clickhouse.pid
# Minus means that this file is optional.
EnvironmentFile=-/etc/default/clickhouse
LimitCORE=infinity
LimitNOFILE=500000
CapabilityBoundingSet=CAP_NET_ADMIN CAP_IPC_LOCK CAP_SYS_NICE CAP_NET_BIND_SERVICE

[Install]
# ClickHouse should not start from the rescue shell (rescue.target).
WantedBy=multi-user.target

配置完毕后,启动集群服务

# 重载服务
sudo systemctl daemon-reload
sudo systemctl restart clickhouse-server.service
sudo systemctl status clickhouse-server.service
sudo systemctl restart replica-clickhouse.service
sudo systemctl status replica-clickhouse.service

中途可能会遇到一些问题,大部分原因是由于配置文件配置错误导致,需结合错误日志文件进行处理。


时至今日,你依旧是我的光芒。