跳转到主要内容

在 Kubernetes 部署 Databend 集群

本专题讲解如何在 Kubernetes 上安装和配置 Databend 集群。

准备工作

  • 确认 helm 命令已安装。查看参考文档

  • 请确保您有一个 Kubernetes 集群并已经运行起来。 例如:

    此外,一些简单的 Kubernetes 引擎可供本地测试:

  • 创建云对象存储,并获得相应凭据,即 access_key_idsecret_access_key

    • AWS S3 或其他与 S3 兼容的存储服务
    • Azure Storage Blob
    • OpenDAL 支持的其他存储服务
    For advanced user

    Databend 也支持不带访问密钥的身份验证方法:

:::

  • 确认 Kubernetes 集群有一个默认的存储类。

    云端平台

    推荐使用 Amazon Elastic Block Store (EBS) CSI driver。 And remember to set the annotation for default class when adding storage classes, for example:

    storageClasses:
    - name: gp3
    annotations:
    storageclass.kubernetes.io/is-default-class: "true"
    allowVolumeExpansion: true
    volumeBindingMode: WaitForFirstConsumer
    reclaimPolicy: Delete
    parameters:
    type: gp3
    ❯ kubectl get sc
    NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
    gp2 kubernetes.io/aws-ebs Delete WaitForFirstConsumer true 16d
    gp3 (default) ebs.csi.aws.com Delete WaitForFirstConsumer true 15d

:::


* **[Recommanded]** 如果您想要监视Databend Meta 和 Databend 查询节点的状态,请确保在Kubernetes 集群中运行Prometheus Operator。
:::tip Kube Prometheus Stack 安装步骤
1. 为 kube-promeus-stack 添加图表仓库

```shell
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update prometheus-community
```

2. 准备一个 values 文件

```yaml title="values.yaml"
grafana:
grafana.ini:
auth.anonymous:
enabled: true
org_role: Admin
prometheus:
prometheusSpec:
ruleNamespaceSelector: {}
ruleSelectorNilUsesHelmValues: false
serviceMonitorNamespaceSelector: {}
serviceMonitorSelectorNilUsesHelmValues: false
podMonitorNamespaceSelector: {}
podMonitorSelectorNilUsesHelmValues: false
```

3. 使用 helm 安装 [Kube Prometheus Stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack)

```shell
helm upgrade --install monitoring \
prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--values values.yaml
```

4. 验证 promeus & grafana 已开始运行:

```shell
❯ kubectl -n monitoring get pods
NAME READY STATUS RESTARTS AGE
monitoring-prometheus-node-exporter-7km6w 1/1 Running 0 19m
monitoring-kube-prometheus-operator-876c99fb8-qjnpd 1/1 Running 0 19m
monitoring-kube-state-metrics-7c9f7fc49b-4884t 1/1 Running 0 19m
alertmanager-monitoring-kube-prometheus-alertmanager-0 2/2 Running 1 (18m ago) 18m
monitoring-grafana-654b4bb58c-sf9wp 3/3 Running 0 19m
prometheus-monitoring-kube-prometheus-prometheus-0 2/2 Running 0 18m
```


:::


## 部署 Databend 集群

### Step 1. 部署 Databend Meta 集群

为保持高可用性,**强烈推荐**每个集群至少包含3个节点。

1. 创建一个 values 文件:

详细参数和默认值请参阅[文档](https://github.com/datafuselabs/helm-charts/blob/main/charts/databend-meta/values.yaml)。

```yaml title="values.yaml"
replicaCount: 3
persistence:
size: 20Gi
serviceMonitor:
enabled: true
```

2. 在命名空间 `databend-meta` 中部署 Meta 群集:

```shell
helm repo add databend https://charts.databend.rs
helm repo update databend

helm upgrade --install databend-meta databend/databend-meta \
--namespace databend-meta --create-namespace \
--values values.yaml
```

3. 验证 Meta 服务已经启动:

```shell
❯ kubectl -n databend-meta get pods
NAME READY STATUS RESTARTS AGE
databend-meta-0 1/1 Running 0 5m36s
databend-meta-1 1/1 Running 1 (4m38s ago) 4m53s
databend-meta-2 1/1 Running 1 (4m2s ago) 4m18s

❯ kubectl -n databend-meta get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-databend-meta-0 Bound pvc-578ec207-bf7e-4bac-a9a1-3f0e4b140b8d 20Gi RWO local-path 5m45s
data-databend-meta-1 Bound pvc-693a0350-6b87-491d-8575-90bf62179b59 20Gi RWO local-path 5m2s
data-databend-meta-2 Bound pvc-08bd4ceb-15c2-47f3-a637-c1cc10441874 20Gi RWO local-path 4m27s
```


### Step 2. 部署 Databend 查询集群

1. 创建一个 values 文件,包含一个内置用户 `databend:databend`,和一个名为`example_cluster`的集群以及3个节点。

详细参数和默认值请参阅[文档](https://github.com/datafuselabs/helm-charts/blob/main/charts/databend-query/values.yaml)。

```yaml
replicaCount: 3
config:
query:
clsuterId: example_cluster
# add builtin user
users:
- name: databend
# available type: sha256_password, double_sha1_password, no_password, jwt
authType: double_sha1_password
# echo -n "databend" | sha1sum | cut -d' ' -f1 | xxd -r -p | sha1sum
authString: 3081f32caef285c232d066033c89a78d88a6d8a5
meta:
# Set endpoints to use remote meta service
# depends on previous deployed meta service、namespace and nodes
endpoints:
- "databend-meta-0.databend-meta.databend-meta.svc:9191"
- "databend-meta-1.databend-meta.databend-meta.svc:9191"
- "databend-meta-2.databend-meta.databend-meta.svc:9191"
storage:
# s3, oss
type: s3
s3:
bucket: "<bucket>"
region: "<region>"
access_key_id: "<key>"
secret_access_key: "<secret>"
root: ""
# [recommended] enable monitoring service
serviceMonitor:
enabled: true
# [recommended] enable access from outside cluster
service:
type: LoadBalancer
```

````mdx-code-block

::: LoadBalancer
在设置服务类型为 `LoadBalancer` 时,
几乎所有云端平台都会为查询服务指定一个公共IP地址。
这可能导致安全问题。

这时,请通过注释(annotations)在云平台创建内部负载平衡器。

对于不同的云平台:


<Tabs>
<TabItem value="aws" label="AWS">

推荐安装[AWS Load Balancer Controller](https://github.com/kubernetes-sigs/aws-load-balancer-controller)。

```yaml
service:
type: LoadBalancer
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal
```

</TabItem>

<TabItem value="aliyun" label="Alibaba Cloud">

```yaml
service:
type: LoadBalancer
annotations:
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-address-type: "intranet"
```

</TabItem>
</Tabs>

:::

云存储
config:
storage:
type: s3
s3:
# default endpoint
endpoint_url: "s3.amazonaws.com"
bucket: "<bucket>"
region: "<region>"
access_key_id: "<key>"
secret_access_key: "<secret>"
root: ""
  1. 在命名空间 databend-query 中为 tenant1部署查询集群:
helm repo add databend https://charts.databend.rs
helm repo update databend

helm upgrade --install tenant1 databend/databend-query \
--namespace databend-query --create-namespace \
--values values.yaml
  1. 验证查询服务已经启动:
❯ kubectl -n databend-query get pods
NAME READY STATUS RESTARTS AGE
tenant1-databend-query-66647594c-lkkm9 1/1 Running 0 36s
tenant1-databend-query-66647594c-lpl2s 1/1 Running 0 36s
tenant1-databend-query-66647594c-4hlpw 1/1 Running 0 36s

❯ kubectl -n databend-query get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
tenant1-databend-query LoadBalancer 10.43.84.243 172.20.0.2 8080:32063/TCP,9000:31196/TCP,9090:30472/TCP,8000:30050/TCP,7070:31253/TCP,3307:31367/TCP 17m
  1. 访问查询集群。

    我们在此使用内置用户 databend

  • 集群内访问权限:

    mysql -htenant1-databend-query.databend-query.svc -udatabend -P3307 -pdatabend
  • 带负载平衡器的集群外部访问权限:

    # the address here is the `EXTERNAL-IP` for service tenant1-databend-query above
    mysql -h172.20.0.2 -udatabend -P3307 -pdatabend
  • 使用 kubectl 进行本地访问:

    nohup kubectl port-forward -n databend-query svc/tenant1-databend-query 3307:3307 &
    mysql -h127.0.0.1 -udatabend -P3307 -pdatabend
  1. 为 tenant2 部署第二个集群

为 tenant2 修改 values.yaml

# optional
helm repo update databend

helm upgrade --install tenant2 databend/databend-query \
--namespace databend-query --create-namespace \
--values values.yaml
Verify the query service for tenant2 running
❯ kubectl -n databend-query get pods
NAME READY STATUS RESTARTS AGE
tenant1-databend-query-66647594c-lkkm9 1/1 Running 0 55m
tenant1-databend-query-66647594c-lpl2s 1/1 Running 0 55m
tenant1-databend-query-66647594c-4hlpw 1/1 Running 0 55m
tenant2-databend-query-59dcc4949f-9qg9b 1/1 Running 0 53s
tenant2-databend-query-59dcc4949f-pfxxj 1/1 Running 0 53s
tenant2-databend-query-59dcc4949f-mmwr9 1/1 Running 0 53s

维护 Databend 查询集群

缩放集群

扩大或缩小查询集群有两种方式:

  • 直接使用 kubectl

     # scale query cluster number to 0
    kubectl -n databend-query scale deployment tenant1-databend-query --replicas=0

    # scale query cluster number to 5
    kubectl -n databend-query scale deployment tenant1-databend-query --replicas=5
  • 更新values.yaml中的replicaCount,然后使用helm upgrade命令:

    diff values.yaml
    - replicaCount: 3
    + replicaCount: 5
    helm upgrade --install tenant1 databend/databend-query \
    --namespace databend-query --create-namespace \
    --values values.yaml

升级

升级查询集群需要修改values.yaml文件。

diff values.yaml
replicaCount: 3
+ image:
+ tag: "v0.8.123-nightly"
config:
query:
clsuterId: example_cluster

然后再次运行 helm upgrade 命令:

# optional
helm repo update databend

helm upgrade --install tenant1 databend/databend-query \
--namespace databend-query --create-namespace \
--values values.yaml

检查集群信息

MySQL [(none)]> select * from system.clusters;
+------------------------+------------+------+------------------------------------------------------------------------------+
| name | host | port | version |
+------------------------+------------+------+------------------------------------------------------------------------------+
| TJoPIFqvwU6l6IuZzwVmj | 10.42.0.29 | 9090 | v0.8.122-nightly-5d3a308(rust-1.67.0-nightly-2022-11-20T16:27:23.284298522Z) |
| e7leCg352OPa7bIBTi3ZK | 10.42.0.30 | 9090 | v0.8.122-nightly-5d3a308(rust-1.67.0-nightly-2022-11-20T16:27:23.284298522Z) |
| uGD38DVaWDAnJV5jupK4p4 | 10.42.0.28 | 9090 | v0.8.122-nightly-5d3a308(rust-1.67.0-nightly-2022-11-20T16:27:23.284298522Z) |
+------------------------+------------+------+------------------------------------------------------------------------------+
3 rows in set (0.009 sec)

验证分布式查询

MySQL [(none)]> EXPLAIN SELECT max(number), sum(number) FROM numbers_mt(10000000000) GROUP BY number % 3, number % 4, number % 5 LIMIT 10;
+-------------------------------------------------------------------------------------------------------------------------------------------+
| explain |
+-------------------------------------------------------------------------------------------------------------------------------------------+
| Limit |
| ├── limit: 10 |
| ├── offset: 0 |
| └── Exchange |
| ├── exchange type: Merge |
| └── EvalScalar |
| ├── expressions: [max(number) (#6), sum(number) (#7)] |
| └── AggregateFinal |
| ├── group by: [number % 3, number % 4, number % 5] |
| ├── aggregate functions: [max(number), sum(number)] |
| └── Exchange |
| ├── exchange type: Hash(_group_by_key) |
| └── AggregatePartial |
| ├── group by: [number % 3, number % 4, number % 5] |
| ├── aggregate functions: [max(number), sum(number)] |
| └── EvalScalar |
| ├── expressions: [%(numbers_mt.number (#0), 3), %(numbers_mt.number (#0), 4), %(numbers_mt.number (#0), 5)] |
| └── TableScan |
| ├── table: default.system.numbers_mt |
| ├── read rows: 10000000000 |
| ├── read bytes: 80000000000 |
| ├── partitions total: 152588 |
| ├── partitions scanned: 152588 |
| └── push downs: [filters: [], limit: NONE] |
+-------------------------------------------------------------------------------------------------------------------------------------------+
24 rows in set (0.008 sec)

分布式查询成功,集群将通过 flight_api_address 有效地传输数据。

向集群上传数据

CREATE TABLE t1(i INT, j INT);
INSERT INTO t1 SELECT number, number + 300 from numbers(10000000);
SELECT count(*) FROM t1;
+----------+
| count() |
+----------+
| 10000000 |
+----------+

监控 Meta 和查询集群

信息

部署 Meta 和查询集群时应启用 serviceMonitor

  • datafuselabs/helm-charts下载 grafana dashboard 文件。

  • 为您的集群打开Grafana Web。

  • 在左侧边栏选择 + Import 并上传已下载的两个 JSON 文件。

  • 然后您应该看到两个控制面板:

    • Databend Meta Runtime
    • Databend Query Runtime