跳转到主要内容

使用 Databend 分析 Hits 数据集

这个用例讲解如何使用 Databend 分析 Hits 数据集。

第1步: 部署 Databend

请确保您已经成功安装了 Databend。安装指导:

第2步: 载入 Hits 数据集

2.1 创建用户

使用 MySQL 客户端连接到 Databend 服务器:

mysql -h127.0.0.1 -uroot -P3307 

创建用户:

CREATE USER user1 IDENTIFIED BY 'abc123';

授予用户权限:

GRANT ALL ON *.* TO user1;

参考如何创建用户

2.2 创建表

创建 SQL

2.3 加载数据到 Hits 表

hits_1m.tsv.gz
wget --no-check-certificate https://repo.databend.rs/hits/hits_1m.tsv.gz
## If you want to load full version of hits dataset, please download from clickhouse's dataset:
## wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits_1m.csv.gz
Load CSV files into Databend
curl -H "insert_sql:insert into hits file_format = (type = 'TSV')"  -F  "upload=@./hits_1m.tsv"  -XPUT http://user1:abc123@127.0.0.1:8000/v1/streaming_load

第3步: 查询

执行查询:

mysql -h127.0.0.1 -P3307 -uroot
SELECT SUM(AdvEngineID), COUNT(*), AVG(ResolutionWidth) FROM hits;

示例查询:

编号查询
Q1SELECT COUNT(*) FROM hits;
Q2SELECT COUNT(*) FROM hits WHERE AdvEngineID <> 0;
Q3SELECT SUM(AdvEngineID), COUNT(*), AVG(ResolutionWidth) FROM hits;
Q4SELECT AVG(UserID) FROM hits;
Q5SELECT COUNT(DISTINCT UserID) FROM hits;
Q6SELECT COUNT(DISTINCT SearchPhrase) FROM hits;
Q7SELECT MIN(EventDate), MAX(EventDate) FROM hits;
Q8SELECT AdvEngineID, COUNT(*) FROM hits WHERE AdvEngineID <> 0 GROUP BY AdvEngineID ORDER BY COUNT(*) DESC;
Q9SELECT RegionID, COUNT(DISTINCT UserID) AS u FROM hits GROUP BY RegionID ORDER BY u DESC LIMIT 10;
Q10SELECT RegionID, SUM(AdvEngineID), COUNT(*) AS c, AVG(ResolutionWidth), COUNT(DISTINCT UserID) FROM hits GROUP BY RegionID ORDER BY c DESC LIMIT 10;