使用 Databend 分析 Hits 数据集
这个用例讲解如何使用 Databend 分析 Hits 数据集。
第1步: 部署 Databend
请确保您已经成功安装了 Databend。安装指导:
第2步: 载入 Hits 数据集
2.1 创建用户
使用 MySQL 客户端连接到 Databend 服务器:
mysql -h127.0.0.1 -uroot -P3307
创建用户:
CREATE USER user1 IDENTIFIED BY 'abc123';
授予用户权限:
GRANT ALL ON *.* TO user1;
参考如何创建用户。
2.2 创建表
2.3 加载数据到 Hits 表
hits_1m.tsv.gz
wget --no-check-certificate https://repo.databend.rs/hits/hits_1m.tsv.gz
## If you want to load full version of hits dataset, please download from clickhouse's dataset:
## wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
gzip -d hits_1m.csv.gz
Load CSV files into Databend
curl -H "insert_sql:insert into hits file_format = (type = 'TSV')" -F "upload=@./hits_1m.tsv" -XPUT http://user1:abc123@127.0.0.1:8000/v1/streaming_load
第3步: 查询
执行查询:
mysql -h127.0.0.1 -P3307 -uroot
SELECT SUM(AdvEngineID), COUNT(*), AVG(ResolutionWidth) FROM hits;
示例查询:
编号 | 查询 |
---|---|
Q1 | SELECT COUNT(*) FROM hits; |
Q2 | SELECT COUNT(*) FROM hits WHERE AdvEngineID <> 0; |
Q3 | SELECT SUM(AdvEngineID), COUNT(*), AVG(ResolutionWidth) FROM hits; |
Q4 | SELECT AVG(UserID) FROM hits; |
Q5 | SELECT COUNT(DISTINCT UserID) FROM hits; |
Q6 | SELECT COUNT(DISTINCT SearchPhrase) FROM hits; |
Q7 | SELECT MIN(EventDate), MAX(EventDate) FROM hits; |
Q8 | SELECT AdvEngineID, COUNT(*) FROM hits WHERE AdvEngineID <> 0 GROUP BY AdvEngineID ORDER BY COUNT(*) DESC; |
Q9 | SELECT RegionID, COUNT(DISTINCT UserID) AS u FROM hits GROUP BY RegionID ORDER BY u DESC LIMIT 10; |
Q10 | SELECT RegionID, SUM(AdvEngineID), COUNT(*) AS c, AVG(ResolutionWidth), COUNT(DISTINCT UserID) FROM hits GROUP BY RegionID ORDER BY c DESC LIMIT 10; |