教程 - 从内部 Stage 加载
在本教程中,您将创建一个内部 Stage,上传一个示例文件,然后使用 COPY INTO 命令将文件中的数据加载到 Databend 中。
第1步: 创建 Stage
使用 CREATE STAGE 命令创建一个内部 Stage。
mysql -h127.0.0.1 -uroot -P3307
CREATE STAGE my_int_stage;
DESC STAGE my_int_stage;
+--------------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------+---------+
| name | stage_type | stage_params | copy_options | file_format_options | comment |
+--------------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------+---------+
| my_int_stage | Internal | StageParams { storage: S3(StageS3Storage { bucket: "", path: "", credentials_aws_key_id: "", credentials_aws_secret_key: "", encryption_master_key: "" }) } | CopyOptions { on_error: None, size_limit: 0 } | FileFormatOptions { format: Csv, skip_header: 0, field_delimiter: ",", record_delimiter: "\n", compression: None } | |
+--------------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------+---------+
第2步: 上传文件到 Stage
下载示例数据文件(选择 CSV 或 Parquet),该文件包含两条记录:
Transaction Processing,Jim Gray,1992
Readings in Database Systems,Michael Stonebraker,2004
- CSV
- Parquet
下载 books.csv
- CSV
- Parquet
上传 books.csv
到 Stage
Request /v1/upload_to_stage
curl -H "stage_name:my_int_stage"\
-F "upload=@./books.csv"\
-XPUT http://root:@localhost:8000/v1/upload_to_stage
Response
{"id":"50880048-f397-4d32-994c-ce3d38af430f","stage_name":"my_int_stage","state":"SUCCESS","files":["books.csv"]}
tip
http://127.0.0.1:8000/v1/upload_to_stage
127.0.0.1
:databend-query.toml 中参数http_handler_host
的值8000
:databend-query.toml 中参数http_handler_port
的值
-F \"upload=@./books.csv\"
- 文件 books.csv 的存储位置
上传 books.parquet
到 Stage:
Request /v1/upload_to_stage
curl -H "stage_name:my_int_stage"\
-F "upload=@./books.parquet"\
-XPUT http://root:@localhost:8000/v1/upload_to_stage
Response
{"id":"50880048-f397-4d32-994c-ce3d38af430f","stage_name":"my_int_stage","state":"SUCCESS","files":["books.parquet"]}
tip
http://127.0.0.1:8000/v1/upload_to_stage
127.0.0.1
:databend-query.toml 中参数http_handler_host
的值8000
:databend-query.toml 中参数http_handler_port
的值
-F \"upload=@./books.csv\"
- 文件 books.csv 的存储位置
第3步: 列出 Stage 上文件(可选)
mysql -h127.0.0.1 -uroot -P3307
LIST @my_int_stage;
+---------------+------+------+-------------------------------+--------------------+
| name | size | md5 | last_modified | creator |
+---------------+------+------+-------------------------------+--------------------+
| books.csv | 91 | NULL | 2022-06-10 12:01:40.000 +0000 | 'root'@'127.0.0.1' |
| books.parquet | 91 | NULL | 2022-06-10 12:01:40.000 +0000 | 'root'@'127.0.0.1' |
+---------------+------+------+-------------------------------+--------------------+
第4步: 创建数据库和表
CREATE DATABASE book_db;
USE book_db;
CREATE TABLE books
(
title VARCHAR,
author VARCHAR,
date VARCHAR
);
第5步: 复制数据到目标表格
使用 COPY INTO 命令把 Stage 文件加载到目标表。
- CSV
- Parquet
COPY INTO books FROM @my_int_stage files=('books.csv') file_format = (type = 'CSV' field_delimiter = ',' record_delimiter = '\n' skip_header = 0);
tip
files = ( 'file_name' [ , 'file_name' ... ] )
指定要加载的一个或多个文件名(以逗号分隔)。
- file_format
参数 | 描述 | Required |
---|---|---|
record_delimiter | 文件中的记录分割符 默认 '\n' | Optional |
field_delimiter | 文件中的字段分隔符 默认 ',' | 可选 |
skip_header | 要跳过的文件开头的行数 默认 0 | 可选 |
COPY INTO books FROM @my_int_stage files=('books.parquet') file_format = (type = 'Parquet');
第6步: 验证加载的数据
SELECT * FROM books;
+------------------------------+----------------------+-------+
| title | author | date |
+------------------------------+----------------------+-------+
| Transaction Processing | Jim Gray | 1992 |
| Readings in Database Systems | Michael Stonebraker | 2004 |
+------------------------------+----------------------+-------+