跳转到主要内容

教程 - 从内部 Stage 加载

image

在本教程中,您将创建一个内部 Stage,上传一个示例文件,然后使用 COPY INTO 命令将文件中的数据加载到 Databend 中。

第1步: 创建 Stage

使用 CREATE STAGE 命令创建一个内部 Stage。

mysql -h127.0.0.1 -uroot -P3307
CREATE STAGE my_int_stage;
DESC STAGE my_int_stage;
+--------------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------+---------+
| name | stage_type | stage_params | copy_options | file_format_options | comment |
+--------------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------+---------+
| my_int_stage | Internal | StageParams { storage: S3(StageS3Storage { bucket: "", path: "", credentials_aws_key_id: "", credentials_aws_secret_key: "", encryption_master_key: "" }) } | CopyOptions { on_error: None, size_limit: 0 } | FileFormatOptions { format: Csv, skip_header: 0, field_delimiter: ",", record_delimiter: "\n", compression: None } | |
+--------------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------+---------+

第2步: 上传文件到 Stage

下载示例数据文件(选择 CSV 或 Parquet),该文件包含两条记录:

Transaction Processing,Jim Gray,1992
Readings in Database Systems,Michael Stonebraker,2004

上传 books.csv 到 Stage

Request /v1/upload_to_stage
curl -H "stage_name:my_int_stage"\
-F "upload=@./books.csv"\
-XPUT http://root:@localhost:8000/v1/upload_to_stage
Response
{"id":"50880048-f397-4d32-994c-ce3d38af430f","stage_name":"my_int_stage","state":"SUCCESS","files":["books.csv"]}
tip
  • http://127.0.0.1:8000/v1/upload_to_stage

    • 127.0.0.1databend-query.toml 中参数 http_handler_host 的值
    • 8000databend-query.toml 中参数 http_handler_port 的值
  • -F \"upload=@./books.csv\"

    • 文件 books.csv 的存储位置

第3步: 列出 Stage 上文件(可选)

mysql -h127.0.0.1 -uroot -P3307
LIST @my_int_stage;
+---------------+------+------+-------------------------------+--------------------+
| name | size | md5 | last_modified | creator |
+---------------+------+------+-------------------------------+--------------------+
| books.csv | 91 | NULL | 2022-06-10 12:01:40.000 +0000 | 'root'@'127.0.0.1' |
| books.parquet | 91 | NULL | 2022-06-10 12:01:40.000 +0000 | 'root'@'127.0.0.1' |
+---------------+------+------+-------------------------------+--------------------+

第4步: 创建数据库和表

CREATE DATABASE book_db;
USE book_db;
CREATE TABLE books
(
title VARCHAR,
author VARCHAR,
date VARCHAR
);

第5步: 复制数据到目标表格

使用 COPY INTO 命令把 Stage 文件加载到目标表。

COPY INTO books FROM @my_int_stage files=('books.csv') file_format = (type = 'CSV' field_delimiter = ','  record_delimiter = '\n' skip_header = 0);
tip
  • files = ( 'file_name' [ , 'file_name' ... ] )

    指定要加载的一个或多个文件名(以逗号分隔)。

  • file_format
参数描述Required
record_delimiter文件中的记录分割符 默认 '\n'Optional
field_delimiter文件中的字段分隔符 默认 ','可选
skip_header要跳过的文件开头的行数 默认 0可选

第6步: 验证加载的数据

SELECT * FROM books;
+------------------------------+----------------------+-------+
| title | author | date |
+------------------------------+----------------------+-------+
| Transaction Processing | Jim Gray | 1992 |
| Readings in Database Systems | Michael Stonebraker | 2004 |
+------------------------------+----------------------+-------+