Skip to main content

This Week in Databend #72

· 5 min read

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Multiple Catalogs

  • extends show databases SQL (#9152)

Stage

  • support select from URI (#9247)

Streaming Load

  • support file_format syntax in streaming load insert sql (#9063)

Planner

  • push down limit to union (#9210)

Query

  • use analyze table instead of optimize table statistic (#9143)
  • fast parse insert values (#9214)

Storage

  • use distinct count calculated by the xor hash function (#9159)
  • read_parquet read meta before read data (#9154)
  • push down filter to parquet reader (#9199)
  • prune row groups before reading (#9228)

Open Sharing

  • add prototype open sharing and add sharing stateful tests (#9177)

Code Refactoring 🎉

*

  • simplify the global data registry logic (#9187)

Storage

  • refactor deletion (#8824)

Build/Testing/CI Infra Changes 🔌

  • release databend deb package and databend with hive (#9138, #9241, etc.)

Bug Fixes 🔧

Format

  • support ASCII control code hex as format field delimiter (#9160)

Planner

  • prewhere_column empty and predicate is not const will return empty (#9116)
  • don't push down topk to Merge when it's child is Aggregate (#9183)
  • fix nullable column validity not equal (#9220)

Query

  • address unit test hang on test_insert (#9242)

Storage

  • too many io requests for read blocks during compact (#9128)
  • collect orphan snapshots (#9108)

What's On In Databend

Stay connected with the latest news about Databend.

Breaking Change: Unified File Format Options

To simplify, we're rolling out a set of unified file format options as follows for the COPY INTO command, the Streaming Load API, and all the other cases where users need to describe their file formats:

[ FILE_FORMAT = ( TYPE = { CSV | TSV | NDJSON | PARQUET | XML} [ formatTypeOptions ] ) ]
  • Please note that the current format options starting with format_* will be deprecated.
  • ... FORMAT CSV ... will still be accepted by the ClickHouse handler.
  • Support for customized formats created by CREATE FILE FORMAT ... will be added in a future release: ... FILE_FORMAT = (format_name = 'MyCustomCSV') .... .

Learn More

Open Sharing

Open Sharing is a simple and secure data-sharing protocol designed for databend-query nodes running in a multi-cloud environment.

  • Simple & Free: Open Sharing is open-source and basically a RESTful API implementation.
  • Secure: Open Sharing verifies incoming requesters' identities and access permissions, and provides an audit log.
  • Multi-Cloud: Open Sharing supports a variety of public cloud platforms, including AWS, Azure, GCP, etc.

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

We're about to run stage-related tests again using the Streaming Load API to move files to a stage instead of an AWS command like this:

aws --endpoint-url ${STORAGE_S3_ENDPOINT_URL} s3 cp s3://testbucket/admin/data/ontime_200.csv s3://testbucket/admin/stage/internal/s1/ontime_200.csv >/dev/null 2>&1

This is because Databend users do not need to take care of, or do not even know the stage paths that the AWS command requires.

Issue 8528: refactor stage related tests

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

ariesdevilb41shBohuTANGChasen-ZhangClSlaiddantengsky
ariesdevilb41shBohuTANGChasen-ZhangClSlaiddantengsky
drmingdrmerhantmaclichuangmergify[bot]PsiACERinChanNOWWW
drmingdrmerhantmaclichuangmergify[bot]PsiACERinChanNOWWW
soyeric128sundy-liwubxXuanwoxudong963youngsofun
soyeric128sundy-liwubxXuanwoxudong963youngsofun
ZhiHanZzhyasszzzdong
ZhiHanZzhyasszzzdong

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.