Browsing Tag

DuckDB

How to read OSM data with DuckDB

Jessie Hobb Mar 2, 2024 0

How to Read OSM Data with DuckDBA deep dive into OpenStreetMap data structure and how to utilize it in a scalable wayDall-E 3 image: Adorable and cute 3D render duck studying a paper map, bright sky, with blur background, high quality, 8kThis article will provide an in-depth look at how to read OpenStreetMap data using the DuckDB database.The steps described in this guide will allow the reader to load the OSM data using the Monaco example divided into nodes, ways, and relations.The final result of OSM elements read using…

Using DuckDB with Polars. Learn how to use SQL to query your… | by Wei-Meng Lee | Apr, 2023

Jessie Hobb Apr 14, 2023 0

Learn how to use SQL to query your Polars DataFramesPhoto by Hans-Jurgen Mager on UnsplashIn my previous few articles on data analytics, I talk about two important up-and-coming libraries that are currently gaining a lot of tractions in the industry:DuckDB — where you can query your dataset in-memory using SQL statements.Polars — a much more efficient DataFrame library compared to the venerable Pandas library.What about combining the power of these two libraries?In fact, you can directly query a Polars dataframe through…

Forget about SQLite, Use DuckDB Instead — And Thank Me Later | by Pol Marin | Mar, 2023

Jessie Hobb Mar 16, 2023 0

Introduction to DuckDB and its Python integrationPhoto by Krzysztof Niewolny on UnsplashWe, programmers, tend to default to SQLite when we want to work on local environments with an embedded database. While that works fine most of the time, it’s like using a bicycle to travel 100 km away: probably not the best option.Introducing DuckDB.I first learned about DuckDB in September 2022, while in PyCon Spain at Granada. Now, after 6 months of using it, I can’t live without it. And I want to contribute to the community by…

Running SQL Queries in Jupyter Notebook using JupySQL, DuckDB, and MySQL | by Wei-Meng Lee | Feb, 2023

Jessie Hobb Feb 24, 2023 0

Learn how to run SQL in your Jupyter NotebooksPhoto by Wafer WAN on UnsplashTraditionally, data scientists use Jupyter Notebook to pull data from database servers, or from external datasets (such as CSV, JSON files, etc) and store them into Pandas dataframes:All images by author unless otherwise statedThey then use the dataframes for visualization purposes. This approach has a couple of drawbacks:Querying a database server may degrade the performance of the database server, which may not be optimized for analytical…

Boost Your Cloud Data Applications with DuckDB and Iceberg API | by Alon Agmon | Dec, 2022

Jessie Hobb Dec 24, 2022 0

Use Iceberg API with DuckDB to optimize analytics queries on massive Iceberg tables in your cloud storagePhoto by Hubert Neufeld on UnsplashApache Iceberg is mostly known for making it possible for popular query engines, such as Spark, Dremio and Trino, to reliably query and manipulate records in huge tables stored in data lakes, and to do so in scale while ensuring safe concurrent reads and writes. As such, it addresses some of the major concerns that characterize modern data-lake platforms, such as data integrity,…