Option B: DuckDB Parquet Export
Convert the .artifacts/snippets.db snapshot into a partitioned Parquet dataset for analysis with DuckDB.
Install the requirement if needed:
pip install -r requirements/lock.txt
Export
python tools/export_to_parquet.py
The script attaches .artifacts/snippets.db, installs the httpfs and azure extensions, and writes Parquet files under parquet/.
Each partition corresponds to a snippet.id value.
Query in DuckDB
INSTALL azure;
LOAD azure;
-- Configure credentials (replace with real values)
SET azure_storage_account='ACCOUNT_NAME';
SET azure_storage_access_key='ACCESS_KEY';
-- Read data from Azure Blob Storage
SELECT *
FROM read_parquet('azure://container/snippet/id=1/*.parquet');
For a local dataset:
SELECT * FROM read_parquet('parquet/snippet/id=1/*.parquet');