Multidimensional Data Linearization

Color Coding

Green to Red Gradient: Shows the linearization order from first (green) to last (red) in memory. This helps you understand how different algorithms arrange data sequentially.

Blue Outlines: Indicate query regions and all cells that need to be read to satisfy the query. This includes both requested cells and extra cells read due to chunking.

White Overlays: Show currently hovered elements. Direct hovers have both overlay and outline, while cross-highlighted elements show only the overlay.

Array Views

Array Cells: Shows how individual cells are organized and linearized within chunks. Each cell's color represents its position in the combined linearization order.

Array Chunks: Displays chunk-level organization where all cells in the same chunk share the same color. This shows how chunks are linearized relative to each other.

Interactive Cross-highlighting: Hovering in one view highlights corresponding elements in all other views, helping you trace relationships between logical and physical layouts.

Linear Storage Views

Storage Linearization — Cells Shows how data is actually stored in linear memory, with the same cell-level coloring as the logical array. Blue regions indicate cells that need to be read.

Storage Linearization — Chunks Same linear layout but colored by chunks to show how chunking affects the distribution of read operations. Blue bars below show the byte ranges that need to be fetched.

Byte Range Indicators: Lines below each linear view show how many separate read operations are required - fewer ranges mean better I/O performance.

Performance Metrics

Requested Cells: The number of cells in your query region that you actually want to read.

Actual Cells Read: The total number of cells that must be read due to chunking, including both requested and extra cells.

Read Amplification: Shows how much extra data you read due to chunking. Values > 1.0 indicate wasted bandwidth. Lower is better.

Read Efficiency: Percentage of useful data in each read operation. Higher percentages indicate better performance with less wasted I/O.

Chunks Touched: How many chunks intersect with your query region. Fewer chunks generally mean more efficient access patterns.

Range Reads: Number of separate read operations needed. Fewer range reads reduce I/O overhead and improve performance.

Coalescing Factor: Shows how much read coalescing improves I/O efficiency compared to worst-case (chunks touched ÷ range reads). Values > 1.0 indicate that multiple chunks are being read in fewer operations due to spatial locality. Note that this does not apply to formats that split chunks into individual files, like unsharded Zarr, where each chunk has to be read via a separate read request.

Storage Alignment: Overall measure of how well your query aligns with the storage layout, combining read amplification and coalescing in a weighted normalized average in the range 0 - 1. Higher values indicate better alignment between your access pattern and chunking strategy.

Algorithm Comparison

Row-Major: Best for queries that access consecutive rows. Common in C/C++ and most programming languages.

Column-Major: Optimal for column-wise access patterns. Used in Fortran, R, and some scientific computing applications.

Z-Order (Morton): Space-filling curve that preserves spatial locality well. Good for 2D range queries and spatial databases.

Hilbert Curve: Optimal space-filling curve with the best spatial locality preservation. Excellent for 2D spatial queries but more complex to compute.

Optimization Tips

Match Access Patterns: Choose linearization algorithms that align with how your application accesses data most frequently.

Chunk Size Balance: Larger chunks reduce metadata overhead and number of reads, but increase read amplification. Smaller chunks decrease read amplification but increase read count and can have a management overhead penalty.

Query Shape Matters: Square queries often work best with space-filling curves, while rectangular queries favor row- or column-major ordering, per the rectangle orientation.

Monitor Metrics: Use the metrics to compare different configurations and find the optimal balance for your specific use cases and data geometries.

vischunk

An nd-Array Chunking and Linearization Analyzer

What is chunking and linearization?

Interactive Simulation

Simulation Controls

Performance Metrics

Simulation Visualizations

How to Interpret the Visualizations

Color Coding

Array Views

Linear Storage Views

Performance Metrics

Algorithm Comparison

Optimization Tips